1-Oct-98 7:14:07-GMT,3775;000000000001 Return-Path: Received: from mailrelay1.cc.columbia.edu (mailrelay1.cc.columbia.edu [128.59.35.143]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id DAA20582 for ; Thu, 1 Oct 1998 03:14:05 -0400 (EDT) Received: from sagitta.cia.com (sagitta.cybersurf.net [206.186.113.4]) by mailrelay1.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id DAA11389 for ; Thu, 1 Oct 1998 03:14:04 -0400 (EDT) Received: from cybersurf.net (anzu.cybersurf.net [206.186.111.67]) by sagitta.cia.com (8.8.5/8.8.5) with ESMTP id AAA29349 for ; Thu, 1 Oct 1998 00:13:47 -0700 Message-ID: <36132BC6.D5494815@cybersurf.net> Date: Thu, 01 Oct 1998 00:14:14 -0700 From: Geoffrey Waigh X-Mailer: Mozilla 4.06 [en] (Win95; U) MIME-Version: 1.0 To: fdc@columbia.edu Subject: Re: Terminal Graphics Proposal References: <9810010139.AA14114@unicode.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I was going to send this to offline@unicode.org (the list for offline unicode discussions,) but assumed from your request for private correspondence you were not aware of the forum. Frank da Cruz wrote: > > D R A F T # 1 > > ABSTRACT > > A selection of terminal graphics characters is proposed for Unicode [24] > and ISO 10646 [19] to allow Unicode-based terminal emulation software to > (a) display glyphs that are found on popular types of terminals but > currently are not available in Unicode, and (b) interoperate with other > Unicode applications. I can see clear merit in handling b), but I'm leary of the code space consumption that a) is having here. In general, my feeling is that if 98% emulation does the job in an adequate fashion for non-perfectionists, then that is the way to go. [On control character display] I don't think that the existing C0 control code graphics being indicated as horizontal and a preference for diagonal glyphs warrants disunification. I think that is a font variation and so long as the control code is represented with one of it's standard names (2 or 3 letters, horizontal or diagonal,) the information is being properly conveyed and the user can understand what is going on. However, I think it would be useful to complete the control code set. [On Hex code display] That seems kind of wasteful for a debugging mode. Do the terminals that produce this output have escape sequences for enabling this mode, or is it strictly a terminal configuration option? (Of course by that measure the control character codes come under scrutiny...) [On math symbols] I cannot comment on these since our customers had 0 interest in the technical symbols, and so aside from glancing at the code pages and realizing they wouldn't map to Unicode very well I didn't work with them. [On Line and Blocks] Again, I didn't have to deal with the terminals that form the bulk of these codes and cannot comment. > 9. UNFINISHED BUSINESS > > The selection of characters presented in this draft is far from > comprehensive. Hundreds of other terminals from the past 30+ years are > likely to have glyphs or entire character sets covered neither here nor > in Unicode, and these might or might not be important in some application > somewhere. Readers are invited, therefore, to propose any needed > additions, bearing in mind that Unicode code space is not unlimited. And hopefully the compleatists out there will let sleeping dogs lie. Which is not to say that some other terminals might be worth supporting, but I suspect that the cost to the rest of the world in terms of codepoint space for most of them means that doing the emulation with alternate glyphs or custom fonts is appropriate. Geoffrey Waigh gpw@cybersurf.net 1-Oct-98 11:27:35-GMT,3161;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id HAA23098 for ; Thu, 1 Oct 1998 07:27:34 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id EAA59550 ; Thu, 1 Oct 1998 04:24:53 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA15098; Thu, 1 Oct 98 04:16:20 -0700 Message-Id: <9810011116.AA15098@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Uml-Sequence: 6029 (1998-10-01 11:15:47 GMT) From: Kevin Bracey Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 04:15:46 -0700 (PDT) Subject: Re: Terminal Graphics Proposal In message <9810010143.AA14122@unicode.org> Frank da Cruz wrote: > Sorry for the length of the following; if you're not interested, > skip it. The intention is to bring likeminded parties out of the woodwork; > if you are one, please contact me and we can continue the topic offline. > > - Frank > Very well put together proposal - I like it. A few comments (on the mailing list, so keeping it short): > > Unicode already has a block of Control Pictures at U+2400 through U+2421, > but (except for "NL" at U+2424) these go horizontally across the character > cell, rather than diagonally, thus making them difficult to distinguish > from normal alphanumeric text. A new, parallel block of C0 control > pictures is needed in which the abbreviations are displayed diagonally. That's a glyph variation - the Unicode Standard explicitly states that you can use whatever preferred glyph you like for these. Indeed, IIRC, ISO 10646-1 has considerably different suggested glyphs for these characters. > E080 SP Space (like U+2420 but arranged diagonally) > E081 DEL Delete (Rubout) (2-character name: DT) These two are glyph variants of U+2420 and U+2421. > E082 LS1 Locking Shift 1 (ISO name for SO) > E083 LS0 Locking Shift 0 (ISO name for SI) Maybe these two could be considered glyph variants of U+240E and u+240F? Probably not, I suppose. > Hexadecimal byte values, 2 hex digits each. Like display controls, but for > all 256 8-bit byte values, showing the byte code in hexadecimal, rather > than the (context-dependent) name. For hex debugging (in terminal > emulators, line monitors, protocol analyzers, etc). Should be arranged > diagonally within the character cell as shown in Figure 5.1: Fair enough - but who are you to specify diagonality? These are just characters with the semantic meaning "Graphic representation of octet value xx". > E0F0 Reverse Question Mark DEC VTxxx, Wyse, Televideo (1) I would suggest U+FFFD for this. -- Kevin Bracey, Senior Software Engineer Acorn Computers Ltd Tel: +44 (0) 1223 725228 Acorn House, 645 Newmarket Road Fax: +44 (0) 1223 725328 Cambridge, CB5 8PB, United Kingdom WWW: http://www.acorn.co.uk/ 1-Oct-98 14:43:20-GMT,2442;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id KAA04241 for ; Thu, 1 Oct 1998 10:43:19 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id HAA47416 ; Thu, 1 Oct 1998 07:41:57 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA15592; Thu, 1 Oct 98 07:35:05 -0700 Message-Id: <9810011435.AA15592@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6030 (1998-10-01 14:34:45 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 07:34:43 -0700 (PDT) Subject: Re: Terminal Graphics Proposal > > Unicode already has a block of Control Pictures at U+2400 through U+2421, > > but (except for "NL" at U+2424) these go horizontally across the character > > cell, rather than diagonally, thus making them difficult to distinguish > > from normal alphanumeric text. A new, parallel block of C0 control > > pictures is needed in which the abbreviations are displayed diagonally. > > That's a glyph variation - the Unicode Standard explicitly states that > you can use whatever preferred glyph you like for these. Indeed, IIRC, > ISO 10646-1 has considerably different suggested glyphs for these characters. > My concern is that the pictures in the Unicode book go horizontally. Although I do not claim to be an expert on Unicode fonts, I have never seen one that implemented this block, so I don't actually know how it looks. However, I'd say that the horizontal arrangement would make it extremely difficult for the viewer to discern the cell boundaries, as in: NULSOHSTXETXEOTENQACKBELDELNAKSYNETBCANSUBESCCANACKSSASS3SPAEPACSISCI And thus, at minumum, the table in the book should be altered to show all control pictures arranged diagonally, and all future control picture additions should also be arranged that way. > > E0F0 Reverse Question Mark DEC VTxxx, Wyse, Televideo (1) > > I would suggest U+FFFD for this. > U+FFFD means "this character is not in Unicode" (or in this font), which is not quite the same meaning as "this character is illegal in this context" on the VT terminals. Anyway, reverse question mark is a regular glyph character on the Wyse and Televideo models. - Frank 1-Oct-98 16:24:10-GMT,921;000000000011 Return-Path: Received: from aleve.media.mit.edu (aleve.media.mit.edu [18.85.2.171]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id MAA03674 for ; Thu, 1 Oct 1998 12:24:05 -0400 (EDT) Received: from pinotnoir.media.mit.edu (nelson@pinotnoir.media.mit.edu [18.85.16.104]) by aleve.media.mit.edu (8.8.7/ML970927) with ESMTP id MAA00747; Thu, 1 Oct 1998 12:23:52 -0400 (EDT) Received: (from nelson@localhost) by pinotnoir.media.mit.edu (8.8.5/8.8.5) id MAA04913; Thu, 1 Oct 1998 12:23:52 -0400 Date: Thu, 1 Oct 1998 12:23:52 -0400 Message-Id: <199810011623.MAA04913@pinotnoir.media.mit.edu> From: nelson@media.mit.edu (Nelson Minar) To: Frank da Cruz Subject: Re: Terminal Graphics Proposal In-Reply-To: <9810010143.AA14122@unicode.org> References: <9810010143.AA14122@unicode.org> Wow, that was an impressive proposal. 1-Oct-98 16:59:50-GMT,2868;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id MAA14961 for ; Thu, 1 Oct 1998 12:59:49 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id JAA32536 ; Thu, 1 Oct 1998 09:58:25 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA16971; Thu, 1 Oct 98 09:48:27 -0700 Message-Id: <9810011648.AA16971@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Uml-Sequence: 6034 (1998-10-01 16:48:08 GMT) From: John Cowan Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 09:48:07 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Content-Transfer-Encoding: 7bit Frank da Cruz wrote: > More useful in a terminal emulator, however, is the ability to display the > the official abbreviation [1,18], or "name", of the control character in a > single cell. [...] > > Some control characters have two-character abbreviations (such as CR, LF, > HT, FF), while others are three characters (NUL, SOH, DC1, DLE). Some > terminals compress three-letter abbreviations to the two-character forms > shown in Table 4.2. All terminals, however, display the abbreviations > diagonally in the character cell, as shown in Figure 4.1. [...] > > Unicode already has a block of Control Pictures at U+2400 through U+2421, > but (except for "NL" at U+2424) these go horizontally across the character > cell, rather than diagonally, thus making them difficult to distinguish from > normal alphanumeric text. A new, parallel block of C0 control pictures is > needed in which the abbreviations are displayed diagonally. This reflects a failure to understand the semantics of the Control Pictures block, specifically the range U+2400 - U+214F, which is documented on page 6-84 of the Unicode Standard 2.0. # [F]or the control code graphics U+2400 -> U+241F only the # semantic is encoded in the Unicode Standard. This allwos a particular # application to use the graphic representation it prefers. # [...] The [code points U+2400 to U+241F] are not associated with # specific glyphs, but rather are available to encode any # desired pictorial representation of the given control code. The horizontal representations printed on page 7-188, therefore, are not standardized in any way. Diagonal representations would be entirely equivalent; the distinction is one of font only. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) 1-Oct-98 17:15:40-GMT,2779;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA19770 for ; Thu, 1 Oct 1998 13:15:38 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA28598 ; Thu, 1 Oct 1998 10:11:05 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA17056; Thu, 1 Oct 98 09:53:13 -0700 Message-Id: <9810011653.AA17056@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6035 (1998-10-01 16:51:54 GMT) From: Rick McGowan Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 09:51:51 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Frank da Cruz sent a long proposal. It looks like a pretty thorough analysis, though I've not made it all the way through. One thing leapt out that I thought I'd mention... > Unicode already has a block of Control Pictures at U+2400 through U+2421, > but (except for "NL" at U+2424) these go horizontally across the character > cell, rather than diagonally, thus making them difficult to distinguish > from normal alphanumeric text. A new, parallel block of C0 control > pictures is needed in which the abbreviations are displayed diagonally I think rather that the current control pictures are a SUGGESTION of the possible glyphs for particular functions. The glyphs for them even changed between Unicode 1.0 and 2.0! So I would have to seriously question adding a parallel set of pictures. Unless there is some need for having multiple, parallel representations for THE SAME CODE on the SAME TERMINAL, I don't see any point to adding several glyphic variations. Pick your glyphs and use the existing control pix for existing controls. Of course, there are a *lot* of controls, many control sets, and some degree of overlap, as Frank's proposal points out rather dramatically. I would suggest that he take up an attempt at serious unification of these things, and collect all of the wonderful data he's gathered into a "white paper" on how to use control pictures for what terminals, etc. With mapping tables, and a list of the minimum required additions to support full cross-mappings. This proposal contains a lot of data. It would be best to do as much unification work as possible up-front, rather than relying on UTC and/or WG2 to take it up. The proposal would stand a greater chance of success. If the committees look at it and say that it needs much work to clarify what can and cannot be unified, then they're less likely to act quickly. In my opinion. And the bibliography is impressive. Rick 1-Oct-98 17:24:04-GMT,1128;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA21767 for ; Thu, 1 Oct 1998 13:24:03 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA60476 ; Thu, 1 Oct 1998 10:20:31 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA17189; Thu, 1 Oct 98 10:03:08 -0700 Message-Id: <9810011703.AA17189@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6036 (1998-10-01 17:00:19 GMT) From: Rick McGowan Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 10:00:18 -0700 (PDT) Subject: Re: Terminal Graphics Proposal > I do not claim to be an expert on Unicode fonts, I have never seen one that > implemented this block, so I don't actually know how it looks. Frank -- Go out to http://www.indigo.ie/egt/celtscript/ and look for Everson Mono. There's a PS font that implements these, with completely different glyphs. Rick 1-Oct-98 18:29:33-GMT,6717;000000000011 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA09734 for ; Thu, 1 Oct 1998 14:29:31 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id LAA54084 ; Thu, 1 Oct 1998 11:19:28 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA17232; Thu, 1 Oct 98 10:06:03 -0700 Message-Id: <9810011706.AA17232@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Uml-Sequence: 6037 (1998-10-01 17:03:51 GMT) From: Paul Keinanen Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 10:03:50 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Kevin Bracey said most that I was planning to say about this interesting proposal, but here are some more observations. Frank da Cruz wrote: >Table 4.2: C0 Control Pictures > > Code Name 2X Code Name 2X > E000 NUL NU E010 DLE DL > E001 SOH SH E011 DC1 D1 > E002 STX SX E012 DC2 D2 > E003 ETX EX E013 DC3 D3 > E004 EOT ET E014 DC4 D4 > E005 ENQ EQ E015 NAK NK > E006 ACK AK E016 SYN SY > E007 BEL BL E017 ETB EB > E009 BS BS E018 CAN CN > E009 HT HT E019 EM EM > E00A LF LF E01A SUB SU > E00B VT VT E01B ESC EC > E00C FF FF E01C FS FS > E00D CR CR E01D GS GS > E00E SO SO E01E RS RS > E00F SI SI E01F US US > >There is little to gain by defining separate 2- and 3-character glyphs for >control characters that have 3-character names; therefore it is suggested >that the full abbreviation (from the Name column) be used, with the >characters arranged diagonally within each cell (rather than horizontally as >in the U+2400 block), and that the 2X column be ignored. As far as I know, the Unicode standard does not specify the writing direction or actual representation of these characters. I would think that the two or three character forms are just variations of the same glyph. To me, it would make perfectly sense for readability point of view to use e.g. AK (horisontally, diagonally or vertically spaced) for a very small font and use ACK for larger fonts with more available pixels. If all octet values (00 .. FF) are also going to be displayed, there might be some ambiguity with some of the two letter codes, e.g. FF, D1, D2, D3, D4, EB and EC, which should be noted in the actual font design. >C1 Control characters are specified in ISO-6429 and used in the VT220 >family of terminals [5] and the Wyse 370 [26], where they are represented >in the right half of the "display controls" font as shown in Table 4.3 (DEC >terminals use the full name, Wyse terminals use the 2X name). As with C0 >controls, the "name" is displayed diagonally within the character cell. >Unicode presently includes no C1 control pictures. Looking through various EBCDIC code pages (e.g. IBM278, IBM880) and other unnumbered sets it appears that these control codes are all also available in EBCDIC, but of course at different positions (e.g. IND at 0x24). Some references to these sets are "IBM NLS RM Vol2 SE09-8002-01, March 1990" and "IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987". Based on this observation, it is strange that the C0 control pictures are in the Unicode standard, but not the C1 control pictures. >Table 4.3: C1 Control Pictures >Note that three of the C1 control pictures are unassigned (the ones marked >by "(1)", that would be at U+E020, U+E021, and U+E039 if these were >assigned). These positions should be left vacant in case names are assigned >to these characters in a future revision of ISO 6429. In ISO 8859-1 these are listed as 80 PADDING CHARACTER (PAD) 81 HIGH OCTET PRESET (HOP) 99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) >Table 4.4 shows the names of control characters unique to EBCDIC (that is, >the ones it does not share with ASCII). There seems to be different names for the same EBCDIC control characters and some of these names are equivalent to the ASCII names. Just wondering what should be done to these control pictures ? Some examples below. > E082 LS1 Locking Shift 1 (ISO name for SO) > E083 LS0 Locking Shift 0 (ISO name for SI) > E084 IS4 ISO Name for FS: Information Separator 4 > E085 IS3 ISO Name for GS: Information Separator 3 > E086 IS2 ISO Name for RS: Information Separator 2 > E087 IS1 ISO Name for US: Information Separator 1 >5. HEX BYTES > >Hexadecimal byte values, 2 hex digits each. Like display controls, but for >all 256 8-bit byte values, showing the byte code in hexadecimal, rather than >the (context-dependent) name. For hex debugging (in terminal emulators, >line monitors, protocol analyzers, etc). Should be arranged diagonally >within the character cell as shown in Figure 5.1: These would be very nice :-). Note the possible ambiguity with some two character control pictures r.g. FF, EB etc. So special precautions should be taken when designing the fonts. >8. MISCELLANEOUS SINGLE-CELL GLYPHS >Notes: > (1) The reverse question is essential in VT terminal emulation, where it > indicates that an invalid code was received, or a parity or other > error was detected. It also stands for SUB and/or RS in Wyse display > controls mode, and is the glyph for 0xFF in the Televideo Multinational > Character Set [23]. And it it is also a glyph in the DG Special > Graphics Character Set [2]. Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no need to re-invent it. >9. UNFINISHED BUSINESS >No attempt was made to account for the many Viewdata, Videotex, Minitel, >NAPLPS, or other mosaic graphics character sets. These should be tackled, >if appropriate, by someone who knows something about them. And not forgetting the tele-text block characters on European TVs. With the introduction of TV cards for PCs that also contains a teletext decoder, so there is a need to display the text and block graphics on PC. As far as I remember, the block graphic format is more or less the same as Viewdata with 2 columns and 3 rows per character cell, thus requiring 64 glyphs. All in all a very interesting proposal. By using as much existing characters from current Unicode standard, i guess there would be a greater likelyhood of getting thing officially approved. Paul 1-Oct-98 19:21:23-GMT,4957;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id PAA23982; Thu, 1 Oct 1998 15:20:06 -0400 (EDT) Date: Thu, 1 Oct 98 15:20:04 EDT From: Frank da Cruz To: unicode@unicode.org Subject: Re: Terminal Graphics Proposal In-Reply-To: Your message of Thu, 1 Oct 1998 10:03:50 -0700 (PDT) Message-ID: > As far as I know, the Unicode standard does not specify the writing > direction or actual representation of [control pictures]. I would think that > the two or three character forms are just variations of the same glyph. > This seems to be the consensus, and the most prominent reaction to the proposal. Still, if I were a font maker working from the Unicode book, I'd probably copy the pictures in it, so again, I'd suggest the next edition show the characters diagonally within the cell, and the accompanying text (which if I can overlook, so can a font maker :-) point out the importance of visually preserving the character-cell boundaries by some means. The diagonal arrangement is used on all terminals I have seen that support display controls, so this would be the most obvious method. > If all octet values (00 .. FF) are also going to be displayed, there might > be some ambiguity with some of the two letter codes, e.g. FF, D1, D2, D3, > D4, EB and EC, which should be noted in the actual font design. > Good point. One path to disambiguation would be to show hex digits A-F in lower case. Sounds OK? > >C1 Control characters are specified in ISO-6429 and used in the VT220 > >family of terminals [5] and the Wyse 370 [26], where they are represented > >in the right half of the "display controls" font as shown in Table 4.3 (DEC > >terminals use the full name, Wyse terminals use the 2X name). As with C0 > >controls, the "name" is displayed diagonally within the character cell. > >Unicode presently includes no C1 control pictures. > > Looking through various EBCDIC code pages (e.g. IBM278, IBM880) and other > unnumbered sets it appears that these control codes are all also available > in EBCDIC, but of course at different positions (e.g. IND at 0x24). Some > references to these sets are "IBM NLS RM Vol2 SE09-8002-01, March 1990" and > "IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987". > Thanks for pointing this out -- I'll be sure to unify all duplicates in the next go-round. > In ISO 8859-1 these are listed as > > 80 PADDING CHARACTER (PAD) > 81 HIGH OCTET PRESET (HOP) > 99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) > I suppose that's a good enough source, though I wonder why they are not named in ISO 6429! > >Table 4.4 shows the names of control characters unique to EBCDIC (that > >is, the ones it does not share with ASCII). > > There seems to be different names for the same EBCDIC control characters > and some of these names are equivalent to the ASCII names. Just wondering > what should be done to these control pictures ? Some examples below. > In the spirit of unification, I would venture that if two different control characters have the same name, only one control picture is needed. > >Notes: > > (1) The reverse question is essential in VT terminal emulation... > > Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no > need to re-invent it. > But that one is upside down. The one I'm talking about is upright but flipped on its vertical axis. Clearly an important component of this proposal, before it reaches its final stage, is a collection of pictures of the proposed characters. I'll do my best to scan in the relevant pages from the many terminal manuals, for what it's worth -- many of them are crude and unclear to begin with, some of them are even hand-drawn. Others are wedded to dot matrices of specific dimensions and, in fact, are shown as large tty graphics. I wonder how one proceeds from such elusive sources to create a definitive picture of each character, and then to translate this into the style of a particular font. Oh well, not my problem :-) > All in all a very interesting proposal. By using as much existing characters > from current Unicode standard, i guess there would be a greater likelyhood > of getting thing officially approved. > And of course, many characters in many of these sets are indeed well covered by existing Unicode characters and so never appeared in the proposal in the first place. I considered fully enumerating each character set and noting which characters already did and did not have suitable Unicode equivalents, but that would have made the proposal much too long. Thanks to you and everyone else for the helpful and supportive comments. I think the next step will be to run a new draft (updated according to comments from this list) past the broad constituencies of some of the terminals it treats, for which there are several well-suited newsgroups. Thanks again! - Frank 1-Oct-98 19:58:53-GMT,5362;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA06792 for ; Thu, 1 Oct 1998 15:58:51 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id MAA36108 ; Thu, 1 Oct 1998 12:59:01 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA19623; Thu, 1 Oct 98 12:35:11 -0700 Message-Id: <9810011935.AA19623@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6042 (1998-10-01 19:32:37 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 12:32:36 -0700 (PDT) Subject: Re: Terminal Graphics Proposal > As far as I know, the Unicode standard does not specify the writing > direction or actual representation of [control pictures]. I would think that > the two or three character forms are just variations of the same glyph. > This seems to be the consensus, and the most prominent reaction to the proposal. Still, if I were a font maker working from the Unicode book, I'd probably copy the pictures in it, so again, I'd suggest the next edition show the characters diagonally within the cell, and the accompanying text (which if I can overlook, so can a font maker :-) point out the importance of visually preserving the character-cell boundaries by some means. The diagonal arrangement is used on all terminals I have seen that support display controls, so this would be the most obvious method. > If all octet values (00 .. FF) are also going to be displayed, there might > be some ambiguity with some of the two letter codes, e.g. FF, D1, D2, D3, > D4, EB and EC, which should be noted in the actual font design. > Good point. One path to disambiguation would be to show hex digits A-F in lower case. Sounds OK? > >C1 Control characters are specified in ISO-6429 and used in the VT220 > >family of terminals [5] and the Wyse 370 [26], where they are represented > >in the right half of the "display controls" font as shown in Table 4.3 (DEC > >terminals use the full name, Wyse terminals use the 2X name). As with C0 > >controls, the "name" is displayed diagonally within the character cell. > >Unicode presently includes no C1 control pictures. > > Looking through various EBCDIC code pages (e.g. IBM278, IBM880) and other > unnumbered sets it appears that these control codes are all also available > in EBCDIC, but of course at different positions (e.g. IND at 0x24). Some > references to these sets are "IBM NLS RM Vol2 SE09-8002-01, March 1990" and > "IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987". > Thanks for pointing this out -- I'll be sure to unify all duplicates in the next go-round. > In ISO 8859-1 these are listed as > > 80 PADDING CHARACTER (PAD) > 81 HIGH OCTET PRESET (HOP) > 99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) > I suppose that's a good enough source, though I wonder why they are not named in ISO 6429! > >Table 4.4 shows the names of control characters unique to EBCDIC (that > >is, the ones it does not share with ASCII). > > There seems to be different names for the same EBCDIC control characters > and some of these names are equivalent to the ASCII names. Just wondering > what should be done to these control pictures ? Some examples below. > In the spirit of unification, I would venture that if two different control characters have the same name, only one control picture is needed. > >Notes: > > (1) The reverse question is essential in VT terminal emulation... > > Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no > need to re-invent it. > But that one is upside down. The one I'm talking about is upright but flipped on its vertical axis. Clearly an important component of this proposal, before it reaches its final stage, is a collection of pictures of the proposed characters. I'll do my best to scan in the relevant pages from the many terminal manuals, for what it's worth -- many of them are crude and unclear to begin with, some of them are even hand-drawn. Others are wedded to dot matrices of specific dimensions and, in fact, are shown as large tty graphics. I wonder how one proceeds from such elusive sources to create a definitive picture of each character, and then to translate this into the style of a particular font. Oh well, not my problem :-) > All in all a very interesting proposal. By using as much existing characters > from current Unicode standard, i guess there would be a greater likelyhood > of getting thing officially approved. > And of course, many characters in many of these sets are indeed well covered by existing Unicode characters and so never appeared in the proposal in the first place. I considered fully enumerating each character set and noting which characters already did and did not have suitable Unicode equivalents, but that would have made the proposal much too long. Thanks to you and everyone else for the helpful and supportive comments. I think the next step will be to run a new draft (updated according to comments from this list) past the broad constituencies of some of the terminals it treats, for which there are several well-suited newsgroups. Thanks again! - Frank 1-Oct-98 19:58:53-GMT,1419;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA06793 for ; Thu, 1 Oct 1998 15:58:51 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id MAA16728 ; Thu, 1 Oct 1998 12:57:48 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA19780; Thu, 1 Oct 98 12:42:14 -0700 Message-Id: <9810011942.AA19780@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Uml-Sequence: 6043 (1998-10-01 19:39:42 GMT) From: John Cowan Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 12:39:38 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Content-Transfer-Encoding: 7bit Paul Keinanen wrote: > Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no need > to re-invent it. No, that is the inverted (head over heels) question mark. What is being described here is a reversed (left-to-right) question mark. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) 1-Oct-98 21:32:59-GMT,2116;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA03320 for ; Thu, 1 Oct 1998 17:32:57 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA17196 ; Thu, 1 Oct 1998 14:28:31 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA20910; Thu, 1 Oct 98 13:56:11 -0700 Message-Id: <9810012056.AA20910@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6046 (1998-10-01 20:54:15 GMT) From: Rick McGowan Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 13:54:14 -0700 (PDT) Subject: Re: Terminal Graphics Proposal > Still, if I were a font maker working from the Unicode book, I'd > probably copy the pictures in it, so again, I'd suggest the next edition > show the characters diagonally within the cell, and the accompanying text > (which if I can overlook, so can a font maker :-) Yes, yes, but... People should read, Grasshopper. It is that for which we write. People who do not "R.T.F.M." waste everyone else's time asking questions that are answered in "T.F.M." The Internet is absolutely RAMPANT with that behavior and it always has been. (Well, society itself, for that matter, is full of such behavior, so I shouldn't knock the net...) It would be a poor, poor font designer indeed who, having NOT Read The Formidable Manual, tried to implement a font for the control pictures. I would not pity such a person, only the victims of the resulting "font". > I wonder how one proceeds from such elusive sources to create a definitive > picture of each character, and then to translate this into the style of a > particular font. Oh well, not my problem :-) Actually, Grasshopper... it *is* your problem. Nobody else is going to do this for you. I suggest you gird your loins and heave to. This is a chance for you to make a useful and lasting contribution to History. Cheerily, Rick 1-Oct-98 21:53:31-GMT,1581;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA06957 for ; Thu, 1 Oct 1998 17:53:30 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA11468 ; Thu, 1 Oct 1998 14:53:30 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA21562; Thu, 1 Oct 98 14:41:30 -0700 Message-Id: <9810012141.AA21562@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Uml-Sequence: 6047 (1998-10-01 21:40:59 GMT) From: Asmus Freytag Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 14:40:56 -0700 (PDT) Subject: Re: Terminal Graphics Proposal >And thus, at minumum, the table in the book should be altered to show all >control pictures arranged diagonally, and all future control picture additions >should also be arranged that way. We are looking into this for Unicode 3.0. Although the mail discussion makes clear that the distinction between characters and glyphs is widely known, it makes no sense to depart from the established use in the one area the characters are intended for! Since the two glyph forms are equivalent (i.e. there's no question of changing the identity of the characters) such a change is editorial in nature. For what it's worth, ISO 10646 uses the diagonal forms (although incorrectly in a roman type face). A./ 1-Oct-98 22:06:05-GMT,1299;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id SAA08132 for ; Thu, 1 Oct 1998 18:06:03 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id PAA40338 ; Thu, 1 Oct 1998 15:05:50 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA21781; Thu, 1 Oct 98 14:52:06 -0700 Message-Id: <9810012152.AA21781@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Uml-Sequence: 6048 (1998-10-01 21:50:21 GMT) From: Asmus Freytag Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 14:50:20 -0700 (PDT) Subject: Re: Terminal Graphics Proposal >> >Notes: >> > (1) The reverse question is essential in VT terminal emulation... >> >> Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no >> need to re-invent it. >> >But that one is upside down. The one I'm talking about is upright but >flipped on its vertical axis. > This important character is already on the list of characters to be added in one the coming amendments in ISO 10646. A./ 1-Oct-98 23:12:25-GMT,1652;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id TAA15789 for ; Thu, 1 Oct 1998 19:12:24 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id QAA65550 ; Thu, 1 Oct 1998 16:11:33 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA22944; Thu, 1 Oct 98 16:06:19 -0700 Message-Id: <9810012306.AA22944@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6049 (1998-10-01 23:05:56 GMT) From: kenw@sybase.com (Kenneth Whistler) Reply-To: unicode@unicode.org To: Unicode List Cc: kenw@sybase.com Date: Thu, 1 Oct 1998 16:05:51 -0700 (PDT) Subject: Re: Terminal Graphics Proposal (reverse QMark) > > >> >Notes: > >> > (1) The reverse question is essential in VT terminal emulation... > >> > >> Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no > >> need to re-invent it. > >> > >But that one is upside down. The one I'm talking about is upright but > >flipped on its vertical axis. > > > > This important character is already on the list of characters to be added > in one the coming amendments in ISO 10646. > As Asmus mentioned, this one is already on its way. It is encoded in Amendment 18 to 10646, which is just entering its last round of ballotting: U+2426 SYMBOL FOR SUBSTITUTE FORM TWO with the requisite shape of the reversed question mark. This character is derived from ISO 2047, also shows up in DIN 66 213, and in various terminal emulations. --Ken 2-Oct-98 0:27:41-GMT,1442;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id UAA23786 for ; Thu, 1 Oct 1998 20:27:41 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id RAA31536 ; Thu, 1 Oct 1998 17:27:33 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA24011; Thu, 1 Oct 98 17:17:58 -0700 Message-Id: <9810020017.AA24011@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Uml-Sequence: 6054 (1998-10-02 00:16:05 GMT) From: Markus Kuhn Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 17:16:04 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Paul Keinanen wrote on 1998-10-01 17:03 UTC: > In ISO 8859-1 these are listed as > > 80 PADDING CHARACTER (PAD) > 81 HIGH OCTET PRESET (HOP) > > 99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) Are you sure about the source? Last time I looked into ISO/IEC 8859-1:1986(E), it was certainly free of any control characters. ISO 8859 defines only graphical characters. What exactly is your source on this? Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: 2-Oct-98 0:48:25-GMT,2800;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id UAA27172 for ; Thu, 1 Oct 1998 20:48:24 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id RAA11956 ; Thu, 1 Oct 1998 17:48:34 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA23953; Thu, 1 Oct 98 17:15:49 -0700 Message-Id: <9810020015.AA23953@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Uml-Sequence: 6053 (1998-10-02 00:15:13 GMT) From: Markus Kuhn Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 17:15:12 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Frank da Cruz wrote on 1998-10-01 14:34 UTC: > My concern is that the pictures in the Unicode book go horizontally. Not much love went into the U+24XX glyphs used to print Unicode 2.0. The OCR symbols look quite strange as well. Unicode is a character set, not a font. Keeping things readable is the duty of the font designer. Of course most good fonts will have the Control Pictures with diagonal letters. The ISO 10646-1 standard shows them all nicely diagonally. It is a good idea for font designers to have BOTH the Unicode 2.0 and the ISO 10646 standard on their desk, to see a few glyph variations as the two standards were printed using different fonts. > Although > I do not claim to be an expert on Unicode fonts, I have never seen one that > implemented this block, so I don't actually know how it looks. One X11 ISO 10646-1 font that implements this block is available from http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz See the included README file for instructions on how to have a quick look at it with xfd. I don't claim that the control pictures in there are extremely beautiful (doing ENQ in a 6x13 matrix is quite challenging), but I think it is quite readable. > However, I'd > say that the horizontal arrangement would make it extremely difficult for the > viewer to discern the cell boundaries, as in: > > NULSOHSTXETXEOTENQACKBELDELNAKSYNETBCANSUBESCCANACKSSASS3SPAEPACSISCI > > And thus, at minumum, the table in the book should be altered to show all > control pictures arranged diagonally, and all future control picture additions > should also be arranged that way. I agree that the glyphs used to print the ISO 10646-1 standard are much better here than those used in the Unicode 2.0 standard for the U+24XX range. Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: 2-Oct-98 5:47:14-GMT,2024;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id BAA03026 for ; Fri, 2 Oct 1998 01:47:13 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id WAA11502 ; Thu, 1 Oct 1998 22:47:21 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA25821; Thu, 1 Oct 98 22:41:05 -0700 Message-Id: <9810020541.AA25821@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Uml-Sequence: 6056 (1998-10-02 05:40:41 GMT) From: Paul Keinanen Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 22:40:39 -0700 (PDT) Subject: Re: Terminal Graphics Proposal At 12:32 1.10.1998 -0700, Frank da Cruz wrote: >> If all octet values (00 .. FF) are also going to be displayed, there might >> be some ambiguity with some of the two letter codes, e.g. FF, D1, D2, D3, >> D4, EB and EC, which should be noted in the actual font design. >> >Good point. One path to disambiguation would be to show hex digits A-F in >lower case. Sounds OK? I weas also thinking about that, but since the characters are really small to begin with, trying to make them lower case on a low resolution matrix would make them even harder to read. >> >Notes: >> > (1) The reverse question is essential in VT terminal emulation... >> >> Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no >> need to re-invent it. >> >But that one is upside down. The one I'm talking about is upright but >flipped on its vertical axis. Sorry about that, I did not read your text thoroughly enough. In all those DEC and other systems I have used, the inverted question mark (nmot the reverse question mark) has been used for (parity etc.) error indication. I assumed, incorrectly, that you were refering to this usage. Paul 2-Oct-98 5:53:42-GMT,2125;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id BAA03730 for ; Fri, 2 Oct 1998 01:53:42 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id WAA09530 ; Thu, 1 Oct 1998 22:53:47 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA25825; Thu, 1 Oct 98 22:41:06 -0700 Message-Id: <9810020541.AA25825@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Uml-Sequence: 6057 (1998-10-02 05:40:51 GMT) From: Paul Keinanen Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 1 Oct 1998 22:40:50 -0700 (PDT) Subject: Re: Terminal Graphics Proposal At 17:16 1.10.1998 -0700, Markus Kuhn wrote: >Paul Keinanen wrote on 1998-10-01 17:03 UTC: >> In ISO 8859-1 these are listed as >> >> 80 PADDING CHARACTER (PAD) >> 81 HIGH OCTET PRESET (HOP) >> >> 99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) > >Are you sure about the source? Last time I looked into >ISO/IEC 8859-1:1986(E), it was certainly free of any control >characters. ISO 8859 defines only graphical characters. >What exactly is your source on this? So that explains why ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT list no code points in the C0 and C1 range. Then I am just wondering why ftp://dkuug.dk/i18n/charmaps/CP819 (alias Latin1 alias ISO_8859-1:1987) lists /x80 PADDING CHARACTER (PAD) /x81 HIGH OCTET PRESET (HOP) /x99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) and ftp://dkuug.dk/i18n/charmaps.646/ISO_8859-1:1987 lists the same code point values for these control characters /d128 /d129 /d153 So I just wonder, where they at dkuug.dk/i18n have taken these C0 and C1 codes from, unfortunately these tables did not contain any references (as did most EBCDIC tables). Paul 2-Oct-98 9:37:48-GMT,1880;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id FAA07143 for ; Fri, 2 Oct 1998 05:37:47 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id CAA14408 ; Fri, 2 Oct 1998 02:37:14 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA26800; Fri, 2 Oct 98 01:54:04 -0700 Message-Id: <9810020854.AA26800@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Uml-Sequence: 6061 (1998-10-02 08:51:08 GMT) From: Michael Everson Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 2 Oct 1998 01:51:01 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id FAA07143 Ar 17:16 -0700 1998-10-01, scríobh Markus Kuhn: >Paul Keinanen wrote on 1998-10-01 17:03 UTC: >> In ISO 8859-1 these are listed as >> >> 80 PADDING CHARACTER (PAD) >> 81 HIGH OCTET PRESET (HOP) >> >> 99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) > >Are you sure about the source? Last time I looked into >ISO/IEC 8859-1:1986(E), it was certainly free of any control >characters. ISO 8859 defines only graphical characters. >What exactly is your source on this? I am not following the Terminal Graphics Proposal thread in great detail because I think Elvish more relevant to my work :-) but I would like to say that I hope lots of hardcopy examples will be forwarded to WG2 so that we who are not so expert in the field can evaluate it appropriately. Cf. the Western Musical Symbols or Syriac proposals. Michael Everson PS. Yes, I would make TTFs for them if necessary. 2-Oct-98 11:58:41-GMT,1798;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id HAA15340 for ; Fri, 2 Oct 1998 07:58:41 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id EAA13470 ; Fri, 2 Oct 1998 04:53:01 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA29143; Fri, 2 Oct 98 04:44:36 -0700 Message-Id: <9810021144.AA29143@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Uml-Sequence: 6067 (1998-10-02 11:44:14 GMT) From: Kevin Bracey Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 2 Oct 1998 04:44:12 -0700 (PDT) Subject: Re: Terminal Graphics Proposal In message <9810020026.AA24186@unicode.org> Markus Kuhn wrote: > > Unicode is a character set, not a font. Keeping things readable is the > duty of the font designer. Of course most good fonts will have the Control > Pictures with diagonal letters. The ISO 10646-1 standard shows them > all nicely diagonally. It is a good idea for font designers to have > BOTH the Unicode 2.0 and the ISO 10646 standard on their desk, to see a few > glyph variations as the two standards were printed using different fonts. > I heartily concur with this. IMHO, most of ISO 10646-1's glyphs are a lot better than the Unicode Standard's. -- Kevin Bracey, Senior Software Engineer Acorn Computers Ltd Tel: +44 (0) 1223 725228 Acorn House, 645 Newmarket Road Fax: +44 (0) 1223 725328 Cambridge, CB5 8PB, United Kingdom WWW: http://www.acorn.co.uk/ 2-Oct-98 15:07:41-GMT,2466;000000000001 Return-Path: Received: from www.im.se (fw.im.se [193.14.22.222]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id LAA25411 for ; Fri, 2 Oct 1998 11:07:37 -0400 (EDT) Received: from imhps.im.se (imhps.im.se [192.36.35.5]) by www.im.se (8.8.7/8.8.7) with ESMTP id QAA28984; Fri, 2 Oct 1998 16:48:31 +0200 (METDST) Received: from msxsth1.im.se by imhps.im.se (1.37.109.16/IM-3.12) id AA088490748; Fri, 2 Oct 1998 17:05:48 +0200 Received: by msxsth1 with Internet Mail Service (5.5.2232.9) id ; Fri, 2 Oct 1998 17:03:39 +0200 Message-Id: From: Karlsson Kent - keka To: "'Paul Keinanen'" , "'Rick McGowan'" , "'Frank da Cruz'" , "'Markus Kuhn'" , "'Ken Whistler'" Cc: "'Asmus Freytag'" , "'Kevin Bracey'" , "'John Cowan'" Subject: RE: Terminal Graphics Proposal Date: Fri, 2 Oct 1998 17:03:15 +0200 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain; charset="iso-8859-1" I'm ***not*** REALLY interested in the control code display-instead-of-do characters, since I find them to be a thing of the past* (no flames, please, I alredy know that some of you disagree). And I know TUS says one can use ANY (in some way appropriate) glyps for them. It still disturbs me that thay have no compatibility decompositions. (Compare the decompositons for some characters.) The glyphs for these (nonce, imho) symbol characters are still fairly fixed to actually be a short (2-3) sequence of letters/digits. I think it would be reasonable to have compatibility decompositions for these characters too. This would affect collation also: they would be sorted according to the constituent letters (which is what I, at least, would expect). I probably should not say this, but... If you are abolutely hardbent on having symbols for control codes, there should be some for the Unicode control codes too (like paragraph separator, left-to-right-mark, etc.) They need not be constructed from letters... R. /kent k *(though the newly suggested hexadecimal-digit-pair display ones might continue to be useful; though hexadecimal digit quadruples would fill an entier plane and more! ;-) 2-Oct-98 16:42:26-GMT,2281;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id MAA23664; Fri, 2 Oct 1998 12:41:04 -0400 (EDT) Date: Fri, 2 Oct 1998 12:41:04 -0400 (EDT) From: Frank da Cruz Message-Id: <199810021641.MAA23664@watsun.cc.columbia.edu> To: unicode@unicode.org Subject: Re: Terminal Graphics Proposal In-Reply-To: Your message of Fri, 2 Oct 1998 01:51:01 -0700 (PDT) Ar Fri, 2 Oct 1998 01:51:01 -0700 (PDT) scríobh Michael Everson: > ... I hope lots of hardcopy examples will be forwarded to WG2 so that we > who are not so expert in the field can evaluate it appropriately. > I'll be happy to provide copies of the character set table from all the relevant manuals. How do I forward them to WG2? > PS. Yes, I would make TTFs for them if necessary. > What's a TTF? You mean Web-viewable glyph tables like on your website? Thanks! - Frank P.S. By the way, I realize some people find this focus on arcane, "obsolete", ""legacy"" technology amusing, but it might have certain unanticipated benefits. For historic or scholarly purposes, the UTC has an interest in encoding scripts that are no longer in active use; one might view these glyphs in the same way. I am always amazed by the vigor with which the history of computing is discarded and wiped out on a continuing basis. Computing is quite likely to dominate human life from now on; some day everyone will look around and wonder how it all happened, and nobody will know. At least now (I hope) we'll be able to publish works -- electronic or otherwise -- in a Unicode font, illustrating how people used computers in ancient times (the 1970s and 80s), for the continued amusement of generations to come. P.P.S. Those interested in preserving the signs and symbols of bygone eras of computing might also want to take a look at Fred Hoyle's book, The Black Cloud, circa 1954, which I read a long time ago but don't have any more. As I recall, it included fragments of computer programs written in the strange punch-card symbols of the time -- lozenges, etc -- which I dimly recall from my youthful experiences with IBM EAM equipment. Does anyone have a copy handy? I wonder if it can be printed in Unicode; perhaps here is fodder for another fun proposal... 2-Oct-98 17:29:14-GMT,2329;000000000001 Return-Path: Received: from inergen.sybase.com (inergen.sybase.com [192.138.151.43]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA07088 for ; Fri, 2 Oct 1998 13:29:12 -0400 (EDT) Received: from smtp1.sybase.com (sybgate.sybase.com [130.214.220.35]) by inergen.sybase.com (8.8.4/8.8.4) with SMTP id KAA10504; Fri, 2 Oct 1998 10:26:44 -0700 (PDT) Received: from birdie.sybase.com by smtp1.sybase.com (4.1/SMI-4.1/SybH3.5-030896) id AA25742; Fri, 2 Oct 98 10:25:29 PDT Received: by birdie.sybase.com (5.x/SMI-SVR4/SybEC3.5) id AA16847; Fri, 2 Oct 1998 10:25:23 -0700 Date: Fri, 2 Oct 1998 10:25:23 -0700 From: kenw@sybase.com (Kenneth Whistler) Message-Id: <9810021725.AA16847@birdie.sybase.com> To: keka@im.se Subject: RE: Terminal Graphics Proposal Cc: keinanen@sci.fi, rmcgowan@apple.com, fdc@watsun.cc.columbia.edu, Markus.Kuhn@cl.cam.ac.uk, kenw@sybase.com, asmusf@ix.netcom.com, kbracey@acorn.com, cowan@locke.ccil.org X-Sun-Charset: US-ASCII Kent said: > I'm ***not*** REALLY interested in the control code display-instead-of-do > characters, since I find them to be a thing of the past* (no flames, please, > I alredy know that some of you disagree). And I know TUS says one can use > ANY (in some way appropriate) glyps for them. > > It still disturbs me that thay have no compatibility decompositions. > (Compare the decompositons for some characters.) I disagree completely on this. Proposing compatibility decompositions for glyphs which have arbitrary content (such as these) confuses apples and oranges. These 32 glyphs for control codes could contain 3-letter acronyms, or 2-letter acronyms, or could be substituted out for something completely different, such as the ISO 2047 set. Compatibility decompositions in that context would be completely misleading. > The glyphs for > these (nonce, imho) symbol characters are still fairly fixed to actually be > a short (2-3) sequence of letters/digits. I think it would be reasonable to > have compatibility decompositions for these characters too. This would > affect collation also: they would be sorted according to the constituent > letters (which is what I, at least, would expect). Again, I disagree. These should *not* sort as "NUL", "SOH", etc. --Ken 2-Oct-98 19:42:24-GMT,854;000000000001 Return-Path: Received: from lotus.kanji.com (lotus.kanji.com [206.230.42.4]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA20222 for ; Fri, 2 Oct 1998 15:42:17 -0400 (EDT) Received: by kanji.com via sendmail from stdin id (Debian Smail3.2.0.101) for fdc@watsun.cc.columbia.edu; Fri, 2 Oct 1998 13:30:46 -0600 (MDT) Message-Id: Date: Fri, 2 Oct 1998 13:30:46 -0600 (MDT) From: Jon Babcock To: fdc@watsun.cc.columbia.edu Subject: The Black Hole _The Black Hole_ by Fred Hoyle(ISBN: 0899683444) is available on www.amazon.com for $26.96 plus shipping. Just noticed your note in a unicode ML msg and thought you might be interested. (I've no connection with Amazon, btw.) Jon -- Jon Babcock 2-Oct-98 20:25:29-GMT,1979;000000000011 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id QAA02116 for ; Fri, 2 Oct 1998 16:25:25 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id MAA08278 ; Fri, 2 Oct 1998 12:15:57 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA02257; Fri, 2 Oct 98 11:15:01 -0700 Message-Id: <9810021815.AA02257@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Uml-Sequence: 6079 (1998-10-02 18:13:04 GMT) From: Michael Everson Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 2 Oct 1998 11:13:03 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id QAA02116 Ar 09:54 -0700 1998-10-02, scríobh Frank da Cruz: >What's a TTF? You mean Web-viewable glyph tables like on your website? A True Type Font. >- Frank > >P.S. By the way, I realize some people find this focus on arcane, >"obsolete", ""legacy"" technology amusing, but it might have certain >unanticipated benefits. For historic or scholarly purposes, the UTC has an >interest in encoding scripts that are no longer in active use; one might >view these glyphs in the same way. I am always amazed by the vigor with >which the history of computing is discarded and wiped out on a continuing >basis. I for my part do NOT!!!! want to see these terminal graphic things in the BMP. They belong in Plane 1. -- Michael Everson, Everson Gunn Teoranta ** http://www.indigo.ie/egt 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Guthán: +353 1 478-2597 ** Facsa: +353 1 478-2597 (by arrangement) 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire 2-Oct-98 22:55:26-GMT,1768;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id SAA02054 for ; Fri, 2 Oct 1998 18:55:24 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id PAA19284 ; Fri, 2 Oct 1998 15:48:52 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA04658; Fri, 2 Oct 98 14:30:05 -0700 Message-Id: <9810022130.AA04658@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6084 (1998-10-02 21:24:10 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 2 Oct 1998 14:24:06 -0700 (PDT) Subject: Re: Terminal Graphics Proposal > I for my part do NOT!!!! want to see these terminal graphic things in the > BMP. They belong in Plane 1. > Perhaps, but as the lawyers say, the door was opened by the characters already included in blocks at U+2400, U+2500, U+2600, and U+2700. In any case, the intention here is to help Unicode become somewhat more "technology-neutral". Terminal emulation is a fact of life, and important to a significant number of serious and productive computer users; why should its special glyphs be excluded from the same status enjoyed by dingbats and astrological signs? Seriously, I think terminal emulation is far more mainstream than many Unicoders seem to think, and I hope it is a worthy goal to welcome this consituency into the fold, thus allowing them to continue their work in their accustomed manner, rather than according to the dictates of haute couture, with the added bonus of uniform access to the world's writing systems. - Frank 2-Oct-98 23:32:19-GMT,4256;000000000001 Return-Path: Received: from orpheus.amdahl.com (orpheus.amdahl.com [159.199.101.3]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with SMTP id TAA05771 for ; Fri, 2 Oct 1998 19:32:18 -0400 (EDT) Received: from minerva.amdahl.com([129.212.33.25]) (3880 bytes) by orpheus.amdahl.com via sendmail with P:smtp/R:match-mx-hosts/T:smtp (sender: ) id for ; Fri, 2 Oct 1998 16:28:58 -0700 (PDT) (Smail-3.2.0.102 1998-Aug-2 #1 built 1998-Aug-14) Received: from libra by minerva.amdahl.com with smtp (Smail3.1.29.1 #5) id m0zPEc1-0001AFC; Fri, 2 Oct 98 16:27 PDT Message-Id: From: "Tony Harminc" To: Frank da Cruz , unicode@unicode.org Date: Fri, 2 Oct 1998 19:29:05 -0400 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: Terminal Graphics Proposal Priority: normal In-reply-to: <9810010137.AA14105@unicode.org> X-mailer: Pegasus Mail for Win32 (v3.01a) On 30 Sep 98, at 18:35, Frank da Cruz wrote: > 8. MISCELLANEOUS SINGLE-CELL GLYPHS > > Table 8.1: Miscellaneous Single-Cell Terminal Glyphs > > Code Description Reference > E0F0 Reverse Question Mark DEC VTxxx, Wyse, Televideo (1) > E0F1 Box with X inside DG Math 06/07, GCGID SP500000 > E0F2 Human stick figure with hat SNI Facet 04/14 > E0F3 Clock (with hands at 3:00) SNI Klammern 05/01 > E0F4 Overscore asterisk IBM 3270 > E0F5 Overscore semicolon IBM 3270 > E0F6 Padlock (keyboard locked) IBM 3270 This last one introduces a bit of a problem, I think. It differs from all other characters mentioned in that it is never displayed in the data portion of a 3270 screen, but rather occurs "below the line" as an indication of keyboard status. If it is to be included, then there are several more uniquely 3270 characters that can be seen below the line; I don't know formal names for them, and indeed they generally don't appear in IBM's CDRA documents. Roughly, they are: Outline up arrow (indication of upshifted condition) Outline down arrow (indication of downshifted (override) condition) Key (indication of terminal physically locked (I think this may be what is meant by E0F6 above) Stick figure (terminal is connected to "operator" (really to a supervisory program)) Solid block (terminal is connected to "application program") 4 in square box (terminal is connected to 3274-type control unit) 6 in square box (terminal is connected to 3276-type control unit) Lightning bolt (communication failure) Rectangle with slash (machine check) Printer symbol with slash (associated printer has an error condition) and most problematic: Left half of clock (these two form a doublewidth clock (set at 6:10 Right half of clock or 2:30, though I'm sure the time would be considered a matter of glyph - indeed at least one non-IBM manufacturer's clock symbol was 5:50 or 10:30) Now it's entirely reasonable to argue that all the above (and I may have forgotten a couple) have no business being encoded at all. Indeed some terminal emulators use graphical means to produce the symbols. In any case there is nothing in the 3270 architecture that specifies use of any of them, and an emulator program can use other means to communicate the same information to the user. However a number of Windows-based emulators I know do use glyphs encoded in a font that they supply to produce at least a subset of the symbols. (It should be pointed out that a number of "ordinary" glyphs can also appear below the line, but I can think of no reason not to unify them with the upper case letters, numbers, and so on.) That IBM doesn't include them in CDRA may be a good reason to exclude them from this proposal. But they can be genuinely useful for writers of emulators. What to do ? And how many clocks and stick figures is it reasonable to encode ? Tony Harminc 3-Oct-98 9:44:33-GMT,2720;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id FAA27978 for ; Sat, 3 Oct 1998 05:44:32 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id CAA46594 ; Sat, 3 Oct 1998 02:44:30 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA07667; Sat, 3 Oct 98 02:33:15 -0700 Message-Id: <9810030933.AA07667@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Uml-Sequence: 6089 (1998-10-03 09:32:58 GMT) From: Michael Everson Reply-To: unicode@unicode.org To: Unicode List Date: Sat, 3 Oct 1998 02:32:57 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id FAA27978 Ar 14:24 -0700 1998-10-02, scríobh Frank da Cruz: >> I for my part do NOT!!!! want to see these terminal graphic things in the >> BMP. They belong in Plane 1. >> >Perhaps, but as the lawyers say, the door was opened by the characters >already included in blocks at U+2400, U+2500, U+2600, and U+2700. I will not support their inclusion in the BMP unless there is a really good reason. (I'd still make TTFs if necessary though, because I am a loon.) The list of characters I saw was rather long. >In any >case, the intention here is to help Unicode become somewhat more >"technology-neutral". The UCS is going to be used for centuries. Do we really think VT100 emulation will be needed via BMP support? >Terminal emulation is a fact of life, and important >to a significant number of serious and productive computer users; why should >its special glyphs be excluded from the same status enjoyed by dingbats and >astrological signs? Because the dingbats are used in typography, and astrological signs have a definite semantic. >Seriously, I think terminal emulation is far more >mainstream than many Unicoders seem to think, and I hope it is a worthy goal >to welcome this consituency into the fold, thus allowing them to continue >their work in their accustomed manner, rather than according to the dictates >of haute couture, with the added bonus of uniform access to the world's >writing systems. I don't see the argument for BMP here. -- Michael Everson, Everson Gunn Teoranta ** http://www.indigo.ie/egt 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Guthán: +353 1 478-2597 ** Facsa: +353 1 478-2597 (by arrangement) 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire 3-Oct-98 21:07:30-GMT,1792;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA02053 for ; Sat, 3 Oct 1998 17:07:29 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA55830 ; Sat, 3 Oct 1998 14:02:52 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA09212; Sat, 3 Oct 98 13:52:08 -0700 Message-Id: <9810032052.AA09212@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Uml-Sequence: 6091 (1998-10-03 20:51:56 GMT) From: Elliotte Rusty Harold Reply-To: unicode@unicode.org To: Unicode List Date: Sat, 3 Oct 1998 13:51:54 -0700 (PDT) Subject: Re: Terminal Graphics Proposal > E0B4 Latin capital letter H with bar SNI Math 04/05 (2) > E0B5 Latin small letter h with bar SNI Math 04/06 (2) Is E0B5 supposed to be Planck's constant over 2*PI? If so, it's encoded at 210F, 0127, and 045B. And your E0B4 is at 0126. +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@sunsite.unc.edu | Writer/Programmer | +-----------------------+------------------------+-------------------+ | XML: Extensible Markup Language (IDG Books 1998) | | http://www.amazon.com/exec/obidos/ISBN=0764531999/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://sunsite.unc.edu/javafaq/ | | Read Cafe con Leche for XML News: http://sunsite.unc.edu/xml/ | +----------------------------------+---------------------------------+ 4-Oct-98 21:50:26-GMT,3358;000000000001 Return-Path: Received: from www.im.se (fw.im.se [193.14.22.222]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA11663 for ; Sun, 4 Oct 1998 17:50:19 -0400 (EDT) Received: from imhps.im.se (imhps.im.se [192.36.35.5]) by www.im.se (8.8.7/8.8.7) with ESMTP id XAA24796; Sun, 4 Oct 1998 23:33:11 +0200 (METDST) Received: from msxsth1.im.se by imhps.im.se (1.37.109.16/IM-3.12) id AA239507835; Sun, 4 Oct 1998 23:50:35 +0200 Received: by msxsth1 with Internet Mail Service (5.5.2232.9) id ; Sun, 4 Oct 1998 23:48:37 +0200 Message-Id: From: Karlsson Kent - keka To: "'kenw@sybase.com'" Cc: keinanen@sci.fi, rmcgowan@apple.com, fdc@watsun.cc.columbia.edu, Markus.Kuhn@cl.cam.ac.uk, asmusf@ix.netcom.com, kbracey@acorn.com, cowan@locke.ccil.org Subject: RE: Terminal Graphics Proposal Date: Sun, 4 Oct 1998 23:47:53 +0200 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain > ---------- > From: kenw@sybase.com > Sent: fredag 2 oktober 1998 19:25 > To: keka@im.se > Cc: keinanen@sci.fi; rmcgowan@apple.com; fdc@watsun.cc.columbia.edu; > Markus.Kuhn@cl.cam.ac.uk; kenw@sybase.com; asmusf@ix.netcom.com; > kbracey@acorn.com; cowan@locke.ccil.org > Subject: RE: Terminal Graphics Proposal > > Kent said: > > > I'm ***not*** REALLY interested in the control code > display-instead-of-do > > characters, since I find them to be a thing of the past* (no flames, > please, > > I alredy know that some of you disagree). And I know TUS says one can > use > > ANY (in some way appropriate) glyps for them. > > > > It still disturbs me that thay have no compatibility decompositions. > > (Compare the decompositons for some characters.) > > I disagree completely on this. Proposing compatibility decompositions > for glyphs which have arbitrary content (such as these) confuses > apples and oranges. These 32 glyphs for control codes could contain > 3-letter acronyms, or 2-letter acronyms, or could be substituted out > for something completely different, such as the ISO 2047 set. > Compatibility > decompositions in that context would be completely misleading. > > > The glyphs for > > these (nonce, imho) symbol characters are still fairly fixed to actually > be > > a short (2-3) sequence of letters/digits. I think it would be > reasonable to > > have compatibility decompositions for these characters too. This would > > affect collation also: they would be sorted according to the constituent > > letters (which is what I, at least, would expect). > > Again, I disagree. These should *not* sort as "NUL", "SOH", etc. > > --Ken > When looking in an index for a terminal emulator say, which is somethign I actually do sometimes (still), both I and I expect the average reader would expect to find NL under N, FF under F and so on, rather than quite unexpectedly before A. In practice the "content" (glyphs) for these characters do not appear to be arbitrary. They appear to be rather fixed, and not much intended for future arbitrary glyph invention. The only variation I have seen, correct me if I am wrong, is between two and three letter acronyms, for which a differing code positions would be tolarable. Kind regards /kent k 5-Oct-98 16:34:10-GMT,6749;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id MAA25339; Mon, 5 Oct 1998 12:32:44 -0400 (EDT) Date: Mon, 5 Oct 98 12:32:43 EDT From: Frank da Cruz To: unicode@unicode.org Subject: Re: Terminal Graphics Proposal In-Reply-To: Your message of Sat, 3 Oct 1998 02:32:57 -0700 (PDT) Message-ID: > Ar 14:24 -0700 1998-10-02, scrmobh Frank da Cruz: > >> I for my part do NOT!!!! want to see these terminal graphic things in the > >> BMP. They belong in Plane 1. > >> > >Perhaps, but as the lawyers say, the door was opened by the characters > >already included in blocks at U+2400, U+2500, U+2600, and U+2700. > > I will not support their inclusion in the BMP unless there is a really good > reason. (I'd still make TTFs if necessary though, because I am a loon.) The > list of characters I saw was rather long. > Hurray for Loons! > >Terminal emulation is a fact of life, and important > >to a significant number of serious and productive computer users; why should > >its special glyphs be excluded from the same status enjoyed by dingbats and > >astrological signs? > > Because the dingbats are used in typography, and astrological signs have a > definite semantic. > But what about the block at U+2500? It was included to allow for character-cell graphics that are possible on the PC -- and the so-called ANSI emulations that based on it -- but they exclude other types of terminals that are just as important ("existing standards"). The blocks at U+2580 and U+25A0 are also clearly intended for character-cell graphic applications, but they are incomplete. This proposal aims to fill some holes in existing categories. The argument for including the missing characters (not necessarily all of them), stated as clearly as I can, is: 1. There are numerous terminal emulation products on the market, with a user base numbering in the millions. 2. Increasingly, these products are used on systems -- like Windows NT -- that have Unicode fonts. 3. Many terminal based applications take full advantage of the features and glyph repertoires of the terminals they are designed for. 4. The glyph repertoire of many common terminals -- VT100/VT220, Wyse, Siemens Nixdorf, Data General, etc, include glyphs that are not presently in Unicode. 5. Customers of terminal emulation products demand complete and accurate emulation. 6. In order to succeed, makers of terminal emulation software must create private fonts containing the missing glyphs (which, as an aside, unnecessarily drives up the cost of the product for the end user) in the Private Use area. 7. Because of the closed an proprietary nature of this process, each terminal emulation product potentially (and in fact) encodes the same characters at different places. 8. Other applications use the Private Use Area for other purposes (and other glyphs). 9. The result is that terminal emulation products do not interoperate with each other (who cares), or (the real point) with other applications on the same platform. For example, a VT100 or HP forms-based screen can not be pasted into a word processing document without changing the forms borders (etc, depending on exactly how they are encoded) into whatever other glyphs happen to be defined at the same code points in the font used by the other application. Ditto for mathematical formulae displayed on DEC or Siemens Nixdorf screen. Ditto for character-cell illustrations or tables in numerous online texts intended for display on any of the widespread terminals. > >In any > >case, the intention here is to help Unicode become somewhat more > >"technology-neutral". > > The UCS is going to be used for centuries. Do we really think VT100 > emulation will be needed via BMP support? > How does one answer a question like this? Should it be based purely on numbers? For example, if there are currently millions of users of terminal emulators (there are), is it right to turn our backs on them while at the same time we encode writing systems that are used by only a handful of scholars? Or, to turn the question on its head, what is wrong with VT100 emulation? The fact that the popular trade press would like us all to live in a GUI world that we all know is unreliable, mysterious, proprietary, and constantly in flux, rather than in a proven, productive, stable, dependable, and cost-effective open environment should not be a factor in this discussion, any more than it should be in deciding whether to encode Linear B. Here in New York City we have thousands of people whose jobs are to sit in front of a 3270 (or other) terminal all day and respond to telephone calls. These include 911 (police/fire emergency) operators, EMS dispatchers, heat-complaint bureau and poison control agents, and car rental and airline reservation clerks (to name a few). These are what we like to call "mission critical" applications, and they must be (what we like to call) "rock solid". These people use a particular application all day, every day. They are trained on it, they must be able to use it effectively. At some point, the aging terminals will be replaced by PCs, because the terminals wear out and almost nobody makes them any more, but the applications themselves will not go away, nor should they. The new PCs will need to do exactly what the terminals did. We don't want our 911 operators to become needlessly confused when some strange symbol shows up on their screen in place of the one they expect. Taking this a step further, the people who write the training, operations, and procedures manuals for these systems need to be able to show the terminal screens and quote individual glyphs in the text. This is legitimate, real-world, nuts-and-bolts stuff that might not grab headlines in PC Week (but then I think that's an excellent indicator its importance :-) The original proposal included: Math symbols: 34 Line/Box/Block symbols: 31 Misc symbols: 7 Control pictures: 115 Hex bytes: 256 TOTAL: 443 The single biggest category is hex bytes, which so far seems to have received a warm reception. Thus the greatest controversy seems to swirl around the smallest number of characters. We begin by unifying the proposed diagonal C0 control pictures with the ones already at U+2400: Math symbols: 34 Line/Box/Block symbols: 31 Misc symbols: 7 Control pictures: 81 Hex bytes: 256 TOTAL: 409 If we eliminate the hex bytes, this brings the total down to 153. - Frank 5-Oct-98 19:07:40-GMT,2001;000000000001 Return-Path: Received: from dfw-ix10.ix.netcom.com (dfw-ix10.ix.netcom.com [206.214.98.10]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA13364 for ; Mon, 5 Oct 1998 15:07:39 -0400 (EDT) Received: (from smap@localhost) by dfw-ix10.ix.netcom.com (8.8.4/8.8.4) id OAA18941; Mon, 5 Oct 1998 14:03:12 -0500 (CDT) Received: from stl-wa51-59.ix.netcom.com(207.220.40.187) by dfw-ix10.ix.netcom.com via smap (V1.3) id rma018829; Mon Oct 5 14:02:24 1998 Message-Id: <3.0.5.32.19981005120405.00a62a20@popd.ix.netcom.com> X-Sender: asmusf@popd.ix.netcom.com X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.5 (32) Date: Mon, 05 Oct 1998 12:04:05 -0700 To: Karlsson Kent - keka , "'kenw@sybase.com'" From: Asmus Freytag Subject: RE: Terminal Graphics Proposal Cc: keinanen@sci.fi, rmcgowan@apple.com, fdc@watsun.cc.columbia.edu, Markus.Kuhn@cl.cam.ac.uk, kbracey@acorn.com, cowan@locke.ccil.org In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" >> >> Again, I disagree. These should *not* sort as "NUL", "SOH", etc. >> >> --Ken >> >When looking in an index for a terminal emulator say, which is somethign I >actually do sometimes (still), both I and I expect the average reader would >expect to find NL under N, FF under F and so on, rather than quite >unexpectedly before A. > I would side with Ken. If the emulator manual used the character codes that correspond to the 'Control code Pictures', I would in fact expect them to sort with all the other control code pictures and special symbols. If the index wanted to focus on the names for the control functions, it would use the charcter codes for the Latin letters and spell out FF etc. There is no need to burdern *every* single implementation of the standard sort with the table entries since such simple solutions are possible. A./ 6-Oct-98 17:58:19-GMT,1532;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA10835 for ; Tue, 6 Oct 1998 13:58:17 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA09096 ; Tue, 6 Oct 1998 10:57:23 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA04434; Tue, 6 Oct 98 10:43:08 -0700 Message-Id: <9810061743.AA04434@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Uml-Sequence: 6096 (1998-10-06 17:42:51 GMT) From: "Julie Doll Allen" Reply-To: unicode@unicode.org To: Unicode List Date: Tue, 6 Oct 1998 10:42:50 -0700 (PDT) Subject: RE: Terminal Graphics Proposal Content-Transfer-Encoding: 7bit Asmus wrote: Finally, once you feel that your proposal is pretty stable, there are brand new instructions on how to submit proposals to Unicode on the web site (the page should be called proposals.html, but I'm not sure where you will find it). It would be useful to assemble the kinds of information that are needed, esp. the answers to the form. ---------[end snip]------- The new page is at: http://www.unicode.org/pending/proposals.html I am still adding links to get to it, but it can be accessed from What's New or, of course, directly. Julie Allen Editor Unicode, Inc. 6-Oct-98 19:09:10-GMT,1153;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA02708 for ; Tue, 6 Oct 1998 15:09:09 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id MAA53736 ; Tue, 6 Oct 1998 12:07:35 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA05236; Tue, 6 Oct 98 12:01:06 -0700 Message-Id: <9810061901.AA05236@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Uml-Sequence: 6097 (1998-10-06 19:00:47 GMT) From: "Tony Harminc" Reply-To: unicode@unicode.org To: Unicode List Date: Tue, 6 Oct 1998 12:00:46 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Content-Transfer-Encoding: 7BIT On 5 Oct 98, at 13:57, Frank da Cruz wrote: > The single biggest category is hex bytes, which so far seems to have received > a warm reception. Btw, should the hex bytes have the Number property ? Tony Harminc 6-Oct-98 20:11:20-GMT,955;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id QAA20909 for ; Tue, 6 Oct 1998 16:11:19 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id NAA67062 ; Tue, 6 Oct 1998 13:10:21 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA05696; Tue, 6 Oct 98 13:04:37 -0700 Message-Id: <9810062004.AA05696@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6098 (1998-10-06 20:04:23 GMT) From: Rick McGowan Reply-To: unicode@unicode.org To: Unicode List Date: Tue, 6 Oct 1998 13:04:21 -0700 (PDT) Subject: Re: Terminal Graphics Proposal > The single biggest category is hex bytes, which so far seems to have received > a warm reception. What does "warm reception" mean? Rick 6-Oct-98 20:49:45-GMT,1073;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id QAA03497 for ; Tue, 6 Oct 1998 16:49:41 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id NAA10010 ; Tue, 6 Oct 1998 13:45:33 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA05911; Tue, 6 Oct 98 13:31:48 -0700 Message-Id: <9810062031.AA05911@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6099 (1998-10-06 20:30:21 GMT) From: kenw@sybase.com (Kenneth Whistler) Reply-To: unicode@unicode.org To: Unicode List Date: Tue, 6 Oct 1998 13:30:20 -0700 (PDT) Subject: Re: Terminal Graphics Proposal > > On 5 Oct 98, at 13:57, Frank da Cruz wrote: > > > The single biggest category is hex bytes, which so far seems to have received > > a warm reception. > > Btw, should the hex bytes have the Number property ? > Clearly not. --Ken > Tony Harminc > 6-Oct-98 21:10:51-GMT,1278;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA08700 for ; Tue, 6 Oct 1998 17:10:49 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA53574 ; Tue, 6 Oct 1998 14:09:56 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA06001; Tue, 6 Oct 98 13:37:31 -0700 Message-Id: <9810062037.AA06001@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Uml-Sequence: 6100 (1998-10-06 20:37:07 GMT) From: John Cowan Reply-To: unicode@unicode.org To: Unicode List Date: Tue, 6 Oct 1998 13:37:05 -0700 (PDT) Subject: Re: Terminal Graphics Proposal Content-Transfer-Encoding: 7bit Tony Harminc wrote: > Btw, should the hex bytes have the Number property ? IMHO no. They are "Symbol, other". -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) 8-Oct-98 0:08:32-GMT,16887;000000000401 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id UAA27987; Wed, 7 Oct 1998 20:07:16 -0400 (EDT) Date: Wed, 7 Oct 98 20:07:15 EDT From: Frank da Cruz To: unicode@unicode.org Subject: Collected Comments on Terminal Graphics Proposal Message-ID: Thanks to all who commented on the Terminal Graphics proposal. Here are some collected responses to particular points. Geoffrey Waigh wrote: > > A selection of terminal graphics characters is proposed for Unicode [24] > > and ISO 10646 [19] to allow Unicode-based terminal emulation software to > > (a) display glyphs that are found on popular types of terminals but > > currently are not available in Unicode, and (b) interoperate with other > > Unicode applications. > > I can see clear merit in handling b), but I'm leary of the code space > consumption that a) is having here. In general, my feeling is that > if 98% emulation does the job in an adequate fashion for > non-perfectionists, then that is the way to go. > When a company is in the market for a terminal emulator, one of the factors affecting their choice is the quality of the emulation, which includes the ability to display all the same glyphs the terminal displays. If product A can do this (but has to use a custom font to do so) and product B is a good citizen and sticks with Unicode -- which prevents it from displaying the same glyphs properly -- many companies will choose product A because its emulation is better, even though they might suffer down the road with its nonstandard encodings -- and maybe even total lack of Unicode support. > [On Hex code display] > > That seems kind of wasteful for a debugging mode. Do the terminals > that produce this output have escape sequences for enabling this > mode, or is it strictly a terminal configuration option? (Of course > by that measure the control character codes come under scrutiny...) > This is the largest biggest block in the proposal, and it can be dispensed with. I do believe, however, that many developers, help-desk people, network managers, etc, will find it handy in debugging not only terminal sessions but Web pages, word processors, network protocols, and files using Unicode-based tools. Kevin Bracey wrote: > > Unicode already has a block of Control Pictures at U+2400 through > > U+2421, but (except for "NL" at U+2424) these go horizontally across the > > character cell, rather than diagonally, thus making them difficult to > > distinguish from normal alphanumeric text. A new, parallel block of C0 > > control pictures is needed in which the abbreviations are displayed > > diagonally. > > That's a glyph variation - the Unicode Standard explicitly states that you > can use whatever preferred glyph you like for these. Indeed, IIRC, ISO > 10646-1 has considerably different suggested glyphs for these characters. > (And many others concurred.) OK, this block is removed from Draft 2 of the proposal, but some suggestions added for the next edition of the Unicode Standard. Asmus Freytag wrote [On the same topic...]: > And thus, at minumum, the table in the book should be altered to show all > control pictures arranged diagonally, and all future control picture > additions should also be arranged that way. We are looking into this for Unicode 3.0. Although the mail discussion makes clear that the distinction between characters and glyphs is widely known, it makes no sense to depart from the established use in the one area the characters are intended for! Since the two glyph forms are equivalent (i.e. there's no question of changing the identity of the characters) such a change is editorial in nature. For what it's worth, ISO 10646 uses the diagonal forms (although incorrectly in a roman type face). Kevin Bracey wrote: > > E080 SP Space (like U+2420 but arranged diagonally) > > E081 DEL Delete (Rubout) (2-character name: DT) > > These two are glyph variants of U+2420 and U+2421. > OK, these are removed too. > > E082 LS1 Locking Shift 1 (ISO name for SO) > > E083 LS0 Locking Shift 0 (ISO name for SI) > > Maybe these two could be considered glyph variants of U+240E and u+240F? > Probably not, I suppose. > I've left them in, along with IS1 through IS4. > > E0F0 Reverse Question Mark DEC VTxxx, Wyse, Televideo (1) > > I would suggest U+FFFD for this. > This was discussed at some length, but I've left it in, since many terminals display this glyph, and for different purposes. It does not always mean "unknown character received". > Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no > need to re-invent it. > As noted in previous postings, the ISO one is upside down, whereas this one is upright. Asmus Freytag wrote: > This important character [reverse question mark] is already on the list of > characters to be added in one the coming amendments in ISO 10646. kenw@sybase.com (Kenneth Whistler) wrote: > As Asmus mentioned, this one is already on its way. It is encoded in > Amendment 18 to 10646, which is just entering its last round of ballotting: > > U+2426 SYMBOL FOR SUBSTITUTE FORM TWO > > with the requisite shape of the reversed question mark. > Thanks; draft 2 amended accordingly. Rick McGowan wrote: > Of course, there are a *lot* of controls, many control sets, and some > degree of overlap, as Frank's proposal points out rather dramatically. I > would suggest that he take up an attempt at serious unification of these > things, and collect all of the wonderful data he's gathered into a "white > paper" on how to use control pictures for what terminals, etc. With > mapping tables, and a list of the minimum required additions to support > full cross-mappings. > I have tried to do this in Draft 2. Paul Keinanen wrote: > If all octet values (00 .. FF) are also going to be displayed, there might > be some ambiguity with some of the two letter codes, e.g. FF, D1, D2, D3, > D4, EB and EC, which should be noted in the actual font design. > Thanks for noticing! A caution to this effect has been added to Draft 2. > > C1 Control characters are specified in ISO-6429 and used in the VT220 > > family of terminals [5] and the Wyse 370 [26], where they are > > represented in the right half of the "display controls" font as shown in > > Table 4.3 (DEC terminals use the full name, Wyse terminals use the 2X > > name). As with C0 controls, the "name" is displayed diagonally within > > the character cell. Unicode presently includes no C1 control pictures. > > Looking through various EBCDIC code pages (e.g. IBM278, IBM880) and other > unnumbered sets it appears that these control codes are all also available > in EBCDIC, but of course at different positions (e.g. IND at 0x24). Some > references to these sets are "IBM NLS RM Vol2 SE09-8002-01, March 1990" > and "IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987". > Thanks for the reference. I found a complete listing of modern EBCDIC (which has changed considerable since the System/360 days!) in the CDRA Registry, and have totally revised the EBCDIC controls section in Draft 2. > >Note that three of the C1 control pictures are unassigned (the ones > >marked by "(1)", that would be at U+E020, U+E021, and U+E039 if these > >were assigned). These positions should be left vacant in case names are > >assigned to these characters in a future revision of ISO 6429. > > In ISO 8859-1 these are listed as > > 80 PADDING CHARACTER (PAD) > 81 HIGH OCTET PRESET (HOP) > 99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) > I have both the ISO and ECMA versions of this standard and find no reference to these or any other control characters. Nor can I find these characters ISO 6429 or any of the control sets in the ISO Registry. Can you give a more precise source? > Then I am just wondering why: > > ftp://dkuug.dk/i18n/charmaps/CP819 (alias Latin1 alias ISO_8859-1:1987) > lists: > /x80 PADDING CHARACTER (PAD) > /x81 HIGH OCTET PRESET (HOP) > /x99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI) > > and ftp://dkuug.dk/i18n/charmaps.646/ISO_8859-1:1987 > lists the same code point values for these control characters > > /d128 > /d129 > /d153 > > So I just wonder, where they at dkuug.dk/i18n have taken these C0 and C1 > codes from, unfortunately these tables did not contain any references (as > did most EBCDIC tables). > > 5. HEX BYTES > > > > Hexadecimal byte values, 2 hex digits each. Like display controls, but > > for all 256 8-bit byte values... > > These would be very nice :-). Note the possible ambiguity with some two > character control pictures r.g. FF, EB etc. So special precautions should be > taken when designing the fonts. > Noted in Draft 2. Karlsson Kent - keka wrote: > ... (though the newly suggested hexadecimal-digit-pair display ones might > continue to be useful; though hexadecimal digit quadruples would fill an > entier plane and more! ;-) Rick McGowan wrote: > > The single biggest category is hex bytes, which so far seems to have > > received a warm reception. > > What does "warm reception" mean? > Some nice comments (like the ones just above). Paul Keinanen wrote: > > No attempt was made to account for the many Viewdata, Videotex, Minitel, > > NAPLPS, or other mosaic graphics character sets. These should be > > tackled, if appropriate, by someone who knows something about them. > > And not forgetting the tele-text block characters on European TVs. With > the introduction of TV cards for PCs that also contains a teletext > decoder, so there is a need to display the text and block graphics on > PC. As far as I remember, the block graphic format is more or less the > same as Viewdata with 2 columns and 3 rows per character cell, thus > requiring 64 glyphs. > There are numerous mosaic graphics, Teletex, and similar character sets in the ISO Register. Quite honestly, I have never even seen such a terminal and do not feel qualified to propose how/if/when/whether this class of glyphs should be handled in Unicode. > All in all a very interesting proposal. By using as much existing > characters from current Unicode standard, i guess there would be a greater > likelyhood of getting thing officially approved. > In most places, the proposal does not bother enumerate all of the characters used by these terminals that are already in Unicode -- and this evidently leaves the false impression that they were not researched. Indeed they were! If it is necessary to get the proposal passed, of course it can be done. Rick McGowan wrote: > > Still, if I were a font maker working from the Unicode book, I'd > > probably copy the pictures in it, so again, I'd suggest the next edition > > show the characters diagonally within the cell, and the accompanying text > > (which if I can overlook, so can a font maker :-) > > Yes, yes, but... People should read, Grasshopper. It is that for which we > write. > Yes, I should know this as well as anyone, having written several books myself, which serve to varying degrees as software manuals, and which, if users of the software would only read them, would save me my daily 6-8 hours of question answering -- hence the smiley :-) Karlsson Kent - keka wrote: > I probably should not say this, but... If you are abolutely hardbent on > having symbols for control codes, there should be some for the Unicode > control codes too (like paragraph separator, left-to-right-mark, etc.) > They need not be constructed from letters... > I have added a section on these to Draft 2. They are not needed for terminal emulators (at least not yet), but might be handy in other contexts. Tony Harminc wrote: > > E0F6 Padlock (keyboard locked) IBM 3270 > > This last one introduces a bit of a problem, I think. It differs > from all other characters mentioned in that it is never displayed in > the data portion of a 3270 screen, but rather occurs "below the line" > as an indication of keyboard status. If it is to be included, then > there are several more uniquely 3270 characters that can be seen > below the line; I don't know formal names for them, and indeed they > generally don't appear in IBM's CDRA documents. Roughly, they are: > > Outline up arrow (indication of upshifted condition) > Outline down arrow (indication of downshifted (override) condition) > Key (indication of terminal physically locked (I think > this may be what is meant by E0F6 above) > Stick figure (terminal is connected to "operator" (really to a > supervisory program)) > Solid block (terminal is connected to "application program") > 4 in square box (terminal is connected to 3274-type control unit) > 6 in square box (terminal is connected to 3276-type control unit) > Lightning bolt (communication failure) > Rectangle with slash (machine check) > Printer symbol with slash (associated printer has an error condition) > These have been added in Draft 2 -- but just the ones not already in Unicode (such as outline arrows, "4 in square box" which is really just an inverse video "4" as far as a terminal is concerned, etc). > and most problematic: > Left half of clock (these two form a doublewidth clock (set at 6:10 > Right half of clock or 2:30, though I'm sure the time would be > considered a matter of glyph - indeed at least > one non-IBM manufacturer's clock symbol was 5:50 > or 10:30) > I don't have an actual 3270 terminal to look at just now, but I did manage to scrape up the IBM 3270 Component Description manual, which lists (and illustrates) all the special glyphs shown in the Operator Information Area, in which there is nothing to suggest that the clock is made from two character cells. In fact, it looks quite round to me :-) Even if it is made from pieces, I assume there is no way to see them in isolation, and so there should be no harm in encoding the clock as a single glyph (and then, if necessary, show it in double size). > Now it's entirely reasonable to argue that all the above (and I may > have forgotten a couple) have no business being encoded at all. > Indeed some terminal emulators use graphical means to produce the > symbols. In any case there is nothing in the 3270 architecture that > specifies use of any of them, and an emulator program can use other > means to communicate the same information to the user. However a > number of Windows-based emulators I know do use glyphs encoded in a > font that they supply to produce at least a subset of the symbols. > (It should be pointed out that a number of "ordinary" glyphs can also > appear below the line, but I can think of no reason not to unify them > with the upper case letters, numbers, and so on.) > Right. The reason for including the special glyphs appears at the top of this message. > That IBM doesn't include them in CDRA may be a good reason to exclude > them from this proposal. But they can be genuinely useful for > writers of emulators. What to do ? And how many clocks and stick > figures is it reasonable to encode ? > In Draft 2, I'm listing one of each (I retired the SNI 3:00 clock and stick figure with hat). (Yes, I know that on the RS/6000 there is a little animated "running man" who can stop, fall down, etc, as an indicator of the system status, but that's above and beyond...) Elliotte Rusty Harold wrote: > > E0B4 Latin capital letter H with bar SNI Math 04/05 (2) > > E0B5 Latin small letter h with bar SNI Math 04/06 (2) > > Is E0B5 supposed to be Planck's constant over 2*PI? If so, it's encoded at > 210F, 0127, and 045B. And your E0B4 is at 0126. > Who knows what it's supposed to be! In any case, I looked harder and found barred H's and T's, dotted L's, etc (which look just right for the SNI character set), as well as some Engs, in Latin Extended A (U+0100..) and so removed them from the proposal. As a result of all your comments, and further research, Draft 2 should be much tighter in terms of unifications, but also more complete -- win some, lose some :-) It's coming up in the next message. NOTE: If it is the sense of the readers that these proposals should no longer be posted here, but rather just pointers to them, I'm happy to comply. In case you want to skip the next draft in email, the pointer is: ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt Thanks again! - Frank 8-Oct-98 1:12:12-GMT,1571;000000000011 Return-Path: Received: from mail-out1.apple.com (mail-out1.apple.com [17.254.0.52]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id VAA08504 for ; Wed, 7 Oct 1998 21:12:11 -0400 (EDT) Received: from mailgate.apple.com (A17-128-100-225.apple.com [17.128.100.225]) by mail-out1.apple.com (8.8.5/8.8.5) with ESMTP id SAA40798 for ; Wed, 7 Oct 1998 18:06:40 -0700 Received: from scv4.apple.com (scv4.apple.com) by mailgate.apple.com (mailgate.apple.com - SMTPRS 2.0.15) with ESMTP id for ; Wed, 07 Oct 1998 18:06:30 -0700 Received: from rangda (rangda.apple.com [17.202.14.171]) by scv4.apple.com (8.8.5/8.8.5) with SMTP id SAA56586 for ; Wed, 7 Oct 1998 18:06:29 -0700 Message-Id: <199810080106.SAA56586@scv4.apple.com> To: fdc@watsun.cc.columbia.edu Subject: Re: Terminal Graphics Draft 2 Date: Wed, 7 Oct 1998 18:06:29 -0700 From: Rick McGowan Reply-To: rmcgowan@apple.com Received: by Apple.Mailer (2.95.2) Thanks Frank for working on the next draft. But PLEASE, in future *DO NOT* post 59.1k worth of draft document to the Unicode list!!! I get enough megabytes of e-mail per day. And some people pay for downloading and connect-time. A pointer to: ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt is sufficient. Anyone out there without access to either a web browser OR ftp should ask for a copy of the draft via private e-mail. Rick 8-Oct-98 6:44:07-GMT,2499;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id CAA22615 for ; Thu, 8 Oct 1998 02:44:06 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id XAA08294 ; Wed, 7 Oct 1998 23:43:45 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA14555; Wed, 7 Oct 98 23:39:36 -0700 Message-Id: <9810080639.AA14555@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Uml-Sequence: 6106 (1998-10-08 06:39:23 GMT) From: brox@corena.no (Bjorn Brox) Reply-To: unicode@unicode.org To: Unicode List Date: Wed, 7 Oct 1998 23:39:21 -0700 (PDT) Subject: Re: Collected Comments on Terminal Graphics Proposal Content-Transfer-Encoding: 8bit Frank da Cruz wrote this: > > Thanks to all who commented on the Terminal Graphics proposal. Here are > some collected responses to particular points. ... > > And not forgetting the tele-text block characters on European TVs. With > > the introduction of TV cards for PCs that also contains a teletext > > decoder, so there is a need to display the text and block graphics on > > PC. As far as I remember, the block graphic format is more or less the > > same as Viewdata with 2 columns and 3 rows per character cell, thus > > requiring 64 glyphs. > > > There are numerous mosaic graphics, Teletex, and similar character sets in > the ISO Register. Quite honestly, I have never even seen such a terminal > and do not feel qualified to propose how/if/when/whether this class of glyphs > should be handled in Unicode. The national norwegian teletext service on WWW is using a Teletex-font (nrkttv.ttf) when properly configured. http://www.nrk.no/teksttv (sorry, it's in Norwegian) Wouldn't it be nice to be able to cut and paste from such a window? You should also take a look on the corporate use subarea defined by Adobe Systems. http://www.adobe.com/supportservice/devrelations/typeforum/corporateuse.txt http://www.adobe.com/supportservice/devrelations/typeforum/unicodegn.html Some of your maths symbols, and probably others is covered by this range.. -- Bjorn Brox, CORENA Norge AS, http://www.corena.no/ Kirkegaardsvn. 45, P.O.Box 1024, N-3601 Kongsberg, NORWAY Phone: +47 32737435, Fax: +47 32736877, Mobile: +47 92638590 8-Oct-98 8:55:12-GMT,5092;000000000001 Return-Path: Received: from www.im.se (fw.im.se [193.14.22.222]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id EAA14769 for ; Thu, 8 Oct 1998 04:55:07 -0400 (EDT) Received: from imhps.im.se (imhps.im.se [192.36.35.5]) by www.im.se (8.9.1/8.9.1) with ESMTP id KAA07148; Thu, 8 Oct 1998 10:36:53 +0200 (METDST) Received: from msxsth1.im.se by imhps.im.se (1.37.109.16/IM-3.12) id AA106666867; Thu, 8 Oct 1998 10:54:27 +0200 Received: by msxsth1 with Internet Mail Service (5.5.2232.9) id ; Thu, 8 Oct 1998 10:52:29 +0200 Message-Id: From: Karlsson Kent - keka To: "'Frank da Cruz'" Cc: "'kenw@sybase.com'" , "'rmcgowan@apple.com'" , "'asmusf@ix.netcom.com'" , "'kbracey@acorn.com'" , "'Markus.Kuhn@cl.cam.ac.uk'" , "'cowan@locke.ccil.org'" Subject: RE: Terminal Graphics Draft 2 Date: Thu, 8 Oct 1998 10:52:00 +0200 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain; charset="iso-8859-1" > Rusty Harold, Paul Keinanen, Karlsson Kent, Rick McGowan, Kenneth > Whistler. > Well, it is Kent Karlsson (but so what...). > Math Symbols > Although most math symbols found on terminals are already in Unicode, > certain terminal-based applications rely on the ability to construct > large > symbols (integral and summation signs, braces, brackets) from smaller > character-cell-sized pieces. Section 6. > Some of these are used INTERNALLY in Knuth's TeX (and INTERNALLY in the now hopefully retired troff). Users CANNOT access these glyph pieces directly in these systems, if I remember correctly (nor would they want to). I don't know to what extent they may be considered to be "characters" in dvi files (which are never written by humans). > 4. HEX BYTES > > Hexadecimal byte values, 2 hex digits each, allow any 8-bit byte to be > displayed in hexadecimal in a single character cell (and therefore allow > any > Unicode character value to be displayed in two cells), > Well, if in UTF-16 one would need 2 OR 4 such per character. If in UTF-8 one would need from 1 to 4 such per character (assuming only UTF-16 space is used, otherwise UTF-8 can have up to 6 octets per character). > Table 5.0: Unicode Control Characters > > Code Val Name Description > E000 2000 NQ SP Symbol for En Quad > E001 2001 MQ SP Symbol for Em Quad > E002 2002 EN SP Symbol for En Space > E003 2003 EM SP Symbol for Em Space > E004 2004 3/M SP Symbol for Three-Per-Em-Space > E005 2005 4/M SP Symbol for Four-Per-Em-Space > E006 2006 6/M SP Symbol for Six-Per-Em-Space > E007 2007 F SP Symbol for Figure Space > E008 2008 P SP Symbol for Punctuation Space > E009 2009 TH SP Symbol for Thin Space > E00A 200A H SP Symbol for Hair Space > E00B 200B ZW SP Symbol for Zero-Width Space > E00C 200C ZW NJ Symbol for Zero-Width Non-Joiner > E00D 200D ZW J Symbol for Zero-Width Joiner > E00E 200E LRM Symbol for Left-to-Right Mark > E00F 200F RLM Symbol for Right-to-Left Mark > E010 2028 L SEP Symbol for Line Separator > E011 2029 P SEP Symbol for Paragraph Separator > E012 202A LRE Symbol for Left-to-Right Embedding > E013 202B RLE Symbol for Right-to-Left Embedding > E014 202C PDF Symbol for Pop Directional Formatting > E015 202D LRO Symbol for Left-to-Right Override > E016 202E RLO Symbol for Right-to-Left Override > E017 206A I SS Symbol for Inhibit Symmetric Swapping > E018 206B A SS Symbol for Activate Symmetric Swapping > E019 206C I AFS Symbol for Inhibit Arabic Form Shaping > E01A 206D A AFS Symbol for Activate Arabic Form Shaping > E01B 206E NA DS Symbol for National Digit Shapes > E01C 206F NO DS Symbol for Nominal Digit Shapes > E01D FEFF ZWN BSP Symbol for Zero Width No Break Space > E01E FFFE FF FE Symbol for Not A Character (Byte Order) (2) > E01F FFFE FF FF Symbol for Not A Character (2) > I think these have more room for glyph invention, since there is no need to be at all compatible with existing terminals. Like grey(ish) arrows, grey(ish) section mark or pilcrow, grey(ish) 'space box with annotation', etc., rather than letters. > Table 5.2: C1 Control Characters > > Code Val Name 2X Description > 80 80 (1) > 81 81 (1) > E022 82 BPH Symbol for Break Permitted Here (2) > E023 83 NBH Symbol for No Break Here (2) > Aren't these two 'control codes' the same as the Unicode characters Zero Width Space and Zero Width No-Break Space? > E024 84 IND IN Symbol for Index (3) > E025 85 NEL NL Symbol for Next Line > newline??? again? Regards /kent k 8-Oct-98 13:09:08-GMT,2066;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id JAA11405 for ; Thu, 8 Oct 1998 09:09:07 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id GAA30778 ; Thu, 8 Oct 1998 06:08:22 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA15702; Thu, 8 Oct 98 06:03:47 -0700 Message-Id: <9810081303.AA15702@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Uml-Sequence: 6107 (1998-10-08 13:02:16 GMT) From: "Hart, Edwin F." Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 8 Oct 1998 06:02:15 -0700 (PDT) Subject: RE: Terminal Graphics Proposal I can see value for encoding the paired hex digits (00 to FF) in the proposal. However with appropriate rendering software, I could also see having them merely as a glyph variant for rendering. I might compare this to encoding the Braille script. 1. Two of the glyphs could be used to display a 16-bit Unicode character (e.g., for debugging or for displaying an unknown character) 2. Protocol analyzers for communications and LANs use these glyphs to display captured data. (Today, the Network Associates (formerly, Network General) Sniffer is perhaps the most widely recognized device. 20 years ago, it was the Spectron DataScope. Both of these are likely trademarks.) They have 2 display modes, hex and text (typically ASCII, EBCDIC). In my youth, these devices used hardware fonts in ROM and TV-resolution CRTs. Now, these devices tend to be PCs or computers with embedded software. If the manufacturers of such equipment want to display characters beyond the 7-bit ASCII set, Unicode is the natural choice. 3. The glyphs are used for debugging communication problems and software problems with the "real" terminals (rather than PCs emulating the terminals). Ed Hart 8-Oct-98 13:27:17-GMT,1078;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id JAA15084; Thu, 8 Oct 1998 09:27:15 -0400 (EDT) Date: Thu, 8 Oct 98 9:27:15 EDT From: Frank da Cruz To: rmcgowan@apple.com Subject: Re: Terminal Graphics Draft 2 In-Reply-To: Your message of Wed, 7 Oct 1998 18:06:29 -0700 Message-ID: > Thanks Frank for working on the next draft. But PLEASE, in future *DO NOT* > post 59.1k worth of draft document to the Unicode list!!! I get enough > megabytes of e-mail per day. And some people pay for downloading and > connect-time. > > A pointer to: > > ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt > > is sufficient. Anyone out there without access to either a web browser OR > ftp should ask for a copy of the draft via private e-mail. > OK. That was my original idea (I suggested keeping the discussion private to spare the bulk of Unicode readers, but nobody seemed to want it that way). I'll post pointers from now on. - Frank 8-Oct-98 16:28:04-GMT,1105;000000000001 Return-Path: Received: from inergen.sybase.com (inergen.sybase.com [192.138.151.43]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id MAA08787 for ; Thu, 8 Oct 1998 12:28:03 -0400 (EDT) Received: from smtp1.sybase.com (sybgate.sybase.com [130.214.220.35]) by inergen.sybase.com (8.8.4/8.8.4) with SMTP id JAA17259; Thu, 8 Oct 1998 09:29:17 -0700 (PDT) Received: from birdie.sybase.com by smtp1.sybase.com (4.1/SMI-4.1/SybH3.5-030896) id AA16596; Thu, 8 Oct 98 09:28:05 PDT Received: by birdie.sybase.com (5.x/SMI-SVR4/SybEC3.5) id AA19822; Thu, 8 Oct 1998 09:27:56 -0700 Date: Thu, 8 Oct 1998 09:27:56 -0700 From: kenw@sybase.com (Kenneth Whistler) Message-Id: <9810081627.AA19822@birdie.sybase.com> To: keka@im.se Subject: RE: Terminal Graphics Draft 2 Cc: fdc@watsun.cc.columbia.edu, kenw@sybase.com X-Sun-Charset: US-ASCII > > E024 84 IND IN Symbol for Index (3) > > E025 85 NEL NL Symbol for Next Line > > > newline??? again? I agree with Kent. Shouldn't this simply be: U+2424 SYMBOL FOR NEWLINE ? --Ken 8-Oct-98 18:33:27-GMT,2068;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA18263 for ; Thu, 8 Oct 1998 14:33:25 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA43790 ; Thu, 8 Oct 1998 10:10:50 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA16630; Thu, 8 Oct 98 09:57:59 -0700 Message-Id: <9810081657.AA16630@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6109 (1998-10-08 16:57:33 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 8 Oct 1998 09:57:32 -0700 (PDT) Subject: RE: Terminal Graphics Draft 2 > > > E024 84 IND IN Symbol for Index (3) > > > E025 85 NEL NL Symbol for Next Line > > > > > newline??? again? > > I agree with Kent. Shouldn't this simply be: > > U+2424 SYMBOL FOR NEWLINE > That depends on what the (unstated) semantics are for U+2424. I expect it simply represents a "line terminator", like LF in UNIX, CR on the Macintosh, or CRLF in DOS. NEL stands for Next Line (not Newline). The definition of NEL in ISO 6429 [8.3.87] is rather complex: "The effect of NEL depends on the setting of the DEVICE COMPONENT SELECT MODE (DCSM) and on the parameter value of SELECT IMPLICIT MOVEMENT DIRECTION (SIMD)." Several paragraphs go on to explain this. Confusingly, terminals that support C1 controls and that also use 2-character abbreviations for them abbreviate NEL as NL. However, the VT220 class of terminals actually puts "NEL" on the screen in display-controls mode. All this in contrast to EBCDIC, which defines an actual NL character, which I doubt carries the ISO NEL semantics. - Frank P.S. Sorry for getting your name backwards, Kent. And for omitting Geoffrey Waigh from the acknowledgements. Both errors fixed in my working copy. Also, no more posting long drafts; just pointers from now on. 8-Oct-98 18:55:44-GMT,765;000000000001 Return-Path: Received: from aples2.jhuapl.edu (aples2.jhuapl.edu [128.244.26.86]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA24567 for ; Thu, 8 Oct 1998 14:55:40 -0400 (EDT) Received: by aples2.jhuapl.edu with Internet Mail Service (5.5.2232.9) id <41YQFQF6>; Thu, 8 Oct 1998 14:55:38 -0400 Message-ID: <91D1D51C2955D111B82B00805F19989501CD7119@aples2.jhuapl.edu> From: "Hart, Edwin F." To: "'Frank da Cruz'" Subject: RE: Terminal Graphics Draft 2 Date: Thu, 8 Oct 1998 14:55:36 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain Thanks for championing this effort. It seems to be going very well. Ed 8-Oct-98 19:01:11-GMT,751;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id PAA26296; Thu, 8 Oct 1998 15:01:05 -0400 (EDT) Date: Thu, 8 Oct 98 15:01:05 EDT From: Frank da Cruz To: "Hart, Edwin F." Subject: RE: Terminal Graphics Draft 2 In-Reply-To: Your message of Thu, 8 Oct 1998 14:55:36 -0400 Message-ID: > Thanks for championing this effort. > > It seems to be going very well. > Thanks for saying so, Ed (and good to hear from you). Yup, we old timers have to stick up for what's right :-) Maybe a few more USS Yorktown incidents will get more people longing for the good old days when things actually worked... - Frank 8-Oct-98 19:23:24-GMT,2765;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id PAA03172; Thu, 8 Oct 1998 15:23:22 -0400 (EDT) Date: Thu, 8 Oct 98 15:23:22 EDT From: Frank da Cruz To: "Hart, Edwin F." Subject: RE: Terminal Graphics Draft 2 In-Reply-To: Your message of Thu, 8 Oct 1998 15:11:38 -0400 Message-ID: > This is how the work really gets done. Someone with the need and the > expertise gets the ball rolling. I must say that I was impressed by the > depth of your first draft. Doing Kermit did not hurt. > Yes, I learned long ago that the person who cares -- and can write it down -- can usually get it done. > This seems like an issue that SHARE needs to support given all of the legacy > systems in use by our organizational members. > Don't say "legacy"! I hate that. It means "Not Microsoft Windows and so deserves to be ground into fine powder at the earliest opportunity but because we are so stupid and lazy we can't do it yet, please don't hate us, we are so ashamed." (Seriously, my personal mission is to expunge that word from the computing lexicon.) > Since we swapped our > mainframe for lots of VMS and NT systems, I don't go to SHARE anymore. It's > nice to have an issue where I know that SHARE has a definite interest in the > outcome. I believe that you have made the point about the need for terminal > emulation and the key players on the UTC accept the argument. If you can > sell Rick McGowan, Ken Whistler, and Asmus Freytag, the rest of the UTC will > accept the proposal. > Then I guess looks good, since they are mainly quibbling about individual characters and not the entire idea. > BTW, pardon my ignorance, but what was the Yorktown incident? > The US Navy has a "smart ship" program, meaning everything is controlled by computer. The USS Yorktown is a guided missile frigate entirely controlled by a network PCs running Windows NT (installed, naturally, over the vigorous objections of the technical people). According to a front page article in Government Computer News, the network froze and the ship turned itself off, engine, rudder, and all. No amount of prodding would bring it back to life. It had to be ignomiously towed back to port. Evidently this has happened more than once. The fact that no no missiles were launched is a pretty lucky break. Sigh. Microsoft evidently has a pretty cozy deal with US government -- NT is the only platform that any government installation can just buy, without any sort of approval. Anything else -- mainframes, UNIX, VMS, etc -- requires mountains of paperwork, RFPs, RFQs, sealed bids, etc etc. Oh well, don't get me started :-) - Frank 8-Oct-98 19:27:09-GMT,834;000000000001 Return-Path: Received: from aples2.jhuapl.edu (aples2.jhuapl.edu [128.244.26.86]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA04303 for ; Thu, 8 Oct 1998 15:27:09 -0400 (EDT) Received: by aples2.jhuapl.edu with Internet Mail Service (5.5.2232.9) id <41YQFQK2>; Thu, 8 Oct 1998 15:27:02 -0400 Message-ID: <91D1D51C2955D111B82B00805F19989501CD711C@aples2.jhuapl.edu> From: "Hart, Edwin F." To: "'Frank da Cruz'" Subject: RE: Terminal Graphics Draft 2 Date: Thu, 8 Oct 1998 15:27:01 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain; charset="iso-8859-1" Thanks for the feedback. I had not heard of the Yorktown. I'll try to avoid the term, legacy. : ) Best regards, Ed 8-Oct-98 19:42:48-GMT,6418;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA10390 for ; Thu, 8 Oct 1998 15:42:34 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id LAA23382 ; Thu, 8 Oct 1998 11:24:11 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA17365; Thu, 8 Oct 98 10:52:54 -0700 Message-Id: <9810081752.AA17365@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6112 (1998-10-08 17:51:24 GMT) From: Rick McGowan Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 8 Oct 1998 10:51:23 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Frank -- I reviewed the latest draft and have more comments... > ... All proposed characters have Combining Class 0 (although > some of the characters are designed to "combine" (connect) with other > characters in adjacent cells). You might re-word the above a bit: ... (although some of the corresponding glyphs must be designed to "combine" (connect) with other glyphs in adjacent display cells). > Digital VT220 and higher terminals, as well as Televideo, Wyse, HP, Perkin > Elmer, and other models, allow the user to select whether control characters > are acted upon or displayed graphically. Unicode itself includes its own Well, to my mind that indicates that these aren't NEEDED to be encoded at all! Just set the terminal emulator into the "display controls" mode and let it display the glyphs that the emulator has for the control codes. They should not need to be encoded, since they're merely a variant representation that the terminal does internally. It's unfortunate that my argument is weakend by the fact that we already have a bunch of control pix encoded... I do have a problem generally with adding "picture" characters to correspond to existing things that are Unicode-specific. For instance, I see it as just pointless to add "Symbol for En Quad" or "Symbol for Right-to-Left Override" or other such things, unless you can show that this and other such codes are absolutely necessary for supporting the emulation of these Actual Physical Terminals. Of course you say: > (1) There is no known need for these symbols when emulating current > terminals. In the future, if/when terminals are based on Unicode, they > might be useful in that context. In the meantime, makers of word > processors, Web browsers, etc, might have a use for these glyphs. So, it's my opinion that we should note the possible use, and move on. I.e., don't propose adding them now on the off chance the *someone* might have a user for them. Wait for a use. It's perfectly possible in implementations to have a "show control" mode that shows controls as glyphs, without having the pictures for them encoded as characters. So if characters aren't in support of the graphical requirements for Actual Physical Terminals, they should not be proposed. Or should be proposed separately. ((Here's a little side-bar... It's sometimes desirable to separate things into independent proposals so that characters which appear to be "non-controversial" or less controversial can be put into one proposal and the controversial stuff in another. That way, when committees look at them "formally" and vote, things move more quickly. In practice, this can lead to forward progress in pieces rather than multiple rounds back into the draft stage for an entire set of stuff. This happened recently with Tibetan extensions that were recently approved for addition. The ad-hoc group of experts removed everything controversial and quickly came to consensus on an agreed set for immediate proposal. If they had waited until the last bit of controversy were resolved for a few items, they would still not have a proposal today.)) I guess it would be nice to see this document broken into two really major sections -- one is an analysis of the existing controls, with recommendations about usage and mappings to character sets for popular Actual Physical Terminals, as best you're able to determine. The other section would be proposed additions. > Table 5.2: C1 Control Characters Table 5.2 is particularly valuable information of the "here's what exists" variety... and given the widespread use of ISO-6249 controls, it is probably worth adding these. You also say in the notes "ISO-6428". Is that different from 6429? Or just a typo? > 5.3. EBCDIC Control Pictures Likewise, this is valuable information. It would be good to somehow call out the proposed additions, perhaps by putting an asterisk before or after the names. Because they're in EBCDIC order I found it a bit hard to discern precisely which are proposed additions. Someone from IBM should look at the 3270 stuff... I suppose someone will do so. Another thing that should be discussed is when adding "symbol for foo" one should also add "foo" itself. For instance, there is no "Start of Field" control character; but a picture of it is being proposed. Probably UTC needs to hash through *that* issue... > Table 6.1: Math Symbols for Terminals You should look at the glyph pieces in the Adobe Symbol font, which is a widely used font. Many of these are contained in the Symbol font (0xE6 to 0xFE inclusive). I believe the following two characters are just masculine and feminine ordinal indicators, and are already encoded between 0x80 and 0xFF, as part of ISO Latin 1. They are probably just variant glyphs... unless the documentation distinguishes them and they occur in pairs with lower-case. Do you mean "small" or "capital"? Or are they really different? > E0B3 Latin small letter a with underbar SNI Math 04/04 (2) > E0B4 Latin capital letter O with underbar SNI Math 04/09 (2) By the way, I'm opposed quite strongly to adding the 256 "hex bytes" under any circumstances. Good thing they're an indepenedent proposal. The total proposed, including Hex Bytes is 448. Without Hex Bytes, it's a modest 192, and I think it could be reduced with a little more unification. Of course reduction will offset the expected increase due to other terminals clamoring to be included... Rick 8-Oct-98 20:29:17-GMT,1895;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id QAA23231 for ; Thu, 8 Oct 1998 16:29:16 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id MAA06860 ; Thu, 8 Oct 1998 12:55:10 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA18332; Thu, 8 Oct 98 12:46:59 -0700 Message-Id: <9810081946.AA18332@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Uml-Sequence: 6114 (1998-10-08 19:46:44 GMT) From: "Hart, Edwin F." Reply-To: unicode@unicode.org To: Unicode List Cc: "'Unicode List'" Date: Thu, 8 Oct 1998 12:46:43 -0700 (PDT) Subject: RE: Terminal Graphics Draft 2 What should the "customary and familiar mnemonic" be? One of my concerns is that the names for the EBCDIC controls seemed to vary from device to device and with different editions of the IBM "green card" (System/360 Reference Summary) (and "yellow card" and "pink card"). I'm unsure how much of this has solidified and/or disappeared with IBM's SNA devices because I have not looked at any of this in over 10 years. Ed Hart ---------- From: Frank da Cruz [SMTP:fdc@watsun.cc.columbia.edu] Sent: 08 October, 1998 13:37 To: Unicode List Subject: RE: Terminal Graphics Draft 2 . . . I'm sure we could also find other examples of control characters in the C1 and EBCDIC sets whose semantics are the same or close but whose names differ; I don't think that means we should unify them. The purpose of "display controls" is to show the customary and familiar mnemonic for each control character in its context so people can read them easily. - Frank 8-Oct-98 20:52:59-GMT,1934;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id QAA01890 for ; Thu, 8 Oct 1998 16:52:58 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id NAA30836 ; Thu, 8 Oct 1998 13:18:15 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA18816; Thu, 8 Oct 98 13:09:10 -0700 Message-Id: <9810082009.AA18816@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6115 (1998-10-08 20:09:01 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Cc: unicode@unicode.org Date: Thu, 8 Oct 1998 13:09:00 -0700 (PDT) Subject: RE: Terminal Graphics Draft 2 > What should the "customary and familiar mnemonic" be? > > One of my concerns is that the names for the EBCDIC controls seemed to vary > from device to device and with different editions of the IBM "green card" > (System/360 Reference Summary) (and "yellow card" and "pink card"). I'm > unsure how much of this has solidified and/or disappeared with IBM's SNA > devices because I have not looked at any of this in over 10 years. > I took the EBCDIC names from the current reference, and do indeed note the fact that the names have changed over the years, and include a table of original names for reference. Is there any sentiment in favor of actually encoding the old names? By the way, I also omitted the mnemonics of numerous special-purpose control sets found in the ISO Register since, to my knowledge, no terminal ever displays these mnemonics in "display controls" mode, nor does any kind of protocol analyzer or data scope. I think people who look at "display controls" screens will be satisfied with the proposed familiar set of mnemonics. But I could be wrong. - Frank 8-Oct-98 21:19:51-GMT,6421;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA09529 for ; Thu, 8 Oct 1998 17:19:47 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA10974 ; Thu, 8 Oct 1998 14:18:08 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA19370; Thu, 8 Oct 98 14:07:51 -0700 Message-Id: <9810082107.AA19370@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6117 (1998-10-08 21:07:34 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 8 Oct 1998 14:07:31 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Rick McGowan wrote: > Frank -- I reviewed the latest draft and have more comments... > I appreciate it, thanks. > I do have a problem generally with adding "picture" characters to > correspond to existing things that are Unicode-specific. > ... > So, it's my opinion that we should note the possible use, and move on. > I.e., don't propose adding them now on the off chance the *someone* might > have a user for them. Wait for a use. > Fine with me! > Table 5.2 is particularly valuable information of the "here's what exists" > variety... and given the widespread use of ISO-6249 controls, it is probably > worth adding these. You also say in the notes "ISO-6428". Is that > different from 6429? Or just a typo? > It's a typo; thanks for spotting it. > > 5.3. EBCDIC Control Pictures > > Likewise, this is valuable information. It would be good to somehow call > out the proposed additions, perhaps by putting an asterisk before or after > the names. Because they're in EBCDIC order I found it a bit hard to discern > precisely which are proposed additions. > I suppose the proposal is rather dense -- the inevitabal tug-of-war between saying everything everywhere, thus making it so long nobody will read it, or presuming it is read from top to bottom so everything is explained in advance but must be remembered (the topological sort). The marking of new additions is in the left ("Code") column. If the code is Exxx it is to be added; otherwise it is already in Unicode (usually in the U+2xxx's). But OK, I'll try to highlight them better. > Someone from IBM should look at the 3270 stuff... I suppose someone will > do so. > I was hoping for some feedback from the IBM mainframe camp too; not just 3270 users, but also those who analyze and debug 3270 data streams. If any readers happen to know people outside this group who might be interested, please feel free to forward the proposal to them. > Another thing that should be discussed is when adding "symbol for foo" one > should also add "foo" itself. For instance, there is no "Start of Field" > control character; but a picture of it is being proposed. Probably UTC > needs to hash through *that* issue... > Oh what a tangled web we weave... I think in this case we have an exception to the rule. I think we can say that Unicode is ISO/ASCII based rather than EBCDIC based. The structure of U+0000 through U+00FF is identical with ASCII (= ISO 646 International Reference Version) + ISO 8859-1, with the layout of ISO 4873 (C0, GL, C1, GR). The C0 control set is, indeed, the ASCII C0 set (and that of ISO 646; ISO Registry number #001). Granted, the C1 area is left unspecified, but what else could it be but that of ISO 6429? I think it would be pretty weird (note: this is how we spell "weird" this week...) to add EBCDIC controls to an ISO/ASCII based character set. Personally, I'd rather leave them out and use the positions they would occupy for something more useful. But the *symbols* for them do need encoding, since we will be using Unicode-based software to analyze EBCDIC and/or 3270 data streams (wire bearing EBCDIC comes into PC, which uses Unicode internally). However, I would heartily welcome review by IBM or other EBCDIC/3270-centric party of the specific repertoire of glyphs in the proposal. > You should look at the glyph pieces in the Adobe Symbol font, which is a > widely used font. Many of these are contained in the Symbol font (0xE6 to > 0xFE inclusive). > All the more reason to add them to Unicode. Another, as Kent Karlsson pointed out earlier today, is that they are used in TeX (see the original TeX and METAFONT book, p.175: TeX Standard Extension Fonts). > I believe the following two characters are just masculine and feminine > ordinal indicators, and are already encoded between 0x80 and 0xFF, as part > of ISO Latin 1. They are probably just variant glyphs... unless the > documentation distinguishes them and they occur in pairs with lower-case. > Do you mean "small" or "capital"? Or are they really different? > > > E0B3 Latin small letter a with underbar SNI Math 04/04 (2) > > E0B4 Latin capital letter O with underbar SNI Math 04/09 (2) > Well, "small" means lowercase; "capital" actually means "big" -- who can tell with an "O"! Hopefully I'll be able to post GIFs of scanned pages soon; that'll be a day's work! The reason these need to be encoded separately from feminine/masculine ordinals are their size -- they fill the whole cell, like a regular letter. Since terminal emulators and data analyzers use fixed-pitch fonts, we can't just switch to another point size to display these characters, since that will wreck the matrix arrangement of the screen. > By the way, I'm opposed quite strongly to adding the 256 "hex bytes" under > any circumstances. Good thing they're an indepenedent proposal. > I certainly would not want to see them hold up the rest. > The total proposed, including Hex Bytes is 448. Without Hex Bytes, it's a > modest 192, and I think it could be reduced with a little more unification. > Of course reduction will offset the expected increase due to other > terminals clamoring to be included... > Yes, I see the mobs starting to form on the street below, waving placards emblazoned with vertical lightnings with solidi; diagonal lightnings with horizontal bars; European no-parking signs; Canadian moose-crossing signs... Seriously, the hex bytes are entirely separable from the rest. I'll be glad to cut them loose unless somebody speaks up strongly in their favor. Thanks again! - Frank 8-Oct-98 21:31:49-GMT,6665;000000000011 Return-Path: Received: from orpheus.amdahl.com (orpheus.amdahl.com [159.199.101.3]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with SMTP id RAA13325 for ; Thu, 8 Oct 1998 17:31:47 -0400 (EDT) Received: from minerva.amdahl.com([129.212.33.25]) (6252 bytes) by orpheus.amdahl.com via sendmail with P:smtp/R:match-mx-hosts/T:smtp (sender: ) id for ; Thu, 8 Oct 1998 14:31:40 -0700 (PDT) (Smail-3.2.0.102 1998-Aug-2 #1 built 1998-Aug-14) Received: from libra by minerva.amdahl.com with smtp (Smail3.1.29.1 #5) id m0zRNdh-0001THC; Thu, 8 Oct 98 14:30 PDT Message-Id: From: "Tony Harminc" To: Frank da Cruz Date: Thu, 8 Oct 1998 17:31:54 -0400 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: Terminal Graphics Draft 2 Priority: normal In-reply-to: <9810080053.AA13270@unicode.org> X-mailer: Pegasus Mail for Win32 (v3.01a) Just a few very minor comments. In general my comments are not meant to be inserted as text - they're for your information. I've left headings in to help identify the places. > This document represents a survey of the following terminals: > IBM 3164 and 3270 [15,16,27] Really, "3270" is not a terminal, i.e. there has never been a device made by IBM with that model number. Rather, 3270 is an architecture, with a large number of IBM terminals having been made that conform in varying degrees to its specifications. Typical 3270 model numbers are 3277 (the earliest implementation c. 1971), 3278 (the thing that most people think of when they think of a "real" 3270, c. 1977), and 3178 (a simpler, cheaper version c. 1982). > 3.1. Temporary Reference Code Assignments > > The characters proposed in this document are assigned temporary Unicode > values from the Private Use area, strictly for reference within (or to) > this document only. Final values should be assigned out of the Private ^^^^^^ Should probably read "outside of" to avoid ambiguity > 5.3. EBCDIC Control Pictures > Table 5.3 shows the EBCDIC control characters [29], in EBCDIC order. The > Code column shows the Unicode value; those starting with 24 are already in > Unicode block U+2400; those starting with E need to be added. The Val > column shows the EBCDIC value (hex). The Name column shows the EBCDIC > abbreviation for the code, and the description lists "Symbol for" plus the > EBCDIC name. There are no known "2X" forms in use. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I don't understand what this means. Are you saying that none of the EBCDIC control characters in the range X'20'-X'2F' are in use ? This is certainly not true. It probably means something else, but it's not obvious to me. > 5.5. 3270 Terminal Operator Status Indicators > > The IBM 3270 terminal displays a variety of unique glyphs in its Operator > Information Area [15, Figure A-4]. Although they are not encoded in any IBM > character set (known to me), they nevertheless appear on the screen, and are > therefore required for accurate terminal emulation. These glyphs are listed > in Table 5.5. In particular, they are not assigned GCGIDs in [29 as updated]. > Table 5.5: 3270 Terminal Operator Status Indicators > > Code Description > E080 Human stick figure > E081 Human stick figure in box > E082 Clock at 6:10 (or 1:30) Oops - I think I meant 2:30. :-) The hands are the same length. As for the double vs single width issue, I did look at some "real" 3270s today, unfortunately made by Memorex/Telex (who were never known for great faithfulness to the IBM model). They have a very tall, thin, squished looking clock that is clearly a single cell. One PC-based terminal emulator I have is TCP3270 from McGill University (since sold to Hummingbird Communications), and it ships a font with the two clock halves in separate characters in order to get a satisfactorily round clock face of legible size. It winds up being a 1 1/2 width character. > E083 White rectangle with stroke (1) > E084 Black rectangle with stroke (2) > E085 Lighting with stroke (3) > E086 Security key (4) > E087 Black and White Right-Pointing Triangles (5) Elsewhere it was suggested that the 4 and 6 in boxes were just inverse video characters; I think they are different. In particular, if we have a "white" numeral, then the surrounding box is also "white", and the background inside the box is "black". > Notes: > (1) A rectangle like the one at U+25AD with an oblique stroke through it. > Note that "white" and "black" are used in the sense of the Unicode > standard, and do not imply any particular colors or measure of goodness. > (2) A rectangle like the one at U+25AC with an oblique stroke through it. > (3) A horizontal lightning symbol with an oblique stroke through it. > (4) A picture of a key (indicating the keyboard is locked). This should not be unified with other lock or key-like symbols, in particular with the locked padlock commonly used to indicate shift lock. (This isn't in Unicode, I believe, but I think is part of Alan LaBonte's keyboard standards, and so might get in via that route.) This one is a key (rather than a lock) to show that a (physical) key is needed to use the terminal. > 10. REFERENCES > [13] IBM System/360 Principles of Operation, GA22-6821-8, Poughkeepsie, > NY, 1970. I would ditch the reference to the S/360 POO - it's been pretty much obsolete since 1970 or so. Fine for a historic reference, but I think [29] (as updated) is the better ref. > [14] IBM National Language Design Guide, Volume 2: National Language > Support Reference Manual, 4th Edition, North York, ON, 1994. (Order number SE09-8002-03) > [29] IBM Character Data Representation Architecture, Level 1 Registry, > IBM Canada Ltd., National Language Technical Centre, Ontario, > SC09-1391-00, 1990. The above publication is obsolete, and is replaced by: IBM Character Data Representation Architecture, Registration and Registry, IBM Canada Ltd., Toronto, SC09-2190-00, 1995. (This is a 300 page book also containing two CD-ROMs.) Thanks for doing all this work. I hope the views of the likes of Michael Everson ("Unicode will be in use for centuries" with the implication that all these silly terminal emulations are just dinosaurs) will not prevail. Cheers... Tony H. 8-Oct-98 21:52:29-GMT,2360;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA20099 for ; Thu, 8 Oct 1998 17:52:27 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA41846 ; Thu, 8 Oct 1998 14:48:19 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA19672; Thu, 8 Oct 98 14:28:58 -0700 Message-Id: <9810082128.AA19672@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6118 (1998-10-08 21:27:30 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 8 Oct 1998 14:27:29 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 I neglected to answer this one... > > Digital VT220 and higher terminals, as well as Televideo, Wyse, HP, Perkin > > Elmer, and other models, allow the user to select whether control > > characters are acted upon or displayed graphically. Unicode itself > > includes its own ... > > Well, to my mind that indicates that these aren't NEEDED to be encoded at > all! Just set the terminal emulator into the "display controls" mode and > let it display the glyphs that the emulator has for the control codes. > Good point. Indeed, this character set, unlike others in the VT220 (and higher) can not be selected by the host using ISO 2022 escape sequences, which makes sense -- the familiar "transparent mode" conundrum -- once entered, how then to exit, since everything is transparent? However, this is not to say that other terminals, such as the Wyse 60, which do not comply with ISO 2022 rules, do allow the host to command them into "display controls" mode. In any case, the control-picture symbols must be encoded because we're concerned not only about the terminal emulator but also the applications it must interact with, as in: User: "Help, my screen is messed up!" Help desk: "OK, click on Debug in the Terminal window menu bar and repeat what you did before." User: "Now my screen is REALLY messed up!" Help desk: "Let's have a look. Please use your mouse to copy it and paste it into your email window and send it to us." This is, of course, looking forward to the day when All Is Unicode... - Frank 8-Oct-98 22:14:17-GMT,2992;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id SAA25881 for ; Thu, 8 Oct 1998 18:14:16 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id PAA15352 ; Thu, 8 Oct 1998 15:11:49 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA19887; Thu, 8 Oct 98 14:44:41 -0700 Message-Id: <9810082144.AA19887@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6119 (1998-10-08 21:43:01 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 8 Oct 1998 14:43:00 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 John Cowan wrote: > I agree with Rick: don't propose characters that someone might need > someday. There are quite enough characters, and indeed whole > scripts, that are not yet available! > > > 2421 7F DEL DT Symbol for Delete (3) > > [...] > > > (3) Not, strictly speaking, a control character, but not a visible > > one either. > > DEL is a control character in every sense, despite its position at 7F. > It depends who you ask. ISO 6429, 3rd Edition, 1992, says (in Annex F, section 8.1) "The character DELETE..., not being a control function in the strict sense, has been removed from the body of this International Standard." DELETE and SPACE are special to ISO 4873 and ISO 2022, in which "character sets" (as we think of them, monolithically) are actually composed of a control portion (C0 or C1), SPACE (or not), a graphics portion, and DELETE (or not). SPACE is a character set unto itself, and so is DELETE. If one is present, the other must be too, in which case the graphics set is a 94-byte character set (with a little "byte" taken out of the northwest and southeast corner), and this is crucial to its identification. When SPACE and DELETE are not present, it is a 96-byte set (such as "the right half of ISO-8859 Latin Alphabet 1"). > > 5.3. EBCDIC Control Pictures > > Note: the EBCDIC/Unicode mapping tables at the Unicode FTP site > map the EBCDIC-specific controls onto the C1 space, but the mapping > seems to make no sense. For example, EBCDIC 09 (Superscript) > is mapped to Unicode 008D (Reverse Line Feed). Why? > IBM provides its own EBCDIC / ASCII control-character mapping in the CDRA. Of course it is inadequate as there are 64 EBCDIC control characters (three undefined), but there are only 32 in ASCII. In any case, we don't care about ASCII/EBCDIC mapping here. The need is for glyphs for visual representation of each EBCDIC control character, by name so people who live in the EBCDIC / 3270 world who must debug EBCDIC and/or 3270 data streams using Unicode-based software will be able to see these control characters represented by the names they are known by in the EBCDIC/3270 world. - Frank 8-Oct-98 23:08:25-GMT,7301;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id TAA10645 for ; Thu, 8 Oct 1998 19:08:24 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id QAA41818 ; Thu, 8 Oct 1998 16:02:17 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA20721; Thu, 8 Oct 98 15:56:45 -0700 Message-Id: <9810082256.AA20721@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6120 (1998-10-08 22:56:30 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 8 Oct 1998 15:56:28 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 "Tony Harminc" wrote: > Just a few very minor comments. In general my comments are not meant > to be inserted as text - they're for your information. I've left > headings in to help identify the places. > I appreciate them, many thanks! > > This document represents a survey of the following terminals: > > > > IBM 3164 and 3270 [15,16,27] > > Really, "3270" is not a terminal, i.e. there has never been a device > made by IBM with that model number. Rather, 3270 is an architecture... > Right -- sloppy wording again. > with a large number of IBM terminals having been made that conform in > varying degrees to its specifications. Typical 3270 model numbers > are 3277 (the earliest implementation c. 1971), 3278 (the thing that > most people think of when they think of a "real" 3270, c. 1977) > The one that weighs about 700 pounds :-) (318 Kg) (Actually it's not so heavy compared to the 2741...) > > 5.3. EBCDIC Control Pictures > ... > > EBCDIC name. There are no known "2X" forms in use. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > I don't understand what this means. Are you saying that none of the > EBCDIC control characters in the range X'20'-X'2F' are in use ? This > is certainly not true. It probably means something else, but it's > not obvious to me. > Sorry, sequential reading required again :-) You probably skipped directly to the EBCDIC sections. "2X" was my shorthand for 2-character abbreviations for 3-character mnemonics, such as used in the Display Controls font of Wyse and Televideo terminals (see Section 5.1). > > Table 5.5: 3270 Terminal Operator Status Indicators > > > > Code Description > > E080 Human stick figure > > E081 Human stick figure in box > > E082 Clock at 6:10 (or 1:30) > > Oops - I think I meant 2:30. :-) > Oops, right. > As for the double vs single width issue, I did look at some "real" > 3270s today, unfortunately made by Memorex/Telex (who were never > known for great faithfulness to the IBM model). They have a very > tall, thin, squished looking clock that is clearly a single cell. > One PC-based terminal emulator I have is TCP3270 from McGill > University (since sold to Hummingbird Communications), and it ships a > font with the two clock halves in separate characters in order to get > a satisfactorily round clock face of legible size. It winds up being > a 1 1/2 width character. > Do you think this is worth worrying about? We certainly have a lot of glyphs in Unicode that are more complex than this but, presumably, fit in a single cell. > > E083 White rectangle with stroke (1) > > E084 Black rectangle with stroke (2) > > E085 Lighting with stroke (3) > > E086 Security key (4) > > E087 Black and White Right-Pointing Triangles (5) > > Elsewhere it was suggested that the 4 and 6 in boxes were just > inverse video characters; I think they are different. In particular, > if we have a "white" numeral, then the surrounding box is also > "white", and the background inside the box is "black". > Do you think it matters? We need to conserve code points whenever possible; in this case it would seem to me that no information is lost by displaying full-cell inverse video digits. I should probably have a look at a real 327x... > > (4) A picture of a key (indicating the keyboard is locked). > > This should not be unified with other lock or key-like symbols, in > particular with the locked padlock commonly used to indicate shift > lock. (This isn't in Unicode, I believe, but I think is part of Alan > LaBonte's keyboard standards, and so might get in via that route.) > This one is a key (rather than a lock) to show that a (physical) key > is needed to use the terminal. > Noted. And I do remember seeing a padlock on an IBM 327x terminal screen, don't I? It was a long time ago, maybe I dreamed it. Anyway, I can't find any reference to it now. > > [13] IBM System/360 Principles of Operation, GA22-6821-8, Poughkeepsie, > > NY, 1970. > > I would ditch the reference to the S/360 POO - it's been pretty much > obsolete since 1970 or so. Fine for a historic reference, but I think > [29] (as updated) is the better ref. > This is my reference for Table 5.3A. Thanks for the other updated references. > Thanks for doing all this work. I hope the views of the likes of > Michael Everson ("Unicode will be in use for centuries" with the > implication that all these silly terminal emulations are just > dinosaurs) will not prevail. > Michael's views are mainstream. In any case, Michael is a passionate advocate of some of my favorite scripts :-) I certainly would not want to see (say) hex bytes squeezing out (say) Nordic or Irish runes. Terminals (or emulators -- including xterms and the like), protocol analyzers, escape sequences, termcaps, timesharing systems, mainframes, etc, are approximately as widespread now as they ever were, but around them has grown an entirely new world of Windows and GUIs and Web browsers, and this is all we hear about in the mass market. Younger people are not even aware of this older world, quietly doing its job in its unglamorous "machine rooms", just as passengers on an ocean liner could be unaware of what went on below decks. I have seen Columbia students drop their jaws in amazement upon first seeing a terminal emulation screen -- let alone an actual terminal -- "What's THAT???? It's so UGLY!". But that world is there; we all depend on it, and it's not going anywhere. (Really. Nothing ever quite disappears. I have heard of a payroll system that was originally written in the early 1960s for an IBM ... OK, I forget the exact model number -- 1104 or something like that. When that machine was replaced by a 7094 (?), the same payroll system ran under an 1104 emulator. When the 7094 was replaced by a 360, it still ran on the 1104 emulator, which itself ran on a 7094 emulator. And so on and so on, legend has it, to this day.) - Frank P.S. This is coming to you from the original IBM Thomas J Watson Research Laboratory, where IBM developed much of its 1940s and 50s technology before turning the building over to Columbia University in 1955 and moving to Yorktown Heights (no, I wasn't here then). "THIMK" :-) P.P.S. A slightly updated draft (2.5), based on today's discussion, is in the usual place (get out your clickers): ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt (End) 9-Oct-98 5:28:10-GMT,2059;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id BAA01525 for ; Fri, 9 Oct 1998 01:28:09 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id WAA47442 ; Thu, 8 Oct 1998 22:27:49 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA21907; Thu, 8 Oct 98 22:19:52 -0700 Message-Id: <9810090519.AA21907@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Uml-Sequence: 6121 (1998-10-09 05:19:27 GMT) From: Geoffrey Waigh Reply-To: unicode@unicode.org To: Unicode List Date: Thu, 8 Oct 1998 22:19:25 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Content-Transfer-Encoding: 7bit To interject a small point on the matter of high-fidelity terminal emulation; the vast majority of emulation users are not going to quibble over the exact appearance or dimensions of the status area indicators. When I worked for a company doing terminal emulation, the customers only wanted it to run their applications correctly and support a bizarre array of kludges^H^H^H^H^H^H^Hfeatures that let them improve performance/functionality of their system. Quite a few terminal features were missing until a new customer needed it and I was appalled at how divergent some of the character code to glyph mappings that customers ended up with were. [When I was migrating the system to Unicode the fonts, character sets and character -> glyph mappings were cleaned up.] Also on the matter of debugging 5250/3270 streams our developers always used an ASCII text representation. There wasn't any desire that I can recall for fancy debugging fonts from either our technical staff or our customers. (Then again the customers usually left terminal protocol analysis to our support group.) Geoffrey Waigh gpw@cybersurf.net 9-Oct-98 7:25:55-GMT,2599;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id DAA20023 for ; Fri, 9 Oct 1998 03:25:53 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id AAA43726 ; Fri, 9 Oct 1998 00:24:28 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA22250; Fri, 9 Oct 98 00:12:21 -0700 Message-Id: <9810090712.AA22250@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Uml-Sequence: 6122 (1998-10-09 07:12:10 GMT) From: Paul Keinanen Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 00:12:09 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id DAA20023 At 13:44 8.10.1998 -0700, John Cowan wrote: >Frank da Cruz wrote: > >> 2421 7F DEL DT Symbol for Delete (3) > >[...] > >> (3) Not, strictly speaking, a control character, but not a visible >> one either. > >DEL is a control character in every sense, despite its position at 7F. The situation with Delete/Rubout is a bit complicated, depending on usage. In the VTxxx environment it is clearly a control function (erasing the previous character), but originally the code for Delete/Rubout was chosen for Teletype style paper tape manual entry. If you hit the wrong key, you manually stepped back the tape one character and overpunched all 7 holes (7F in 7 bit+odd parity) or all 8 holes (FF in 7 bit+even parity) and the reader/computer would then ignore the character. In that sense it is a dummy nonspaced character. When using a Teletype with a paper tape reader, you had to be careful to have the computer in correct mode for manual resp. paper tape reader input. Due to this duality, should Rubout and Delete be concidered to be two separate characters, although they seem to have the same code point in all ASCII based character sets ? Probably not, as we try to unify other characters as well, but at least the Rubout functionality should also be included in the description. You could of course argue that the Rubout functionality is either a nonspaced dummy character just in the same way as NUL, but as NUL is concidered a control character, thus the Rubout should also be concidered a control character. So indeed, there can be many interpretations. Paul Keinänen 9-Oct-98 13:40:53-GMT,1567;000000000001 Return-Path: Received: from mailrelay1.cc.columbia.edu (mailrelay1.cc.columbia.edu [128.59.35.143]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id JAA02739 for ; Fri, 9 Oct 1998 09:40:52 -0400 (EDT) Received: from cyber (gate1.rrds.co.uk [195.166.41.35]) by mailrelay1.cc.columbia.edu (8.8.5/8.8.5) with SMTP id JAA00654 for ; Fri, 9 Oct 1998 09:40:48 -0400 (EDT) Sender: pawillia@rrds.co.uk Message-Id: <361E101C.E3A323C1@rrds.co.uk> Date: Fri, 09 Oct 1998 14:31:09 +0100 From: Paul Williams Organization: Racal Radar Defence Systems Ltd X-Mailer: Mozilla 4.02 [en] (X11; I; SunOS 5.5.1 sun4m) MIME-Version: 1.0 To: fdc@columbia.edu Subject: Terminal Graphics for Unicode. Trainspotter Alert! Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Frank, [Your email inbox must be huge. Please don't feel obliged to reply to this.] I've just been reading your very interesting proposal to add characters found in terminal repertoires to the Unicode standard. Although I expect that point 2 in the Problems section doesn't hold true for the VT320, i.e. "Lack of definitive, high-quality pictures of the glyphs in some cases", you may still like to take a look at: http://www.celigne.co.uk/terminal/built-in_glyphs.html I do realise that this is the work of a complete anorak, but it is part of my project to write a "real" VT320 terminal emulator and document it completely. Regards, Paul [not speaking on behalf of my employer] 9-Oct-98 13:54:43-GMT,3358;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id JAA06407; Fri, 9 Oct 1998 09:54:39 -0400 (EDT) Sender: Frank da Cruz Date: Fri, 9 Oct 98 9:54:38 EDT From: Kermit Software Support To: Karlsson Kent - keka Subject: Re: Kermit95 In-Reply-To: Your message of Fri, 9 Oct 1998 11:52:00 +0200 Cc: kermit-support@columbia.edu Reply-To: kermit-support@columbia.edu Message-ID: > I have a few questions: > > 1. Can one use Unicode (UTF-16, either endianism, or UTF-8) on the host > side, i.e. 'on the wire' (between the host and Kermit)?=A0 (This would > be for a Unicode enabled application on the host.) > Not yet. So far nobody has asked for it. In any case, I have not heard of any platform that offers UTF-8 host-terminal sessions. Well, maybe Plan 9? But yes, we do plan to add UTF-8 support, even in advance of user demand. > 2. Is the Unicode enabled Kermit available also on Win95/98? > it only for NT? > It's only for NT because, at present, Kermit 95 is a console application, and Unicode is not supported in console windows on Windows 95/98. > 3. Can one use Kanji/Hangul etc. with Kermit95? Can one use proportional > width fonts in general with Kermit95? > DCBS is problematic in console windows. Proportional-width fonts don't usually make sense in a terminal screen. However, when K-95 is converted to full GUI form, the user should be able to choose any font at all, including a proportional one. >From the Kermit 95 FAQ: There is no explicit support in Kermit 95 for Chinese, Japanese, or Korean (CJK), but you still might be able to view CJK in a Kermit 95 window if (a) your PC is configured to allow it; (b) the CJK character set used on your PC is the same as that on the host; (c) K95 has been told to "set terminal character-set transparent" and "set terminal bytesize 8". However, the ability to view CJK text does not necessarily mean you can also enter it. According to Microsoft Knowledge Base Article Q156793, CJK Input Method Editors are not available in Console windows under Windows 95, even though they are available in Windows NT 3.51 and later. Explicit support for CJK terminal emulation is planned for future releases of Kermit 95, after release of the GUI. We do, however, support translation among the full range of Kanji character sets during file transfer. > (I.e. can I use, e.g. Bitstreams Cyberbit font to display > Latin/Hangul/Kanji/Hiragana/...?) > Maybe in NT. Certainly not in Win95/98 at present. > 4. Can Kermit95 properly handle (a non-empty subset of) combining > characters? Conjoining Hangul Jamo? > No. > 5. Can Kermit95 interpret ESC-sequences, so that (terminalish) forms > can be drawn, and filled in? (ECS-sequence positioning would be to a "cell > grid", but text strings should still be displayable via a proportional font). > Yes, except for the proprortional font part. All terminals emulated by Kermit 95 use fixed-pitch fonts, and so all of Kermit 95's emulations assume a regular matrix of screen cells. Are you aware of any terminal that does not follow this model? > 6. Can Kermit be used together with an IME (to input, e.g. Kanji). > (Just asking...) > See FAQ quote above. - Frank 9-Oct-98 14:09:23-GMT,3918;000000000001 Return-Path: Received: (from jaltman@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id KAA10541; Fri, 9 Oct 1998 10:09:21 -0400 (EDT) Sender: Jeffrey Altman Date: Fri, 9 Oct 98 10:09:20 EDT From: kermit-support@watsun.cc.columbia.edu To: kermit-support@columbia.edu Cc: Karlsson Kent - keka , kermit-support@columbia.edu Subject: Re: Kermit95 In-Reply-To: Your message of Fri, 9 Oct 98 9:54:38 EDT Reply-To: kermit-support@watsun.cc.columbia.edu Message-ID: Let me elaborate on some of the responses in more detail. > > 2. Is the Unicode enabled Kermit available also on Win95/98? > > it only for NT? > > > It's only for NT because, at present, Kermit 95 is a console application, > and Unicode is not supported in console windows on Windows 95/98. All versions of K95 use Unicode internally during terminal emulation. In other words, all remote character-sets are translated to Unicode and stored in the screen buffer. On NT, Unicode is used for display because it is supported by the OS. On Win95/98 and OS/2 the Unicode characters are converted to the equivalent character (if one exists) in the code page used on the local system. Adding support for host based UTF-7 or UTF-8 in conjunction with an existing terminal emulation will not be difficult. If you have a host application that does this, we would like to know about it. > > 3. Can one use Kanji/Hangul etc. with Kermit95? Can one use proportional > > width fonts in general with Kermit95? > > > DCBS is problematic in console windows. Proportional-width fonts don't > usually make sense in a terminal screen. However, when K-95 is converted to > full GUI form, the user should be able to choose any font at all, including a > proportional one. > > >From the Kermit 95 FAQ: > > There is no explicit support in Kermit 95 for Chinese, Japanese, or > Korean (CJK), but you still might be able to view CJK in a Kermit 95 > window if (a) your PC is configured to allow it; (b) the CJK character > set used on your PC is the same as that on the host; (c) K95 has been > told to "set terminal character-set transparent" and "set terminal > bytesize 8". However, the ability to view CJK text does not necessarily > mean you can also enter it. According to Microsoft Knowledge Base > Article Q156793, CJK Input Method Editors are not available in Console > windows under Windows 95, even though they are available in Windows NT > 3.51 and later. Explicit support for CJK terminal emulation is planned > for future releases of Kermit 95, after release of the GUI. > > We do, however, support translation among the full range of Kanji character > sets during file transfer. > > > (I.e. can I use, e.g. Bitstreams Cyberbit font to display > > Latin/Hangul/Kanji/Hiragana/...?) > > > Maybe in NT. Certainly not in Win95/98 at present. The primary reason that we have not implemented DBCS support for K95 is because there are no freely distributed monospaced Unicode fonts that populate the CJK areas. The Bitstream Cyberbit font is proportionally spaced and cannot be used in Console windows. Monotype does have fully populated versions of Lucida Console and Monotype.com fonts that can be purchased. You would have to contact them for pricing info. These fonts can them be used in Win95/98 and NT. In NT you would now have the ability to display the CJK characters. However, as Win95/98 does not have code page support for the CJK characters (at least in the Western releases) you would still not be able to display the CJK characters. Jeffrey Altman * Sr.Software Designer * Kermit-95 for Win32 and OS/2 The Kermit Project * Columbia University 612 West 115th St #716 * New York, NY * 10025 http://www.kermit-project.org/k95.html * kermit-support@kermit-project.org 9-Oct-98 14:20:35-GMT,3605;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id KAA13757 for ; Fri, 9 Oct 1998 10:20:34 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id HAA45070 ; Fri, 9 Oct 1998 07:19:55 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA23282; Fri, 9 Oct 98 07:13:18 -0700 Message-Id: <9810091413.AA23282@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Uml-Sequence: 6123 (1998-10-09 14:12:43 GMT) From: Markus Kuhn Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 07:12:42 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Frank da Cruz wrote on 1998-10-08 21:27 UTC: > In any case, the control-picture symbols must be encoded because we're > concerned not only about the terminal emulator but also the applications it > must interact with, as in: > > User: "Help, my screen is messed up!" > > Help desk: "OK, click on Debug in the Terminal window menu bar > and repeat what you did before." > > User: "Now my screen is REALLY messed up!" > > Help desk: "Let's have a look. Please use your mouse to copy it > and paste it into your email window and send it to us." > > This is, of course, looking forward to the day when All Is Unicode... The helpdesk can already get the same effect today by much simpler asking for a bitmap of the screen or window to be mailed: Help desk: "What GUI are you using?" User: "X11" Help desk: "Excellent choice. Just enter the shell command 'xwd | uuencode - | mail helpdesk' and then click on the messed-up window" User: "Done." Help desk: "Ah, now I see your problem. That is easy to fix ..." Let's not create unnecessary complicated user requirements. I highly welcome attempts to complete Unicode with the various technical character set symbols that various terminal types have, after proper unification according to the well-proven character/glyph model. Also symbols that are not part of the user accessible character set of a terminal but that appear as part of the normal look-and-feel of this terminal in status lines, etc. should be added to Unicode, in order to allow to emulate one terminal inside another terminal emulator (e.g., an IBM 3270 emulator that runs inside an UTF-8 enhanced xterm). I am sceptical however, whether all the many control symbols really need to have a place in the BMP. I think that debugging tools can quite easily provide them using some other replacement notation, that might or might not bypass the usual font mechanisms. I have been used to see ^M in a different color as a common replacement symbol for Carriage Return (Ctrl-M) for over 15 years, and I never missed a much less readable CR glyph for debugging purposes. Debugging is done by experts and experts are used to cope with any replacement notation anyway. We can do hexdumps quite nicely with 0-9A-F without having to resort to hex-byte-glyphs and control abbreviation glyphs. I think the criterion for inclusion of terminal emulator characters should be whether the character can ever be seen by a normal user in normal (non-debugging, non-configure) operation on the screen of the terminal. Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: 9-Oct-98 14:49:23-GMT,2248;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id KAA22940 for ; Fri, 9 Oct 1998 10:49:21 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id HAA22448 ; Fri, 9 Oct 1998 07:47:47 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA23387; Fri, 9 Oct 98 07:29:59 -0700 Message-Id: <9810091429.AA23387@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Uml-Sequence: 6124 (1998-10-09 14:29:42 GMT) From: Markus Kuhn Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 07:29:41 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Frank da Cruz wrote on 1998-10-08 22:56 UTC: > (Really. Nothing ever quite disappears. I have heard of a payroll system > that was originally written in the early 1960s for an IBM ... OK, I forget > the exact model number -- 1104 or something like that. When that machine > was replaced by a 7094 (?), the same payroll system ran under an 1104 > emulator. When the 7094 was replaced by a 360, it still ran on the 1104 > emulator, which itself ran on a 7094 emulator. And so on and so on, legend > has it, to this day.) Another question is which terminals should actually be supported. Many of the ones you mentioned have died away already. I am aware of the IBM 3270 family and the DEC VT100 family having a long and healthy live (to Michael Everson: those might indeed still be around in a hundred years from now, we will know after Y2K), but much of the rest is probably not sufficiently mainstream enough to deserve consideration in Unicode. Do you have any form of data on the terminal emulator market regarding more exotic terminal types? Are there really more than a few hundred people out there who use applications that depend on a terminal type radically different from a DEC VT340 or a IBM 3278? Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: 9-Oct-98 15:38:18-GMT,1363;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id LAA08268; Fri, 9 Oct 1998 11:38:10 -0400 (EDT) Date: Fri, 9 Oct 98 11:38:09 EDT From: Frank da Cruz To: Otto Stolz Subject: Re: Terminal Graphics Draft 2 In-Reply-To: Your message of Fri, 9 Oct 1998 17:31:21 -0600 Message-ID: > Hello, > > on 1998-10-09 at 17:19, I have written to the Unicode list: > > Cf. figure 3-1 in IBM form GA27-2837-8 > > "IBM 3270 Information Display System Character Set Reference". > > If you want this old (Aug 1968), pre-CDRA pamphlet, you can have it > for the asking. It contains various keyboard layouts, I/O interface > code charts and some ancillary material (including special 3270 controll > character assignments, cf. my forthcoming note to the Unicode list). > I'd love to have it -- as you can guess, I collect such things. Thanks! Frank da Cruz The Kermit Project Columbia University 612 West 115th Street New York NY 10025-7799 USA P.S. Another rare object I was hoping to find was an Siemens (or Nixdorf?) BA80 terminal manual. Nobody at SNI acknowledges there ever was such a thing, but I don't believe them. This is just a "shot in the dark" -- ignore it if you don't know what I'm talking about. - Frank 9-Oct-98 15:43:42-GMT,1745;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id LAA09826 for ; Fri, 9 Oct 1998 11:43:40 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id IAA17762 ; Fri, 9 Oct 1998 08:42:00 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA23887; Fri, 9 Oct 98 08:33:42 -0700 Message-Id: <9810091533.AA23887@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Uml-Sequence: 6125 (1998-10-09 15:33:15 GMT) From: Otto Stolz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 08:33:14 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Am 1998-10-8 um 12:19 hat John Cowan geschrieben: > A typical (though not the only) > glyph for U+2424 is the one which appears on the Enter key of PC > keyboards. Please describe. On my keyboards (both PC and X-Terminal), the Enter key has the word "Enter" engraved, whilst the Return key has a U+21B2 Downwards Arrow with Tip Leftwards (or is it a glyph variant of U+21B5?). If I remember correctly, my 3270 terminal had the similar engravings on these keys, viz. "DatFreig" (Datenfreigabe = German translation of "Enter"), and U+21B5, respectively. Cf. figure 3-1 in IBM form GA27-2837-8 "IBM 3270 Information Display System Character Set Reference". Btw., the semantics of these 3270 keys were quite different: the Enter key sends data to zhe host, whilst the Return key is just a local cursor movement without sending anything. Best wishes, Otto Stolz 9-Oct-98 16:02:21-GMT,842;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id MAA14500; Fri, 9 Oct 1998 12:02:17 -0400 (EDT) Date: Fri, 9 Oct 98 12:02:17 EDT From: Frank da Cruz To: Otto Stolz Subject: Re: Terminal Graphics Draft 2 In-Reply-To: Your message of Fri, 9 Oct 1998 17:58:28 -0600 Message-ID: > Am 1998-10-9 um 17:31 hat Otto Stolz geschrieben: > > ancillary material (including special 3270 controll > > character assignments, cf. my forthcoming note to the Unicode list). > > This is only to tell you that no note is forthcoming, as I eventually have > found all of those control characters in either table 5.3 or 5.4 of your > 2nd draft. > OK, that will please the minimalists :-) Thanks. - Frank 9-Oct-98 16:25:12-GMT,1442;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id MAA20078 for ; Fri, 9 Oct 1998 12:25:11 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id JAA65830 ; Fri, 9 Oct 1998 09:24:47 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA24229; Fri, 9 Oct 98 09:18:58 -0700 Message-Id: <9810091618.AA24229@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Uml-Sequence: 6126 (1998-10-09 16:18:46 GMT) From: John Cowan Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 09:18:45 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Content-Transfer-Encoding: 7bit Otto Stolz wrote: > On my keyboards (both PC and X-Terminal), the Enter key has the word > "Enter" engraved, whilst the Return key has a U+21B2 Downwards Arrow > with Tip Leftwards (or is it a glyph variant of U+21B5?). Mm, you're right. I don't know what I was thinking of. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org You tollerday donsk? N. You tolkatiff scowegian? Nn. You spigotty anglease? Nnn. You phonio saxo? Nnnn. Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) 9-Oct-98 17:17:51-GMT,1805;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA04954 for ; Fri, 9 Oct 1998 13:17:50 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA52330 ; Fri, 9 Oct 1998 10:17:12 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA24788; Fri, 9 Oct 98 10:04:28 -0700 Message-Id: <9810091704.AA24788@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6129 (1998-10-09 17:04:14 GMT) From: Rick McGowan Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 10:04:12 -0700 (PDT) Subject: careless quotation and forwarding It's nice to quote a little context when you're replying to some previous message. People do that a lot by pre-pending ">" to the quoted lines. Sometimes, however, people don't take enough care with REMOVING the irrelevant portions of the note they're quoting. I get a little tired of scenarios like this: Someone quotes a few lines of a note, intersperses comments, then just LEAVES the rest of the note. There was one note yesterday on this list that ended with 717 lines quoted from another note, WITHOUT COMMENT. This was some 30k of excess garbage that clogged up my inbox, and I had to scroll all the way through it to verify that it was UNCOMMENTED, and hence irrelevant to forward. Please take more care in removing excess quoted material -- parts of notes that are irrelevant or upon which you are not commenting. It doesn't take that much time to do, and saves all of your readers the trouble of fruitlessly wading through the excess. Thanks, Rick 9-Oct-98 17:37:03-GMT,1854;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA11282 for ; Fri, 9 Oct 1998 13:37:01 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA56198 ; Fri, 9 Oct 1998 10:35:01 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA24939; Fri, 9 Oct 98 10:15:15 -0700 Message-Id: <9810091715.AA24939@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6130 (1998-10-09 17:13:48 GMT) From: Rick McGowan Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 10:13:47 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Frank wrote... > The reason these need to be encoded separately from feminine/masculine > ordinals are their size Ah, I figured so. In this case, they should just be unified with the existing codes. > Since terminal emulators and data analyzers use fixed-pitch fonts, we can't > just switch to another point size to display these characters, since that will > wreck the matrix arrangement of the screen. Eh? There are plenty of fixed-pitch fonts that include masculine and feminine ordinal indicators taking up the same size cell as everything else. Big or small, unless you need to distinguish these from small ordinals for the emulation of ONE Actual Physical Terminal, there's no point in encoding them again. > Seriously, the hex bytes are entirely separable from the rest. I'll be > glad to cut them loose unless somebody speaks up strongly in their favor. I'll repeat myself strongly in their disfavor. I think you should remove them from this proposal, whether or not you make another proposal for them. Rick 9-Oct-98 17:49:15-GMT,1449;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA14124 for ; Fri, 9 Oct 1998 13:49:12 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA57790 ; Fri, 9 Oct 1998 10:47:35 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA25247; Fri, 9 Oct 98 10:36:02 -0700 Message-Id: <9810091736.AA25247@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6131 (1998-10-09 17:35:33 GMT) From: kenw@sybase.com (Kenneth Whistler) Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 10:35:32 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Frank wrote: > > (Really. Nothing ever quite disappears. I have heard of a payroll system > that was originally written in the early 1960s for an IBM ... OK, I forget > the exact model number -- 1104 or something like that. When that machine > was replaced by a 7094 (?), the same payroll system ran under an 1104 > emulator. When the 7094 was replaced by a 360, it still ran on the 1104 > emulator, which itself ran on a 7094 emulator. And so on and so on, legend > has it, to this day.) Somewhat orthogonally, many of these old dinosaurs are about to be cleared away by the asteroid known as Y2K on its way to impact. --Ken 9-Oct-98 18:10:55-GMT,1548;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA19499 for ; Fri, 9 Oct 1998 14:10:53 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id LAA42778 ; Fri, 9 Oct 1998 11:09:06 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA25845; Fri, 9 Oct 98 10:55:36 -0700 Message-Id: <9810091755.AA25845@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6135 (1998-10-09 17:54:18 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 10:54:14 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 > > The reason these need to be encoded separately from feminine/masculine > > ordinals are their size > > Ah, I figured so. In this case, they should just be unified with the > existing codes. > See my response to Otto Stolz on this. Again, I don't care so much whether these particular characters are encoded, but they illustrate a point worth making, namely that unifications that work in the GUI world don't necessarily work in an environment where we must use a fixed-pitch font. If a "big" feminine ordinal arrives, I can't just display the regular one at a bigger point size with a lower baseline, because (a) I might not be able to (maybe it's a console application), and (b) cells must be fixed size. - Frank 9-Oct-98 18:11:50-GMT,2295;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA19718 for ; Fri, 9 Oct 1998 14:11:47 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id LAA45088 ; Fri, 9 Oct 1998 11:09:08 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA25534; Fri, 9 Oct 98 10:47:19 -0700 Message-Id: <9810091747.AA25534@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6132 (1998-10-09 17:45:54 GMT) From: kenw@sybase.com (Kenneth Whistler) Reply-To: unicode@unicode.org To: Unicode List Cc: kenw@sybase.com Date: Fri, 9 Oct 1998 10:45:47 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 John Cowan wrote: > > But the SYMBOL FOR NEWLINE is not tied to U+2424, which is a LINE > SEPARATOR, not a line terminator. A typical (though not the only) > glyph for U+2424 is the one which appears on the Enter key of PC > keyboards. No. U+2424 *is* SYMBOL FOR NEWLINE It is a graphic symbol for the NEWLINE function. U+2424 is *not* a line separator, nor a line terminator. It is not a control function or control code at all. It *is* the character which should be used when displaying a graphic symbol for the control function NEWLINE. And here we are talking the EBCDIC NL (etc.), not C `\n'. U+21B5 DOWNARDS ARROW WITH CORNER LEFTWARDS is the character which can be used to represent that which typically appears on the Enter key of PC keyboards (i.e., serving as a graphic symbol for CARRIAGE RETURN). Incidentally, there is another pile of graphic symbols for keyboard functions coming down the pike in Amendment 22 to 10646 (based on ISO 9995-7). These should be checked to verify that there are no duplicates against the collection of symbols being proposed for terminal emulation. (Examples: symbols for compose, enter, alternate, shift lock, undo, print screen, clear screen, delete, etc.) --Ken > > -- > John Cowan http://www.ccil.org/~cowan cowan@ccil.org > You tollerday donsk? N. You tolkatiff scowegian? Nn. > You spigotty anglease? Nnn. You phonio saxo? Nnnn. > Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5) > 9-Oct-98 18:22:26-GMT,2272;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA22300 for ; Fri, 9 Oct 1998 14:22:24 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id LAA40550 ; Fri, 9 Oct 1998 11:20:39 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA25711; Fri, 9 Oct 98 10:53:19 -0700 Message-Id: <9810091753.AA25711@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6133 (1998-10-09 17:51:01 GMT) From: kenw@sybase.com (Kenneth Whistler) Reply-To: unicode@unicode.org To: Unicode List Cc: kenw@sybase.com Date: Fri, 9 Oct 1998 10:51:00 -0700 (PDT) Subject: RE: Terminal Graphics Draft 2 > I can't think of any connections where both NL and NEL would be > used in the same data stream, since data streams tend to be > either ASCII/ISO or EBCDIC, but not a mixture. > > However, real terminals (VT220-520) print "NEL" when in display- > controls mode, so why make an exception to the rule of printing > the actual name in this one case? I think this would be more > confusing than doing what the actual terminal does. Especially > considering the "metareferences" we might make to these characters > in Unicode texts; e.g. "An ISO data stream will show the [NEL] > character, whereas an EBCDIC data stream will show the [NL] > character"... > > I'm sure we could also find other examples of control characters > in the C1 and EBCDIC sets whose semantics are the same or close > but whose names differ; I don't think that means we should unify > them. The purpose of "display controls" is to show the customary > and familiar mnemonic for each control character in its context so > people can read them easily. And since your collection of display controls lists both the three-letter and two-letter mnemonics for these things, I cannot see any argument for disunification. This is the thing meant for what in your chart is: E025 85 NEL NL Symbol for Next Line U+2424 is the correct character for the graphic symbol display of "NEL" or "NL Symbol or Next Line (or Symbol for Newline). --Ken > > - Frank > 9-Oct-98 18:40:04-GMT,3883;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA28263 for ; Fri, 9 Oct 1998 14:40:02 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id LAA40650 ; Fri, 9 Oct 1998 11:26:03 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA25671; Fri, 9 Oct 98 10:52:15 -0700 Message-Id: <9810091752.AA25671@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6134 (1998-10-09 17:51:41 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 10:51:40 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 > If the Feminine, and Masculine, Ordinal Indicators (U+00AA, and U+00BA, > respectively) were written in the same fixed-pitch font as the surrounding > text, they also would occupy one cell, each, won't they? > Yes, but they would be too small. The SNI glyphs are full-size base characters, but the ordinal indicator glyphs are superscripts. > As Frank had written in both of his own drafts: > > arriving at a sufficient set of character-cell terminal graphics for > > Unicode is complicated by the well-known problems that affect other > > preexisting character sets to varying degrees: > > 1. Lack of official names for the characters of some of the sets. > > 2. Lack of definitive, high-quality pictures of the glyphs in some cases. > > 3. Lack of descriptions of the purpose and intended use of the glyphs. > > I think, those are good reasons not to take the glyphs in the Siemens > Nixdorf 97801-5xx Benutzerhandbuch too seriously -- good reasons to unify > these characters with the above-mentioned U+00AA and U+00BA. The only > reason, IMHO, not to unify them would be existence of a character set > containing different glyphs both for the proposed characters and the > existing ones -- as Rick has already noted. > I tend to agree. The "strange" SNI glyphs are not a high priority, to me personally at least. I have, however, posted a message to the Sinix newsgroup (of SNI customers) to see if any strong opinions come to the surface. All I can say from my own experience is that there was heavy demand for accurate SNI terminal emulation for Windows 95/98/NT, and we met that demand as best we could within the limitations of the code pages and fonts available to us. For those of you not familiar with SNI 97801, it probably has the most advanced ISO 2022 implementation and repertoire of character sets of any terminal ever built -- at least in the West (it lacks Hebrew, Arabic, and CJK, but includes ISO 8859-1,2,3,4,5,7,9, various ISO 646 versions, plus a selection of "strange" private sets, and a wide variety of input methods). To answer Otto's point with a question: what is a character set? I can see both a superscript feminine ordinal and a "big" feminine ordinal on the same screen simply by sending ISO 2022 escape sequences to switch "character sets". So in a sense, all character sets that can be designated and invoked by ISO 2022 escape sequences form one big character set :-) See, for example: http://www.columbia.edu/kermit/kuishots.html Go down to Shot 3. This screen was produced using ISO 2022 escape sequences from the host to a VT320 terminal emulator on Windows 95, with Lucida Console as the (Unicode) font. The same screen could be produced by sending the exact same data stream to the 97801. (This screen does not show any of the SNI "strange" glyphs, but I hope it illustrates the point.) Again, I have no great investment in these characters, and so far our SNI users have not complained about their absence, but before striking them I hope to hear some additional testimony from them. - Frank 9-Oct-98 19:46:19-GMT,1944;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id PAA16362; Fri, 9 Oct 1998 15:46:15 -0400 (EDT) Date: Fri, 9 Oct 1998 15:46:15 -0400 (EDT) Message-Id: <199810091946.PAA16362@watsun.cc.columbia.edu> From: Frank da Cruz To: Michael Everson Subject: Re: Terminal Graphics Draft 2 In-Reply-To: Your message of Fri, 9 Oct 1998 12:20:23 -0700 (PDT) > Everson Mono is fixed-pitch font that includes masculine and feminine > ordinal indicators taking up the same size cell as everything else. > Hi Michael. Did we discuss Everson Mono before? I mean, the possibility of packaging it with Kermit 95? (Since Microsoft so resolutely and, may I say, arrogantly, refuses to decently populate Lucida Console.) I assume we would have to license it and pay some money. Can you give me a rough idea of the terms? And could it be used as a plug-in replacement for Lucida Console on Windows NT (but with so many of those huge gaps filled in)? - Frank P.S. If your offer to make TTF glyphs for them still stands, the local photocopier has been fixed so I can start copying from books and manuals to sheets of paper. These, of course, I can send by post or fax, or scan them. P.P.S. What did John Cowan mean about Ireland? I had discussions about this with various people last year and learned that the name of the country was a somewhat sensitive topic, but the consensus seemed to favor "Republic of Ireland" rather than "Éire" (which is controversial since it came from Eamon de Valera (sp?) who agreed with England on the partition) or "Ireland" which is confusing to some people (e.g. postal authorities). By the way, in case you are interested in the results of that discussion, I'd be glad to send it -- it is a discourse on how to address postal mail to many lands, the Isles to the north of continental Europe being the most interesting case :-) 9-Oct-98 19:53:17-GMT,1561;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id PAA18311 for fdc; Fri, 9 Oct 1998 15:53:15 -0400 (EDT) Date: Fri, 9 Oct 1998 15:53:15 -0400 (EDT) From: Frank da Cruz Message-Id: <199810091953.PAA18311@watsun.cc.columbia.edu> To: fdc@watsun.cc.columbia.edu Path: news.columbia.edu!watsun.cc.columbia.edu!fdc From: fdc@watsun.cc.columbia.edu (Frank da Cruz) Newsgroups: de.comp.os.sinix Subject: SNI 97801 Emulation vs Unicode Date: 9 Oct 1998 17:55:14 GMT Organization: Columbia University Lines: 24 Message-ID: <6vlim2$9s6$1@apakabar.cc.columbia.edu> NNTP-Posting-Host: watsun.cc.columbia.edu Xref: news.columbia.edu de.comp.os.sinix:1873 I would like to know if anybody is using applications with SNI 97801 terminals (or emulators) that require any of the following character sets: 1. Klammern (Brackets) -- pieces of brackets, braces, integral signs, little clocks, and other funny symbols. 2. Facet -- shapes for drawing pictures (mosaic graphics), but not like Videotex / Teletex. 3. "IBM" -- contains some unique symbols not found in any IBM code page, like rotated hex digit-pairs, superscript "proportional-to" symbol, superscript infinity symbol, etc. These three character sets contain glyphs that are not in Unicode. The question is whether they should be added. I don't care about the other 97801 sets (Math, Euro, German, International, or the Latin alphabets) because they are already in Unicode. Thank you! Frank da Cruz Columbia University 9-Oct-98 21:07:45-GMT,2657;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA08937 for ; Fri, 9 Oct 1998 17:07:44 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA51036 ; Fri, 9 Oct 1998 14:06:05 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA29038; Fri, 9 Oct 98 13:53:48 -0700 Message-Id: <9810092053.AA29038@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Uml-Sequence: 6139 (1998-10-09 20:53:35 GMT) From: "Alain" Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 13:53:33 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id RAA08937 A 08:33 98-10-09 -0700, Otto Stolz a écrit : >Am 1998-10-8 um 12:19 hat John Cowan geschrieben: >> A typical (though not the only) >> glyph for U+2424 is the one which appears on the Enter key of PC >> keyboards. > >Please describe. > >On my keyboards (both PC and X-Terminal), the Enter key has the word >"Enter" engraved, whilst the Return key has a U+21B2 Downwards Arrow >with Tip Leftwards (or is it a glyph variant of U+21B5?). > >If I remember correctly, my 3270 terminal had the similar engravings >on these keys, viz. "DatFreig" (Datenfreigabe = German translation of >"Enter"), and U+21B5, respectively. Cf. figure 3-1 in IBM form GA27-2837-8 >"IBM 3270 Information Display System Character Set Reference". > >Btw., the semantics of these 3270 keys were quite different: the Enter >key sends data to zhe host, whilst the Return key is just a local cursor >movement without sending anything. > >Best wishes, > Otto Stolz [Alain] : According to ISO/IEC 9995-7 (Symbols used for keyboard functions), "Enter" (fr: Validation) and "Return" (fr: Retour) are indeed two very different functions, with two different international symbols. On 3270's they are used simultaneously. On PCs, only the Return function is used *generally*, unless you use a terminal emulator, in which case, Enter is also used (generally dedicating the same scan code as the righ-hand-side Control key; some applications alos use the Return key of the numeric keypad as an Enter function). Alain LaBonté Québec Project editor, ISO/IEC 9995 series (8 parts) Coeditor, ISO/IEC 9995-7 (with Bernard Chauvois and Fred Bealle) [and author/designer of several keyboard drivers for PCs] 9-Oct-98 21:37:01-GMT,7519;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA17342 for ; Fri, 9 Oct 1998 17:36:59 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA43772 ; Fri, 9 Oct 1998 14:36:11 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA29631; Fri, 9 Oct 98 14:28:34 -0700 Message-Id: <9810092128.AA29631@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6142 (1998-10-09 21:25:14 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 14:25:12 -0700 (PDT) Subject: Terminal Graphics: Assorted Responses Paul Keinanen wrote [About DEL and RUB]: > ... > Due to this duality, should Rubout and Delete be concidered to be two > separate characters, although they seem to have the same code point in all > ASCII based character sets ? Probably not... > I don't think they should be separated. The name Rubout was rubbed out decades (literally) ago. ANSI X3.4-1977 does not contain the word Rubout. Markus Kuhn wrote: > > I highly welcome attempts to complete Unicode with the various technical > character set symbols that various terminal types have, after proper > unification according to the well-proven character/glyph model. > Good... > Also symbols that are not part of the user accessible character set of > a terminal but that appear as part of the normal look-and-feel of this > terminal in status lines, etc. should be added to Unicode, in order to allow > to emulate one terminal inside another terminal emulator (e.g., an IBM 3270 > emulator that runs inside an UTF-8 enhanced xterm). > Good... > I am sceptical > however, whether all the many control symbols really need to have a place > in the BMP. I think that debugging tools can quite easily provide them > using some other replacement notation, that might or might not bypass the > usual font mechanisms. > Indeed they can, but then will such debugging tools be interoperable with other applications? I think it is a worthy goal to be able to paste terminal screens -- even when they contain debugging information -- into other applications. For example, for publication purposes, e.g. by people who write networking and data communications textbooks, manuals, and "for dummies" books. I think there is a nontrivial market there :-) I don't think we should go out of our way to anticipate how people will use these characters, or what kind of people will use them. > Another question is which terminals should actually be supported. Many > of the ones you mentioned have died away already. > Like the Perkin Elmers. But they are not the basis for any proposed characters, only illustrations of features like "display controls". The VT100 family, Wyse, Televideo, IBM, and SNI are all very current. The Heath/Zenith 19 hasn't been manufactured in quite a while, but it remains one of the most popular terminals to emulate due to its powerful but simple command set and several unique features. People who were not even born until after the last H19s vanished are using emulators for them today, with matching termcaps (or 3270 protocol converter terminal types) on the other end. In the IBM world, the H19 is especially popular since it lets the host change the cursor shape, and Series/1-based protocol converters use this feature to tell the user whether the 3270 is in insert or overwrite mode. For this reason alone, countless people stick with this emulation rather than "modernize" to VT100 (or beyond). > Do you have any form of data on the terminal emulator market regarding > more exotic terminal types? Are there really more than a few hundred > people out there who use applications that depend on a terminal type > radically different from a DEC VT340 or a IBM 3278? > I can tell you as a maker of terminal emulation software that there is indeed a significant and and insistent demand for all sorts of terminals you never heard of. The original list for Kermit 95 was simply VT100, VT220, and ANSI. In the three years since we released it -- that is, here in the late 1990s -- quite contrary to our expectations, customer demands have compelled us to expand the list as follows: [C:\k95\] K-95> set terminal type ? One of the following: aixterm beterm hft qansi tvi910+ vt100 wy30 ansi-bbs dg200 hp2621a qnx tvi925 vt102 wy370 at386 dg210 hpterm scoansi tvi950 vt220 wy50 avatar/0+ dg217 hz1500 sni-97801 vc404 vt320 wy60 ba80 heath19 linux tty vip7809 vt52 [C:\k95\] K-95> You'll see that some of them are true antiques: the Volker Craig 404, the Hazeltine 1500, ... Customers require these emulations because they have applications that are hardwired to use them. And yes, some of these applications are "dinosaurs", but they do not die so easily, and who is to say they should? kenw@sybase.com (Kenneth Whistler) wrote [of ancient programs]: > > Somewhat orthogonally, many of these old dinosaurs are about to be > cleared away by the asteroid known as Y2K on its way to impact. > And many won't -- there is a huge industry that installs patches in ancient binaries for which source code is lost, or the source language is long forgotten (or no longer compilable or assemblable). And then, whenever the "window" rolls around, they'll have to do it again :-) > Incidentally, there is another pile of graphic symbols for keyboard > functions coming down the pike in Amendment 22 to 10646 (based on > ISO 9995-7). These should be checked to verify that there are no > duplicates against the collection of symbols being proposed for > terminal emulation. (Examples: symbols for compose, enter, alternate, > shift lock, undo, print screen, clear screen, delete, etc.) > For sure. I assume someone who is party to both proposals will take responsbility? If not, is Amendment 22 in a public place so I can check it myself? > And since your collection of display controls lists both the > three-letter and two-letter mnemonics for these things [NEL and NL], > I cannot see any argument for disunification. This is the thing meant > for what in your chart is: > > E025 85 NEL NL Symbol for Next Line > > U+2424 is the correct character for the graphic symbol > display of "NEL" or "NL Symbol or Next Line (or Symbol for Newline). > Well, we've beaten this one to death, but I would say we should be consistent. Our choices are these: 1. Encode the full form and no "2X" forms, i.e. no abbreviations of abbreviations should be encoded. 2. Encode both forms (I'm not advocating that). 3. List 2X forms as glyph alternatives and allow font designers to use *all* full forms or *all* 2X forms. 4. Encode only 2X forms. Only in the last case, I think, does it make sense to unify NEL and NL. Otherwise NEL is a full C1 form and NL is an EBCDIC form, which unfortunately happens to coincide with the "2X" representation of NEL. Any other argument for unification would lead us to unify the symbols for all controls -- C0, C1, EBCDIC, Unicode, and otherwise -- that have similar functions but different names, which would defeat the purpose of having these glyphs to begin with. - Frank 9-Oct-98 22:19:51-GMT,1665;000000000001 Return-Path: Received: (from jaltman@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id SAA26443; Fri, 9 Oct 1998 18:19:46 -0400 (EDT) Sender: Jeffrey Altman Date: Fri, 9 Oct 98 18:19:46 EDT From: kermit-support@watsun.cc.columbia.edu To: Karlsson Kent - keka Cc: kermit-support@columbia.edu Subject: Re: RE2: Kermit95 In-Reply-To: Your message of Fri, 9 Oct 1998 20:49:52 +0200 Reply-To: kermit-support@watsun.cc.columbia.edu Message-ID: One more note regarding Unicode UTF-8 and text terminals. If the text terminal is an ANSI X3.64-1979 derivative and the character set is UTF-8, all of the terminal command sequences must use the 7-bit equivalents to the 8-bit C1 controls. This may be a reason for why UTF-7 might be perferred in an text terminal environment. I do not believe that UTF-7 would interfere with the C1 control character range. Also, modifications would need to be made to how the terminal responds to character-set invocation comamnds. In general, I believe that all of the ISO 2022 rules for character-set handling would need to be ignored. The result is that you would be restricted to using only those characters available in Unicode and could not use any of the Special Graphics characters available in most terminal emulations for box drawing. Jeffrey Altman * Sr.Software Designer * Kermit-95 for Win32 and OS/2 The Kermit Project * Columbia University 612 West 115th St #716 * New York, NY * 10025 http://www.kermit-project.org/k95.html * kermit-support@kermit-project.org 9-Oct-98 17:08:26-GMT,2889;000000000011 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA02699 for ; Fri, 9 Oct 1998 13:08:24 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA36498 ; Fri, 9 Oct 1998 10:05:17 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA24722; Fri, 9 Oct 98 10:01:01 -0700 Message-Id: <9810091701.AA24722@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Uml-Sequence: 6128 (1998-10-09 17:00:45 GMT) From: Otto Stolz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 10:00:44 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 Frank da Cruz had proposed: > E0B3 Latin small letter a with underbar SNI Math 04/04 (2) > E0B4 Latin capital letter O with underbar SNI Math 04/09 (2) Rick McGowan wrote: > I believe [those] two characters are just masculine and feminine > ordinal indicators, and are already encoded between 0x80 and 0xFF, as part > of ISO Latin 1. They are probably just variant glyphs... unless the > documentation distinguishes them and they occur in pairs with lower-case. Am 1998-10-8 um 14:07 hat Frank da Cruz geschrieben: > The reason these need to be encoded > separately from feminine/masculine ordinals are their size -- they fill the > whole cell, like a regular letter. Since terminal emulators and data > analyzers use fixed-pitch fonts, we can't just switch to another point size > to display these characters, since that will wreck the matrix arrangement > of the screen. This is what I cannot understand. If the Feminine, and Masculine, Ordinal Indicators (U+00AA, and U+00BA, respectively) were written in the same fixed-pitch font as the surrounding text, they also would occupy one cell, each, won't they? As Frank had written in both of his own drafts: > arriving at a sufficient set of character-cell terminal graphics for > Unicode is complicated by the well-known problems that affect other > preexisting character sets to varying degrees: > 1. Lack of official names for the characters of some of the sets. > 2. Lack of definitive, high-quality pictures of the glyphs in some cases. > 3. Lack of descriptions of the purpose and intended use of the glyphs. I think, those are good reasons not to take the glyphs in the Siemens Nixdorf 97801-5xx Benutzerhandbuch too seriously -- good reasons to unify these characters with the above-mentioned U+00AA and U+00BA. The only reason, IMHO, not to unify them would be existence of a character set containing different glyphs both for the proposed characters and the existing ones -- as Rick has already noted. Best wishes, Otto Stolz 9-Oct-98 18:40:04-GMT,3883;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA28263 for ; Fri, 9 Oct 1998 14:40:02 -0400 (EDT) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id LAA40650 ; Fri, 9 Oct 1998 11:26:03 -0700 Received: by unicode.org (NX5.67g/NX3.0S) id AA25671; Fri, 9 Oct 98 10:52:15 -0700 Message-Id: <9810091752.AA25671@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6134 (1998-10-09 17:51:41 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 9 Oct 1998 10:51:40 -0700 (PDT) Subject: Re: Terminal Graphics Draft 2 > If the Feminine, and Masculine, Ordinal Indicators (U+00AA, and U+00BA, > respectively) were written in the same fixed-pitch font as the surrounding > text, they also would occupy one cell, each, won't they? > Yes, but they would be too small. The SNI glyphs are full-size base characters, but the ordinal indicator glyphs are superscripts. > As Frank had written in both of his own drafts: > > arriving at a sufficient set of character-cell terminal graphics for > > Unicode is complicated by the well-known problems that affect other > > preexisting character sets to varying degrees: > > 1. Lack of official names for the characters of some of the sets. > > 2. Lack of definitive, high-quality pictures of the glyphs in some cases. > > 3. Lack of descriptions of the purpose and intended use of the glyphs. > > I think, those are good reasons not to take the glyphs in the Siemens > Nixdorf 97801-5xx Benutzerhandbuch too seriously -- good reasons to unify > these characters with the above-mentioned U+00AA and U+00BA. The only > reason, IMHO, not to unify them would be existence of a character set > containing different glyphs both for the proposed characters and the > existing ones -- as Rick has already noted. > I tend to agree. The "strange" SNI glyphs are not a high priority, to me personally at least. I have, however, posted a message to the Sinix newsgroup (of SNI customers) to see if any strong opinions come to the surface. All I can say from my own experience is that there was heavy demand for accurate SNI terminal emulation for Windows 95/98/NT, and we met that demand as best we could within the limitations of the code pages and fonts available to us. For those of you not familiar with SNI 97801, it probably has the most advanced ISO 2022 implementation and repertoire of character sets of any terminal ever built -- at least in the West (it lacks Hebrew, Arabic, and CJK, but includes ISO 8859-1,2,3,4,5,7,9, various ISO 646 versions, plus a selection of "strange" private sets, and a wide variety of input methods). To answer Otto's point with a question: what is a character set? I can see both a superscript feminine ordinal and a "big" feminine ordinal on the same screen simply by sending ISO 2022 escape sequences to switch "character sets". So in a sense, all character sets that can be designated and invoked by ISO 2022 escape sequences form one big character set :-) See, for example: http://www.columbia.edu/kermit/kuishots.html Go down to Shot 3. This screen was produced using ISO 2022 escape sequences from the host to a VT320 terminal emulator on Windows 95, with Lucida Console as the (Unicode) font. The same screen could be produced by sending the exact same data stream to the 97801. (This screen does not show any of the SNI "strange" glyphs, but I hope it illustrates the point.) Again, I have no great investment in these characters, and so far our SNI users have not complained about their absence, but before striking them I hope to hear some additional testimony from them. - Frank 30-Oct-98 20:53:04-GMT,1988;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA29389 for ; Fri, 30 Oct 1998 15:53:03 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id MAA49598 ; Fri, 30 Oct 1998 12:50:52 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA17044; Fri, 30 Oct 98 11:48:35 -0800 Message-Id: <9810301948.AA17044@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6361 (1998-10-30 19:45:46 GMT) From: kenw@sybase.com (Kenneth Whistler) Reply-To: unicode@unicode.org To: Unicode List Cc: kenw@sybase.com Date: Fri, 30 Oct 1998 11:45:42 -0800 (PST) Subject: Re: Terminal Emulation Doug commented: > > Excluding the hex-byte characters (which almost nobody seems to like), > we're only talking about 256 characters, aren't we? I guess I don't > understand why the opposition is so vigorous. As *glyphs*, nobody cares. They're fine. Anybody who wants to use glyphs like these to represent hex byte values may feel free to do so, and nobody will object. As *characters*, they are useless dreck. There is no reason to introduce into a text stream a *character*--say U+2841--to serve as a visible symbolic placeholder for the byte value 0x41. What purpose does this serve? Debuggers translate *byte values* into visibly displayed glyphs (either unitary, as proposed here, or simply as sequences of glyphs for the hex digits, i.e. "41"). Adding an arbitrary layer of textual *characters* in between just gets in the way of what the debugger should be doing. Unicode is a *character* encoding standard. It is not a glyph registry. People who want a registry of well-defined glyphs that font vendors can use to produce common collections of displayable glyphs (for terminal emulations or whatever) should be talking to AFII, instead. --Ken 30-Oct-98 22:51:33-GMT,1287;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA17156 for ; Fri, 30 Oct 1998 17:51:32 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA35960 ; Fri, 30 Oct 1998 14:44:39 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA18475; Fri, 30 Oct 98 13:06:37 -0800 Message-Id: <9810302106.AA18475@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6366 (1998-10-30 21:01:33 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 30 Oct 1998 13:01:30 -0800 (PST) Subject: Re: Terminal Emulation Michael Everson wrote: > > 2. The letterlike characters from the SNI Math set that I had in the > > first draft but later withdrew are in fact from ISO Registration > > 103: Teletex Supplementary Set of Graphic Characters from CCITT > > (ITU-T) T.61. These include Lappish Eng... > > Sami eng. > Right, I knew that, sorry. (I was hurriedly copying the names from the Teletex standard, which says "Lappish"...) Revisions coming up momentarily. - Frank 30-Oct-98 23:12:33-GMT,2135;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id SAA18729 for ; Fri, 30 Oct 1998 18:12:33 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id PAA25438 ; Fri, 30 Oct 1998 15:10:28 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA18765; Fri, 30 Oct 98 13:16:27 -0800 Message-Id: <9810302116.AA18765@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6367 (1998-10-30 21:14:12 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 30 Oct 1998 13:14:11 -0800 (PST) Subject: Terminal Charsets Proposal Well, I ran out of time -- gotta go now, probably won't be back till Monday, so there's no way to get these to the appropriate parties on paper by Fedex on November 2nd. Maybe November 3rd?... Anyway, the proposal is now split into 3: regular glyphs, control pics, and hex bytes. Michael Everson's glyph map is included, as well as an archive of the email discussion and a document from SNI about their glyphs. I would have liked to spend more time polishing -- and will do so before I send them in for real. In the meantime, any comments will be appreciated (but not responded to for a few days). HEX BYTE PICTURES FOR UNICODE (plain text) ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt ADDITIONAL CONTROL PICTURES FOR UNICODE (plain text) ftp://kermit.columbia.edu/kermit/ucsterminal/control TERMINAL GRAPHICS FOR UNICODE (plain text) ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt Glyph Map (PDF, binary, contributed by Michael Everson) ftp://kermit.columbia.edu/kermit/ucsterminal/terminal-emulation.pdf Clarification of SNI Glyphs (Microsoft Word 7.0, binary, from SNI) ftp://kermit.columbia.edu/kermit/ucsterminal/sni-charsets.doc Discussion (Unicode list e-mail, plain text) ftp://kermit.columbia.edu/kermit/ucsterminal/mail.txt - Frank 30-Oct-98 23:43:13-GMT,1360;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id SAA23024 for ; Fri, 30 Oct 1998 18:43:12 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id PAA36380 ; Fri, 30 Oct 1998 15:34:42 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA19518; Fri, 30 Oct 98 13:48:53 -0800 Message-Id: <9810302148.AA19518@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6370 (1998-10-30 21:45:39 GMT) From: Rick McGowan Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 30 Oct 1998 13:45:38 -0800 (PST) Subject: Re: Terminal Emulation Doug Ewell wrote and Everson commented... > > ... to include characters of debatable usefulness > > rather than excluding them. > > No way! Include characters of limited usefulness, perhaps. But not of > debatable usefulness. I missed this one. Good call, Michael! Nobody wants things of debatable usefulness. Either the committees as a whole are convinced of some utility, and the characters go IN, or they're not convinced of utility, and the characters stay OUT. (Luckily, nobody has to use or like every single character...) Rick 31-Oct-98 2:30:27-GMT,1649;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id VAA08939 for ; Fri, 30 Oct 1998 21:30:27 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id SAA27418 ; Fri, 30 Oct 1998 18:27:33 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA00927; Fri, 30 Oct 98 16:46:07 -0800 Message-Id: <9810310046.AA00927@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6382 (1998-10-31 00:45:41 GMT) From: "Joan Aliprand" Reply-To: unicode@unicode.org To: Unicode List Date: Fri, 30 Oct 1998 16:45:40 -0800 (PST) Subject: Re: Terminal Charsets Proposal - alternative deadlines REPLY TO 10/30/98 15:25 FROM UNICODE@UNICODE.ORG: Terminal Charsets Proposal Frank: > Well, I ran out of time -- gotta go now, probably won't be back till > Monday, so there's no way to get these to the appropriate parties on > paper by Fedex on November 2nd. Maybe November 3rd?... You have two chances: (a) Yes, I guess Arnold could hold off for one day. (I'll leave a message for him.) (b) But if you get your proposal to the Unicode Office by Monday, November 16, there is enough time to make copies for distribution at the UTC/L2 meeting (even with the Thanksgiving holiday intervening). And it is possible for us to pull a copy from a site. (However, then you are trusting that software and connections work correctly.) -- Joan Aliprand Chair, UTC To: UNICODE@UNICODE.ORG 1-Nov-98 9:01:20-GMT,2440;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id EAA11750 for ; Sun, 1 Nov 1998 04:01:19 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id BAA19848 ; Sun, 1 Nov 1998 01:00:40 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA06934; Sun, 1 Nov 98 00:42:50 -0800 Message-Id: <9811010842.AA06934@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline X-Uml-Sequence: 6387 (1998-11-01 08:42:33 GMT) From: Doug Ewell Reply-To: unicode@unicode.org To: Unicode List Date: Sun, 1 Nov 1998 00:42:31 -0800 (PST) Subject: Re: Terminal Emulation Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id EAA11750 Kenneth Whistler wrote: > Doug commented: > >> Excluding the hex-byte characters (which almost nobody seems to >> like), we're only talking about 256 characters, aren't we? I >> guess I don't understand why the opposition is so vigorous. > > As *glyphs*, nobody cares. They're fine. Anybody who wants to use > glyphs like these to represent hex byte values may feel free to do > so, and nobody will object. > > As *characters*, they are useless dreck. ... Sorry, I guess my use of the word "excluding" was somehow misleading. I did not mean to appear to be supporting addition of the hex bytes into Unicode. I meant to say that, IF the hex bytes were removed from the proposal, we would be left with a single 256-character block (which is not even fully populated) and that I wouldn't have guessed that its addition would have caused so much controversy. I should also point out for the benefit of Michael, Rick, and others that I nearly used the phrase "limited usefulness" instead of "debatable usefulness," and in retrospect should have. I meant "debatable" from the perspective of individual users, but "limited" from the perspective of the committees. All character sets have at least one character that SOMEBODY might think is not necessary, as evidenced by the case of the gentleman who wanted to replace the supposedly useless vertical bar in ASCII with the Euro symbol. Cheers, -Doug 1-Nov-98 11:24:09-GMT,2545;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id GAA20339 for ; Sun, 1 Nov 1998 06:24:08 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id DAA63194 ; Sun, 1 Nov 1998 03:23:26 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA07272; Sun, 1 Nov 98 03:17:44 -0800 Message-Id: <9811011117.AA07272@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Uml-Sequence: 6388 (1998-11-01 11:17:31 GMT) From: Michael Everson Reply-To: unicode@unicode.org To: Unicode List Date: Sun, 1 Nov 1998 03:17:30 -0800 (PST) Subject: Re: Terminal Emulation Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id GAA20339 Ar 00:42 -0800 1998-11-01, scríobh Doug Ewell: >Sorry, I guess my use of the word "excluding" was somehow misleading. >I did not mean to appear to be supporting addition of the hex bytes >into Unicode. I meant to say that, IF the hex bytes were removed >>from the proposal, we would be left with a single 256-character block >(which is not even fully populated) and that I wouldn't have guessed >that its addition would have caused so much controversy. >I should also point out for the benefit of Michael, Rick, and others >that I nearly used the phrase "limited usefulness" instead of >"debatable usefulness," and in retrospect should have. I meant >"debatable" from the perspective of individual users, but "limited" >from the perspective of the committees. Any character the users aren't sure they want should not be proposed to the committees. >All character sets have at >least one character that SOMEBODY might think is not necessary, as >evidenced by the case of the gentleman who wanted to replace the >supposedly useless vertical bar in ASCII with the Euro symbol. >>>Shhhhh!<<< One oughtn't want say this too loudly. Best not encourage >>>people to think of such things. (In ASCII one writes "EUR " or "E" if >>>one cannot otherwise represent the EURO SIGN.) -- Michael Everson, Everson Gunn Teoranta ** http://www.indigo.ie/egt 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Guthán: +353 1 478-2597 ** Facsa: +353 1 478-2597 (by arrangement) 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire 2-Nov-98 14:35:21-GMT,2873;000000000011 Return-Path: Received: from timone.hac.awii.com (nat17.awii.com [208.133.247.17]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id JAA22408 for ; Mon, 2 Nov 1998 09:35:21 -0500 (EST) Received: by timone with Internet Mail Service (5.0.1460.8) id <4D96PAWD>; Mon, 2 Nov 1998 09:36:05 -0500 Message-ID: From: "O'Leary, Sean (NJ)" To: "'Frank da Cruz'" Subject: RE: Terminal Charsets Proposal Date: Mon, 2 Nov 1998 09:36:03 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.0.1460.8) Content-Type: text/plain Frank, I have found cases where a hex bytes area would be extremely useful. To me, the hex bytes are at least as useful as the Braille or control character encodings. It does not seem likely that the hex bytes will make it into Unicode's BMP, but I am still interested in tracking which directions this proposal goes. I would be interested in reviewing your site at: ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt but the site rejected my login attempts. Is this site for public viewing? Thanks, Sean O'Leary Software Internationalization Automated Wagering International (201) 489-5950 email: oleary@awii.com > -----Original Message----- > From: Frank da Cruz [SMTP:fdc@watsun.cc.columbia.edu] > Sent: Friday, October 30, 1998 4:14 PM > To: Unicode List > Subject: Terminal Charsets Proposal > > Well, I ran out of time -- gotta go now, probably won't be back till > Monday, so there's no way to get these to the appropriate parties on > paper by Fedex on November 2nd. Maybe November 3rd?... > > Anyway, the proposal is now split into 3: regular glyphs, control pics, > and hex bytes. > > Michael Everson's glyph map is included, as well as an archive of the > email discussion and a document from SNI about their glyphs. > > I would have liked to spend more time polishing -- and will do so before > I send them in for real. In the meantime, any comments will be > appreciated (but not responded to for a few days). > > HEX BYTE PICTURES FOR UNICODE (plain text) > ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt > > ADDITIONAL CONTROL PICTURES FOR UNICODE (plain text) > ftp://kermit.columbia.edu/kermit/ucsterminal/control > > TERMINAL GRAPHICS FOR UNICODE (plain text) > ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt > > Glyph Map (PDF, binary, contributed by Michael Everson) > ftp://kermit.columbia.edu/kermit/ucsterminal/terminal-emulation.pdf > > Clarification of SNI Glyphs (Microsoft Word 7.0, binary, from SNI) > ftp://kermit.columbia.edu/kermit/ucsterminal/sni-charsets.doc > > Discussion (Unicode list e-mail, plain text) > ftp://kermit.columbia.edu/kermit/ucsterminal/mail.txt > > - Frank 2-Nov-98 19:30:37-GMT,2884;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA11073 for ; Mon, 2 Nov 1998 14:30:29 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id LAA69238 ; Mon, 2 Nov 1998 11:27:30 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA14632; Mon, 2 Nov 98 11:10:50 -0800 Message-Id: <9811021910.AA14632@unicode.org> Errors-To: uni-bounce@unicode.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 X-Uml-Sequence: 6401 (1998-11-02 19:10:25 GMT) From: Mark Davis Reply-To: unicode@unicode.org To: Unicode List Date: Mon, 2 Nov 1998 11:10:24 -0800 (PST) Subject: New draft Unicode technical reports available for review Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id OAA11073 Unicode Technical Committee (UTC) meeting #78 will be held the first week of December. As at every meeting, technical reports on http://www.unicode.org/unicode/reports/techreports.html will come up for discussion or approval. These papers can have significant impact on the recommendations for implementations and on Unicode conformance. Topics include how Unicode text is normalized for program identifiers and on the web, how Unicode text should line-break, how to deal with characters that can either have full-width or half-width in East Asian contexts, and how to sort Unicode characters. If you have any feedback on these topics, be sure to review the documents, and send your feedback to contact point listed in each paper. For consideration at any UTC meeting, you should make sure that your comments are sent well before the meeting dates. The draft technical reports include: UTR #15: Unicode Normalization Forms UTR #14: Line Breaking Properties UTR #13: Unicode Newline Guidelines UTR #11: East Asian Character Width UTR #10: Unicode Collation Algorithm In addition. we will be now be posting proposed draft technical reports as they become available. These are in an earlier stage of development, and have not yet been considered by the UTC, so feedback is especially valuable. Topics include how to handle Unicode characters in regular expressions, the structure and terminology for character encodings, handling Unicode on EBCDIC systems, coding annotations (Ruby), and the Unicode BIDI algorithm. UTR #18: Unicode Regular Expression Guidelines UTR #17: Character Encoding Model UTR #16: EBCDIC-Friendly UCS Transformation Format UTR #12: Support for Interlinear Annotations UTR #9: The Bidirectional Algorithm Reference Implementation (UTR #6: Standard Compression Scheme for Unicode (SCSU) has also been updated--editorial fixes only.) 4-Nov-98 4:19:06-GMT,3638;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id XAA22069 for ; Tue, 3 Nov 1998 23:19:05 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id UAA71048 ; Tue, 3 Nov 1998 20:18:22 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA25646; Tue, 3 Nov 98 20:06:44 -0800 Message-Id: <9811040406.AA25646@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6418 (1998-11-04 04:06:32 GMT) From: "Julia Oesterle (Unicode)" Reply-To: unicode@unicode.org To: Unicode List Date: Tue, 3 Nov 1998 20:06:31 -0800 (PST) Subject: Re: Terminal Emulation from Frank da Cruz this one went astray...resend. Date: Tue, 3 Nov 98 19:25:39 EST From: Frank da Cruz Subject: Re: Terminal Emulation > > Rich McGowan wrote: > > > I'd suggest the best course of action for now would be to bring the > > proposal to the attention of UTC members. (Some of them are on the > > Unicode list, others aren't.) Ken and I can do this part, which merely > > involves sending UTC some pointers and a blurb about the proposal, to > > solicit their consideration and feedback. > > > Thanks. I think it's ready for this. The proposal has been split into > three (as noted previously), and updated again today for polish and > constistency (from the rushed hatchet job of last Friday): > > HEX BYTE PICTURES FOR UNICODE (plain text) > ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt > > ADDITIONAL CONTROL PICTURES FOR UNICODE (plain text) > ftp://kermit.columbia.edu/kermit/ucsterminal/control.txt > > TERMINAL GRAPHICS FOR UNICODE (plain text) > ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt > > Glyph Map (PDF, contributed by Michael Everson) (*) > ftp://kermit.columbia.edu/kermit/ucsterminal/terminal-emulation.pdf > > Clarification of SNI Glyphs (Microsoft Word 7.0, from SNI) > ftp://kermit.columbia.edu/kermit/ucsterminal/sni-charsets.doc > > Discussion (plain text -- from Unicode mailing list) > ftp://kermit.columbia.edu/kermit/ucsterminal/mail.txt > > I think the hex bytes proposal is worth another look by those on both > sides > of the fence. It has been beefed up considerably in the motivation / > justification department. > > After several more days of comment, I'll send them in on paper. > > Thanks again to everybody for all the help (and patience). > > - Frank > > (*) Michael's glyph map is based on previous drafts; some of the > characters > shown in it have since been eliminated from the proposal. > ------------------------------Header-------------------------------------- > --- > From watsun.cc.columbia.edu!fdc Tue Nov 3 17:09:57 1998 > Received: from watsun.cc.columbia.edu by unicode.com id aa28351; > 3 Nov 98 17:09 PST > Received: from watsun.cc.columbia.edu (watsun.cc.columbia.edu > [128.59.39.2]) > by mail1.dynamic.com (8.8.5/8.8.5) with ESMTP id QAA15248 > for ; Tue, 3 Nov 1998 16:25:31 -0800 > Received: (from fdc@localhost) > by watsun.cc.columbia.edu (8.8.5/8.8.5) id TAA24416; > Tue, 3 Nov 1998 19:25:40 -0500 (EST) > Date: Tue, 3 Nov 98 19:25:39 EST > From: Frank da Cruz > Subject: Re: Terminal Emulation > In-Reply-To: Your message of Thu, 29 Oct 1998 14:18:14 -0800 > To: unicode@unicode.com > Message-ID: 9-Nov-98 21:09:50-GMT,2560;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id QAA13219 for ; Mon, 9 Nov 1998 16:09:49 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id NAA26578 ; Mon, 9 Nov 1998 13:09:21 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA18733; Mon, 9 Nov 98 13:00:04 -0800 Message-Id: <9811092100.AA18733@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6450 (1998-11-09 20:59:44 GMT) From: kenw@sybase.com (Kenneth Whistler) Reply-To: unicode@unicode.org To: Unicode List Cc: kenw@sybase.com Date: Mon, 9 Nov 1998 12:59:43 -0800 (PST) Subject: Re: Displaying Plane 1 characters (annotating the code table Markus Scherer noted: > However, it probably makes sense for files as an easy and somewhat compact > format, and it makes sense for the number of possible characters: 1M + 64k, > including 128k+6400 private use character code points. There are about 38000 > characters assigned so far, with about 20000-30000 more in the pipeline. Here are the exact values of what currently is encoded and what Unicode 3.0 will contain (synched with the prospective content of the republication of ISO/IEC 10646-1): Unicode 2.1: 6813 Misc. characters 20902 Unihan 11172 Johab Hangul 6400 Private use 2048 Surrogates 65 Controls 2 Not characters 18134 Unassigned assignable 38887 Assigned graphic characters Unicode 3.0 (prospective, as of November 3, 1998): 10554 Misc. characters 20902 Unihan 6582 Unihan Extension A 11172 Johab Hangul 6400 Private use 2048 Surrogates 65 Controls 2 Not characters 7811 Unassigned assignable 49210 Assigned graphic characters For a net gain of 10323 new characters. Others have noted the following, but I would like to reiterate, so that *correct* rumors can circulate, instead of incorrect ones: Unicode 3.0 will *not* contain any encoded characters requiring surrogates. The republication of ISO/IEC 10646-1 will *not* contain any encoded characters outside of the Basic Multilingual Plane. Plane 1 (and 2 and 14) are for ISO/IEC 10646-2, which is still in working draft and which has not yet even started a CD ballot. When 10646 Part 2 progresses far enough, we anticipate publishing a Version 4.0 of the Unicode Standard -- and *that* will make use of surrogate codes to access encoded characters on Planes 1 and beyond. --Ken Whistler 9-Nov-98 21:53:20-GMT,3634;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id QAA28069; Mon, 9 Nov 1998 16:52:57 -0500 (EST) Date: Mon, 9 Nov 98 16:52:57 EST From: Frank da Cruz To: Joan Aliprand cc: Ken Whistler , Rick McGowan , "Hart, Edwin F." Subject: Re: Terminal Charsets Proposal - alternative deadlines In-Reply-To: Your message of Fri, 30 Oct 1998 16:45:40 -0800 (PST) Message-ID: > > Well, I ran out of time -- gotta go now, probably won't be back till > > Monday, so there's no way to get these to the appropriate parties on > > paper by Fedex on November 2nd. Maybe November 3rd?... > > You have two chances: > ... > (b) But if you get your proposal to the Unicode Office by Monday, November > 16, there is enough time to make copies for distribution at the UTC/L2 > meeting (even with the Thanksgiving holiday intervening). > Well, I posted an announcement of the latest drafts about a week ago (to the wrong address, but you reposted them, thanks!) and have not heard a peep, so I suppose they must be ready to go. > And it is possible for us to pull a copy from a site. (However, then you > are trusting that software and connections work correctly.) > The relevant files are all available via anonymous ftp to kermit.columbia.edu [128.59.39.2], directory kermit/ucsterminal. Transfer all files in text mode except the ones marked (*): -rw-rw-r-- 1 fdc 1067 Nov 4 11:14 README.TXT -rw-rw-r-- 1 fdc 41001 Nov 3 19:12 control.txt -rw-rw-r-- 1 fdc 14665 Nov 3 19:12 hex.txt -rw-rw-r-- 1 fdc 257434 Nov 9 16:14 mail.txt -rw-rw-r-- 1 fdc 42496 Oct 30 12:46 sni-charsets.doc (*) -rw-rw-r-- 1 fdc 88216 Oct 30 12:45 terminal-emulation.pdf (*) -rw-rw-r-- 1 fdc 38763 Nov 3 19:12 ucsterminal.txt -rw-rw-r-- 1 fdc 44534 Sep 30 21:27 ucsterminal_01.txt -rw-rw-r-- 1 fdc 59180 Oct 7 20:03 ucsterminal_02.txt -rw-rw-r-- 1 fdc 37651 Oct 30 15:52 ucsterminal_03.txt (*) Transfer these in binary mode. The three proposals are: -rw-rw-r-- 1 fdc 41001 Nov 3 19:12 control.txt -rw-rw-r-- 1 fdc 14665 Nov 3 19:12 hex.txt -rw-rw-r-- 1 fdc 38763 Nov 3 19:12 ucsterminal.txt The Unicode mail list discussion is: -rw-rw-r-- 1 fdc 257434 Nov 9 16:14 mail.txt The glyph maps are in a PDF file from Michael Everson: -rw-rw-r-- 1 fdc 88216 Oct 30 12:45 terminal-emulation.pdf Clarification on the mysterious Siemens Nixdorf glyphs is in the following Microsoft Word file: -rw-rw-r-- 1 fdc 42496 Oct 30 12:46 sni-charsets.doc And the following are earlier drafts of the original monolithic proposal: -rw-rw-r-- 1 fdc 44534 Sep 30 21:27 ucsterminal_01.txt -rw-rw-r-- 1 fdc 59180 Oct 7 20:03 ucsterminal_02.txt -rw-rw-r-- 1 fdc 37651 Oct 30 15:52 ucsterminal_03.txt I also have a set of "exhibits" on paper, which are photocopies of character-set tables from a selection of terminal manuals. These are listed at the end of ucsterminal.txt. I don't have any way to put them online, so I'll be glad to send them by fedex to any address you designate. However, I don't think they could arrive at a Post Office box in time for the deadline. Can I consider the proposals submitted? (Your web page says to send paper "unless prior arrangements have been made for receipt of electronic copy"). Thanks! - Frank 9-Nov-98 22:49:26-GMT,1317;000000000011 Return-Path: Received: from RLG.ORG (rlg.org [204.161.104.131]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with SMTP id RAA15456 for ; Mon, 9 Nov 1998 17:49:24 -0500 (EST) Message-Id: <199811092249.RAA15456@watsun.cc.columbia.edu> Date: Mon, 9 Nov 98 14:48:50 PST From: "Joan Aliprand" To: fdc@watsun.cc.columbia.edu Subject: Re: Terminal Charsets Proposal - alternative deadlines REPLY TO 11/09/98 13:52 FROM FDC@WATSUN.CC.COLUMBIA.EDU "Frank da Cruz": Re: Terminal Charsets Proposal - alternative deadlines Dear Frank, I have put your Terminal Charsets Proposal on the agenda for the UTC/L2 joint meeting in December. However, the agenda has many items that are time-critical for Version 3.0, so I cannot guarantee whether this proposal will be discussed at this meeting. I have been unable to connect to the Kermit FTP server this afternoon. (I tried from the Kermit Web site, as well as the direct IP address you gave.) If problems persist, I may have to ask you to send the main documents (i.e., the three proposals) by express mail or fax to the Unicode Office. Yours sincerely, -- Joan Aliprand Chair, UTC To: FDC@WATSUN.CC.COLUMBIA.EDU cc: KEN(KENW@SYBASE.COM), RICK(RMCGOWAN@APPLE.COM), HART(EDWIN.HART@JHUAPL.EDU) 21-Nov-98 12:43:55-GMT,3190;000000000005 Return-Path: Received: from mailrelay1.cc.columbia.edu (mailrelay1.cc.columbia.edu [128.59.35.143]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id HAA01746 for ; Sat, 21 Nov 1998 07:43:48 -0500 (EST) Received: from heaton.cl.cam.ac.uk (heaton.cl.cam.ac.uk [128.232.32.11]) by mailrelay1.cc.columbia.edu (8.8.5/8.8.5) with SMTP id HAA17231 for ; Sat, 21 Nov 1998 07:43:35 -0500 (EST) Received: from trillium.cl.cam.ac.uk (cl.cam.ac.uk) [128.232.8.5] (mgk25) by heaton.cl.cam.ac.uk with esmtp (Exim 1.82 #1) id 0zhCNx-0001jE-00; Sat, 21 Nov 1998 12:43:33 +0000 X-Mailer: exmh version 2.0.2+CL 2/24/98 To: fdc@columbia.edu cc: unicode@unicode.org Subject: UCS Terminal Emulation Draft X-URL: http://www.cl.cam.ac.uk/~mgk25/ Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 21 Nov 1998 12:43:31 +0000 From: Markus Kuhn Message-Id: A few questions/suggestions on ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt which I am implementing at the moment. > * E0A6 Extensible UR or LL brace section IBM SS240000 > * E0A7 Extensible LR or UL brace section IBM SS250000 I don't understand why there are not four of these. How can UR and LL be unified? > * E0AE Right ceiling corner DEC Tech 03/05 > * E0AF Right floor corner DEC Tech 03/06 What are these good for? Big floor-ceiling operators can already be constructed using the bracket segments. And why are there only right versions of these? > E0EF Box drawing double dash H DGL 03/12 (5) > (5) Similar to U+2504 but double rather than triple. What is the difference to U+254c (BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL)? Michael's glyph in ftp://kermit.columbia.edu/kermit/ucsterminal/terminal-emulation.pdf doesn't seem to fit the description here. This character is still a bit confusing. > E0D8 H line - Scan 5 DSG 07/01, Wyse ANSI 02/02 (2) I think, this one should be unified with U+2500. The E0D6-E0DA characters should also be renamed, as a scan line count is ambiguous and resolution dependent. Something like E0D6 BOX DRAWINGS LIGHT HORIZONTAL UPPER ONE SIXTH E0D7 BOX DRAWINGS LIGHT HORIZONTAL UPPER TWO SIXTH E0D9 BOX DRAWINGS LIGHT HORIZONTAL LOWER TWO SIXTH E0DA BOX DRAWINGS LIGHT HORIZONTAL UPPER ONE SIXTH and together with 2500 we would then have all the required lines. An implementation of these characters is now available in http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz in the file 6x13-future.bdf, in which I collect proposed implementations of post-Unicode 2.1 characters for my 6x13 font. It would be nice if you could have a look at these characters. [BTW: The 6x13.bdf file is now complete and will be added to various Linux distributions in a few days. This is your last chance to send me bug reports and suggestions for this free Unicode xterm font before wide distribution.] Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: 23-Nov-98 18:59:08-GMT,4546;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id NAA05690 for ; Mon, 23 Nov 1998 13:59:05 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id KAA68464 ; Mon, 23 Nov 1998 10:56:50 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA19230; Mon, 23 Nov 98 10:49:18 -0800 Message-Id: <9811231849.AA19230@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6651 (1998-11-23 18:49:05 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Cc: unicode@unicode.org Date: Mon, 23 Nov 1998 10:49:02 -0800 (PST) Subject: Re: UCS Terminal Emulation Draft Hi Markus. > A few questions/suggestions on > > ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt > > which I am implementing at the moment. > > > * E0A6 Extensible UR or LL brace section IBM SS240000 > > * E0A7 Extensible LR or UL brace section IBM SS250000 > > I don't understand why there are not four of these. How can UR and LL be > unified? > Because they look exactly the same :-) (IBM being clever...) > > * E0AE Right ceiling corner DEC Tech 03/05 > > * E0AF Right floor corner DEC Tech 03/06 > > What are these good for? Big floor-ceiling operators can already be > constructed using the bracket segments. And why are there only right > versions of these? > They're not centered vertically or horizontally. Do you have a DEC terminal manual? They look about like this: +------------------+ +------------------+ | | | | | -------------+ | | | | | | | | | | | | | | | | | | -------------+ | | | | | +------------------+ +------------------+ 03/05 03/06 > > E0EF Box drawing double dash H DGL 03/12 (5) > > (5) Similar to U+2504 but double rather than triple. > > What is the difference to U+254c (BOX DRAWINGS LIGHT DOUBLE DASH > HORIZONTAL)? Michael's glyph in > > ftp://kermit.columbia.edu/kermit/ucsterminal/terminal-emulation.pdf > > doesn't seem to fit the description here. This character is still a bit > confusing. > I might have missed the glyph at U+254C -- I think this one might be a candidate for unification. The DG character, however, has wider spacing between the dashes (for what it's worth). > > E0D8 H line - Scan 5 DSG 07/01, Wyse ANSI 02/02 (2) > > I think, this one should be unified with U+2500. > I think I commented on this in the proposal. Yes, they should be unified, but only if it can be guaranteed that the unified character works in both contexts (PC-style box drawing and VT-style box drawing). I don't see any reason why it shouldn't, but I'm not a font designer. > The E0D6-E0DA > characters should also be renamed, as a scan line count is ambiguous and > resolution dependent. Something like > > E0D6 BOX DRAWINGS LIGHT HORIZONTAL UPPER ONE SIXTH > E0D7 BOX DRAWINGS LIGHT HORIZONTAL UPPER TWO SIXTH > E0D9 BOX DRAWINGS LIGHT HORIZONTAL LOWER TWO SIXTH > E0DA BOX DRAWINGS LIGHT HORIZONTAL UPPER ONE SIXTH > > and together with 2500 we would then have all the required lines. > I'm certainly not averse to this. > An implementation of these characters is now available in > > http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz > > in the file 6x13-future.bdf, in which I collect proposed implementations > of post-Unicode 2.1 characters for my 6x13 font. > You've encoded them in the private-use area, right? Hopefully final resting places will be designated for them in the U+2xxx region, and the repertoire and/or sequencing might be altered. For that matter the entire proposal might be rejected. In the latter case, of course, we can just keep these characters where they are. > It would be nice if you > could have a look at these characters. [BTW: The 6x13.bdf file is now > complete and will be added to various Linux distributions in a few days. > This is your last chance to send me bug reports and suggestions for this > free Unicode xterm font before wide distribution.] > I don't have a way to look at BDF files at the moment so can't comment -- let's take this offline... Thanks! - Frank 23-Nov-98 22:37:12-GMT,1527;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id RAA12301 for ; Mon, 23 Nov 1998 17:37:11 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id OAA49562 ; Mon, 23 Nov 1998 14:33:24 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA21773; Mon, 23 Nov 98 14:15:36 -0800 Message-Id: <9811232215.AA21773@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6657 (1998-11-23 22:15:25 GMT) From: Frank da Cruz Reply-To: unicode@unicode.org To: Unicode List Date: Mon, 23 Nov 1998 14:15:24 -0800 (PST) Subject: Re: Glyphs of new Unicode 3.0 symbols Roman Czyborra wrote: > These exist in http://czyborra.com/unifont/. > > > 237E BELL SYMBOL > If this is an approved addition, and it is indeed a picture of a bell, it can be unified with the "Picture of Bell" character in the "Additional Control Pictures for Unicode" proposal. > I also would like to see a standardized APPLE. > I thought corporate logos were off limits. Note that Data General terminals also include a DG-logo glyph. There are no doubt several others. Well, come to think of it, the number of such logos would be bounded by the number of corporations (and other organizations). Looks, sounds, and smells like a can of worms to me! - Frank 30-Nov-98 20:43:16-GMT,6376;000000000001 Return-Path: Received: from public.lists.apple.com (public.lists.apple.com [17.254.0.151]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA22648 for ; Mon, 30 Nov 1998 15:43:09 -0500 (EST) Received: from unicode.org (unicode2.apple.com [17.254.3.212]) by public.lists.apple.com (8.9.1a/8.9.1) with SMTP id MAA37976 ; Mon, 30 Nov 1998 12:42:00 -0800 Received: by unicode.org (NX5.67g/NX3.0S) id AA00477; Mon, 30 Nov 98 12:13:45 -0800 Message-Id: <9811302013.AA00477@unicode.org> Errors-To: uni-bounce@unicode.org X-Uml-Sequence: 6845 (1998-11-30 20:12:10 GMT) From: kenw@sybase.com (Kenneth Whistler) Reply-To: considered_harmful@unicode.org To: Unicode List Cc: kenw@sybase.com Date: Mon, 30 Nov 1998 12:12:09 -0800 (PST) Subject: Re: Glyphs of new Unicode 3.0 symbols MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Transfer-Encoding: 8bit Roman suggested: > > Speaking of Unicode 3.0 (thank you all for the many enlightening > details!) I would like to express my wish for the following additions > to the Unicode 3.0 CD-ROM for implementor's convenience: > > 2. Add an "age" field to the unidata.txt to specify since which > Unicode version each character has been defined: > "1.0", "1.1", "2.0", "2.1", or "3.0" This is under active consideration for a much revised and extended form of the Unicode Character Database data to accompany the release of the Unicode Standard, Version 3.0. However, do not expect it to simply be an additional field for the UnicodeData-X.Y.Z.txt file. The format and field content of that file have been fixed for long enough that there are multiple implementations out there that parse it with particular assumptions about its format. There is an ongoing discussion, but chances are that new data files will be introduced, with similar, but new formats, for additional information provided about characters in the future. > > 3. Add an "ASCII transliteration" mapping to each Unicode character > so that it can be rendered readable in ASCII contexts This suggestion got thoroughly chewed over last week. Suffice it to say that this is *way* down the priority list for those of us working on the properties, attributes, and sundry characteristics of characters. I consider this to be A) a black hole, and B) a great opportunity for the vendors and industrious entrepeneurs to come up with appropriate solutions for different classes of applications and groups of customers. It is certainly not ripe for an ad hoc standardization by the Unicode Consortium. > > 4. Make the names.txt equivalent to the book's charts by illustrating > it with UTF-8 characters, for example > > 0025 % PERCENT SIGN > x (arabic percent sign - 066A Ùª) > x (per mille sign - 2030 ‰) > x (per ten thousand sign - 2031 ‱) This is, of course, a fairly simple thing to do, but it has annoying edge cases, since there are four digit years and four digit standards citations in the file that have to be filtered so they don't produce erroneous conversions. (For an example of the problem, see the note under U+0197 in the Unicode Standard, Version 2.0.) The transformation from the format of the text-only version of the names list to the formatted, final version of the names list is fairly complex and subtle. We will certainly again be placing the text-only version of the names list on the CD-ROM, but the amount of special-purpose massaging we do to it is a matter of resource contention with other tasks for publication. > > 6. Add mapping tables for the other ISO standards listed as source > standards in chapter R.1 but not in mappings/iso*/ As someone else speculated, much of this information is not just "available" and being held back -- it is implicit in mountains of standards documents, explicit but scattered in various vendors' implementations of mappings, but not sitting ready somewhere to just stick on the CD-ROM. We'll put what we have available, but even reviewing and updating the sometimes outdated information in the tables we *do* have is going to be a major task. Frank asked: > > > 237E BELL SYMBOL > > > If this is an approved addition, and it is indeed a picture of a > bell, it can be unified with the "Picture of Bell" character in the > "Additional Control Pictures for Unicode" proposal. This one is from ISO 2047 (see also DIN 66 213). Yes, it could (and should) have been unified with U+2407 SYMBOL FOR BELL, but that is not what the ISO committee decided. But this is not the only instance in which a graphic representation of a control code has taken on a life of its own as a separate graphic character. Think of U+237E BELL SYMBOL now as a cute little mushroom with legs in the technical symbols area. A propos for representation of a door buzzer, or whatever... But if the terminal graphics proposal needs both a "BEL" and a character for a picture of a bell, this is it. Roman asked: > And is U+3004 JAPANESE INDUSTRIAL STANDARD SYMBOL no corporate symbol? Yep, but there are always exceptions. This one is in Unicode because, although this symbol is not in JIS standards (X 0208, for example), it is universally used in Japanese JIS dictionaries as a little symbol to indicate the JIS value of a character. So even if someone could point to a claim that this is a trademarked logo, it has been genericized by usage. Tim Partridge asked: > Back to the subject of what would be useful on the Unicode 3.0 CD, how about > a list of the characters used by various languages? (Perhaps with > classifications like "essential" and "only in foreign words".) Could the > European subsetters be persuaded to contribute their data? The Cyrillic and > Arabic blocks also merit attention. This would be a nice thing to have, but is also a tremendous amount of work and an open-ended project, since there are disagreements about the status of various letters even within well-known languages, and there are potentially 1000's of languages to deal with. As for whether the European subsetters could be persuaded to contribute their data, it might be more efficient for us to simply point at their results for European languages when they stabilize and are available in a public place. [CEN Workshop Agreement (CWA) on Alphabets of Europe] --Ken Whistler 7-Dec-98 15:57:11-GMT,7925;000000000005 Return-Path: Received: from aples2.jhuapl.edu (aples2.jhuapl.edu [128.244.26.86]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id KAA21896 for ; Mon, 7 Dec 1998 10:57:07 -0500 (EST) Received: by aples2.jhuapl.edu with Internet Mail Service (5.5.2232.9) id ; Mon, 7 Dec 1998 10:56:32 -0500 Message-ID: <91D1D51C2955D111B82B00805F19989501CD7290@aples2.jhuapl.edu> From: "Hart, Edwin F." To: "'da Cruz, Frank'" Cc: "'Aliprand, Joan'" , "'Whistler, Ken'" , "'McGowan, Rick'" , "'Thewlis, Dave'" Subject: UTC response to your request to encode characters for terminal em ulation Date: Mon, 7 Dec 1998 10:56:31 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id KAA21896 Frank, What follows is the text of the response of the UTC to your proposals. (If you would prefer, I also have a Word97 version.) The UTC needs some additional information from you (and IBM and SHARE) before it decides about the "Terminal Graphics for Unicode" proposal. The next UTC meeting is in February in Palo Alto and we would appreciate your response by then. If you want, we can talk about this. Best regards, Ed Edwin F. Hart Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 +1-240-228-6926 (from Washington, DC area) +1-443-778-6926 (from Baltimore area) +1-240-228-1093 (fax) edwin.hart@jhuapl.edu 1998-December-07 To: Frank da Cruz From: Unicode Technical Committee Subject: Response to your three proposals to encode characters for terminal emulation. Thank you for your three proposals for encoding terminal-emulation characters in Unicode. The proposals were well organized, thorough, and well researched. Results The UTC acknowledges the concerns raised in your three documents. The UTC had extended discussions on these documents at its December, 1998 meeting in San José. Here is the result of these discussions for each paper. 1. Document L2/98-353, "Additional Control Pictures for Unicode" Status: rejected The UTC believes that the proposed glyphs would be used as an alternate way to display control characters rather than to interchange information; e.g., to document control sequences. The UTC decided not to encode these glyphs in Unicode. However, the UTC noted that a bell glyph may have value in other contexts and so could be encoded for another purpose in the future. In addition, the UTC noted that Unicode 3.0 aligns the abbreviations for control characters along a diagonal as you had requested. As a secondary concern, encoding glyphs for control characters is an open-ended proposition. The UTC knows that multiple sets of control characters are defined for the C1 control area. For example, ISO had two standards defining control characters, ISO/IEC 6429 and ISO 6630. When someone proposes a new set of C1 control characters, should they also be considered for encoding? What should be encoded? Should exactly one glyph be encoded per control-character code position or should multiple glyphs be encoded for the same control-character code position? These are examples of concerns underlying the UTC decision rather than a request for you to answer the questions. 2. Document L2/98-354, "Terminal Graphics for Unicode" Status: deferred for additional information The UTC has requested more information before it makes a decision. Table 5.1, range of E080 to E087. The UTC has requested an official position from IBM and feedback from SHARE on the glyphs used in the status area of a 3270 display. Table 5.2, range of E0A0 to E0AD. The UTC has requested that Microsoft provide a list of the full set of glyphs used to construct mathematical entities (brackets, braces, sigma, etc.). Previously, the UTC had decided not to encode these as characters. However, once this information is available, the UTC will revisit the issue. In addition, the UTC would appreciate your response to the following: a. What is the full set of terminal-emulation glyphs that you considered and how did you map those not in your proposal into Unicode? The UTC's concern is for round-trip integrity and distinguishing different characters so that the UTC avoids mapping the characters in you proposal to the same Unicode characters you used already for other glyphs in your full set. (The concern is not the characters from standard coded character sets like 7-bit ASCII and the ISO/IEC 8859 series, but rather the set of symbols outside of these sets.) b. Which of the following proposed characters could be unified with (mapped into) Unicode characters? 1) Can you provide (a) the source glyphs for the proposed E0AC and E0AD sigma/summation parts, and also (b) better glyphs for them. 2) What is the purpose of the proposed E0AE and E0AF characters? Are they supposed to be full corners for a box, or partial corners, or to provide the top and bottom corners of right brackets, or to provide serifs for the sigma (E0AC and E0AD)? Could the proposed E0AE and E0AF characters be unified with 231D top right corner and 231F bottom right corner? 3) For the proposed E0B0, could it be unified with either 2713 check mark or 221A radical? Is "small" the distinguishing characteristic for not unifying it with 2713 or 221A? 4) For the proposed E0B1, should it be unified with either 237B not check mark or 2415 symbol for negative acknowledge? 5) What is the purpose of the proposed E0D0 and E0D1 characters? Are they to be used to construct extended brackets and braces with the E0A0 to E0AB "extensible" characters? If so, then they should be moved to the mathematical symbol area of your proposal. If not, please explain how they might be used. 6) Could the proposed E0D2 to E0D5 triangle characters be unified with the 25E2 to 25E5 black triangle characters? 7) Could the proposed E0E5 diamond be unified with 25C6 black diamond? 8) Could the proposed E0EC be unified with 21E5 rightward arrow to bar? 9) Could the proposed E0ED be unified with 21E4 leftward arrow to bar? 3. Document L2/98-355, "Hex Byte Pictures for Unicode" Status: rejected The UTC considers that these are glyphs and, as such, they are out of the scope of the Unicode standard. Representing hex bytes visibly is a font-rendering issue rather than an information interchange issue. Suggestions Here are some suggestions for you to consider to help you meet your requirements. 1. Code the glyphs in the Private Use Area. If at some time in the future, the terminal emulation vendors are using the assignments, then you may resubmit your proposal (except for the pictures for hex bytes) to Unicode with this additional justification. It is beyond the scope of the UTC to encode characters in the Private Use Area or to endorse any particular use of characters in the Private Use Area. If the terminal emulation community believes that consistent use of private-use code positions is desirable, you might consider registering your code assignments in a registry for the Private Use Area such as the Conscript Registry. Note that Unicode does not endorse any registry for the Private Use Area. Both Adobe and Apple have described how each uses the Private Use Area. You may want to contact these organizations for additional information. 2. Register the glyphs with AFII (Association for Font Information Interchange). AFII is the registration authority for the ISO/IEC 10036 glyph registry. AFII charges a nominal fee for registering glyphs. If you are interested in pursuing this, contact AFII ( afii@unicode.org) for more information. 7-Dec-98 15:57:11-GMT,7925;000000000015 Return-Path: Received: from aples2.jhuapl.edu (aples2.jhuapl.edu [128.244.26.86]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id KAA21896 for ; Mon, 7 Dec 1998 10:57:07 -0500 (EST) Received: by aples2.jhuapl.edu with Internet Mail Service (5.5.2232.9) id ; Mon, 7 Dec 1998 10:56:32 -0500 Message-ID: <91D1D51C2955D111B82B00805F19989501CD7290@aples2.jhuapl.edu> From: "Hart, Edwin F." To: "'da Cruz, Frank'" Cc: "'Aliprand, Joan'" , "'Whistler, Ken'" , "'McGowan, Rick'" , "'Thewlis, Dave'" Subject: UTC response to your request to encode characters for terminal em ulation Date: Mon, 7 Dec 1998 10:56:31 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by watsun.cc.columbia.edu id KAA21896 Frank, What follows is the text of the response of the UTC to your proposals. (If you would prefer, I also have a Word97 version.) The UTC needs some additional information from you (and IBM and SHARE) before it decides about the "Terminal Graphics for Unicode" proposal. The next UTC meeting is in February in Palo Alto and we would appreciate your response by then. If you want, we can talk about this. Best regards, Ed Edwin F. Hart Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 +1-240-228-6926 (from Washington, DC area) +1-443-778-6926 (from Baltimore area) +1-240-228-1093 (fax) edwin.hart@jhuapl.edu 1998-December-07 To: Frank da Cruz From: Unicode Technical Committee Subject: Response to your three proposals to encode characters for terminal emulation. Thank you for your three proposals for encoding terminal-emulation characters in Unicode. The proposals were well organized, thorough, and well researched. Results The UTC acknowledges the concerns raised in your three documents. The UTC had extended discussions on these documents at its December, 1998 meeting in San José. Here is the result of these discussions for each paper. 1. Document L2/98-353, "Additional Control Pictures for Unicode" Status: rejected The UTC believes that the proposed glyphs would be used as an alternate way to display control characters rather than to interchange information; e.g., to document control sequences. The UTC decided not to encode these glyphs in Unicode. However, the UTC noted that a bell glyph may have value in other contexts and so could be encoded for another purpose in the future. In addition, the UTC noted that Unicode 3.0 aligns the abbreviations for control characters along a diagonal as you had requested. As a secondary concern, encoding glyphs for control characters is an open-ended proposition. The UTC knows that multiple sets of control characters are defined for the C1 control area. For example, ISO had two standards defining control characters, ISO/IEC 6429 and ISO 6630. When someone proposes a new set of C1 control characters, should they also be considered for encoding? What should be encoded? Should exactly one glyph be encoded per control-character code position or should multiple glyphs be encoded for the same control-character code position? These are examples of concerns underlying the UTC decision rather than a request for you to answer the questions. 2. Document L2/98-354, "Terminal Graphics for Unicode" Status: deferred for additional information The UTC has requested more information before it makes a decision. Table 5.1, range of E080 to E087. The UTC has requested an official position from IBM and feedback from SHARE on the glyphs used in the status area of a 3270 display. Table 5.2, range of E0A0 to E0AD. The UTC has requested that Microsoft provide a list of the full set of glyphs used to construct mathematical entities (brackets, braces, sigma, etc.). Previously, the UTC had decided not to encode these as characters. However, once this information is available, the UTC will revisit the issue. In addition, the UTC would appreciate your response to the following: a. What is the full set of terminal-emulation glyphs that you considered and how did you map those not in your proposal into Unicode? The UTC's concern is for round-trip integrity and distinguishing different characters so that the UTC avoids mapping the characters in you proposal to the same Unicode characters you used already for other glyphs in your full set. (The concern is not the characters from standard coded character sets like 7-bit ASCII and the ISO/IEC 8859 series, but rather the set of symbols outside of these sets.) b. Which of the following proposed characters could be unified with (mapped into) Unicode characters? 1) Can you provide (a) the source glyphs for the proposed E0AC and E0AD sigma/summation parts, and also (b) better glyphs for them. 2) What is the purpose of the proposed E0AE and E0AF characters? Are they supposed to be full corners for a box, or partial corners, or to provide the top and bottom corners of right brackets, or to provide serifs for the sigma (E0AC and E0AD)? Could the proposed E0AE and E0AF characters be unified with 231D top right corner and 231F bottom right corner? 3) For the proposed E0B0, could it be unified with either 2713 check mark or 221A radical? Is "small" the distinguishing characteristic for not unifying it with 2713 or 221A? 4) For the proposed E0B1, should it be unified with either 237B not check mark or 2415 symbol for negative acknowledge? 5) What is the purpose of the proposed E0D0 and E0D1 characters? Are they to be used to construct extended brackets and braces with the E0A0 to E0AB "extensible" characters? If so, then they should be moved to the mathematical symbol area of your proposal. If not, please explain how they might be used. 6) Could the proposed E0D2 to E0D5 triangle characters be unified with the 25E2 to 25E5 black triangle characters? 7) Could the proposed E0E5 diamond be unified with 25C6 black diamond? 8) Could the proposed E0EC be unified with 21E5 rightward arrow to bar? 9) Could the proposed E0ED be unified with 21E4 leftward arrow to bar? 3. Document L2/98-355, "Hex Byte Pictures for Unicode" Status: rejected The UTC considers that these are glyphs and, as such, they are out of the scope of the Unicode standard. Representing hex bytes visibly is a font-rendering issue rather than an information interchange issue. Suggestions Here are some suggestions for you to consider to help you meet your requirements. 1. Code the glyphs in the Private Use Area. If at some time in the future, the terminal emulation vendors are using the assignments, then you may resubmit your proposal (except for the pictures for hex bytes) to Unicode with this additional justification. It is beyond the scope of the UTC to encode characters in the Private Use Area or to endorse any particular use of characters in the Private Use Area. If the terminal emulation community believes that consistent use of private-use code positions is desirable, you might consider registering your code assignments in a registry for the Private Use Area such as the Conscript Registry. Note that Unicode does not endorse any registry for the Private Use Area. Both Adobe and Apple have described how each uses the Private Use Area. You may want to contact these organizations for additional information. 2. Register the glyphs with AFII (Association for Font Information Interchange). AFII is the registration authority for the ISO/IEC 10036 glyph registry. AFII charges a nominal fee for registering glyphs. If you are interested in pursuing this, contact AFII ( afii@unicode.org) for more information. 8-Dec-98 1:00:34-GMT,1315;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id UAA21174; Mon, 7 Dec 1998 20:00:17 -0500 (EST) Date: Mon, 7 Dec 98 20:00:17 EST From: Frank da Cruz To: "Hart, Edwin F." Cc: "'Aliprand, Joan'" , "'Whistler, Ken'" , "'McGowan, Rick'" , "'Thewlis, Dave'" Subject: Re: UTC response to your request to encode characters for terminal em ulation In-Reply-To: Your message of Mon, 7 Dec 1998 10:56:31 -0500 Message-ID: > What follows is the text of the response of the UTC to your proposals. (If > you would prefer, I also have a Word97 version.) > No thanks, I *like* plain text :-) > The UTC needs some > additional information from you (and IBM and SHARE) before it decides about > the "Terminal Graphics for Unicode" proposal. The next UTC meeting is in > February in Palo Alto and we would appreciate your response by then. > OK, I'm happy to keep following up. I'll try to have a detailed response for you by end of January. In the meantime, please let me know what comes in from IBM and SHARE. Thanks for your consideration and the detailed response. - Frank 8-Dec-98 14:52:44-GMT,2723;000000000015 Return-Path: Received: from aples2.jhuapl.edu (aples2.jhuapl.edu [128.244.26.86]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id JAA13274 for ; Tue, 8 Dec 1998 09:52:37 -0500 (EST) Received: by aples2.jhuapl.edu with Internet Mail Service (5.5.2232.9) id ; Tue, 8 Dec 1998 09:52:27 -0500 Message-ID: <91D1D51C2955D111B82B00805F19989501CD72A0@aples2.jhuapl.edu> From: "Hart, Edwin F." To: "'Frank da Cruz'" Subject: RE: UTC response to your request to encode characters for termina l em ulation Date: Tue, 8 Dec 1998 09:52:20 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain; charset="iso-8859-1" Frank, Since you sent your reports in plain ASCII text, I had thought that this was your preferred medium. I'm sorry that the response was not positive. The UTC recognized your concerns but felt that the resolution was more appropriate in the font/rendering arena rather than coding in Unicode. If you can get written support from terminal emulation vendors, it would strengthen your case. Some of the UTC players were rather vehement about not coding the hex digits so this part is really dead. Regarding the glyphs used in the 3270 status area, my feeling is that unless these are communicated from the controller to the 3270 terminal, the UTC will reject these. Email me with your phone number if you want to talk about any of this. Ed Edwin F. Hart Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 +1-240-228-6926 (from Washington, DC area) +1-443-778-6926 (from Baltimore area) +1-240-228-1093 (fax) edwin.hart@jhuapl.edu ---------- From: Frank da Cruz [SMTP:fdc@watsun.cc.columbia.edu] Sent: 07 December, 1998 20:00 To: Hart, Edwin F. Cc: 'Aliprand, Joan'; 'Whistler, Ken'; 'McGowan, Rick'; 'Thewlis, Dave' Subject: Re: UTC response to your request to encode characters for terminal em ulation > What follows is the text of the response of the UTC to your proposals. (If > you would prefer, I also have a Word97 version.) > No thanks, I *like* plain text :-) > The UTC needs some > additional information from you (and IBM and SHARE) before it decides about > the "Terminal Graphics for Unicode" proposal. The next UTC meeting is in > February in Palo Alto and we would appreciate your response by then. > OK, I'm happy to keep following up. I'll try to have a detailed response for you by end of January. In the meantime, please let me know what comes in from IBM and SHARE. Thanks for your consideration and the detailed response. - Frank 8-Dec-98 15:57:40-GMT,2245;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id KAA02356; Tue, 8 Dec 1998 10:57:22 -0500 (EST) Date: Tue, 8 Dec 98 10:57:22 EST From: Frank da Cruz To: "Hart, Edwin F." Subject: RE: UTC response to your request to encode characters for termina l em ulation In-Reply-To: Your message of Tue, 8 Dec 1998 09:52:20 -0500 Message-ID: > Since you sent your reports in plain ASCII text, I had thought that this was > your preferred medium. > You were right. > I'm sorry that the response was not positive. The UTC recognized your > concerns but felt that the resolution was more appropriate in the > font/rendering arena rather than coding in Unicode. If you can get written > support from terminal emulation vendors, it would strengthen your case. > But this is a competitive market. The other terminal emulation makers compete with us, so that will be tough, or at least awkward. And quite honestly, in a way this whole proposal was partly against my better judgement, since the other emulation companies have been profiting from our work for years, and had this proposal been approved, it would have solved a big problem for them. > Some of the UTC players were rather vehement about not coding the hex digits > so this part is really dead. > I expected that, but thought it should be entered into the record anyway, because I think it addesses issues that will come up again and again. > Regarding the glyphs used in the 3270 status area, my feeling is that unless > these are communicated from the controller to the 3270 terminal, the UTC > will reject these. > It's not a big deal. It would have been nice to have standardized glyphs, and I feel I did my duty by proposing them. So now we'll go ahead and put everything in the private use area and distribute custom fonts just like everybody else, and live with the fallout. Although I do plan to provide the requested responses, I don't feel there is much point, since there is no chance that any of this will get into Unicode 3.0 anyway, and I can't afford to drag this out forever -- we have deadlines too. - Frank 8-Dec-98 19:26:20-GMT,1202;000000000011 Return-Path: Received: from aples2.jhuapl.edu (aples2.jhuapl.edu [128.244.26.86]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id OAA01837 for ; Tue, 8 Dec 1998 14:26:19 -0500 (EST) Received: by aples2.jhuapl.edu with Internet Mail Service (5.5.2232.9) id ; Tue, 8 Dec 1998 14:26:15 -0500 Message-ID: <91D1D51C2955D111B82B00805F19989501CD72A8@aples2.jhuapl.edu> From: "Hart, Edwin F." To: "'da Cruz, Frank'" Subject: feedback from IBM on 3270 status symbols Date: Tue, 8 Dec 1998 14:26:12 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain Dave, IBM has raised the possibility of using of the symbols for documentation and this is certainly a valid use. The symbols are internal to the 3270 display terminal rather than communicated between it and the controller. Best regards, Ed Edwin F. Hart Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 +1-240-228-6926 (from Washington, DC area) +1-443-778-6926 (from Baltimore area) +1-240-228-1093 (fax) edwin.hart@jhuapl.edu 8-Dec-98 19:37:08-GMT,2025;000000000001 Return-Path: Received: (from fdc@localhost) by watsun.cc.columbia.edu (8.8.5/8.8.5) id OAA04895; Tue, 8 Dec 1998 14:36:17 -0500 (EST) Date: Tue, 8 Dec 98 14:36:17 EST From: Frank da Cruz To: "Hart, Edwin F." Subject: Re: feedback from IBM on 3270 status symbols In-Reply-To: Your message of Tue, 8 Dec 1998 14:26:12 -0500 Message-ID: > Dave, > Who's Dave? > IBM has raised the possibility of using of the symbols for documentation > and this is certainly a valid use. > The UTC probably won't see it that way, since I made that argument for many of the other characters that they rejected. > The symbols are internal to the 3270 display terminal rather than > communicated between it and the controller. > All symbols are internal to terminals. The same can be said for (e.g.) the backwards question mark, which is not sent by the host, but is displayed on screen to indicate some kind of error, and which was accepted by the UTC (although perhaps in some other context), or many other symbols that are displayed in response to communications from the host, but not necessarily mapped to a particular character. No big deal -- I think I made all the relevant arguments already. Even if these symbols are not sent by the host or the controller, they still are shown on the screen, and therefore PC based emulators will also need to show them on the screen. If these emulators are based on Unicode, but Unicode does not include these characters, then all such emulators will have to bundle custom fonts, each one probably incompatible with the other. On the other hand, if some company like Monotype makes a Unicode "terminal emulation font" with all these characters at well-defined positions (and in fact, this is exactly what will happen), then this will become a de facto standard anyway, which is as good as a real standard except it will conflict with other uses of the Private Use area. - Frank 8-Dec-98 20:33:52-GMT,3517;000000000001 Return-Path: Received: from aples2.jhuapl.edu (aples2.jhuapl.edu [128.244.26.86]) by watsun.cc.columbia.edu (8.8.5/8.8.5) with ESMTP id PAA20309 for ; Tue, 8 Dec 1998 15:33:48 -0500 (EST) Received: by aples2.jhuapl.edu with Internet Mail Service (5.5.2232.9) id ; Tue, 8 Dec 1998 15:33:48 -0500 Message-ID: <91D1D51C2955D111B82B00805F19989501CD72A9@aples2.jhuapl.edu> From: "Hart, Edwin F." To: "'Frank da Cruz'" Subject: RE: feedback from IBM on 3270 status symbols Date: Tue, 8 Dec 1998 15:33:47 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2232.9) Content-Type: text/plain Comments are embedded in your message. Edwin F. Hart Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 +1-240-228-6926 (from Washington, DC area) +1-443-778-6926 (from Baltimore area) +1-240-228-1093 (fax) edwin.hart@jhuapl.edu ---------- From: Frank da Cruz [SMTP:fdc@watsun.cc.columbia.edu] Sent: 08 December, 1998 14:36 To: Hart, Edwin F. Subject: Re: feedback from IBM on 3270 status symbols > Dave, > Who's Dave? Dave is my SHARE manager. I decided originally to write to him and then changed my mind and decided to write directly to you. > IBM has raised the possibility of using of the symbols for documentation > and this is certainly a valid use. > The UTC probably won't see it that way, since I made that argument for many of the other characters that they rejected. Well, I was not smart enough to repeat this argument at the meeting. You needed a better mouthpiece. : ) > The symbols are internal to the 3270 display terminal rather than > communicated between it and the controller. > All symbols are internal to terminals. The same can be said for (e.g.) the backwards question mark, which is not sent by the host, but is displayed on screen to indicate some kind of error, and which was accepted by the UTC (although perhaps in some other context), or many other symbols that are displayed in response to communications from the host, but not necessarily mapped to a particular character. All symbols are internal to terminals. Yes, but the third set represents graphic characters from a code table presumably invoked by a ISO/IEC 2022 control sequence and then by the 7-bit/8-bit code positions on the wire. Since the characters are communicated on the wire (and hence have inherent information content), the UTC is willing to consider encoding them. The alternate control characters and hex digits could be displayed using an alternate font and appropriate rendering software. No big deal -- I think I made all the relevant arguments already. Even if these symbols are not sent by the host or the controller, they still are shown on the screen, and therefore PC based emulators will also need to show them on the screen. If these emulators are based on Unicode, but Unicode does not include these characters, then all such emulators will have to bundle custom fonts, each one probably incompatible with the other. On the other hand, if some company like Monotype makes a Unicode "terminal emulation font" with all these characters at well-defined positions (and in fact, this is exactly what will happen), then this will become a de facto standard anyway, which is as good as a real standard except it will conflict with other uses of the Private Use area. - Frank