[15589] in Athena Bugs
sun4 8.1.11: /usr/bin/tr
daemon@ATHENA.MIT.EDU (Jacob Morzinski)
Tue Oct 21 01:29:13 1997
To: bugs@MIT.EDU
Date: Tue, 21 Oct 1997 01:29:10 EDT
From: "Jacob Morzinski" <jmorzins@MIT.EDU>
AARGH. All the eigth bits got stripped from the example text.
So, here's the bug report again, this time with MIME junk:
System name: portnoy
Type and version: SPARC/5 8.1.11 (with mkserv)
Display type: cgthree
What were you trying to do?
Use Solaris's /usr/bin/tr to translate lowercase characters
to uppercase:
/usr/bin/tr '[:lower:]' '[:upper:]' < latin1
What's wrong:
/usr/bin/tr gets into an off-by-one error when trying to POSIX-ly
convert the characters [\337-\366\370-\377] to upper case.
(That's the characters [=DF=E0=E1=E2=E3=E4=E5=E6=E7=E8=E9=EA=EB=EC=ED=EE=
=EF=F0=F1=F2=F3=F4=F5=F6=F8=F9=FA=FB=FC=FD=FE=FF].)
The bug also exists under Solaris 2.6.
Sample invocation:
% setenv LC_CTYPE iso_8859_1
% cat < latin1
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ (0100-0137, 0x40-0x5f=
)
`abcdefghijklmnopqrstuvwxyz{|}~ (0140-0176, 0x60-0x7e=
)
=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=D6=
=D7=D8=D9=DA=DB=DC=DD=DE=DF (0300-0337, 0xc0-0xdf)
=E0=E1=E2=E3=E4=E5=E6=E7=E8=E9=EA=EB=EC=ED=EE=EF=F0=F1=F2=F3=F4=F5=F6=
=F7=F8=F9=FA=FB=FC=FD=FE=FF (0340-0377, 0xe0-0xff)
% /usr/bin/tr '[:lower:]' '[:upper:]' < latin1
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ (0100-0137, 0X40-0X5F=
)
`ABCDEFGHIJKLMNOPQRSTUVWXYZ{|}~ (0140-0176, 0X60-0X7E=
)
=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=D6=
=D7=D8=D9=DA=DB=DC=DD=DE=C0 (0300-0337, 0XC0-0XDF)
=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=D6=D8=
=F7=D9=DA=DB=DC=DD=DE=FE=FF (0340-0377, 0XE0-0XFF)
Note especially the last line of each output.
"=E0" (0340) has been upcased into "=C1" (0301),
"=E1" (0341) has been upcased into "=C2" (0302),
and so on. Somehow we've gotten off-by-one errors.
What should have happened:
% perl5 -ne 'print uc($_)' < latin1
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ (0100-0137, 0X40-0X5F=
)
`ABCDEFGHIJKLMNOPQRSTUVWXYZ{|}~ (0140-0176, 0X60-0X7E=
)
=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=D6=
=D7=D8=D9=DA=DB=DC=DD=DE=DF (0300-0337, 0XC0-0XDF)
=C0=C1=C2=C3=C4=C5=C6=C7=C8=C9=CA=CB=CC=CD=CE=CF=D0=D1=D2=D3=D4=D5=D6=
=F7=D8=D9=DA=DB=DC=DD=DE=FF (0340-0377, 0XE0-0XFF)
Please describe any relevant documentation references:
(/usr/man/man1/tr.1)
2. The next example translates all lower-case characters in
file1 to upper-case and writes the results to standard
output.
tr "[:lower:]" "[:upper:]" <file1
Thank you,
-- =
Jacob Morzinski jmorzins@mit.edu
You can reconstruct the "latin1" file from the body message, but
if it makes thing simpler, here's a uuencoded version of it:
begin 644 latin1
M0$%"0T1%1D=3D(24I+3$U.3U!14E-455976%E:6UQ=3D7E\@(" @(" @(" @(" @
M(" @*# Q,# M,#$S-RP@,'@T,"TP>#5F*0I@86)C9&5F9VAI:FML;6YO<'%R
M<W1U=3DG=3DX>7I[?'U^(" @(" @(" @(" @(" @(" H,#$T,"TP,3<V+" P>#8P
M+3!X-V4I"L#!PL/$Q<;'R,G*R\S-SL_0T=3D+3U-76U]C9VMO<W=3D[?(" @(" @
M(" @(" @(" @("@P,S P+3 S,S<L(#!X8S M,'AD9BD*X.'BX^3EYN?HZ>KK
M[.WN[_#Q\O/T]?;W^/GZ^_S]_O\@(" @(" @(" @(" @(" @*# S-# M,#,W
=2E-RP@,'AE,"TP>&9F*0KZ
=
end