[15588] in Athena Bugs
sun4 8.1.11: /usr/bin/tr
daemon@ATHENA.MIT.EDU (Jacob Morzinski)
Tue Oct 21 01:10:13 1997
To: bugs@MIT.EDU
Date: Tue, 21 Oct 1997 01:10:09 EDT
From: "Jacob Morzinski" <jmorzins@MIT.EDU>
System name: portnoy
Type and version: SPARC/5 8.1.11 (with mkserv)
Display type: cgthree
What were you trying to do?
Use Solaris's /usr/bin/tr to translate lowercase characters
to uppercase:
/usr/bin/tr '[:lower:]' '[:upper:]' < latin1
What's wrong:
/usr/bin/tr gets into an off-by-one error when trying to POSIX-ly
convert the characters [\337-\366\370-\377] to upper case.
(That's the characters [_`abcdefghijklmnopqrstuvxyz{|}~].)
The bug also exists under Solaris 2.6.
Sample invocation:
% setenv LC_CTYPE iso_8859_1
% cat < latin1
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ (0100-0137, 0x40-0x5f)
`abcdefghijklmnopqrstuvwxyz{|}~ (0140-0176, 0x60-0x7e)
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ (0300-0337, 0xc0-0xdf)
`abcdefghijklmnopqrstuvwxyz{|}~ (0340-0377, 0xe0-0xff)
% /usr/bin/tr '[:lower:]' '[:upper:]' < latin1
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ (0100-0137, 0X40-0X5F)
`ABCDEFGHIJKLMNOPQRSTUVWXYZ{|}~ (0140-0176, 0X60-0X7E)
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^@ (0300-0337, 0XC0-0XDF)
ABCDEFGHIJKLMNOPQRSTUVXwYZ[\]^~ (0340-0377, 0XE0-0XFF)
Note especially the last line of each output.
"`" (0340) has been upcased into "A" (0301),
"a" (0341) has been upcased into "B" (0302),
and so on. Somehow we've gotten off-by-one errors.
What should have happened:
% perl5 -ne 'print uc($_)' < latin1
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ (0100-0137, 0X40-0X5F)
`ABCDEFGHIJKLMNOPQRSTUVWXYZ{|}~ (0140-0176, 0X60-0X7E)
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ (0300-0337, 0XC0-0XDF)
@ABCDEFGHIJKLMNOPQRSTUVwXYZ[\]^ (0340-0377, 0XE0-0XFF)
Please describe any relevant documentation references:
(/usr/man/man1/tr.1)
2. The next example translates all lower-case characters in
file1 to upper-case and writes the results to standard
output.
tr "[:lower:]" "[:upper:]" <file1
Thank you,
--
Jacob Morzinski jmorzins@mit.edu
You can reconstruct the "latin1" file from the body message, but
if it makes thing simpler, here's a uuencoded version of it:
begin 644 latin1
M0$%"0T1%1D=(24I+3$U.3U!14E-455976%E:6UQ=7E\@(" @(" @(" @(" @
M(" @*# Q,# M,#$S-RP@,'@T,"TP>#5F*0I@86)C9&5F9VAI:FML;6YO<'%R
M<W1U=G=X>7I[?'U^(" @(" @(" @(" @(" @(" H,#$T,"TP,3<V+" P>#8P
M+3!X-V4I"L#!PL/$Q<;'R,G*R\S-SL_0T=+3U-76U]C9VMO<W=[?(" @(" @
M(" @(" @(" @("@P,S P+3 S,S<L(#!X8S M,'AD9BD*X.'BX^3EYN?HZ>KK
M[.WN[_#Q\O/T]?;W^/GZ^_S]_O\@(" @(" @(" @(" @(" @*# S-# M,#,W
.-RP@,'AE,"TP>&9F*0KZ
end