[660] in linux-security and linux-alert archive
Re: [linux-security] two comments..
daemon@ATHENA.MIT.EDU (Detlef Lannert)
Thu Apr 4 15:03:41 1996
From: Detlef Lannert <lannert@lannert.rz.uni-duesseldorf.de>
To: hobbit@avian.org (*Hobbit*)
Date: Thu, 4 Apr 1996 15:01:51 +0200 (MET DST)
Cc: linux-security@tarsier.cv.nrao.edu, best-of-security@suburbia.net
In-Reply-To: <199604031802.NAA20575@narq.avian.org> from "*Hobbit*" at Apr 3, 96 01:02:59 pm
Reply-to: lannert@uni-duesseldorf.de
Hobbit wrote:
[Mod: Quoting trimmed. --Jeff.]
> static unsigned char bflgs[] = {
> 1, 1, 1, 1, 1, 1, 1, 1, /* nul - ^G */
[ 14 lines of obvious contents deleted ]
> 1, 1, 1, 1, 0, 1, 0, 0 /* x - del */
> }; /* bflgs */
>
> and characters are easily sanitized/checked/whatever by or-ing out the
> high bit from the character in question and doing something like
>
> register unsigned char q;
> register int x;
>
> for (x = 0; x <= len; x++) {
> q = (unsigned char) (string[x] & 0x7f); /* rip hibits */
> if (q == 0)
> break; /* end of string */
> if (bflgs[q] == 0) {
> appropriate code /* bad char, no donut */
> } else {
> appropriate code /* good char, DTRT */
> }
> } /* for */
Objection, Your Honour!
(a) Not every machine uses us-ascii encoding internally. Those with, say,
ebcdic are getting rare, but they still exist, and there may be other
strange architectures as well. Posix does not make any assumptions about
the specific order in which uc/lc letters, digits, etc. appear, and a
sensible program should not either.
(b) Not everyone uses the English alphabet. There are a few of us who
sometimes uses those strange characters in the code range of 128..255.
Even if full Unicode multibyte encodings are not supported, Latin-1
(ISO-8859-1) and the like should be acceptable. Just folding the upper
half of the possible one-byte values onto the lower half and then indexing
a table will produce, hmm, unsatisfactory results.
[Mod: For both of these points it strikes me that such
decisions--allowing extended character codes and addressing
ASCII/EBCDIC/etc. issues--should be made case-by-case depending upon the
intended use of the passed characters and the desired level of code
portability, respectively. There isn't One Great Solution here, IMHO,
so let's try not to "what if" this to death.... --Jeff.]
Let me suggest a different approach. It might have other flaws -- I'm not
a C guru and will be thankful for any corrections and/or improvements --
but it is quite short and, I hope, gets the code issue right. It burns
a few more bytes of storage, but that shouldn't really matter in this age
of the pentiums and megabytes. Here it goes:
/* initialization */
unsigned char badchars[] = "'\n\e\"" /* or such */;
unsigned char allchars[256] = { 0 };
int i;
unsigned char *p;
for (p = badchars; *p; p++)
allchars[*p] = 1 /* or whatsoever */;
/* some thousand lines later ... */
unsigned char string[...];
/* ... */
for (x = 0; x <= len; x++) {
if (q = allchars[string[x]]) {
/* handle a bad character */
} else {
/* do something useful */
}
}
I agree with your further remarks on using such a table to denote various
character classes, and on being restrictive wrt the acceptable control
characters. But -- if at all possible -- let the >0x7f characters pass!
Detlef
--
Detlef Lannert +49-211-8113905 E-Mail: lannert@uni-duesseldorf.de
PGP 2.x key available (finger lannert@clio.rz.uni-duesseldorf.de)
"Ordnung ist etwas fuer Leute, die nicht eins sind mit der Welt."
- Max Frisch