[32394] in Perl-Users-Digest
Perl-Users Digest, Issue: 3661 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Apr 9 16:09:28 2012
Date: Mon, 9 Apr 2012 13:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 9 Apr 2012 Volume: 11 Number: 3661
Today's topics:
Digest::combined 0.1 released <oneingray@gmail.com>
f python? <xahlee@gmail.com>
Re: f python? <martin.hellwig@gmail.com>
Re: f python? <dmcanzi@uwaterloo.ca>
Re: f python? <kaz@kylheku.com>
Re: f python? <hjp-usenet2@hjp.at>
Re: f python? <jurgenex@hotmail.com>
Re: f python? <kaz@kylheku.com>
Re: f python? <nobody@nowhere.com>
Re: f python? <xahlee@gmail.com>
Re: f python? (Seymour J.)
Re: f python? <roy@panix.com>
Re: f python? <kaz@kylheku.com>
Re: f python? <kaz@kylheku.com>
Re: How to keep the script from stopping or hanging <abhishek.jain.1985@gmail.com>
Re: looking for a hexagon tiling module <mvdwege@mail.com>
Missing utf8_heavy.pl <jimoeDESPAM@sohnen-moe.com>
Re: Missing utf8_heavy.pl <hjp-usenet2@hjp.at>
Re: Missing utf8_heavy.pl <jimoeDESPAM@sohnen-moe.com>
Re: Missing utf8_heavy.pl <hjp-usenet2@hjp.at>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 10 Apr 2012 02:13:02 +0700
From: Ivan Shmakov <oneingray@gmail.com>
Subject: Digest::combined 0.1 released
Message-Id: <86aa2kvhvl.fsf_-_@gray.siamics.net>
>>>>> Ben Morrow <ben@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:
[...]
>> PS. I'll try to file an RT ticket against Digest::SHA on whether my
>> module could be added to the distribution.
> I strongly suspect the answer will be 'no', given that Digest::SHA is
> in the core and really rather important (it's part of the CPAN
> toolchain, for instance). Still, there's no harm in asking.
This question was briefly discussed on CPAN RT [1] and it was
decided to implement a more generic Digest::combined Perl module
[2], available as a separate distribution, which I've happily
released today [3].
[1] https://rt.cpan.org/Ticket/Display.html?id=76044
[2] http://search.cpan.org/perldoc?Digest::combined
[3] http://search.cpan.org/~onegray/Digest-combined-0.1/
--
FSF associate member #7257
------------------------------
Date: Sun, 8 Apr 2012 04:11:20 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: f python?
Message-Id: <5d729516-89d5-44b0-9916-fcea22b1a610@v7g2000pbs.googlegroups.com>
hi guys,
sorry am feeling a bit prolifit lately.
today's show, is: =E3=80=88Fuck Python=E3=80=89
http://xahlee.org/comp/fuck_python.html
------------------------------------
Fuck Python
By Xah Lee, 2012-04-08
fuck Python.
just fucking spend 2 hours and still going.
here's the short story.
so recently i switched to a Windows version of python. Now, Windows
version takes path using win backslash, instead of cygwin slash. This
fucking broke my find/replace scripts that takes a dir level as input.
Because i was counting slashes.
Ok no problem. My sloppiness. After all, my implementation wasn't
portable. So, let's fix it. After a while, discovered there's the
=E3=80=8Cos.sep=E3=80=8D. Ok, replace =E3=80=8C"/"=E3=80=8D to =E3=80=8Cos.=
sep=E3=80=8D, done. Then, bang, all hell
went lose. Because, the backslash is used as escape in string, so any
regex that manipulate path got fucked majorly. So, now you need to
find a quoting mechanism. Then, fuck python doc incomprehensible
scattered comp-sci-r-us BNF shit. Then, fuck python for =E2=80=9Cos.path=E2=
=80=9D and
=E2=80=9Cos=E2=80=9D modules then string object and string functions incons=
istent
ball. And FUCK Guido who wants to fuck change python for his idiotic
OOP concept of =E2=80=9Celegance=E2=80=9D so that some of these are depreca=
ted.
So after several exploration of =E2=80=9Crepr()=E2=80=9D, =E2=80=9Cformat()=
=E2=80=9D, =E2=80=9C=E2=80=B9str=E2=80=BA.count()=E2=80=9D,
=E2=80=9Cos.path.normpath()=E2=80=9D, =E2=80=9Cre.split()=E2=80=9D, =E2=80=
=9Clen(re.search().group())=E2=80=9D etc,
after a long time, let's use =E2=80=9Cre.escape()=E2=80=9D. 2 hours has pas=
sed. Also,
discovered that =E2=80=9Cos.path.walk=E2=80=9D is now deprecated, and one i=
s supposed
to use the sparkling =E2=80=9Cos.walk=E2=80=9D. In the process of refreshin=
g my
python, the =E2=80=9Cos.path.walk=E2=80=9D semantics is really one fucked u=
p fuck.
Meanwhile, the =E2=80=9Cos.walk=E2=80=9D went into incomprehensible OOP obj=
ect and
iterators fuck.
now, it's close to 3 hours. This fix is supposed to be done in 10 min.
I'd have done it in elisp in just 10 minutes if not for my
waywardness.
This is Before
def process_file(dummy, current_dir, file_list):
current_dir_level =3D len(re.split("/", current_dir)) -
len(re.split("/", input_dir))
cur_file_level =3D current_dir_level+1
if min_level <=3D cur_file_level <=3D max_level:
for a_file in file_list:
if re.search(r"\.html$", a_file, re.U) and
os.path.isfile(current_dir + "/" + a_file):
replace_string_in_file(current_dir + "/" + a_file)
This is After
def process_file(dummy, current_dir, file_list):
current_dir =3D os.path.normpath(current_dir)
cur_dir_level =3D re.sub( "^" + re.escape(input_dir), "",
current_dir).count( os.sep)
cur_file_level =3D cur_dir_level + 1
if min_level <=3D cur_file_level <=3D max_level:
for a_file in file_list:
if re.search(r"\.html$", a_file, re.U) and
os.path.isfile(current_dir + re.escape(os.sep) + a_file):
replace_string_in_file(current_dir + os.sep + a_file)
# print "%d %s" % (cur_file_level, (current_dir + os.sep +
a_file))
Complete File
# -*- coding: utf-8 -*-
# Python
# find & replace strings in a dir
import os, sys, shutil, re
# if this this is not empty, then only these files will be processed
my_files =3D []
input_dir =3D "c:/Users/h3/web/xahlee_org/lojban/hrefgram2/"
input_dir =3D "/cygdrive/c/Users/h3/web/zz"
input_dir =3D "c:/Users/h3/web/xahlee_org/"
min_level =3D 2; # files and dirs inside input_dir are level 1.
max_level =3D 2; # inclusive
print_no_change =3D False
find_replace_list =3D [
(
u"""<iframe style=3D"width:100%;border:none" src=3D"http://xahlee.org/
footer.html"></iframe>""",
u"""<iframe style=3D"width:100%;border:none" src=3D"../footer.html"></
iframe>""",
),
]
def replace_string_in_file(file_path):
"Replaces all findStr by repStr in file file_path"
temp_fname =3D file_path + "~lc~"
backup_fname =3D file_path + "~bk~"
# print "reading:", file_path
input_file =3D open(file_path, "rb")
file_content =3D unicode(input_file.read(), "utf-8")
input_file.close()
num_replaced =3D 0
for a_pair in find_replace_list:
num_replaced +=3D file_content.count(a_pair[0])
output_text =3D file_content.replace(a_pair[0], a_pair[1])
file_content =3D output_text
if num_replaced > 0:
print "=E2=97=86 ", num_replaced, " ", file_path.replace("\\", "/")
shutil.copy2(file_path, backup_fname)
output_file =3D open(file_path, "r+b")
output_file.read() # we do this way instead of =E2=80=9Cos.rename=E2=
=80=9D to
preserve file creation date
output_file.seek(0)
output_file.write(output_text.encode("utf-8"))
output_file.truncate()
output_file.close()
else:
if print_no_change =3D=3D True:
print "no change:", file_path
# os.remove(file_path)
# os.rename(temp_fname, file_path)
def process_file(dummy, current_dir, file_list):
current_dir =3D os.path.normpath(current_dir)
cur_dir_level =3D re.sub( "^" + re.escape(input_dir), "",
current_dir).count( os.sep)
cur_file_level =3D cur_dir_level + 1
if min_level <=3D cur_file_level <=3D max_level:
for a_file in file_list:
if re.search(r"\.html$", a_file, re.U) and
os.path.isfile(current_dir + re.escape(os.sep) + a_file):
replace_string_in_file(current_dir + os.sep + a_file)
# print "%d %s" % (cur_file_level, (current_dir + os.sep +
a_file))
input_dir =3D os.path.normpath(input_dir)
if (len(my_files) !=3D 0):
for my_file in my_files:
replace_string_in_file(os.path.normpath(my_file) )
else:
os.path.walk(input_dir, process_file, "dummy")
print "Done."
------------------------------
Date: Sun, 08 Apr 2012 14:10:03 +0100
From: "Martin P. Hellwig" <martin.hellwig@gmail.com>
Subject: Re: f python?
Message-Id: <4F818E2B.2030100@gmail.com>
On 08/04/2012 12:11, Xah Lee wrote:
<cut all>
Hi Xah,
You clearly didn't want help on this subject, as you really now how to
do it anyway. But having read your posts over the years, I'd like to
give you an observation on your persona, free of charge! :-)
You are actually a talented writer, some may find your occasional
profanity offensive but at least it highlights your frustration.
You are undoubtedly and proven a good mathematian and more important
than that self taught. You have a natural feel for design (otherwise you
would not clash with others view of programming).
You know a mixture of programming languages.
Whether you like it or not, you are in the perfect position to create a
new programming language and design a new programming paradigm.
Unhindered from all the legacy crap, that keep people like me behind (I
actually like BNF for example).
It is likely I am wrong, but if that is your destiny there is no point
fighting it.
Cheers and good luck,
Martin
------------------------------
Date: Sun, 8 Apr 2012 17:03:28 +0000 (UTC)
From: "David Canzi" <dmcanzi@uwaterloo.ca>
Subject: Re: f python?
Message-Id: <jlsgd0$psj$1@rumours.uwaterloo.ca>
Xah Lee <xahlee@gmail.com> wrote:
>hi guys,
>
>sorry am feeling a bit prolifit lately.
>
>today's show, is: 'Fuck Python'
>http://xahlee.org/comp/fuck_python.html
>
>------------------------------------
>Fuck Python
> By Xah Lee, 2012-04-08
>
>fuck Python.
>
>just fucking spend 2 hours and still going.
>
>here's the short story.
>
>so recently i switched to a Windows version of python. Now, Windows
>version takes path using win backslash, instead of cygwin slash. This
>fucking broke my find/replace scripts that takes a dir level as input.
>Because i was counting slashes.
>
>Ok no problem. My sloppiness. After all, my implementation wasn't
>portable. So, let's fix it. After a while, discovered there's the
>'os.sep'. Ok, replace "/" to 'os.sep', done. Then, bang, all hell
>went lose. Because, the backslash is used as escape in string, so any
>regex that manipulate path got fucked majorly.
When Microsoft created MS-DOS, they decided to use '\' as
the separator in file names. This was at a time when several
previously existing interactive operating systems were using
'/' as the file name separator and at least one was using '\'
as an escape character. As a result of Microsoft's decision
to use '\' as the separator, people have had to do extra work
to adapt programs written for Windows to run in non-Windows
environments, and vice versa. People have had to do extra work
to write software that is portable between these environments.
People have done extra work while creating tools to make writing
portable software easier. And people have to do extra work when
they use these tools, because using them is still harder than
writing portable code for operating systems that all used '/'
as their separator would have been.
If you added up the cost of all the extra work that people have
done as a result of Microsoft's decision to use '\' as the file
name separator, it would probably be enough money to launch the
Burj Khalifa into geosynchronous orbit.
So, when you say fuck Python, are you sure you're shooting at the
right target?
--
David Canzi | TIMTOWWTDI (tim-toe-woe-dee): There Is More Than One
| Wrong Way To Do It
------------------------------
Date: Sun, 8 Apr 2012 17:25:38 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: f python?
Message-Id: <20120408101320.971@kylheku.com>
["Followup-To:" header set to comp.lang.lisp.]
On 2012-04-08, David Canzi <dmcanzi@uwaterloo.ca> wrote:
> Xah Lee <xahlee@gmail.com> wrote:
>>hi guys,
>>
>>sorry am feeling a bit prolifit lately.
>>
>>today's show, is: 'Fuck Python'
>>http://xahlee.org/comp/fuck_python.html
>>
>>------------------------------------
>>Fuck Python
>> By Xah Lee, 2012-04-08
>>
>>fuck Python.
>>
>>just fucking spend 2 hours and still going.
>>
>>here's the short story.
>>
>>so recently i switched to a Windows version of python. Now, Windows
>>version takes path using win backslash, instead of cygwin slash. This
>>fucking broke my find/replace scripts that takes a dir level as input.
>>Because i was counting slashes.
>>
>>Ok no problem. My sloppiness. After all, my implementation wasn't
>>portable. So, let's fix it. After a while, discovered there's the
>>'os.sep'. Ok, replace "/" to 'os.sep', done. Then, bang, all hell
>>went lose. Because, the backslash is used as escape in string, so any
>>regex that manipulate path got fucked majorly.
>
> When Microsoft created MS-DOS, they decided to use '\' as
> the separator in file names.
This is false. The MS-DOS (dare I say it) "kernel" accepts both forward and
backslashes as separators.
The application-level choice was once configurable through a variable
in COMMAND.COM. Then they hard-coded it to backslash.
However, Microsoft operating systems continued to (and until this day)
recognize slash as a path separator.
Only, there are broken userland programs on Windows which don't know this.
> So, when you say fuck Python, are you sure you're shooting at the
> right target?
I would have to say, probably yes.
------------------------------
Date: Sun, 8 Apr 2012 19:32:40 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: f python?
Message-Id: <slrnjo3its.56k.hjp-usenet2@hrunkner.hjp.at>
On 2012-04-08 17:03, David Canzi <dmcanzi@uwaterloo.ca> wrote:
> If you added up the cost of all the extra work that people have
> done as a result of Microsoft's decision to use '\' as the file
> name separator, it would probably be enough money to launch the
> Burj Khalifa into geosynchronous orbit.
So we have another contender for the Most Expensive One-byte Mistake?
Poul-Henning Kamp nominated the C/Unix guys:
http://queue.acm.org/detail.cfm?id=2010365
hp
--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Bill Code on asrg@irtf.org
------------------------------
Date: Sun, 08 Apr 2012 10:49:34 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: f python?
Message-Id: <4oj3o7lpuakrf317jgjsn9eubfqtl42s1m@4ax.com>
"David Canzi" <dmcanzi@uwaterloo.ca> wrote:
>Xah Lee <xahlee@gmail.com> wrote:
Please check whom you are replying to.
Do not feed the trolls, please.
jue
------------------------------
Date: Sun, 8 Apr 2012 19:14:45 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: f python?
Message-Id: <20120408114313.85@kylheku.com>
On 2012-04-08, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> On 2012-04-08 17:03, David Canzi <dmcanzi@uwaterloo.ca> wrote:
>> If you added up the cost of all the extra work that people have
>> done as a result of Microsoft's decision to use '\' as the file
>> name separator, it would probably be enough money to launch the
>> Burj Khalifa into geosynchronous orbit.
>
> So we have another contender for the Most Expensive One-byte Mistake?
The one byte mistake in DOS and Windows is recognizing two characters as path
separators. All code that correctly handles paths is complicated by having to
look for a set of characters instead of just scanning for a byte.
> http://queue.acm.org/detail.cfm?id=2010365
DOS backslashes are already mentioned in that page, but alas it perpetuates the
clueless myth that DOS and windows do not recognize any other path separator.
Worse, the one byte Unix mistake being covered is, disappointingly, just a
clueless rant against null-terminated strings.
Null-terminated strings are infinitely better than the ridiculous encapsulation of length + data.
For one thing, if s is a non-empty null terminated string then, cdr(s) is also
a string representing the rest of that string without the first character,
where cdr(s) is conveniently defined as s + 1.
Not only can compilers compress storage by recognizing that string literals are
the suffixes of other string literals, but a lot of string manipulation code is
simplified, because you can treat a pointer to interior of any string as a
string.
Because they are recursively defined, you can do elegant tail recursion on null
terminated strings:
const char *rec_strchr(const char *in, int ch)
{
if (*in == 0)
return 0;
else if (*in == ch)
return in;
else
return rec_strchr(in + 1, ch);
}
length + data also raises the question: what type is the length field? One
byte? Two bytes? Four? And then you have issues of byte order. Null terminated
C strings can be written straight to a binary file or network socket and be
instantly understood on the other end.
Null terminated strings have simplified all kids of text manipulation, lexical
scanning, and data storage/communication code resulting in immeasurable
savings over the years.
------------------------------
Date: Mon, 09 Apr 2012 00:43:49 +0100
From: Nobody <nobody@nowhere.com>
Subject: Re: f python?
Message-Id: <pan.2012.04.08.23.44.00.697000@nowhere.com>
On Sun, 08 Apr 2012 04:11:20 -0700, Xah Lee wrote:
> Ok no problem. My sloppiness. After all, my implementation wasn't
> portable. So, let's fix it. After a while, discovered there's the
> os.sep. Ok, replace "/" to os.sep, done. Then, bang, all hell
> went lose. Because, the backslash is used as escape in string, so any
> regex that manipulate path got fucked majorly. So, now you need to
> find a quoting mechanism.
if os.altsep is not None:
sep_re = '[%s%s]' % (os.sep, os.altsep)
else:
sep_re = '[%s]' % os.sep
But really, you should be ranting about regexps rather than Python.
They're convenient if you know exactly what you want to match, but a
nuisance if you need to generate the expression based upon data which is
only available at run-time (and re.escape() only solves one very specific
problem).
------------------------------
Date: Sun, 8 Apr 2012 22:45:13 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: Re: f python?
Message-Id: <5429c402-0db2-4937-8b0b-6ef81c51e536@v7g2000pbs.googlegroups.com>
Xah Lee wrote:
=C2=AB http://xahlee.org/comp/fuck_python.html =C2=BB
David Canzi wrote
=C2=ABWhen Microsoft created MS-DOS, they decided to use '\' as the
separator in file names. =C2=A0This was at a time when several previously
existing interactive operating systems were using '/' as the file name
separator and at least one was using '\' as an escape character. =C2=A0As a
result of Microsoft's decision to use '\' as the separator, people
have had to do extra work to adapt programs written for Windows to run
in non-Windows environments, and vice versa. =C2=A0People have had to do
extra work to write software that is portable between these
environments. People have done extra work while creating tools to
make writing portable software easier. =C2=A0And people have to do extra
work when they use these tools, because using them is still harder
than writing portable code for operating systems that all used '/' as
their separator would have been.=C2=BB
namekuseijin wrote:
> yes, absolutely. =C2=A0But you got 2 inaccuracies there: =C2=A01) Microso=
ft didn't create DOS; 2) fucking DOS was written in C, and guess what, it u=
ses \ as escape character. =C2=A0Fucking microsoft.
>
> > So, when you say fuck Python, are you sure you're shooting at the
> > right target?
>
> I agree. =C2=A0Fuck winDOS and fucking microsoft.
No. The choice to use backslash than slash is actually a good one.
because, slash is one of the useful char, far more so than backslash.
Users should be able to use that for file names.
i don't know the detailed history of path separator, but if i were to
blame, it's fuck unix. The entirety of unix, unix geek, unixers, unix
fuckheads. Fuck unix.
=E3=80=88On Unix Filename Characters Problem=E3=80=89
http://xahlee.org/UnixResource_dir/writ/unix_filename_chars.html
=E3=80=88On Unix File System's Case Sensitivity=E3=80=89
http://xahlee.org/UnixResource_dir/_/fileCaseSens.html
=E3=80=88UNIX Tar Problem: File Length Truncation, Unicode Name Support=E3=
=80=89
http://xahlee.org/comp/unix_tar_problem.html
=E3=80=88What Characters Are Not Allowed in File Names?=E3=80=89
http://xahlee.org/mswin/allowed_chars_in_file_names.html
=E3=80=88Unicode Support in File Names: Windows, Mac, Emacs, Unison, Rsync,
USB, Zip=E3=80=89
http://xahlee.org/mswin/unicode_support_file_names.html
=E3=80=88The Nature of the Unix Philosophy=E3=80=89
http://xahlee.org/UnixResource_dir/writ/unix_phil.html
Xah
------------------------------
Date: Mon, 09 Apr 2012 08:19:46 -0400
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: f python?
Message-Id: <4f82d3e2$1$fuzhry+tra$mr2ice@news.patriot.net>
In <20120408114313.85@kylheku.com>, on 04/08/2012
at 07:14 PM, Kaz Kylheku <kaz@kylheku.com> said:
>Null-terminated strings are infinitely better than the ridiculous
>encapsulation of length + data.
ROTF,LMAO!
>For one thing, if s is a non-empty null terminated string then,
>cdr(s) is also a string representing the rest of that string
>without the first character,
Are you really too clueless to differentiate between C and LISP?
>Null terminated strings have simplified all kids of text
>manipulation, lexical scanning, and data storage/communication
>code resulting in immeasurable savings over the years.
Yeah, especially code that needs to deal with lengths and nulls. It's
great for buffer overruns too.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Mon, 09 Apr 2012 08:45:03 -0400
From: Roy Smith <roy@panix.com>
Subject: Re: f python?
Message-Id: <roy-0AC15F.08450209042012@news.panix.com>
In article <4f82d3e2$1$fuzhry+tra$mr2ice@news.patriot.net>,
Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid> wrote:
> >Null terminated strings have simplified all kids of text
> >manipulation, lexical scanning, and data storage/communication
> >code resulting in immeasurable savings over the years.
>
> Yeah, especially code that needs to deal with lengths and nulls. It's
> great for buffer overruns too.
I once worked on a C++ project that used a string class which kept a
length count, but also allocated one extra byte and stuck a null at the
end of every string.
------------------------------
Date: Mon, 9 Apr 2012 18:55:28 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: f python?
Message-Id: <20120409111329.694@kylheku.com>
On 2012-04-09, Shmuel Metz <spamtrap@library.lspace.org.invalid> wrote:
> In <20120408114313.85@kylheku.com>, on 04/08/2012
> at 07:14 PM, Kaz Kylheku <kaz@kylheku.com> said:
>
>>Null-terminated strings are infinitely better than the ridiculous
>>encapsulation of length + data.
>
> ROTF,LMAO!
>
>>For one thing, if s is a non-empty null terminated string then,
>>cdr(s) is also a string representing the rest of that string
>>without the first character,
>
> Are you really too clueless to differentiate between C and LISP?
In Lisp we can burn a list literal like '(a b c) into ROM, and compute (b c)
without allocating any memory.
Null-terminated C strings do the same thing.
In some Lisp systems, in fact, "CDR coding" was used to save space when
allocating a list all at once. This created something very similar to
a C string: a vector-like object of all the CARs, with a terminating
convention marking the end.
It's logically very similar.
I need not repeat the elegant recursion example for walking a C string.
That example is not possible with the length + data representation.
(Not without breaking the encapsulation and passing the length as a separate
recursion parameter to a recursive routine that works with the raw data part of
the string.)
>>Null terminated strings have simplified all kids of text
>>manipulation, lexical scanning, and data storage/communication
>>code resulting in immeasurable savings over the years.
>
> Yeah, especially code that needs to deal with lengths and nulls.
To get the length of a string, you call a function, in either representation,
so it is not any more complicated from a coding point of view. The function is,
of course, more expensive if the string is null terminated, but you can code
with awareness of this and not call length wastefully.
If all else was equal (so that the expense of the length operation were
the /only/ issue) then of course the length + data would be better.
However, all else is not equal.
One thing that is darn useful, for instance, is that
p + strlen(p) still points to a string which is length zero, and this
sort of thing is widely exploited in text processing code. e.g.
size_t digit_prefix_len = strspn(input_string, "0123456789");
const char *after_digits = input-string + digit_prefix_len;
if (*after_digits == 0) {
/* string consists only of digits: nothing after digits */
} else {
/* process part after digits */
}
It's nice that after_digits is a bona-fide string just like input_string,
without any memory allocation being required.
We can lexically analyze a string without ever asking it what its length is,
and as we march down the string, the remaining suffix of that string is always
a string so we can treat it as one, recurse on it, whatever.
Code that needs to deal with null "characters" is manipulating binary data, not
text, and should use a suitable data structure for that.
> It's great for buffer overruns too.
If we scan for a null terminator which is not there, we have a buffer overrun.
If a length field in front of string data is incorrect, we also have a buffer
overrrun.
A pattern quickly emerges here: invalid, corrupt data produced by buggy code
leads to incorrect results, and behavior that is not well-defined!
------------------------------
Date: Mon, 9 Apr 2012 19:00:48 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: f python?
Message-Id: <20120409113544.49@kylheku.com>
On 2012-04-09, Roy Smith <roy@panix.com> wrote:
> In article <4f82d3e2$1$fuzhry+tra$mr2ice@news.patriot.net>,
> Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid> wrote:
>
>> >Null terminated strings have simplified all kids of text
>> >manipulation, lexical scanning, and data storage/communication
>> >code resulting in immeasurable savings over the years.
>>
>> Yeah, especially code that needs to deal with lengths and nulls. It's
>> great for buffer overruns too.
>
> I once worked on a C++ project that used a string class which kept a
> length count, but also allocated one extra byte and stuck a null at the
> end of every string.
Me too! I worked on numerous C++ projects with such a string template
class.
It was usually called
std::basic_string
and came from this header called:
#include <string>
which also instantiated it into two flavors under two nicknames:
std::basic_string<char> being introduced as std::string, and
std::basic_string<wchar_t> as std::wstring.
This class had a c_str() function which retrieved a null-terminated
string and so most implementations just stored the data that way, but
some of the versions of that class cached the length of the string
to avoid doing a strlen or wcslen operation on the data.
------------------------------
Date: Mon, 9 Apr 2012 11:17:05 -0700 (PDT)
From: Abhishek Jain <abhishek.jain.1985@gmail.com>
Subject: Re: How to keep the script from stopping or hanging
Message-Id: <8339513.396.1333995425405.JavaMail.geo-discussion-forums@pboo1>
Does anyone think of WARN ??
------------------------------
Date: Sun, 08 Apr 2012 15:07:47 +0200
From: Mart van de Wege <mvdwege@mail.com>
Subject: Re: looking for a hexagon tiling module
Message-Id: <86zkam1ie4.fsf@gaheris.avalon.lan>
Eli the Bearded <*@eli.users.panix.com> writes:
> I'd like a module to manipulate co-ordinates on an X,Y plane that
> have an overlapping hexagon grid. Consider, for example, a game
> that uses a hexagonal grid to move around. Chinese Checkers (star
> halma) is one such game. You would want functions to turn a mouse
> click into a hexagon tile id of some sort, and you'd want functions
> to calculate how to draw the tiles if the side is S pixels, or
> if you want to fit N tiles across on a M wide field.
>
I've actually done the basic math to generate this using Cairo:
http://mvdwege.wordpress.com/2011/07/07/math-for-fun/
I still have to work out the rest of the code to really generate maps,
and my final usage scenario is different from yours, but hopefully it's
useful.
Regards,
Mart
--
"We will need a longer wall when the revolution comes."
--- AJS, quoting an uncertain source.
------------------------------
Date: Sun, 08 Apr 2012 11:49:22 -0700
From: James Moe <jimoeDESPAM@sohnen-moe.com>
Subject: Missing utf8_heavy.pl
Message-Id: <_oGdnZLFcbUuQBzSnZ2dnUVZ5r6dnZ2d@giganews.com>
Hello,
perl v5.14.2
assp v1.7.5.7 (yes, it is old)
opensuse v12.1
ASSP = anti-spam smtp proxy, a bayesian spam filter.
I attempted to move ASSP from a server that was running perl v5.8.6 to
another server. After installing and configuring, all attempts to send
mail result in this error:
"Mainloop: Can't locate utf8_heavy.pl in @INC"
(@INC contains: .
/usr/lib/perl5/site_perl/5.14.2/x86_64-linux-thread-multi
/usr/lib/perl5/site_perl/5.14.2
/usr/lib/perl5/vendor_perl/5.14.2/x86_64-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.14.2
/usr/lib/perl5/5.14.2/x86_64-linux-thread-multi
/usr/lib/perl5/5.14.2
/usr/lib/perl5/site_perl /usr/local/bin/assp-v1)
at /usr/lib/perl5/5.14.2/utf8.pm line 17.;
despite the fact that utf8_heavy.pl (and ./unicore and ./Unicode) exists
in /usr/lib/perl5/site_perl/5.14.2/
How do I resolve this problem?
--
James Moe
jmm-list at sohnen-moe dot com
------------------------------
Date: Sun, 8 Apr 2012 21:12:33 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Missing utf8_heavy.pl
Message-Id: <slrnjo3op1.n21.hjp-usenet2@hrunkner.hjp.at>
On 2012-04-08 18:49, James Moe <jimoeDESPAM@sohnen-moe.com> wrote:
> Hello,
> perl v5.14.2
> assp v1.7.5.7 (yes, it is old)
> opensuse v12.1
>
> ASSP = anti-spam smtp proxy, a bayesian spam filter.
> I attempted to move ASSP from a server that was running perl v5.8.6 to
> another server. After installing and configuring, all attempts to send
> mail result in this error:
>
> "Mainloop: Can't locate utf8_heavy.pl in @INC"
> (@INC contains: .
> /usr/lib/perl5/site_perl/5.14.2/x86_64-linux-thread-multi
[...]
> /usr/lib/perl5/5.14.2
> /usr/lib/perl5/site_perl /usr/local/bin/assp-v1)
> at /usr/lib/perl5/5.14.2/utf8.pm line 17.;
>
> despite the fact that utf8_heavy.pl (and ./unicore and ./Unicode) exists
> in /usr/lib/perl5/site_perl/5.14.2/
utf8_heavy.pl is part of the perl core. It should not be in
/usr/lib/perl5/site_perl, but in /usr/lib/perl5/5.14.2. Did you copy
that yourself?
hp
--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Bill Code on asrg@irtf.org
------------------------------
Date: Sun, 08 Apr 2012 23:04:34 -0700
From: James Moe <jimoeDESPAM@sohnen-moe.com>
Subject: Re: Missing utf8_heavy.pl
Message-Id: <e-KdndhZAJ1p5h_SnZ2dnUVZ5hmdnZ2d@giganews.com>
On 04/08/2012 12:12 PM, Peter J. Holzer wrote:
>> I attempted to move ASSP from a server that was running perl v5.8.6 to
>> another server. After installing and configuring, all attempts to send
>> mail result in this error:
>>
>> "Mainloop: Can't locate utf8_heavy.pl in @INC"
>> (@INC contains: .
>>
>> despite the fact that utf8_heavy.pl (and ./unicore and ./Unicode) exists
>> in /usr/lib/perl5/site_perl/5.14.2/
>
> utf8_heavy.pl is part of the perl core. It should not be in
> /usr/lib/perl5/site_perl, but in /usr/lib/perl5/5.14.2. Did you copy
> that yourself?
>
The files are also in /usr/lib/perl5/5.14.2/. Regardless, since the
files are in both places of the search path, how is it that they cannot
be found?
The installation is what I ended up with after an update from opensuse
v11.4 to v12.1.
--
James Moe
jmm-list at sohnen-moe dot com
------------------------------
Date: Mon, 9 Apr 2012 20:17:36 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Missing utf8_heavy.pl
Message-Id: <slrnjo69u0.fdh.hjp-usenet2@hrunkner.hjp.at>
On 2012-04-09 06:04, James Moe <jimoeDESPAM@sohnen-moe.com> wrote:
> On 04/08/2012 12:12 PM, Peter J. Holzer wrote:
>>> I attempted to move ASSP from a server that was running perl v5.8.6 to
>>> another server. After installing and configuring, all attempts to send
>>> mail result in this error:
>>>
>>> "Mainloop: Can't locate utf8_heavy.pl in @INC"
>>> (@INC contains: .
>>>
>>> despite the fact that utf8_heavy.pl (and ./unicore and ./Unicode) exists
>>> in /usr/lib/perl5/site_perl/5.14.2/
>>
>> utf8_heavy.pl is part of the perl core. It should not be in
>> /usr/lib/perl5/site_perl, but in /usr/lib/perl5/5.14.2. Did you copy
>> that yourself?
>>
> The files are also in /usr/lib/perl5/5.14.2/. Regardless, since the
> files are in both places of the search path, how is it that they cannot
> be found?
Hard to say from a distance. You could try strace to check what it is
really doing.
Preferrably with a simple script like this:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use 5.010;
binmode STDOUT, ":encoding(UTF-8)";
my $s = "Käsefüße";
say $s;
say lc $s;
say uc $s;
__END__
But simple fact that you have a core file in a non-core location is
strong indication that your perl installation is messed up. Maybe one of
your utf8_heavy.pl files doesn't match your perl version, or maybe some
of the files that utf8_heavy.pl is trying to load is missing (although I
would expect a different error message in this case). Or maybe
permissions are wrong ...
> The installation is what I ended up with after an update from opensuse
> v11.4 to v12.1.
Maybe something went wrong during the update. Have you tried a fresh
installation of opensuse v12.1?
hp
--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Bill Code on asrg@irtf.org
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3661
***************************************