[30468] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1711 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jul 11 14:10:02 2008

Date: Fri, 11 Jul 2008 11:09:26 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 11 Jul 2008     Volume: 11 Number: 1711

Today's topics:
        empty first element after split <mh_usenet@gmx.de>
    Re: empty first element after split <jimsgibson@gmail.com>
    Re: empty first element after split <donotreply@somethingveryunique.de>
    Re: empty first element after split <daveb@addr.invalid>
    Re: empty first element after split <daveb@addr.invalid>
    Re: FAQ 1.12 What's the difference between "perl" and " <semysig@gmail.com>
    Re: FAQ 1.12 What's the difference between "perl" and " <semysig@gmail.com>
    Re: help with regular expression <semysig@gmail.com>
        Help: Upgrade Problem <openlinuxsource@gmail.com>
    Re: Help: Upgrade Problem <spamtrap@dot-app.org>
    Re: Help: Upgrade Problem <openlinuxsource@gmail.com>
    Re: How to get Perl CGI working on Mac OS 10.5? <szrRE@szromanMO.comVE>
    Re: How to get Perl CGI working on Mac OS 10.5? <spamtrap@dot-app.org>
    Re: How to get Perl CGI working on Mac OS 10.5? <szrRE@szromanMO.comVE>
    Re: How to get Perl CGI working on Mac OS 10.5? <cwilbur@chromatico.net>
    Re: How to get Perl CGI working on Mac OS 10.5? <spamtrap@dot-app.org>
    Re: How to get Perl CGI working on Mac OS 10.5? <szrRE@szromanMO.comVE>
    Re: Question about Encode (Windows-1252 to utf-8) worldcyclist@gmail.com
    Re: Question regarding Encode worldcyclist@gmail.com
    Re: Question regarding Encode <fawaka@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 11 Jul 2008 17:14:27 +0200
From: Michael Hamer <mh_usenet@gmx.de>
Subject: empty first element after split
Message-Id: <g57tcj$gpc$01$1@news.t-online.com>

Hi,

the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash 
Records to Text Files" the following code to read a hash from a file:

$/ = "";                # paragraph read mode
while (<>) {
     my @fields = split /^([^:]+):\s*/m;
     shift @fields;      # for leading null field
     push(@Array_of_Records, { map /(.*)/, @fields });
}

or, a bit simpler for testing:

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
	print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is 
$fields[0] empty after the split? I would have expected it to contain 
the "a" but that is found in $fields[1].


------------------------------

Date: Fri, 11 Jul 2008 08:40:07 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: empty first element after split
Message-Id: <110720080840074669%jimsgibson@gmail.com>

In article <g57tcj$gpc$01$1@news.t-online.com>, Michael Hamer
<mh_usenet@gmx.de> wrote:

> Hi,
> 
> the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash 
> Records to Text Files" the following code to read a hash from a file:
> 
> $/ = "";                # paragraph read mode
> while (<>) {
>      my @fields = split /^([^:]+):\s*/m;
>      shift @fields;      # for leading null field
>      push(@Array_of_Records, { map /(.*)/, @fields });
> }
> 
> or, a bit simpler for testing:
> 
> $text="a:b\nc:d\ne:f";
> my @fields = split /^([^:]+):\s*/m,$text;
> foreach $key (@fields)
> {
>   print "field is $key\n";
> }
> 
> This code works as intended, but I don't understand it. Why is 
> $fields[0] empty after the split? I would have expected it to contain 
> the "a" but that is found in $fields[1].

It is because this code is using split in an inverted fashion.
Normally, split is looking for substrings separated by delimiters,
returning the substrings and discarding the delimiters. Here,
parentheses are used to capture a portion of the delimiters, and split
is returning the captured portion intermixed with the substrings.
Therefore, the first field ('a' in your example) is actually part of a
delimiter, not part of a substring, and it is the portion of the string
that precedes the first delimiter that ends up in $field[0]. Since
there is nothing before the 'a', there is nothing in $field[0].

-- 
Jim Gibson


------------------------------

Date: 11 Jul 2008 15:45:39 GMT
From: "Heiko Eißfeldt" <donotreply@somethingveryunique.de>
Subject: Re: empty first element after split
Message-Id: <Xns9AD8B4A997914heikohexcode@151.189.20.10>

Michael Hamer <mh_usenet@gmx.de> wrote in news:g57tcj$gpc$01$1@news.t-
online.com:

> $text="a:b\nc:d\ne:f";
> my @fields = split /^([^:]+):\s*/m,$text;
> foreach $key (@fields)
> {
>      print "field is $key\n";
> }
> 
> This code works as intended, but I don't understand it. Why is 
> $fields[0] empty after the split? I would have expected it to contain 
> the "a" but that is found in $fields[1].


because there is nothing matching before the delimiter.
A simple split delivers items seperated by the delimiter without the 
delimiter parts.
If you want parts from the delimiter also, you need to capture them.

The delimiter is ^([^:]+):\s* which matches a:. Everything
before the : is captured, so you get 
undef, a, b, c, d, e, f



------------------------------

Date: Fri, 11 Jul 2008 17:57:51 +0200
From: Dave B <daveb@addr.invalid>
Subject: Re: empty first element after split
Message-Id: <g57vsn$af8$1@registered.motzarella.org>

Michael Hamer wrote:
> Hi,
> 
> the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash 
> Records to Text Files" the following code to read a hash from a file:
> 
> $/ = "";                # paragraph read mode
> while (<>) {
>      my @fields = split /^([^:]+):\s*/m;
>      shift @fields;      # for leading null field
>      push(@Array_of_Records, { map /(.*)/, @fields });
> }
> 
> or, a bit simpler for testing:
> 
> $text="a:b\nc:d\ne:f";
> my @fields = split /^([^:]+):\s*/m,$text;
> foreach $key (@fields)
> {
> 	print "field is $key\n";
> }
> 
> This code works as intended, but I don't understand it. Why is 
> $fields[0] empty after the split? I would have expected it to contain 
> the "a" but that is found in $fields[1].

My understanding is that, since the separator is not the single blank " ",
and the string starts with a delimiter, there's always a null field "before"
the first delimiter, and split by default does not remove leading empty
fields (this is similar to what happens with awk when the separator is not
the default blank).

If you split on /:/, you should get no empty leading fields (though there
may be other reasons to prefer the original method).

-- 
D.


------------------------------

Date: Fri, 11 Jul 2008 17:59:32 +0200
From: Dave B <daveb@addr.invalid>
Subject: Re: empty first element after split
Message-Id: <g57vvt$af8$2@registered.motzarella.org>

"Heiko Ei�������������������������������������������" wrote:

> The delimiter is ^([^:]+):\s* which matches a:. Everything
> before the : is captured, so you get 
> undef, a, b, c, d, e, f

I think the first field is the empty string, rather than undef (but I may be
wrong).

-- 
D.


------------------------------

Date: Fri, 11 Jul 2008 10:22:46 -0700
From: "Gordon Corbin Etly" <semysig@gmail.com>
Subject: Re: FAQ 1.12 What's the difference between "perl" and "Perl"?
Message-Id: <6dpj77F3q2n6U1@mid.individual.net>

Jürgen Exner wrote:
> sln@netherlands.com wrote:
> > On Thu, 10 Jul 2008 11:52:43 -0700, "Gordon Corbin Etly" wrote:

> I am trying, man, I am trying.

He wasn't saying that to me. Read his posts more carefully. Please don't 
misquote.


> Unfortunately he keeps changing his identity every day.

This is a lie. I have never changed my identity. I have never left any 
question as to who I am. I have always posted with the same first and 
last name, so you always know it was me, and you always have a choice of 
not reading it.


-- 
Gordon C. Etly
Email: perl -e "print q{}.reverse(q{moc.liamg@ylte.nodrog})" 




------------------------------

Date: Fri, 11 Jul 2008 10:31:51 -0700
From: "Gordon Corbin Etly" <semysig@gmail.com>
Subject: Re: FAQ 1.12 What's the difference between "perl" and "Perl"?
Message-Id: <6dpjo9F3nee2U1@mid.individual.net>

Glenn Jackman wrote:
> At 2008-07-10 02:52PM, "Gordon Corbin Etly" wrote:
> >  Then please read the article for yourself:

> > < http://tinyurl.com/6joyuz >
> >  " Perl not only stands for the Practical Extraction and Report
> >  " Language, but it also stands for the Pathologically Eclectic
> >  " Rubbish Lister.
> >
> >  These are Larry Walls own words.

> Here are some more:
>
> http://www.linuxjournal.com/article/3394
>
>    Eventually I came up with the name "pearl", with the gloss
>    Practical Extraction and Report Language. The "a" was still in the
>    name when I made that one up. But I heard rumors of some obscure
>    graphics language named "pearl", so I shortened it to "perl". (The
>    "a" had already disappeared by the time I gave Perl its alternate
>    gloss, Pathologically Eclectic Rubbish Lister.)
>    ...
>    we realized about the time of Perl 4 that it was useful to
>    distinguish between "perl" the program and "Perl" the language.

How does this negate the validity of "Practical Extraction and Report 
Language" (or even the alternate phrase?) It seems to me that he has 
given a two expansions in two separate articles. Whether it's a "gloss" 
or "not only stands for", or some such, it's a meaning the Larry Wall 
himself gave, and as such, there is nothing wrong with writing that in a 
shortened form. No one, I repeat, NO ONE has given any real reason that 
makes this truly invalid. What you and others keep doing is throwing out 
the same tired excuses of why you don't like it, or trying to pull 
technicalities based on the insignificant wording of a quotation, and 
ignoring what he actually said. Twice. Two separate columns. How many 
more are needed?



-- 
Gordon C. Etly
Email: perl -e "print q{}.reverse(q{moc.liamg@ylte.nodrog})" 




------------------------------

Date: Fri, 11 Jul 2008 10:15:28 -0700
From: "Gordon Corbin Etly" <semysig@gmail.com>
Subject: Re: help with regular expression
Message-Id: <6dpipjF3pj1kU1@mid.individual.net>

John W. Krahn wrote:
> Gordon Corbin Etly wrote:

> > While I realize this wasn't the best display of typing skill, I
> > hardly think it was necessary to correct two small mistakes;

> Then I guess you missed the other two corrections?

I see them now after reading it again. It was nice you, but it really 
wasn't necessary, as they were just typing mistakes, (as opposed to 
genuinely having an insufficient understanding of the language) 
resulting from composing a post in haste.


--
Gordon C. Etly
Email: perl -e "print q{}.reverse(q{moc.liamg@ylte.nodrog})" 




------------------------------

Date: Fri, 11 Jul 2008 22:44:56 +0800
From: Amy Lee <openlinuxsource@gmail.com>
Subject: Help: Upgrade Problem
Message-Id: <pan.2008.07.11.14.44.56.290829@gmail.com>

Hello,

What if I download perl-5.10.0 and to upgrade, whether I need to reinstall
the modules?

Thank you~

Regards,

Amy


------------------------------

Date: Fri, 11 Jul 2008 11:16:52 -0400
From: Sherman Pendley <spamtrap@dot-app.org>
Subject: Re: Help: Upgrade Problem
Message-Id: <m1zloo78nv.fsf@dot-app.org>

Amy Lee <openlinuxsource@gmail.com> writes:

> What if I download perl-5.10.0 and to upgrade, whether I need to
> reinstall the modules?

Yes, you will need to reinstall any modules that have an XS
component. That is, those that have a compiled binary, not just pure
Perl code. That's usually not necessary for a minor update, from 5.8.x
to 5.8.y for example, but going to 5.10 from an earlier 5.x release is
a major update.

sherm--

-- 
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net


------------------------------

Date: Fri, 11 Jul 2008 23:48:07 +0800
From: Amy Lee <openlinuxsource@gmail.com>
Subject: Re: Help: Upgrade Problem
Message-Id: <pan.2008.07.11.15.48.06.69638@gmail.com>

On Fri, 11 Jul 2008 11:16:52 -0400, Sherman Pendley wrote:

> Amy Lee <openlinuxsource@gmail.com> writes:
> 
>> What if I download perl-5.10.0 and to upgrade, whether I need to
>> reinstall the modules?
> 
> Yes, you will need to reinstall any modules that have an XS
> component. That is, those that have a compiled binary, not just pure
> Perl code. That's usually not necessary for a minor update, from 5.8.x
> to 5.8.y for example, but going to 5.10 from an earlier 5.x release is
> a major update.
> 
> sherm--
Thank you very much~

Amy


------------------------------

Date: Fri, 11 Jul 2008 10:05:00 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: How to get Perl CGI working on Mac OS 10.5?
Message-Id: <g583rt01j46@news4.newsguy.com>

Keith Keller wrote:
> On 2008-07-10, vol30w60 <vol30w60@yahoo.com> wrote:
>> I've tried searching this topic and only found a few tutorials on
>> this, but I still cannot get CGI working.
>
> Then you probably have an Apache problem, not a Perl problem.
>
>> When I try to access a cgi script, it outputs the code (does not
>> process the script):
>
> Then you definitely have an Apache problem, not a Perl problem.
>
> Since you have an Apache problem, I suggest you try asking your
> question in an Apache newsgroup.  (I believe alt.apache.configuration
> might be a good bet.)


<side question>
I never understood why that news group wasn't created up the 'comp.*' 
hierarchy and why it was made under 'alt.*', was there any particular 
reason for this?
</side question>

-- 
szr 




------------------------------

Date: Fri, 11 Jul 2008 13:26:56 -0400
From: Sherman Pendley <spamtrap@dot-app.org>
Subject: Re: How to get Perl CGI working on Mac OS 10.5?
Message-Id: <m1abgo1gdb.fsf@dot-app.org>

"szr" <szrRE@szromanMO.comVE> writes:

> I never understood why that news group wasn't created up the 'comp.*' 
> hierarchy and why it was made under 'alt.*', was there any particular 
> reason for this?

Huh? You posted this to comp.lang.perl.misc. There *is* an alt.perl
group, but this ain't it. :-)

sherm--

-- 
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net


------------------------------

Date: Fri, 11 Jul 2008 10:41:18 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: How to get Perl CGI working on Mac OS 10.5?
Message-Id: <g585vv01l3u@news4.newsguy.com>

Sherman Pendley wrote:
> "szr" <szrRE@szromanMO.comVE> writes:
>
>> I never understood why that news group wasn't created up the 'comp.*'
>> hierarchy and why it was made under 'alt.*', was there any particular
>> reason for this?
>
> Huh? You posted this to comp.lang.perl.misc. There *is* an alt.perl
> group, but this ain't it. :-)

I wasn't directing this at the 'alt.*' hierarchy, but rather the 
'comp.*' hierarchy. I am just curious of why it was created in 'alt.*' 
and not 'comp.*' like many other major languages, db, protocol and such, 
are. I would of thought a dedicated apache news group would have been a 
ripe candidate for the 'comp.*' hierarchy.

(And I don't think I've ever used or seen the word "hierarchy" so much 
in a single short paragraph before. :-) )

-- 
szr 




------------------------------

Date: Fri, 11 Jul 2008 13:34:50 -0400
From: Charlton Wilbur <cwilbur@chromatico.net>
Subject: Re: How to get Perl CGI working on Mac OS 10.5?
Message-Id: <86lk08gw91.fsf@mithril.chromatico.net>

>>>>> "szr" == szr  <szrRE@szromanMO.comVE> writes:

    szr> <side question> I never understood why that news group
    szr> [alt.apache.configuration] wasn't created up the 'comp.*'
    szr> hierarchy and why it was made under 'alt.*', was there any
    szr> particular reason for this?  </side question>

Most likely, because there's a lengthy and involved process for creating
groups under comp.*, but creating an alt.* group only requires a single
sysadmin issuing a command.

Charlton




-- 
Charlton Wilbur
cwilbur@chromatico.net


------------------------------

Date: Fri, 11 Jul 2008 13:57:19 -0400
From: Sherman Pendley <spamtrap@dot-app.org>
Subject: Re: How to get Perl CGI working on Mac OS 10.5?
Message-Id: <m1iqvccni8.fsf@dot-app.org>

"szr" <szrRE@szromanMO.comVE> writes:

> are. I would of thought a dedicated apache news group would have been a 
> ripe candidate for the 'comp.*' hierarchy.

Oh, you were talking about the Apache group! Sorry, my bad - I thought
you were talking about this one.

sherm--

-- 
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net


------------------------------

Date: Fri, 11 Jul 2008 10:57:57 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: How to get Perl CGI working on Mac OS 10.5?
Message-Id: <g586v501loh@news4.newsguy.com>

Charlton Wilbur wrote:
>>>>>> "szr" == szr  <szrRE@szromanMO.comVE> writes:
>
>    szr> <side question> I never understood why that news group
>    szr> [alt.apache.configuration] wasn't created up the 'comp.*'
>    szr> hierarchy and why it was made under 'alt.*', was there any
>    szr> particular reason for this?  </side question>
>
> Most likely, because there's a lengthy and involved process for
> creating groups under comp.*, but creating an alt.* group only
> requires a single sysadmin issuing a command.

I am aware of that, but it didn't seem to be much of a problem for the 
hundreds of programming, database, web, and other such groups that are 
well established. It just always felt odd that a group for a major open 
source software like Apache ended up in the 'alt' realm. Oh well.

-- 
szr 




------------------------------

Date: Fri, 11 Jul 2008 06:54:24 -0700 (PDT)
From: worldcyclist@gmail.com
Subject: Re: Question about Encode (Windows-1252 to utf-8)
Message-Id: <ae41dcaf-e31d-4c33-bc83-704045925730@l42g2000hsc.googlegroups.com>

On Jul 9, 11:34=A0am, J=FCrgen Exner <jurge...@hotmail.com> wrote:
> williams.wil...@gmail.com wrote:
> >My quandry is that now I need to tackle multiple files in a directory
> >and another developer mentioned that if "UTF-8" and "Windows-1252" are
> >intermixed in a file that it may get confused and I should do a
> >transliteration like..
>
> Unless the file format supports multiple encodings within the same file
> (like e.g. a MIME email) a file can have only one encoding.
>
> >tr/\x93/\N{LEFT DOUBLE QUOTATION MARK}/;
>
> Nuts!
>
> >I am impressed with Encode but any advice or words that anyone wants
> >to throw in would be greatly appreciated.
>
> The only way to survive the encoding nightmare and stay sane is to
> standardize _ALL_ your data on _ONE SINGLE_ encoding. I strongly
> recommend UTF-8, but that's up to you.
> Any conversion between this standard format and other formats happens
> (if at all) =A0_ONLY_ for user interaction, e.g. to support legacy email
> clients which don't support UTF-8 or accept input from a web page in ISO
> 8859-15 or even Greek, Arabic or Chinese or similar tasks. Of course, if
> at all possible even this user interaction should use the agreed-upon
> standard.
>
> jue
> (with a decade of internationalizing and localizing software)

I have seen this before with other CMSs where someone types something
and then cuts
and pastes from Word and then the data is mixed when stored in MySQL.
MySQL doesn't care what you have it encoded in, but the
problem comes when automated routines create XML files that are then
stored with mixed
encoding (CMS data stored into MySQL, another routine generates static
XML files from the faulty data for usage by other places).

Certainly makes the point that the data needs to be validated before
going into the db, but I can
feel the poster's pain regarding this issue.

Maybe specifying your IN and OUT filehandles as ':bytes' would help
(to preserve data and inhibit automated encoding
that may result in unexpected changed to your already formatted
UTF-8).
Once you read in then use the transliteration method you described
before to change things. I'm not a huge fan of using that
method either but that's the way it was done not too many years ago.

I'd like to see other suggestions on this one too.
JC


------------------------------

Date: Fri, 11 Jul 2008 06:53:39 -0700 (PDT)
From: worldcyclist@gmail.com
Subject: Re: Question regarding Encode
Message-Id: <c1380803-0097-4ae2-b783-3f352e5dab6b@r66g2000hsg.googlegroups.com>


> > Maybe I need some kind of check to see if a file is encoded a certain
> > way before figuring out how to jump into it. I can't ever remember using
> > Encode before and now we need it on a massive scope.
>
> There are some heuristic algorithms to do just that, but to be honest I
> would assume all data is in the same encoding unless you have proof
> otherwise. If it isn't, your CMS *REALLY* screwed up.

I have seen this before with other CMSs where someone types something
and then cuts
and pastes from Word and then the data is mixed when stored in MySQL.
MySQL doesn't care what you have it encoded in, but the
problem comes when automated routines create XML files that are then
stored with mixed
encoding (CMS data stored into MySQL, another routine generates static
XML files from the faulty data for usage by other places).

Certainly makes the point that the data needs to be validated before
going into the db, but I can
feel the poster's pain regarding this issue.

Maybe specifying your IN and OUT filehandles as ':bytes' would help
(to preserve data and inhibit automated encoding
that may result in unexpected changed to your already formatted
UTF-8).
Once you read in then use the transliteration method you described
before to change things. I'm not a huge fan of using that
method either but that's the way it was done not too many years ago.

I'd like to see other suggestions on this one too.
JC



------------------------------

Date: Fri, 11 Jul 2008 16:12:52 +0200
From: Leon Timmermans <fawaka@gmail.com>
Subject: Re: Question regarding Encode
Message-Id: <1921f$48776a64$89e0e08f$2682@news1.tudelft.nl>

On Fri, 11 Jul 2008 06:53:39 -0700, worldcyclist wrote:

> 
> I have seen this before with other CMSs where someone types something
> and then cuts
> and pastes from Word and then the data is mixed when stored in MySQL.
> MySQL doesn't care what you have it encoded in

Actually that's not true. MySQL has excellent support for various 
encodings and collations. See chapter 9 of the MySQL reference manual for 
more information on that. Most programmers don't seem to use it though.

> Certainly makes the point that the data needs to be validated before
> going into the db, but I can feel the poster's pain regarding this 
> issue.
 
Full agreement there.

Leon Timmermans


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1711
***************************************


home help back first fref pref prev next nref lref last post