[24824] in Perl-Users-Digest
Perl-Users Digest, Issue: 6975 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Sep 8 18:07:11 2004
Date: Wed, 8 Sep 2004 15:05:13 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 8 Sep 2004 Volume: 10 Number: 6975
Today's topics:
Re: create connection from html form into insert MySQL <1usa@llenroc.ude.invalid>
DBD::ODBC,SQL Server,brackets - escape? <wunkalunka@elvis.com>
Re: DBD::ODBC,SQL Server,brackets - escape? <emschwar@pobox.com>
Efficient Data Storage <aaron@deloachcorp.com>
Re: Efficient Data Storage <tadmc@augustmail.com>
Re: Efficient Data Storage <uri@stemsystems.com>
Re: Efficient Data Storage <spamtrap@dot-app.org>
Finding date 1 day earlier than a given date! (Edward)
Re: Finding date 1 day earlier than a given date! <mritty@gmail.com>
Re: Finding date 1 day earlier than a given date! <tadmc@augustmail.com>
Re: Finding date 1 day earlier than a given date! <noreply@gunnar.cc>
geoip is garbage under load (elia Mazzawi)
Re: how many days ago is 2003-07-20 ? (M.J.T. Guy)
Newbie: Parsing help <lepi_MAKNI_ME_@fly.srk.fer.hr>
Re: Newbie: Parsing help <miknrene@drizzle.com>
parsing UTF-8 chars out of POST data (Aaron Anodide)
Re: parsing XML using a regular expression <leifwessman@hotmail.com>
Re: parsing XML using a regular expression <HelgiBriem_1@hotmail.com>
Re: parsing XML using a regular expression <tadmc@augustmail.com>
Re: parsing XML using a regular expression <tadmc@augustmail.com>
Re: parsing XML using a regular expression <noreply@gunnar.cc>
Re: parsing XML using a regular expression <1usa@llenroc.ude.invalid>
Re: parsing XML using a regular expression <tadmc@augustmail.com>
Re: parsing XML using a regular expression <jerf@jerf.org>
Re: parsing XML using a regular expression <matternc@comcast.net>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 8 Sep 2004 16:22:32 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: create connection from html form into insert MySQL data script
Message-Id: <Xns955E7DE42DC26asu1cornelledu@132.236.56.8>
Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in news:2q8j87Fm4k1tU1@uni-
berlin.de:
> PHP2 wrote:
>> I still have not any advice.. only some buuuh messages
>
> Liar!
>
> http://www.mail-archive.com/beginners%40perl.org/msg61543.html
>
Gunnar, you're good!
Sinan.
------------------------------
Date: Wed, 08 Sep 2004 17:51:18 GMT
From: Derf <wunkalunka@elvis.com>
Subject: DBD::ODBC,SQL Server,brackets - escape?
Message-Id: <Xns955E8318B9094wunkalunkaelviscom@24.93.44.119>
I have inheritied a database where they decided to put the column names
of the tables in brackets (i.e. [Machine Serial No],[Machine Model No],
etc). I've never worked with this before and somehow my SQL query is not
sending the brackets:
update MachineData
set [Machine Model No] = '$model'
DBD::ODBC::st execute failed: [Microsoft][ODBC SQL Server Driver][SQL
Server]Line 2: Incorrect syntax near 'Machine Model No'.
I tried:
update MachineData
set \[Machine Model No\] = '$model'
and got the same error, so I'm wondering if there is a different escape
or some other simple helper I'm missing?
Thanks for the help
Derf
------------------------------
Date: Wed, 08 Sep 2004 12:05:38 -0600
From: Eric Schwartz <emschwar@pobox.com>
Subject: Re: DBD::ODBC,SQL Server,brackets - escape?
Message-Id: <etopt4wprf1.fsf@wilson.emschwar>
Derf <wunkalunka@elvis.com> writes:
> I have inheritied a database where they decided to put the column names
> of the tables in brackets (i.e. [Machine Serial No],[Machine Model No],
> etc). I've never worked with this before and somehow my SQL query is not
> sending the brackets:
<snip>
Can you show your Perl code? Showing the SQL by itself doesn't help,
but if we can see how you're sending it to the DB (or at least how
you're trying to send it), we can suggest a better way.
> update MachineData
> set [Machine Model No] = '$model'
Look at using DBI's prepare/execute pair. You are use DBI, aren't
you? You didn't post any Perl code, so I can't tell. Also, you'll
probably want to use a placeholder for that query, or else you have to
worry about what to do if and when model numbers gain punctuation.
-=Eric
--
Come to think of it, there are already a million monkeys on a million
typewriters, and Usenet is NOTHING like Shakespeare.
-- Blair Houghton.
------------------------------
Date: Wed, 8 Sep 2004 10:53:08 -0500
From: "Aaron DeLoach" <aaron@deloachcorp.com>
Subject: Efficient Data Storage
Message-Id: <AP-dnUJ-JO5-t6LcRVn-rw@eatel.net>
Hello,
I have run into unfamiliar ground. Some guidance would be appreciated.
This project has grown from 1,000 or so users to over 50,000 users. The
project has been an overall success, so it's time to spend a little on the
investment. Currently, we are getting our own servers (in lieu of ISP shared
servers) setup with mod_perl and are revisiting a lot of the code to make
things more efficient. Hopefully, in a month or so we can make the switch.
At present, user records are stored each in a single file using the
Data::Dumper module and the whole project works through the %user = eval
<FILE> method. User files are stored in directories named after the first
two characters of the user ID to keep the directories smaller, in theory,
for quicker searching of files (?). The records are read/written throughout
the use of the program in the method described.
I don't know how much efficiency would be gained by using an alternate
storage method. Perhaps MySQL? None of us are very familiar with databases,
although it doesn't seem very hard. We are looking into storing the records
as binary files which seems promising, but would like some input on the data
storage/retrieval methods available before we do anything.
I should mention that the project was first written in Perl and will remain
that way. Some suggestions were to investigate a different language. But
that's out of the question for now. We would rather increase efficiency in
the Perl code. Servers will remain Linux/Apache.
Any thoughts?
------------------------------
Date: Wed, 8 Sep 2004 11:10:06 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Efficient Data Storage
Message-Id: <slrncjubmu.1is.tadmc@magna.augustmail.com>
Aaron DeLoach <aaron@deloachcorp.com> wrote:
> This project has grown from 1,000 or so users to over 50,000 users.
> At present, user records are stored each in a single file using the
> Data::Dumper module
> I don't know how much efficiency would be gained by using an alternate
> storage method. Perhaps MySQL?
Some form of relational database would be an easy way to get
performance gains over a roll-your-own flat file approach.
I'd recommend postgreSQL over MySQL though.
> We are looking into storing the records
> as binary files which seems promising, but would like some input on the data
> storage/retrieval methods available before we do anything.
If you use an RDBMS you won't _need_ to do anything with regard
to storage and retrieval as the DB will handle all of that for you.
That wheel has been invented and heavily refined, just roll with it! :-)
> Any thoughts?
Use an RDBMS.
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Wed, 08 Sep 2004 16:42:49 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Efficient Data Storage
Message-Id: <x7hdq890h8.fsf@mail.sysarch.com>
>>>>> "AD" == Aaron DeLoach <aaron@deloachcorp.com> writes:
AD> At present, user records are stored each in a single file using
AD> the Data::Dumper module and the whole project works through the
AD> %user = eval <FILE> method. User files are stored in directories
AD> named after the first two characters of the user ID to keep the
AD> directories smaller, in theory, for quicker searching of files
AD> (?). The records are read/written throughout the use of the
AD> program in the method described.
as tad suggested a dbms would be a good idea if you want to migrate from
a flat file. but just using File::Slurp will get you some immediate
speedups over <FILE> with almost no code changes.
also changing from data::dumper to Storable will also speed things up
and also require minimal code changes. try those before you make the
leap to a dbms.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: Wed, 08 Sep 2004 16:36:02 -0400
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: Efficient Data Storage
Message-Id: <XoidnTYs-smu8KLcRVn-tg@adelphia.com>
Aaron DeLoach wrote:
> At present, user records are stored each in a single file using the
> Data::Dumper module and the whole project works through the %user = eval
> <FILE> method.
This suggests a minor tweak that could result in big gains under
mod_perl. Under traditional CGI, the file needs to be read and eval()'d
for each hit on the CGI.
Reducing the time it takes to read a user record is a good idea, but
with mod_perl you can also reduce the number of times a record is read.
You could take advantage of mod_perl's persistent environment here; keep
a hash of user records, and use an "Orcish Maneuver" to read and eval a
record only if the record you want is currently undef:
$users{$this_user} |= get_user($this_user);
The same sort of thing can be done for output templates, XSLT
transformer objects, and more. It's a very common technique for writing
mod_perl optimized code - Google for "Orcish Maneuver" for many examples.
There are naturally trade-offs to consider too. For example, if the file
has changed, the new data won't be read until the next time a new server
instance spawns. If your traffic is very high, and your server instances
have a lifetime measured in seconds, that may not be a problem. If not,
you might need a more involved conditional that also checks for the age
of the file, instead of the simplistic |= used above.
sherm--
--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
------------------------------
Date: 8 Sep 2004 09:07:32 -0700
From: egoduk@hotmail.com (Edward)
Subject: Finding date 1 day earlier than a given date!
Message-Id: <587a2724.0409080807.63eb2599@posting.google.com>
Hi all,
The below script works fine for all dates (change the value in $time),
except if the date is the 2nd of the month.
Given a date of 02/09/04 the script returns the date 32/08/04.
It's kind of the correct result in terms of number of days/months, but
should obviously read 01/09/04!
Anyone know how I can fix it??
Thanks,
Edward.
#!/path/to/perl
use POSIX;
$time="01/09/04";
my ($d, $m, $y) = split ('/', $time);
my $s = mktime (0, 0, 0, $d - 1, $m - 1, $y - 1900);
($d, $m, $y) = (localtime($s - 86400))[3..5];
$time = sprintf ('%02d/%02d/%04d', $d + 1, $m + 1, $y + 1900);
print "$time\n";
------------------------------
Date: Wed, 08 Sep 2004 16:40:27 GMT
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Finding date 1 day earlier than a given date!
Message-Id: <%BG%c.7349$AB6.4002@trndny04>
"Edward" <egoduk@hotmail.com> wrote in message
news:587a2724.0409080807.63eb2599@posting.google.com...
> The below script works fine for all dates (change the value in $time),
> except if the date is the 2nd of the month.
>
> Given a date of 02/09/04 the script returns the date 32/08/04.
No it doesn't. As written, it prints nothing at all. Partially because
you didn't enable warnings, and partially because you gave the time in a
2-digit format, but told mktime to expect it in a 4-digit format.
DO NOT RETYPE CODE. Always always always copy and paste what you
actually have into your post.
> It's kind of the correct result in terms of number of days/months, but
> should obviously read 01/09/04!
>
> Anyone know how I can fix it??
> #!/path/to/perl
use strict;
use warnings;
> use POSIX;
>
> $time="01/09/04";
my $time = '01/09/2004';
> my ($d, $m, $y) = split ('/', $time);
> my $s = mktime (0, 0, 0, $d - 1, $m - 1, $y - 1900);
This makes $s a timestamp to represent one day ago. So far so good...
> ($d, $m, $y) = (localtime($s - 86400))[3..5];
This subtracts another full day from the time [1]
> $time = sprintf ('%02d/%02d/%04d', $d + 1, $m + 1, $y + 1900);
This creates a time one day ahead of $s.
> print "$time\n";
So the question becomes - why are you subtracting two days and then
adding one. Granted, x -2 + 1 is always equal to x - 1, but wouldn't it
be easier to just subtract the one day and be done with it? This is, of
course, the cause of the problem you're seeing. You subtract two days
from Sept 2nd, leaving you with Aug 31, and then you manually add to the
day count, rather than having the time functions do the day count.
#!/usr/bin/perl
use strict;
use warnings;
use POSIX;
my $time="02/09/2004";
my ($d, $m, $y) = split ('/', $time);
my $s = mktime (0, 0, 0, $d - 1, $m - 1, $y - 1900);
($d, $m, $y) = (localtime($s))[3..5];
$time = sprintf ('%02d/%02d/%04d', $d, $m + 1, $y + 1900);
print "$time\n";
__END__
I suspect that your problem was in not understanding the $mday parameter
to mktime() and $mday return value from localtime(). Allow me to
suggest you reread
perldoc -f localtime
and
perldoc POSIX
Paul Lalli
[1] This is generally a bad idea, as not every day consists of exactly
38400 seconds. Daylight Savings Time must be accounted for. The
builtins you used account for it, your manual calculation of 60 x 60 x
24 does not.
------------------------------
Date: Wed, 8 Sep 2004 11:43:35 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Finding date 1 day earlier than a given date!
Message-Id: <slrncjudln.1lj.tadmc@magna.augustmail.com>
Edward <egoduk@hotmail.com> wrote:
> Given a date of 02/09/04 the script returns the date 32/08/04.
> #!/path/to/perl
You should always enable warnings when developing Perl code.
> use POSIX;
>
> $time="01/09/04";
> my ($d, $m, $y) = split ('/', $time);
> my $s = mktime (0, 0, 0, $d - 1, $m - 1, $y - 1900);
^^^^^^^^^
^^^^^^^^^ $y + 100
> ($d, $m, $y) = (localtime($s - 86400))[3..5];
> $time = sprintf ('%02d/%02d/%04d', $d + 1, $m + 1, $y + 1900);
^^^^^^
The day value from localtime() is not off-by-one.
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Wed, 08 Sep 2004 18:51:17 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Finding date 1 day earlier than a given date!
Message-Id: <2q8rfhFsr32aU1@uni-berlin.de>
Edward wrote:
> The below script works fine for all dates (change the value in
> $time), except if the date is the 2nd of the month.
No, it does not. It does not work for any date.
> Anyone know how I can fix it??
Read the docs for the function you are using and find out which
arguments it expects. Compare that with what you actually are passing
to the function.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: 8 Sep 2004 10:12:59 -0700
From: mazzawi@gmail.com (elia Mazzawi)
Subject: geoip is garbage under load
Message-Id: <518a5d0.0409080912.67c573f@posting.google.com>
i wrote this perl handler, it uses apache, mod_perl and geoip.
The handler uses geoip to to resolve the same ip with every apache
request to that handler. when geoip returns an unexpected result a
warning is logged in apache's error log.
geoip returns garbage and sometimes inconsistent results. this starts
and become more frequent as the load on the server increases and does
not happen under low load.
we were able to replicate the problem by hammering a server from 20
boxes using apache ab. apache was running with 50 or so processes.
an i doing something wrong or does geoip mess up under load
use strict;
use Geo::IP qw(GEOIP_STANDARD);
my $GC = Geo::IP->open("/opt/viper-2.0/lib/Viper/Geo/GeoIPCity-133.dat",
GEOIP_STANDARD );
my $GISP = Geo::IP->open("/opt/viper-2.0/lib/Viper/Geo/GeoIPISP-122.dat",
GEOIP_STANDARD );
my ( $RemoteAddr, $GeoRecord, $ISP, $country, $city, $state,
$postalCode, $areaCode);
my ( $lastISP, $lastcountry, $lastcity, $laststate, $lastpostalCode,
$lastareaCode) = ('Comspec Communications Inc', 'CA', 'Toronto', 'ON',
'm6b1p5' , 0);
sub handler {
#look up IP in geoIP
$RemoteAddr = '192.139.80.'. int(rand(255));
$GeoRecord = $GC->record_by_name($RemoteAddr);
$ISP = $GISP->org_by_name($RemoteAddr);
if ( $lastISP ne $ISP ){
warn "$ISP ne $lastISP\n";
}
if ($GeoRecord) {
$country = $GeoRecord->country_code;
if ( $lastcountry ne $country){
warn "$lastcountry ne $country\n";
}
$city = $GeoRecord->city;
if ($lastcity ne $city){
warn "$lastcity ne $city\n";
}
$state = $GeoRecord->region;
if ($laststate ne $state){
warn "$laststate ne $state\n";
}
$postalCode = $GeoRecord->postal_code;
if ($lastpostalCode ne $postalCode){
warn "$lastpostalCode ne $postalCode\n";
}
$areaCode = $GeoRecord->area_code;
if ($lastareaCode ne $areaCode){
warn "$lastareaCode ne $areaCode\n";
}
}
return 200;
}
1;
------------------------------
Date: 8 Sep 2004 17:01:27 GMT
From: mjtg@cus.cam.ac.uk (M.J.T. Guy)
Subject: Re: how many days ago is 2003-07-20 ?
Message-Id: <chndt7$nko$1@pegasus.csx.cam.ac.uk>
Anno Siegel <anno4000@lublin.zrz.tu-berlin.de> wrote:
>Do you have a simple, water-tight solution using only Time::Local?
>Note that in the presence of DST a day may have more or less than 24
>hours.
Easy - always do your calculations in GMT. Then DST issues never arise.
Mike Guy
------------------------------
Date: Wed, 08 Sep 2004 21:48:58 +0200
From: lepi <lepi_MAKNI_ME_@fly.srk.fer.hr>
Subject: Newbie: Parsing help
Message-Id: <chnpsp$fhh$1@bagan.srce.hr>
Hi,
I know very little about regex, but I think it can help me now.
If I have string like this:
$some="I have something here. Speed: 200 km/h. Color is white. OK!!";
I want to parse this string and get two new strings
$speed="200";
$color="white";
How can I do this???
Please explain given regex...
Thanks
------------------------------
Date: Wed, 08 Sep 2004 14:05:05 -0700
From: Michael Slass <miknrene@drizzle.com>
Subject: Re: Newbie: Parsing help
Message-Id: <m31xhcjwu6.fsf@eric.rossnet.com>
lepi <lepi_MAKNI_ME_@fly.srk.fer.hr> writes:
>Hi,
>
>I know very little about regex, but I think it can help me now.
>
>If I have string like this:
>
>$some="I have something here. Speed: 200 km/h. Color is white. OK!!";
>
>I want to parse this string and get two new strings
>
>$speed="200";
>$color="white";
>
>How can I do this???
try this:
my ($speed, $color) = ($some =~ m/Speed:\D*(\d+).*?Color is\W+(\w+)/g);
>Please explain given regex...
To understand why this works (or doesn't for you), you can read all
about perl regular expressions with:
perldoc perlre
--
Mike Slass
------------------------------
Date: 8 Sep 2004 10:00:22 -0700
From: anodide@hotmail.com (Aaron Anodide)
Subject: parsing UTF-8 chars out of POST data
Message-Id: <2db1147f.0409080900.272f1b96@posting.google.com>
Hello,
My question first: What is the correct way to deal with % signs in the
POST data?
Here's my situation - I have a cgi script recieving POST data:
PASS=hello%C2%A3
The %C2%A3 was generated by pressing ALT+156 (british pound sign).
Legacy code I'm using calls CGI::unescape to process the %'s, so in
this case it effectively calls (i don't know why is uses eval):
eval '$password = CGI::unescape($in[$i]);';
However, when this returs, length($password) = 7.
If I set a local variable to the same string:
$password1 = "hello£"; (this time using alt-156 directly in my
editor)
Then length($password1) = 6.
Then I call an external validation program, a C++ program compiled in
UNICODE:
system( "validate", $password );
It fails, because C2 and A3 appear as unique characters in argv[1].
BUT, if I call:
system( "validate", $password1 );
Then the program works.
Thanks in advance for anyone who takes the time to think about this
for me.
Aaron Anodide
------------------------------
Date: 8 Sep 2004 08:17:27 -0700
From: "Leif Wessman" <leifwessman@hotmail.com>
Subject: Re: parsing XML using a regular expression
Message-Id: <chn7q7$k53@odak26.prod.google.com>
Bernard El-Hagin wrote:
> "Leif Wessman" <leifwessman@hotmail.com> wrote:
>
> >
> > Bernard El-Hagin wrote:
> >> "Leif Wessman" <leifwessman@hotmail.com> wrote:
> >>
> >>
> >> This
> >>
> >>
> >> > I'm trying to parse some xml with a regular expression (yes, i
> >> > know that there is several XML modules that I can use).
> >>
> >>
> >> when put together with this
> >>
> >>
> >> > My problem is that I'm not that good in creating regular
> >> > expressions. [...]
> >>
> >>
> >> suggests using one of the modules you claim to know about.
> >
> > How can I then LEARN anything?
>
>
> LEARN to use the right tool for the job.
>
>
> --
> Cheers,
> Bernard
Bernard. Thanks for your comments.
Now, does anyone else have any suggestions for me? I would like to
parse a simple XML file as stated in my first posting. I would not like
to use an XML-parser.
Leif
------------------------------
Date: Wed, 08 Sep 2004 15:19:03 +0000
From: Helgi Briem <HelgiBriem_1@hotmail.com>
Subject: Re: parsing XML using a regular expression
Message-Id: <ql8uj0tbdakg3h0u0jkesm60o447m5tr7s@4ax.com>
On 8 Sep 2004 08:17:27 -0700, "Leif Wessman" <leifwessman@hotmail.com>
wrote:
>> >> > I'm trying to parse some xml with a regular expression (yes, i
>> >> > know that there is several XML modules that I can use).
>> > How can I then LEARN anything?
>> LEARN to use the right tool for the job.
>Bernard. Thanks for your comments.
>
>Now, does anyone else have any suggestions for me? I would like to
>parse a simple XML file as stated in my first posting. I would not like
>to use an XML-parser.
Go somewhere where they teach people the wrong
way to do things.
--
Helgi Briem hbriem AT simnet DOT is
Never worry about anything that you see on the news.
To get on the news it must be sufficiently rare
that your chances of being involved are negligible!
------------------------------
Date: Wed, 8 Sep 2004 11:03:43 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: parsing XML using a regular expression
Message-Id: <slrncjubav.1is.tadmc@magna.augustmail.com>
Leif Wessman <leifwessman@hotmail.com> wrote:
> I'm trying to parse some xml with a regular expression (yes, i know
> that there is several XML modules that I can use).
You have headed off the 2nd question.
The 1st question is: why do you want to do it with regular expressions
rather than with a real parse?
If you tell us the constraints that prompt your approach, that will
help us a lot for providing advice...
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Wed, 8 Sep 2004 11:05:03 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: parsing XML using a regular expression
Message-Id: <slrncjubdf.1is.tadmc@magna.augustmail.com>
Leif Wessman <leifwessman@hotmail.com> wrote:
> Bernard El-Hagin wrote:
>> "Leif Wessman" <leifwessman@hotmail.com> wrote:
>> > I'm trying to parse some xml with a regular expression
>> suggests using one of the modules you claim to know about.
> How can I then LEARN anything?
You did not say that this was merely a learning exercise.
If you don't say otherwise, folks will assume that this is
for a Real Program, and provide advice accordingly.
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Wed, 08 Sep 2004 18:22:16 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: parsing XML using a regular expression
Message-Id: <2q8pp4Fsqkh0U1@uni-berlin.de>
Leif Wessman wrote:
>
> while ($xml =~
> /<item>.*?<id>(.*?)<\/id>.*?(<name>(.*?)<\/name>)?.*?<\/item>/gs) {
---------------------------------------------------^^^
Using the ? quantifier right before .* is not a good idea. Use two
regexes:
while ( $xml =~ /<item>.*?<id>(.*?)<\/id>(.*?)<\/item>/gs ) {
print "id : $1\n";
if ( $2 and $2 =~ /<name>(.+?)<\/name>/ ) {
print "name: $1\n";
}
}
As regards efficiency, reading "quite large" files into memory should
be avoided.
local $/ = '<item>';
while (<XML>) {
if ( /<id>(.+)<\/id>/ ) {
print "id: $1";
if ( /<name>(.+)<\/name>/ ) {
print ", name: $1";
}
print "\n";
}
}
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: 8 Sep 2004 16:20:27 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: parsing XML using a regular expression
Message-Id: <Xns955E7D89EEF27asu1cornelledu@132.236.56.8>
"Leif Wessman" <leifwessman@hotmail.com> wrote in news:chn7q7$k53
@odak26.prod.google.com:
> Now, does anyone else have any suggestions for me? I would like to
> parse a simple XML file as stated in my first posting. I would not like
> to use an XML-parser.
Do it yourself?
Sinan.
------------------------------
Date: Wed, 8 Sep 2004 12:08:42 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: parsing XML using a regular expression
Message-Id: <slrncjuf4q.1sj.tadmc@magna.augustmail.com>
Leif Wessman <leifwessman@hotmail.com> wrote:
> I would not like
> to use an XML-parser.
Why not?
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Wed, 08 Sep 2004 21:09:23 GMT
From: Jeremy Bowers <jerf@jerf.org>
Subject: Re: parsing XML using a regular expression
Message-Id: <pan.2004.09.08.17.09.36.664754@jerf.org>
On Wed, 08 Sep 2004 07:15:46 -0700, Leif Wessman wrote:
> How can I then LEARN anything?
The only thing you'll learn by "parsing" (not really) XML with regular
expressions is either A: Why it is impossible (and even with RE
extensions that make it not really RE, extremely impractical), B: Some
really bad habits that only work on a horribly constrained subset of XML,
or, if you're *really* unlucky, C: Both.
------------------------------
Date: Wed, 08 Sep 2004 17:56:58 -0400
From: Chris Mattern <matternc@comcast.net>
Subject: Re: parsing XML using a regular expression
Message-Id: <ncmdnZ2FovG3HaLcRVn-uQ@comcast.com>
Leif Wessman wrote:
>
> Bernard El-Hagin wrote:
>> "Leif Wessman" <leifwessman@hotmail.com> wrote:
>>
>>
>> This
>>
>>
>> > I'm trying to parse some xml with a regular expression (yes, i
>> > know that there is several XML modules that I can use).
>>
>>
>> when put together with this
>>
>>
>> > My problem is that I'm not that good in creating regular
>> > expressions. [...]
>>
>>
>> suggests using one of the modules you claim to know about.
>>
>>
>> --
>> Cheers,
>> Bernard
>
> How can I then LEARN anything?
>
Then I would suggest reading one of the XML modules. You CANNOT
properly parse XML with regular expressions. You will only buy
yourself grief if you try and the primary lesson you will learn
is "don't use REs to parse XML". You might learn a little about
REs in the process, but you could learn better by using REs to
do something REs are suited to. Parsing XML is nontrivial.
Reading and understanding the XML modules is the best way to
see how its done.
--
Christopher Mattern
"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6975
***************************************