[18070] in Perl-Users-Digest
Perl-Users Digest, Issue: 230 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Feb 7 06:10:39 2001
Date: Wed, 7 Feb 2001 03:10:15 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <981544215-v10-i230@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Wed, 7 Feb 2001 Volume: 10 Number: 230
Today's topics:
Re: Pattern Matching <miguenther@lucent.com>
Re: Pattern Matching (Bernard El-Hagin)
Re: Pattern Matching (Leonhard Pang)
Re: Perl / Java and checkbox <ron@savage.net.au>
Re: PLEASE HELP A NEWBIE <shutupsteve@aNwOdSaPnAgM.com>
Re: PLEASE HELP A NEWBIE <ashley@pcraft.com>
Re: PLEASE HELP A NEWBIE <uri@sysarch.com>
printing to a specific line and char position <sshewmaker@transy.edu>
Re: printing to a specific line and char position <miguenther@lucent.com>
Re: printing to a specific line and char position (Philip Lees)
Re: Radical readdir suggestion <ldo@geek-central.gen.new_zealand>
Re: Radical readdir suggestion <ldo@geek-central.gen.new_zealand>
Re: Radical readdir suggestion <uri@sysarch.com>
Re: Radical readdir suggestion (Martien Verbruggen)
Re: Radical readdir suggestion <iltzu@sci.invalid>
regexp for splitting up file paths? <chris.burn@bigfoot.com>
Stream? Pipe? Socket? <s997659@ee.cuhk.edu.hk>
Re: Stream? Pipe? Socket? <hafateltec@hotmail.com>
Re: Stream? Pipe? Socket? <krahnj@acm.org>
Re: Stream? Pipe? Socket? <uri@sysarch.com>
Threading in Perl <andrew_ralph@yahoo.com>
Re: XML::Parser question <bart.lateur@skynet.be>
Re: XML::Parser question (Martien Verbruggen)
Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 7 Feb 2001 09:29:49 +0100
From: "Michael Guenther" <miguenther@lucent.com>
Subject: Re: Pattern Matching
Message-Id: <95r12q$ing@nntpb.cb.lucent.com>
Hi
I have a solutionb but it has som bad side effects.
For the you can remove all \n \r \t \f (means \s) and " " ( spaces) out of
your text and your match string, and the
match it. That will work but the I way I do it yo destroy your original text
you can save in a deffint variable but I think there must be a better way.
Try
####################
$text = "this was the
text he want
to
match using
as few lines of
code as
possible";
$match ="usingas few lines";
print "DEBUG org Text->$text<-GUBED\n";
$text =~s/[\s ]//g;
$match=~s/[\s ]//g;
print "DEBUG Text->$text<-GUBED\n";
print "DEBUG Match->$match<-GUBED\n";
if ($text =~/($match)/){
print "DEBUG got it->$1<-GUBED\n";
};
############
if there is a better solution ping me
Thanks
Michael
------------------------------
Date: Wed, 7 Feb 2001 09:06:04 +0000 (UTC)
From: bernard.el-hagin@lido-tech.net (Bernard El-Hagin)
Subject: Re: Pattern Matching
Message-Id: <slrn9823vq.2uq.bernard.el-hagin@gdndev25.lido-tech>
On Wed, 07 Feb 2001 04:20:32 GMT, Charles Warner
<charleswarner@bigfoot.com> wrote:
>Hello,
>
>I have a pattern matching question that I am not sure about. Let's say I
>have a block of text in a variable $text:
>
>$text = qq~This is the
>text I want
>to
>
>match using
>as few lines of
>code
>as possible.~;
>
>I want to be able to extract the text "using as few lines" from the block.
>Of course the text won't actually be formatted that way. Do you have any
>suggetions on how to do this with pattern matching?
You're probably asking how to span several lines with a regex. If that's
the case read about the 's' modifier to the m// operator.
Cheers,
Bernard
--
#requires 5.6.0
perl -le'* = =[[`JAPH`]=>[q[Just another Perl hacker,]]];print @ { @ = [$ ?] }'
------------------------------
Date: 7 Feb 2001 09:37:54 GMT
From: lp@bluewin.ch (Leonhard Pang)
Subject: Re: Pattern Matching
Message-Id: <904168E1Clpbluewinch@127.0.0.1>
charleswarner@bigfoot.com (Charles Warner) wrote in <k24g6.7018$iM6.859258
@newsread1.prod.itd.earthlink.net>:
[snip]
>I have a pattern matching question that I am not sure about. Let's say I
>have a block of text in a variable $text:
>
>$text = qq~This is the
>text I want
>to
>
>match using
>as few lines of
>code
>as possible.~;
>
return true if ($text =~ /using\s*as\s*few\s*lines/sg);
-Leonhard
------------------------------
Date: Wed, 7 Feb 2001 22:47:30 +1100
From: "Ron Savage" <ron@savage.net.au>
Subject: Re: Perl / Java and checkbox
Message-Id: <MJ9g6.3179$sS4.114588@ozemail.com.au>
This will get you going. Of course, it's up to you how to handle the click
on the submit button...
-----><8-----
#!D:/Perl/bin/perl
use integer;
use strict;
use warnings;
use CGI;
use CGI::Carp;
# ---------------------------------------------------
my($dir_name) = 'D:/Temp';
opendir(INX, $dir_name) || die("Can't opendir($dir_name): $!");
my(@file) = grep{! /^\.\.?$/} readdir(INX);
closedir(INX);
my($q) = CGI -> new();
print $q -> header(),
$q -> start_html(),
$q -> start_form({action => $q -> url()}),
$q -> h1('Display checkbox per file name'),
# (map{$q -> checkbox({name => 'file', label => $_, checked => 0, force =>
1}) . $q -> br()} @file),
(map{$q -> checkbox({name => 'file', label => $_, checked =>
) . $q -> br()} @file),
$q -> submit(),
$q -> end_form(),
$q -> end_html();
-----><8-----
--
Cheers
Ron Savage
ron@savage.net.au
http://savage.net.au/index.html
[snip]
> List some filenames within a directory (the number of files is undefined)
> Put a "checkbox" next to each of these listed filenames
> Allow a user to tick as many of these boxes as required
> Delete all these files that have ticks in the checkboxes.
[snip]
------------------------------
Date: Wed, 07 Feb 2001 06:20:44 GMT
From: "Stephen Deken" <shutupsteve@aNwOdSaPnAgM.com>
Subject: Re: PLEASE HELP A NEWBIE
Message-Id: <%O5g6.722$gb1.75759@news4.aus1.giganews.com>
> So you've immediately contrdicted your subject. Not good.
At least it netted me a reply. That's more than I can say for the previous
subject, which is "POSIX::strftime() error under RH7, perl 5.6.0, POSIX
1.03". So much for being descriptive.
> I think you need to show some code. The following works
> fine for me with 5.6:
I did, with the last post, but I will again:
perl -MPOSIX -e 'print strftime( "%B", 0, 0, 0, 0, 0, 0 ) . "\n";'
...which, on three unrelated systems running perl 5.6.0, returns 'December'
rather than 'Janurary'. On those same systems, an installation of perl
5.005_03 returns 'Janurary'.
> Are you sure you're passing the right args to strftime()?
Positive. strftime( '%b', localtime( time ) ) returns Janurary, because the
date on the server is Feburary. The (extensive) script that I've built
works flawlessly under 5.005_03 on the broken systems.
--sjd;
------------------------------
Date: Wed, 07 Feb 2001 01:58:58 -0700
From: "Ashley M. Kirchner" <ashley@pcraft.com>
Subject: Re: PLEASE HELP A NEWBIE
Message-Id: <3A810E52.B186DCE6@pNcOrSaPfAtM.com>
I'm going to chime in real quick here:
System: Redhat 7.0, Perl 5.6.0:
> use POSIX;
> $d = strftime('%b %B', localtime);
> print "$d\n";
-> Feb February
However:
perl -MPOSIX -e 'print strftime( "%B", 0, 0, 0, 0, 0, 0 ) . "\n";'
-> December
AMK4
--
H | Hi, I'm currently out of my mind. Please leave a message. BEEEEP!
|____________________________________________________________________
------------------------------
Date: Wed, 07 Feb 2001 10:14:12 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: PLEASE HELP A NEWBIE
Message-Id: <x78znipzpn.fsf@home.sysarch.com>
perl -MPOSIX -e 'print strftime( "%B", 0, 0, 0, 0, 0, 0 ) . "\n";'
December
perl -v
This is perl, v5.6.0 built for sun4-solaris
perl -MPOSIX -e 'print strftime( "%B", 0, 0, 0, 0, 0, 0 ) . "\n";'
January
perl -v
This is perl, version 5.005_03 built for sun4-solaris
i would call that a bug.
uri
--
Uri Guttman --------- uri@sysarch.com ---------- http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page ----------- http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net ---------- http://www.northernlight.com
------------------------------
Date: Wed, 7 Feb 2001 00:14:10 -0500
From: "Valen1260" <sshewmaker@transy.edu>
Subject: printing to a specific line and char position
Message-Id: <95qlhb$h9n$1@news.uky.edu>
I remember, long ago in BASIC, that I could specify to what line and
character position I wanted to print a statement? Does Perl have anything
like this? I want a counter to run without scrolling the screen madly out
of view. In other words, I want the new count to overwrite the last.
Thanks in advance.
------------------------------
Date: Wed, 7 Feb 2001 09:38:21 +0100
From: "Michael Guenther" <miguenther@lucent.com>
Subject: Re: printing to a specific line and char position
Message-Id: <95r1iq$io2@nntpb.cb.lucent.com>
Hi
Maybe You can do it with the Curses Modul von CPAN
Michael
Valen1260 <sshewmaker@transy.edu> wrote in message
news:95qlhb$h9n$1@news.uky.edu...
> I remember, long ago in BASIC, that I could specify to what line and
> character position I wanted to print a statement? Does Perl have anything
> like this? I want a counter to run without scrolling the screen madly out
> of view. In other words, I want the new count to overwrite the last.
>
> Thanks in advance.
>
>
------------------------------
Date: Wed, 07 Feb 2001 09:03:45 GMT
From: pjlees@ics.forthcomingevents.gr (Philip Lees)
Subject: Re: printing to a specific line and char position
Message-Id: <3a810d7b.68316533@news.grnet.gr>
On Wed, 7 Feb 2001 00:14:10 -0500, "Valen1260" <sshewmaker@transy.edu>
wrote:
>I remember, long ago in BASIC, that I could specify to what line and
>character position I wanted to print a statement? Does Perl have anything
>like this? I want a counter to run without scrolling the screen madly out
>of view. In other words, I want the new count to overwrite the last.
print "Processing line $line_no of $total_lines\r";
works for me. The \r does a carriage return without a line feed.
(I know this doesn't answer your more general question. Maybe somebody
else will do that.)
Phil
--
Philip Lees
ICS-FORTH, Heraklion, Crete, Greece
Ignore coming events if you wish to send me e-mail
'The aim of high technology should be to simplify, not complicate' - Hans Christian von Baeyer
------------------------------
Date: Wed, 07 Feb 2001 21:19:04 +1300
From: Lawrence DčOliveiro <ldo@geek-central.gen.new_zealand>
Subject: Re: Radical readdir suggestion
Message-Id: <ldo-075172.21190407022001@news.wave.co.nz>
In article <3a7e6526.4cce$d8@news.op.net>, mjd@plover.com (Mark Jason
Dominus) wrote:
>In article <ldo-BD3CE2.16005405022001@news.wave.co.nz>,
>Lawrence DčOliveiro <ldo@geek-central.gen.new_zealand> wrote:
>>What is the use of readdir returning the "." and ".." entries?
>
>The Principle of Least Surprise.
In what way?
>>Has anybody ever written a Perl script that depended on these entries
>>being returned in order to work?
>
>Yes.
Such as...?
------------------------------
Date: Wed, 07 Feb 2001 21:21:33 +1300
From: Lawrence DčOliveiro <ldo@geek-central.gen.new_zealand>
Subject: Re: Radical readdir suggestion
Message-Id: <ldo-E8934A.21213307022001@news.wave.co.nz>
In article <slrn97s6gk.64e.mgjv@verbruggen.comdyn.com.au>,
mgjv@tradingpost.com.au wrote:
>On Mon, 05 Feb 2001 16:00:54 +1300,
> Lawrence DčOliveiro <ldo@geek-central.gen.new_zealand> wrote:
>>
>> What is the use of readdir returning the "." and ".." entries?
>
>Breaking the way readdir works, and
>making exceptiopns for 'special' entries is IMO a bad thing. . and ..
>are part of the way the file system works. They should be there.
There is a subtle distinction between how the file system is
_implemented_ and how it _works_. On UNIX, there is no OS call to return
the pathname of the current working directory (strange, but true).
That's why you need those "." and ".." entries in _every_
directory--they are an essential part of the mechanism for navigating
your way around the filesystem tree. They are not part of the
directory's _contents_ per se.
(Strange that you should choose to burden every directory on every UNIX
filesystem with the extra space needed to hold these special entries,
rather than just fix the oversight in the original design of the kernel,
but there you go...)
>> Note that readdir already works this way on platforms where directories
>> do not have such entries (eg MacOS).
>
>But readdir presumably still returns _all_ entries in the directory,
>or equivalent, right?
Correct. On MacOS, "." and ".." could indeed be valid names of files or
folders, and readdir should return them if present. They are not
"reserved" names in the way they are on UNIX.
However, now we need to draw a distinction between directory _entries_
and directory _contents_. Namely, in UNIX, the "." entry points at the
directory itself, while ".." points at its parent--thus, neither of
those is actually part of the contents of the directory itself. I
believe that _all_ real-world uses of readdir are concerned with
returning the contents of the directory, not with its entries. Thus, on
platforms where those entries are not part of the directory contents
(UNIX, DOS/Windows), they should not be returned.
Remember, the whole point of a cross-platform, higher-level scripting
language like Perl is precisely so you can write scripts that abstract
away from peculiarities of the hardware or OS implementation. Yes, you
can write a routine that skips "." and ".." on UNIX and DOS/Windows, but
processes them on MacOS (and I guess on VMS, where "." and ".." would in
fact be the same file :)). But Perl is supposed to save you work, not
add to it, right?
------------------------------
Date: Wed, 07 Feb 2001 10:20:48 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: Radical readdir suggestion
Message-Id: <x766impzen.fsf@home.sysarch.com>
>>>>> "LD" == Lawrence DčOliveiro <ldo@geek-central.gen.new_zealand> writes:
LD> There is a subtle distinction between how the file system is
LD> _implemented_ and how it _works_. On UNIX, there is no OS call to return
LD> the pathname of the current working directory (strange, but true).
LD> That's why you need those "." and ".." entries in _every_
LD> directory--they are an essential part of the mechanism for navigating
LD> your way around the filesystem tree. They are not part of the
LD> directory's _contents_ per se.
they are normal directory entries and are hard links to the current and
parent dirs.
LD> (Strange that you should choose to burden every directory on every UNIX
LD> filesystem with the extra space needed to hold these special entries,
LD> rather than just fix the oversight in the original design of the kernel,
LD> but there you go...)
no, you are mistaken. the whole concept of a unix filesystem was to
seperate the file name from its location. so when you are in some random
dir how could you find its parent dir? all you have in the process is an
inode of the dir you are in. you use the .. and . dir entries to
navigate.
LD> However, now we need to draw a distinction between directory _entries_
LD> and directory _contents_. Namely, in UNIX, the "." entry points at the
LD> directory itself, while ".." points at its parent--thus, neither of
LD> those is actually part of the contents of the directory itself. I
LD> believe that _all_ real-world uses of readdir are concerned with
LD> returning the contents of the directory, not with its entries. Thus, on
LD> platforms where those entries are not part of the directory contents
LD> (UNIX, DOS/Windows), they should not be returned.
wrong. . and .. are part of the directory contents. readdir is not meant
to have any sort of filtering. it just returns all of the entries in a
dir.
LD> Remember, the whole point of a cross-platform, higher-level scripting
LD> language like Perl is precisely so you can write scripts that abstract
LD> away from peculiarities of the hardware or OS implementation. Yes, you
LD> can write a routine that skips "." and ".." on UNIX and DOS/Windows, but
LD> processes them on MacOS (and I guess on VMS, where "." and ".." would in
LD> fact be the same file :)). But Perl is supposed to save you work, not
LD> add to it, right?
no, you write a simple wrapper to handle that. and never touch it
again. readdir is a low level call. deal with it.
uri
--
Uri Guttman --------- uri@sysarch.com ---------- http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page ----------- http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net ---------- http://www.northernlight.com
------------------------------
Date: Wed, 7 Feb 2001 21:45:13 +1100
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: Radical readdir suggestion
Message-Id: <slrn9829pp.vht.mgjv@martien.heliotrope.home>
On Wed, 07 Feb 2001 21:21:33 +1300,
Lawrence DčOliveiro <ldo@geek-central.gen.new_zealand> wrote:
> In article <slrn97s6gk.64e.mgjv@verbruggen.comdyn.com.au>,
> mgjv@tradingpost.com.au wrote:
>
>>On Mon, 05 Feb 2001 16:00:54 +1300,
>> Lawrence DčOliveiro <ldo@geek-central.gen.new_zealand> wrote:
>>>
>>> What is the use of readdir returning the "." and ".." entries?
>>
>>Breaking the way readdir works, and
>>making exceptiopns for 'special' entries is IMO a bad thing. . and ..
>>are part of the way the file system works. They should be there.
>
> There is a subtle distinction between how the file system is
> _implemented_ and how it _works_. On UNIX, there is no OS call to return
nitpicking :)
>>But readdir presumably still returns _all_ entries in the directory,
>>or equivalent, right?
>
> Correct. On MacOS, "." and ".." could indeed be valid names of files or
> folders, and readdir should return them if present. They are not
> "reserved" names in the way they are on UNIX.
>
> However, now we need to draw a distinction between directory _entries_
> and directory _contents_. Namely, in UNIX, the "." entry points at the
> directory itself, while ".." points at its parent--thus, neither of
> those is actually part of the contents of the directory itself. I
The file foo is also not part of the content of a directory. The name
foo points to a file on the file systems somewhere. The names . and ..
point to files on the file system somewhere. They are in that respect
not special at all. They get automatically created by a mkdir, and they
always point to directories, but that's pretty much where their
specialness, as far as how they are part of a directory, ends.
> believe that _all_ real-world uses of readdir are concerned with
> returning the contents of the directory, not with its entries. Thus, on
> platforms where those entries are not part of the directory contents
> (UNIX, DOS/Windows), they should not be returned.
In Unix, they are a smuch part of the directory as any other name. I
don't know about DOS/Windows, but I suspect it's very much the same
there.
> Remember, the whole point of a cross-platform, higher-level scripting
> language like Perl is precisely so you can write scripts that abstract
> away from peculiarities of the hardware or OS implementation. Yes, you
> can write a routine that skips "." and ".." on UNIX and DOS/Windows, but
> processes them on MacOS (and I guess on VMS, where "." and ".." would in
> fact be the same file :)). But Perl is supposed to save you work, not
> add to it, right?
But, it would be too much of a surprise to people who actually know how
the file system works, and who expect to see those entries there. On
Unix, . and .. are part of the directory. They always have been, and
they always will. They should be there.
Martien
--
Martien Verbruggen |
Interactive Media Division | You can't have everything, where
Commercial Dynamics Pty. Ltd. | would you put it?
NSW, Australia |
------------------------------
Date: 7 Feb 2001 10:54:29 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: Radical readdir suggestion
Message-Id: <981542616.20645@itz.pp.sci.fi>
In article <x766impzen.fsf@home.sysarch.com>, Uri Guttman wrote:
>
>no, you write a simple wrapper to handle that. and never touch it
>again.
..until someone ports perl to a new platform. Or just uses your
script on some platform you weren't familiar with.
> readdir is a low level call. deal with it.
Then it should've been called sysreaddir(). And we ought to have a
readdir() that makes the common thing easy by returning only the names
that point downwards in the directory tree.
The quickest solution would be to write a wrapper module and put it on
the CPAN. Maybe one already exists. But then we'd also have to make
perlfunc read: "Warning: Never use raw readdir() except for platform
specific code. Use Some::Module instead." And include that module in
the standard distribution.
--
Ilmari Karonen - http://www.sci.fi/~iltzu/
"The slushpile was, I gather, of record-breaking -- indeed, even floorboard-
breaking -- proportions." -- Charlie Stross in rec.arts.sf.composition
Please ignore Godzilla and its pseudonyms - do not feed the troll.
------------------------------
Date: Wed, 7 Feb 2001 11:00:28 -0000
From: "Chris Burn" <chris.burn@bigfoot.com>
Subject: regexp for splitting up file paths?
Message-Id: <95r9qh$1b9sj$1@news.wam.net>
Hi, does anyone know of a regexp that will break up a file path into the
following components?
so for filepath = /home/chris/test.0001.jpg
path = /home/chris/
filehead = test.
filenumber = 0001
fileext = .jpg
thanks for the help
chris burn
------------------------------
Date: Wed, 7 Feb 2001 13:04:38 +0800
From: Immortal Love <s997659@ee.cuhk.edu.hk>
Subject: Stream? Pipe? Socket?
Message-Id: <Pine.GSO.4.05.10102071303360.23605-100000@sparc53.ee.cuhk.edu.hk>
I would like to ask what they are and what is their difference? thx.
------------------------------
Date: Wed, 7 Feb 2001 16:06:38 +1000
From: "Mike McPherson" <hafateltec@hotmail.com>
Subject: Re: Stream? Pipe? Socket?
Message-Id: <95qon2$uit$1@brokaw.wa.com>
"Immortal Love" <s997659@ee.cuhk.edu.hk> wrote in message
news:Pine.GSO.4.05.10102071303360.23605-100000@sparc53.ee.cuhk.edu.hk...
> I would like to ask what they are and what is their difference? thx.
Try perldoc perlipc for starters.
:)
------------------------------
Date: Wed, 07 Feb 2001 06:29:11 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: Stream? Pipe? Socket?
Message-Id: <3A80EC3E.DF656066@acm.org>
Immortal Love wrote:
>
> I would like to ask what they are and what is their difference? thx.
Get the book "Advanced Programming in the UNIX Environment" by W.
Richard Stevens and all will be explained.
John
------------------------------
Date: Wed, 07 Feb 2001 09:35:23 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: Stream? Pipe? Socket?
Message-Id: <x7d7cuq1ic.fsf@home.sysarch.com>
IL> I would like to ask what they are and what is their difference? thx.
and your perl question is?
--
Uri Guttman --------- uri@sysarch.com ---------- http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page ----------- http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net ---------- http://www.northernlight.com
------------------------------
Date: Wed, 7 Feb 2001 10:10:41 -0000
From: "Andrew Ralph" <andrew_ralph@yahoo.com>
Subject: Threading in Perl
Message-Id: <Sa9g6.67$WE.856@news.uk.colt.net>
Hi,
I "inherited" some code that uses unix forking and I now need to port this
to windows.
Is there any threading support for perl in windows ? I seem to remember
looking a while ago and I couldn't really seem to find anything. Does
anyone know of something that I could use ?
Thanks,
Andy.
------------------------------
Date: Wed, 07 Feb 2001 09:16:21 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: XML::Parser question
Message-Id: <b7328tg1s1h6ci33a80lk1v1h944iu593k@4ax.com>
kumar22@my-deja.com wrote:
>I'm wondering if anyone who's used it can tell me how to use the
>option "style=>'tree'" to get some sort of data structure that can be
>easily worked with.
If you ask me, (and I think you do ;-), style Tree is crap. IMO. It
returns TWO items for each element: the tag name, and the attributes
plus the contents. Just use Data::Dumper, and dump the tree that
XML::Parser returns. So you'd have to process two elements at a time,
which isn't exactly handy. I think that it would be a lot handier, if it
returned just ONE item per element.
So I wrote a variation that I find a lot handier, and which you might
like, too. For a lack of a better name, I called it CompactTree (because
one item per element is compacter than two).
Here's the implementation. Put it in a file "CompactTree.pm" in the
directory XML/Parser, next to "Expat.pm" and do "use
XML::Parser::CompactTree" in your script, apart from the (style =>
'XML::Parser::CompactTree') attribute.
package XML::Parser::CompactTree;
sub Init {
my $expat = shift;
$expat->{Stack} = [];
$expat->{TOS} = $expat->{Tree} = [];
}
sub Start {
my $expat = shift;
my $tag = shift;
my $newlist = [ $tag => { @_ } ];
push @{ $expat->{Stack} }, $expat->{TOS};
push @{ $expat->{TOS} }, $newlist;
$expat->{TOS} = $newlist;
}
sub End {
my $expat = shift;
my $tag = shift;
$expat->{TOS} = pop @{ $expat->{Stack} };
}
sub Char {
my $expat = shift;
my $text = shift;
$text =~ tr/\012/\n/; # Mac
my $clist = $expat->{TOS};
unless (ref $clist->[-1]) {
$clist->[-1] .= $text;
} else {
push @$clist, $text;
}
}
sub Final {
my $expat = shift;
delete $expat->{Stack};
delete $expat->{TOS};
return delete $expat->{Tree};
}
1;
__END__
=head1 Name
XML::Parser::CompactTree
=head1 Description
This module can be used as an alternative, similar to XML::Parser::Tree,
but this version returns a tree with one scalar per XML element,
instead of two.
Each of those scalars is either a simple scalar, for plain text, or an
array ref, for a tagged XML element. The format of such an element is:
[ $tag, { %attributes }, @contents ]
=head1 Example
use XML::Parser;
require XML::Parser::CompactTree;
my $parser = new XML::Parser(Style => 'XML::Parser::CompactTree');
# assuming $xml is properly set:
my $tree = $parser->parse($xml);
use Data::Dumper;
print Dumper($tree);
=head1 Author
Bart Lateur, <bart.lateur@skynet.be>, (P) december 1999
=cut
--
Bart.
------------------------------
Date: Wed, 7 Feb 2001 21:35:11 +1100
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: XML::Parser question
Message-Id: <slrn98296u.vht.mgjv@martien.heliotrope.home>
On Wed, 07 Feb 2001 01:41:58 GMT,
kumar22@my-deja.com <kumar22@my-deja.com> wrote:
> Martien asked:
>> Did you read the section of the manual page for XML::Parser with the
>> title 'Tree'?
>
> I did, actually. This section says:
>
> "Parse will return a parse tree for the document. Each node in the tree
> takes the form of a tag, content pair. Text nodes are represented with
> a pseudo-tag of ``0'' and the string that is their content. For
> elements, the content is an array reference. The first item in the
> array is a (possibly empty) hash reference containing attributes. The
> remainder of the array is a sequence of tag-content pairs representing
> the content of the element."
And it continues with an example structure.
> OK, cool. Now, what's a good way to dereference everything, and wind
> up with an array that contains other arrays and hashes and scalars
> rather than references to them?
Huh? In Perl you can't have hashes or arrays contained in other arrays.
Complex data structures are always built up with references to hashes
and arrays. That's how it works. The perllol and perldsc documentation
talks quite extensively about how to build and use these structures. The
perlref documentation talks more generally about references and
dereferencing.
Here's a little example that recreates the XML from source, more or
less.
#!/usr/local/bin/perl -w
use strict;
use XML::Parser;
my $parser = XML::Parser->new(Style => 'Tree');
my $tree = $parser->parse(\*DATA);
print_node(@$tree);
sub print_node
{
my ($name, $content) = @_;
my $args = shift @$content;
print "<$name";
print qq/ $_="$args->{$_}"/ for keys %$args;
print ">";
while (@$content)
{
my $child_name = shift @$content;
my $child_content = shift @$content;
$child_name eq "0" and print $child_content
or print_node($child_name, $child_content);
}
print "</$name>";
}
__DATA__
<foo>
<bar num="1">Some text</bar>
<bar num="2" bagarph = "boo"/>
<bar num="3"><baz>Some text <in>in</in> baz</baz></bar>
</foo>
You'll have to consult the perlref, perllol and perldata documentation
if you don't understand the dereferences in there.
Martien
--
Martien Verbruggen |
Interactive Media Division | In a world without fences, who needs
Commercial Dynamics Pty. Ltd. | Gates?
NSW, Australia |
------------------------------
Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 230
**************************************