[28842] in Perl-Users-Digest
Perl-Users Digest, Issue: 86 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jan 29 18:06:26 2007
Date: Mon, 29 Jan 2007 15:05:11 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 29 Jan 2007 Volume: 11 Number: 86
Today's topics:
Downloading lots and lots and lots of files <coolneo@yahoo.com>
Re: Downloading lots and lots and lots of files <yankeeinexile@gmail.com>
Re: Downloading lots and lots and lots of files <purlgurl@purlgurl.net>
Re: Downloading lots and lots and lots of files <coolneo@yahoo.com>
Re: Downloading lots and lots and lots of files <abigail@abigail.be>
Re: Downloading lots and lots and lots of files <Peter@PSDT.com>
Re: Downloading lots and lots and lots of files <tzz@lifelogs.com>
Re: Downloading lots and lots and lots of files xhoster@gmail.com
Re: Downloading lots and lots and lots of files <tzz@lifelogs.com>
Re: Downloading lots and lots and lots of files <greg.ferguson@icrossing.com>
Re: Downloading lots and lots and lots of files <coolneo@yahoo.com>
Re: Downloading lots and lots and lots of files <rvtol+news@isolution.nl>
Re: Downloading lots and lots and lots of files <tzz@lifelogs.com>
Re: Downloading lots and lots and lots of files <bik.mido@tiscalinet.it>
French Accents appear incorrectly... nigel@bouteyres.com
Re: French Accents appear incorrectly... <greg.ferguson@icrossing.com>
html tags and perl <ysmay13@_NO_SPAM_poczta.fm>
Re: html tags and perl <abigail@abigail.be>
Re: html tags and perl <bik.mido@tiscalinet.it>
Re: html tags and perl <ysmay13@_NO_SPAM_poczta.fm>
Re: html tags and perl <ysmay13@_NO_SPAM_poczta.fm>
Re: html tags and perl krakle@visto.com
Re: html tags and perl <ysmay13@_NO_SPAM_poczta.fm>
Re: HTML::TokeParser (NOSPAM)
Re: socket and fork <m@remove.this.part.rtij.nl>
Re: Style questions <cbigam@somewhereelse.nucleus.com>
Re: Style questions <bik.mido@tiscalinet.it>
subpattern reference using vaiable subpattern index <iler.ml@gmail.com>
Re: subpattern reference using vaiable subpattern index <mritty@gmail.com>
Tar on Windows XP <david@cs.cf.ac.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 29 Jan 2007 06:44:02 -0800
From: "coolneo" <coolneo@yahoo.com>
Subject: Downloading lots and lots and lots of files
Message-Id: <1170081842.925051.117310@m58g2000cwm.googlegroups.com>
First, what I am doing is legit... I'm NOT trying to grab someone
elses content. I work for a non-profit organization and we have
something going on with Google where they are providing digitized
versions of our material. They (Google) provided some information on
howto write a script (shell) to download the digitized version using
wget.
There are about 50,000 items, raning in size from 15MB-600MB. My
script downloads them fine, but it would be much faster if i could
multi-thread(?) it. I'm running the wget using the sys command on a
windows box (i know, i know, but the whole place is windows so I don't
have much of a choice).
Am I on the right track? Or should I be doing this differently?
Thanks!
J
------------------------------
Date: 29 Jan 2007 09:00:32 -0600
From: Lawrence Statton XE2/N1GAK <yankeeinexile@gmail.com>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <87ps8x27zj.fsf@gmail.com>
"coolneo" <coolneo@yahoo.com> writes:
>
> There are about 50,000 items, raning in size from 15MB-600MB. My
> script downloads them fine, but it would be much faster if i could
> multi-thread(?) it.
Moving 5 Terabytes of data is going to take a long, long time no
matter how many threads you throw at the job. If you had 50,000 files
of a few kilobytes each, then you *might* see some improvement because
of the overhead of setting up and tearing down connections, but with
larger files you're most likely network bound.
>
> Am I on the right track? Or should I be doing this differently?
>
Never underestimate the bandwidth of a station wagon filled with
magtape.
Or, updating for the 21st century: An SUV with a box of DVDs.
--
Lawrence Statton - lawrenabae@abaluon.abaom s/aba/c/g
Computer software consists of only two components: ones and
zeros, in roughly equal proportions. All that is required is to
place them into the correct order.
------------------------------
Date: Mon, 29 Jan 2007 07:04:52 -0800
From: Purl Gurl <purlgurl@purlgurl.net>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <45BE0D14.5030607@purlgurl.net>
coolneo wrote:
> There are about 50,000 items, raning in size from 15MB-600MB. My
> script downloads them fine, but it would be much faster if i could
> multi-thread(?) it.
You indicate you have already downloaded those files.
Why do you want to download those files again?
Purl Gurl
------------------------------
Date: 29 Jan 2007 07:25:04 -0800
From: "coolneo" <coolneo@yahoo.com>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <1170084302.265948.317600@j27g2000cwj.googlegroups.com>
On Jan 29, 10:04 am, Purl Gurl <purlg...@purlgurl.net> wrote:
> coolneo wrote:
> > There are about 50,000 items, raning in size from 15MB-600MB. My
> > script downloads them fine, but it would be much faster if i could
> > multi-thread(?) it.You indicate you have already downloaded those files.
>
> Why do you want to download those files again?
>
> Purl Gurl
I managed to download about 21,000 of the 50,000 items over the course
of some time. Initally, Google was processing these items at a slow
rate but lately they have picked it up.
Bandwidth is indeed a concern, and I understand downloading 5TB will
take a long long time, but I think it would be a little shorter if I
could spawn off 4 downloads at a time, or even 2, during our off
business hours and the weekend (I get . The average file size is
125MB. We have a 200mb pipe, so it's not entirely unreasonable (is
it?).
------------------------------
Date: 29 Jan 2007 15:26:01 GMT
From: Abigail <abigail@abigail.be>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <slrners4g2.tcf.abigail@alexandra.abigail.be>
coolneo (coolneo@yahoo.com) wrote on MMMMDCCCXCIX September MCMXCIII in
<URL:news:1170081842.925051.117310@m58g2000cwm.googlegroups.com>:
== First, what I am doing is legit... I'm NOT trying to grab someone
== elses content. I work for a non-profit organization and we have
== something going on with Google where they are providing digitized
== versions of our material. They (Google) provided some information on
== howto write a script (shell) to download the digitized version using
== wget.
==
== There are about 50,000 items, raning in size from 15MB-600MB. My
== script downloads them fine, but it would be much faster if i could
== multi-thread(?) it. I'm running the wget using the sys command on a
== windows box (i know, i know, but the whole place is windows so I don't
== have much of a choice).
==
== Am I on the right track? Or should I be doing this differently?
Before you do anything, first check with google if they allow multiple
connection, and if they do, how many multiple connection you may start.
It won't do you much good to start 100 downloads in parallel if google
holds up 95 of them.
Of course, it's quite likely that the network is the bottleneck.
Starting up many simultaneous connections isn't going to help in
that case.
Finally, I wouldn't use threads. I'd either fork() or use a select()
loop, depending on the details of the work that needs to be done.
But then, I'm a Unix person.
Abigail
--
A perl rose: perl -e '@}-`-,-`-%-'
------------------------------
Date: Mon, 29 Jan 2007 15:42:10 GMT
From: Peter Scott <Peter@PSDT.com>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <pan.2007.01.29.15.42.08.850665@PSDT.com>
On Mon, 29 Jan 2007 06:44:02 -0800, coolneo wrote:
> First, what I am doing is legit... I'm NOT trying to grab someone
> elses content. I work for a non-profit organization and we have
> something going on with Google where they are providing digitized
> versions of our material. They (Google) provided some information on
> howto write a script (shell) to download the digitized version using
> wget.
>
> There are about 50,000 items, ranging in size from 15MB-600MB. My
> script downloads them fine, but it would be much faster if i could
> multi-thread(?) it. I'm running the wget using the sys command on a
> windows box (i know, i know, but the whole place is windows so I don't
> have much of a choice).
You could try
http://search.cpan.org/~marclang/ParallelUserAgent-2.57/lib/LWP/Parallel.pm
Looks like you'll need Cygwin.
--
Peter Scott
http://www.perlmedic.com/
http://www.perldebugged.com/
------------------------------
Date: Mon, 29 Jan 2007 12:20:43 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <g69fy9tlpg4.fsf@dhcp-65-162.kendall.corp.akamai.com>
On 29 Jan 2007, coolneo@yahoo.com wrote:
> I managed to download about 21,000 of the 50,000 items over the course
> of some time. Initally, Google was processing these items at a slow
> rate but lately they have picked it up.
> Bandwidth is indeed a concern, and I understand downloading 5TB will
> take a long long time, but I think it would be a little shorter if I
> could spawn off 4 downloads at a time, or even 2, during our off
> business hours and the weekend (I get . The average file size is
> 125MB. We have a 200mb pipe, so it's not entirely unreasonable (is
> it?).
You should contact Google and request the data directly. I guarantee
you they will be happy to avoid the load on their network and
servers, since HTTP is not the best way to transfer lots of data.
Ted
------------------------------
Date: 29 Jan 2007 17:22:48 GMT
From: xhoster@gmail.com
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <20070129122431.198$aL@newsreader.com>
Abigail <abigail@abigail.be> wrote:
>
> Of course, it's quite likely that the network is the bottleneck.
> Starting up many simultaneous connections isn't going to help in
> that case.
>
> Finally, I wouldn't use threads. I'd either fork() or use a select()
> loop, depending on the details of the work that needs to be done.
> But then, I'm a Unix person.
I probably wouldn't even use fork. I'd just make 3 (or 4, or 10, whatever)
different to do lists, and start up 3 (or 4, or 10) completely independent
programs from the command line.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
------------------------------
Date: Mon, 29 Jan 2007 12:25:40 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <g69bqkhlp7v.fsf@dhcp-65-162.kendall.corp.akamai.com>
On 29 Jan 2007, abigail@abigail.be wrote:
> Of course, it's quite likely that the network is the bottleneck.
> Starting up many simultaneous connections isn't going to help in
> that case.
This depends on the error rates and the latency between the two sides
(each file may be on a different server in a different part of the
world, for all we know). Generally, 4 downloads are faster than 1,
because of the synchronized way TCP/IP works, but of course they
create a bigger load on the client and on the server.
Ted
------------------------------
Date: 29 Jan 2007 09:55:28 -0800
From: "gf" <greg.ferguson@icrossing.com>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <1170093328.273739.220150@v33g2000cwv.googlegroups.com>
coolneo wrote:
> [...] They (Google) provided some information on
> howto write a script (shell) to download the digitized version using
> wget.
>
> There are about 50,000 items, raning in size from 15MB-600MB. My
> script downloads them fine, but it would be much faster if i could
> multi-thread(?) it. I'm running the wget using the sys command on a
> windows box (i know, i know, but the whole place is windows so I don't
> have much of a choice).
>
> Am I on the right track? Or should I be doing this differently?
You didn't say if this is a one-time job or something that'll be on-
going.
If it's a one-time job, then I'd split that file list into however
many processes I want to run, then start that many shell jobs and just
let 'em run until it's done. It's not elegant, it's brute force, but
sometimes that's plenty good.
If you're going to be doing this regularly, then LWP::Parallel is
pretty sweet. You can have each LWP agent shift an individual URL off
the list and slowly whittle it down.
The I/O issues mentioned are going to be worse on a single box though.
You can hit a point where the machine is network I/O bound so you
might want to consider confiscating a couple PCs and run a separate
job on each PC, as long as you're on a switch and a fast pipe.
I'd also seriously consider a modern sneaker-net, and see about buying
some hard-drives that'll hold the entire set of data, and send them to
Google, have them fill the drives, and then return them overnight air.
That might be a lot faster, and then you could reuse the drives later.
------------------------------
Date: 29 Jan 2007 11:04:13 -0800
From: "coolneo" <coolneo@yahoo.com>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <1170097453.333562.186230@l53g2000cwa.googlegroups.com>
On Jan 29, 12:20 pm, Ted Zlatanov <t...@lifelogs.com> wrote:
> On 29 Jan 2007, cool...@yahoo.com wrote:
>
> > I managed to download about 21,000 of the 50,000 items over the course
> > of some time. Initally, Google was processing these items at a slow
> > rate but lately they have picked it up.
> > Bandwidth is indeed a concern, and I understand downloading 5TB will
> > take a long long time, but I think it would be a little shorter if I
> > could spawn off 4 downloads at a time, or even 2, during our off
> > business hours and the weekend (I get . The average file size is
> > 125MB. We have a 200mb pipe, so it's not entirely unreasonable (is
> > it?).You should contact Google and request the data directly. I guarantee
> you they will be happy to avoid the load on their network and
> servers, since HTTP is not the best way to transfer lots of data.
>
> Ted
Ted, I didn't provide some addition information that would may make
you think differently:
Google is kinda odd sometimes. It took them forever to allow multiple
download streams, and then they provide this web interface to recall
data in text format with wget. I mean, for Google, you figure they
could do better. I think they would prefer to not give us anything at
all. Once we have it there is always the chance we'll give it way or
lose it or have it stolen (by Microsoft!).
Another thing I didn't mention is that this can grow to much larger
than the 50,000, in which case, I'd much rather just auto-download,
than deal with media.
------------------------------
Date: Mon, 29 Jan 2007 20:34:23 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <epllrh.1js.1@news.isolution.nl>
coolneo schreef:
> recall data in text format with wget.
I assume it is gz-compressed?
--
Affijn, Ruud
"Gewoon is een tijger."
------------------------------
Date: Mon, 29 Jan 2007 15:33:25 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <g69k5z5k1yi.fsf@dhcp-65-162.kendall.corp.akamai.com>
On 29 Jan 2007, coolneo@yahoo.com wrote:
> Google is kinda odd sometimes. It took them forever to allow multiple
> download streams, and then they provide this web interface to recall
> data in text format with wget. I mean, for Google, you figure they
> could do better. I think they would prefer to not give us anything at
> all. Once we have it there is always the chance we'll give it way or
> lose it or have it stolen (by Microsoft!).
As a business decision it may make sense; technically it's nonsense :)
At the very least they should give you a rsync interface. It's a
single TCP stream, it's fast, and it can be resumed if the connection
should abort. HTTP is low on my list of transport mechanisms for
large files.
> Another thing I didn't mention is that this can grow to much larger
> than the 50,000, in which case, I'd much rather just auto-download,
> than deal with media.
Sure. I was talking about your initial data load; subsequent loads
can be incremental.
I would also suggest limiting to N downloads per hour, to avoid bugs
or other situations (unmounted disk, for example) where you're
repeatedly requesting all the data you already have. That's a very
nasty situation.
Ted
------------------------------
Date: Mon, 29 Jan 2007 23:00:03 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Downloading lots and lots and lots of files
Message-Id: <5hrsr2pc4v2fvkp89p3e940hnuo3r706o4@4ax.com>
On 29 Jan 2007 06:44:02 -0800, "coolneo" <coolneo@yahoo.com> wrote:
>There are about 50,000 items, raning in size from 15MB-600MB. My
>script downloads them fine, but it would be much faster if i could
>multi-thread(?) it. I'm running the wget using the sys command on a
There's no sys command, BTW...
>windows box (i know, i know, but the whole place is windows so I don't
>have much of a choice).
Well, one (cheap) option that has not been mentioned is the simple
minded parallelization you can get with piped open()s...
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
------------------------------
Date: 29 Jan 2007 09:36:44 -0800
From: nigel@bouteyres.com
Subject: French Accents appear incorrectly...
Message-Id: <1170092204.389765.257560@l53g2000cwa.googlegroups.com>
Hi,
I'm creating a French web site. I've used Dreamweaver to develop most=20
of the site and have simply typed the french with accents. This all=20
displays fine. I've used DreamWeaver to create a library object for=20
the menu of the site (to avoid having multiple copies of exactly the=20
same html). I've now uploaded the library object to my server in order=20
that a perl program can read the contents of that file and output a=20
page of html complete with the menu for the site, but here I've run=20
into a problem: the accented letters are being displayed incorrectly.=20
For example an '=E9' (e with an acute accent) is being displayed as a=20
capital A with two dots over it folowed by the copyright symbol. This=20
only happens for the text that is being read from the library object=20
file, an e with an acute accent typed into the perl program and=20
printed directly displays correctly. So I think the problem must be=20
related to how I read and then write the library object file. Here is=20
my code:
#######################################################################
######################
# Open and read the contents of the headerMenu template...
open(INPUT, "../Library/headerMenuFr.lbi") ||=20
&printErrorMessage("Error 001 : Can't open the 'headerMenuFr' file in:=20
$program");
while(<INPUT>) {
push(@records1,$_);
}
close(INPUT)|| &printErrorMessage("Error 012 : Can't close the=20
'headerMenuFr' file in: $program");
# Now read through the header and print it...
$recnum =3D @records1;
for ($i=3D0; $i<$recnum; $i++) {
print $records1[$i];
}
#######################################################################
#####################
If anyone has any bright ideas as to what I'm doing wrong and how to=20
correct it, I'd like to hear them!
Thanks in advance,
Nigel
------------------------------
Date: 29 Jan 2007 10:09:28 -0800
From: "gf" <greg.ferguson@icrossing.com>
Subject: Re: French Accents appear incorrectly...
Message-Id: <1170094168.515493.256380@v33g2000cwv.googlegroups.com>
nigel@bouteyres.com wrote:
[=2E..]
> I'm creating a French web site. I've used Dreamweaver to develop most
> of the site and have simply typed the french with accents. This all
> displays fine. I've used DreamWeaver to create a library object for
> the menu of the site (to avoid having multiple copies of exactly the
> same html). I've now uploaded the library object to my server in order
> that a perl program can read the contents of that file and output a
> page of html complete with the menu for the site, but here I've run
> into a problem: the accented letters are being displayed incorrectly.
> For example an '=E9' (e with an acute accent) is being displayed as a
> capital A with two dots over it folowed by the copyright symbol. This
> only happens for the text that is being read from the library object
> file, an e with an acute accent typed into the perl program and
> printed directly displays correctly. So I think the problem must be
> related to how I read and then write the library object file. Here is
> my code:
OK, first, did you EMBED the accented characters, or did you let=20
Dreamweaver do the RIGHT thing and use HTML entities?
If you embedded the high-bit-set characters, or used the ALT+number=20
method of entering an accented character... I'm all for having you=20
flogged, because that creates a real mess. But it's easily fixed. Just=20
tell Dreamweaver to convert the characters to HTML entities and resave=20
the file. And, then, never, EVER directly enter accented characters=20
again.
Now, typing quick 'n dirty on the fly...
>
> #######################################################################
> ######################
>
> # Open and read the contents of the headerMenu template...
> open(INPUT, "../Library/headerMenuFr.lbi") ||
> &printErrorMessage("Error 001 : Can't open the 'headerMenuFr' file in:
> $program");
# ALWAYS...
use warnings;
use strict;
# ALWAYS use the three parm form of open...
open (my $INPUT, '<','...') or die $!;
>
> while(<INPUT>) {
> push(@records1,$_);
> }
while (<$INPUT>) {
print ;
}
> close(INPUT)|| &printErrorMessage("Error 012 : Can't close the
> 'headerMenuFr' file in: $program");
close ($INPUT) or die "Can't close: $!\n";
> # Now read through the header and print it...
>
> $recnum =3D @records1;
> for ($i=3D0; $i<$recnum; $i++) {
>
> print $records1[$i];
> }
>
> #######################################################################
> #####################
>
> If anyone has any bright ideas as to what I'm doing wrong and how to
> correct it, I'd like to hear them!
If you ABSOLUTELY must embed the characters instead of use HTML=20
entities, then you'll deserve having to learn about UNICODE and=20
opening the IO layers, but be prepared because you'll be opening=20
Pandora's box.
perldoc -f open
------------------------------
Date: Mon, 29 Jan 2007 19:47:38 +0100
From: Kio <ysmay13@_NO_SPAM_poczta.fm>
Subject: html tags and perl
Message-Id: <eplfgo$rah$1@news.onet.pl>
hi,
I've got a perl script which is parsing a HTML file and printing a src
text from a <img> tag on a screen.
I dont know how to write a sub which can take a html $file and change
all the src contents from the img tags to sth else. I dont want to use a
regexp.
for example:
<img src="xxx.jpg"> change to <img src="zzz.jpg">
Thx for you help
Damian
------------------------------
Date: 29 Jan 2007 21:04:46 GMT
From: Abigail <abigail@abigail.be>
Subject: Re: html tags and perl
Message-Id: <slrnersob7.tcf.abigail@alexandra.abigail.be>
Kio (ysmay13@_NO_SPAM_poczta.fm) wrote on MMMMDCCCXCIX September MCMXCIII
in <URL:news:eplfgo$rah$1@news.onet.pl>:
.. hi,
..
.. I've got a perl script which is parsing a HTML file and printing a src
.. text from a <img> tag on a screen.
..
.. I dont know how to write a sub which can take a html $file and change
.. all the src contents from the img tags to sth else. I dont want to use a
.. regexp.
That's silly. If you want to code with articificial restrictions (in this
case "don't use regular expressions"), you're better off asking your question
in a more appropriate forum.
Abigail
--
perl -we '$@="\145\143\150\157\040\042\112\165\163\164\040\141\156\157\164".
"\150\145\162\040\120\145\162\154\040\110\141\143\153\145\162".
"\042\040\076\040\057\144\145\166\057\164\164\171";`$@`'
------------------------------
Date: Mon, 29 Jan 2007 22:11:33 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: html tags and perl
Message-Id: <dgosr2trmnt4bgpnvqi2sju0qhgoo0707g@4ax.com>
On Mon, 29 Jan 2007 19:47:38 +0100, Kio <ysmay13@_NO_SPAM_poczta.fm>
wrote:
>I dont know how to write a sub which can take a html $file and change
>all the src contents from the img tags to sth else. I dont want to use a
>regexp.
Whoa! People wants to use regexen all the time. And we tell them to
use some suitable module instead, which is what I recommend to you
too.
><img src="xxx.jpg"> change to <img src="zzz.jpg">
Earlier today someone posted a question about HTML::TokeParser, and
people answered giving some explicit example. You may enjoy reading
that thread. Your case is actually easier than what was being asked
there...
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
------------------------------
Date: Mon, 29 Jan 2007 22:45:26 +0100
From: Kio <ysmay13@_NO_SPAM_poczta.fm>
Subject: Re: html tags and perl
Message-Id: <eplpu5$247$1@news.onet.pl>
Abigail napisał(a):
> That's silly.
No it's not.
> If you want to code with articificial restrictions (in this
> case "don't use regular expressions"), you're better off asking your question
> in a more appropriate forum.
I can use regexp for those:
<img src="whatever.jpg">
and than change it ...
but if i would have sth like:
<img alt="aa" src="whatever.jpg"> whats then ? regexp is not gonna work
on it :/
what about links situated in JS ? regexp is not gonan work again :/
Damian
------------------------------
Date: Mon, 29 Jan 2007 23:46:17 +0100
From: Kio <ysmay13@_NO_SPAM_poczta.fm>
Subject: Re: html tags and perl
Message-Id: <epltg7$coq$1@news.onet.pl>
Michele Dondi napisa?(a):
> Earlier today someone posted a question about HTML::TokeParser, and
> people answered giving some explicit example. You may enjoy reading
> that thread. Your case is actually easier than what was being asked
> there...
I wrote sth like this:
use HTML::TokeParser;
my $p = HTML::TokeParser->new(\$html_file);
while (my $token = $p->get_tag('img'))
{
#how can i write sth to the src attr ?
#like: new attr="/e/.jpg"
}
Damian
------------------------------
Date: 29 Jan 2007 13:08:46 -0800
From: krakle@visto.com
Subject: Re: html tags and perl
Message-Id: <1170104925.994472.244150@a34g2000cwb.googlegroups.com>
On Jan 29, 12:47 pm, Kio <ysmay13@_NO_SPAM_poczta.fm> wrote:
> I dont want to use a
> regexp.
Why not?
>
> for example:
>
> <img src="xxx.jpg"> change to <img src="zzz.jpg">
>
$file =~ s/xxx\.jpg/zzz\.jpg/g;
Now why would you want to use more code for a slower less efficient
way to replace just to avoid regex?
------------------------------
Date: Mon, 29 Jan 2007 22:51:53 +0100
From: Kio <ysmay13@_NO_SPAM_poczta.fm>
Subject: Re: html tags and perl
Message-Id: <eplqa6$35h$1@news.onet.pl>
krakle@visto.com napisa?(a):
>> I dont want to use a
>> regexp.
>
> Why not?
>
cause in some examples its not gonna work
>> for example:
>>
>> <img src="xxx.jpg"> change to <img src="zzz.jpg">
>>
>
> $file =~ s/xxx\.jpg/zzz\.jpg/g;
>
OK but i have to change the path for all the IMG SRC , JUST those !!
sometimes you can have .jpg, .png or .gif etc.
you can have:
<img src="a.jpg">
<img src='a.jpg'>
<img src=a.jpg>
<img alt="a" src="a.jpg">
etc.
or even img src in JS like: src\="" whatever ...
you have to know that other tags have SRC as well, i have to change just
those from IMG
Damian
------------------------------
Date: Mon, 29 Jan 2007 08:06:48 -0600
From: "Mumia W. (NOSPAM)" <paduille.4060.mumia.w+nospam@earthlink.net>
Subject: Re: HTML::TokeParser
Message-Id: <epl0vk$qr9$1@aioe.org>
On 01/29/2007 04:30 AM, avlee wrote:
>
> Hello
>
> I want to find tag "X" in which i later want to find all "Y" tags.
> Is it possible to do using HTML::TokeParser ?
> Right now i found only "X" tag:
>
> my $parser = HTML::TokeParser->new(\$data);
> my $tag = $parser->get_tag($X_tag);
>
> How to force parser to search "Y" tags from this moment - but only
> inside "X" tag ?
> (not the whole document) ? [...]
Probably you'll have to do "$parser->get_tag(qw(X /X Y /Y));" and write
logic that determines when you have a Y element that is inside of an X
element.
I haven't used HTML::TokeParser much, but a cursory reading of the POD
suggests that the above is what you want.
--
Windows Vista and your freedom in conflict:
http://techdirt.com/articles/20061019/102225.shtml
------------------------------
Date: Mon, 29 Jan 2007 20:16:02 +0100
From: Martijn Lievaart <m@remove.this.part.rtij.nl>
Subject: Re: socket and fork
Message-Id: <pan.2007.01.29.19.16.01.901914@remove.this.part.rtij.nl>
On Mon, 29 Jan 2007 14:01:55 +0100, Larry wrote:
> In article <pan.2007.01.28.19.20.37.752870@remove.this.part.rtij.nl>,
> Martijn Lievaart <m@remove.this.part.rtij.nl> wrote:
>
>> Did you look at perldoc perlipc? It's right in there.
>
> well, that's what I was looking for, still I have some difficulties
> figuring out how fork() works. this is what I've got:
Read the part that starts with "And here’s a multithreaded version".
It's exactly what you want.
M4
--
Redundancy is a great way to introduce more single points of failure.
------------------------------
Date: Mon, 29 Jan 2007 22:00:35 GMT
From: "Colin B." <cbigam@somewhereelse.nucleus.com>
Subject: Re: Style questions
Message-Id: <45be6e7f@news.nucleus.com>
Michele Dondi <bik.mido@tiscalinet.it> wrote:
> On Thu, 25 Jan 2007 22:48:18 GMT, "Colin B."
> <cbigam@somewhereelse.nucleus.com> wrote:
>
>>I'm not a programmer, and never have been. I can grind out a decent shell
>>script, and beat my head against perl until things work, but I'm seldom
>>happy with the inelegance of the solution. Does anyone have any good style
>>guides for perl?
>
> There's the one that comes with Perl:
>
> perldoc perlstyle
Cool. Thanks!
>>And more specifically, I've got the following general case that keeps coming
>>up routinely.
>>
>>foreach (`some_unix_command_that_puts_out_multiple_lines`) {
>
> Nothing wrong with this. Personally I have an idiosincrasy with
> backticks and prefer qx with alternative delimiters, but that's just
> me. (That's also in shell scripts, where I prefer $() all the time,
> but there the latter has the additional, *technical*, advantage of
> allowing nesting.)
While I can appreciate this, I do too much stuff that is necessarily in a
Bourne shell, where $() doesn't work.
> Other than that, if your unix command puts out *lots* of multiple
> lines, you may want to use a "piped" open() instead, and read line by
> line, although all in all it is a more verbose solution.
Duly noted--this is mostly for 'file line quickies at the command line,'
but I'll make a mental note to use open() instead, in significant (or at
least permanent) scripts.
> split() in void context is deprecated. Perl emits a warning if
> warnings are on. I strongly recommend you put the following two lines
> at the beginning of your scripts:
So I'm not entirely sure about this. I've now read the man page and it
says that implicit split() to @_ is deprecated because of the very fact of
clobbering @_. After some thinking, this made good sense and I can see why
I wouldn't want to (generally) do it.
However, is this really void context? I would have thought that it was in
list context, because of the fact that it's dumping to @_. Or is it that
populating @_ is the default behaviour (in this case) for void context?
Actually, what is the definitive definition of void context? If something
dumps its result to @_ or $_ or the like, then is it actually in void
context?
Thanks,
Colin
------------------------------
Date: Mon, 29 Jan 2007 23:46:51 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Style questions
Message-Id: <d7usr2tf5v823ujpkrocpkq5n5khobd5f0@4ax.com>
On Mon, 29 Jan 2007 22:00:35 GMT, "Colin B."
<cbigam@somewhereelse.nucleus.com> wrote:
>However, is this really void context? I would have thought that it was in
>list context, because of the fact that it's dumping to @_. Or is it that
>populating @_ is the default behaviour (in this case) for void context?
The latter.
>Actually, what is the definitive definition of void context? If something
>dumps its result to @_ or $_ or the like, then is it actually in void
>context?
Context is not defined by what is "dumped" to where. If you have a
statement which consists of an expression that is not on the rhs of an
assignment, then that expression is evaluated in void context.
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
------------------------------
Date: 29 Jan 2007 07:17:41 -0800
From: "Yakov" <iler.ml@gmail.com>
Subject: subpattern reference using vaiable subpattern index
Message-Id: <1170083861.239923.107790@v33g2000cwv.googlegroups.com>
I have submatch index in a variable, $SUB=5. How do I write submatch
reference $5 using $SUB ?
Is ${$SUB} ok ?
2. Is there array of subpatterns ($1..$<N>) so I can take it's size ?
Yakov
------------------------------
Date: 29 Jan 2007 07:32:04 -0800
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: subpattern reference using vaiable subpattern index
Message-Id: <1170084721.015847.160520@q2g2000cwa.googlegroups.com>
On Jan 29, 10:17 am, "Yakov" <iler...@gmail.com> wrote:
> I have submatch index in a variable, $SUB=5. How do I write submatch
> reference $5 using $SUB ?
> Is ${$SUB} ok ?
What happened when you tried it?
> 2. Is there array of subpatterns ($1..$<N>) so I can take it's size ?
Just evaluate the pattern match in list context.
my @matches = $x =~ /(foo).*(bar).*(baz)/;
print "Found " . @matches . " matches\n";
If that's not what you meant, please post a short-but-complete program
that demonstrates what you're trying to do and how you're having
difficulty.
Have you read the Posting Guidelines that are posted here twice a
week?
Paul Lalli
------------------------------
Date: Mon, 29 Jan 2007 22:27:50 GMT
From: "David Walker" <david@cs.cf.ac.uk>
Subject: Tar on Windows XP
Message-Id: <Gxuvh.18350$8j7.13192@newsfe1-win.ntli.net>
Hi
I am trying to create a tar archive on Windows XP with the Perl code below.
However, when I look at the archive created (using WinZip 10.0) all the
directory information is lost, and when I untar it using WinZip all I get is
all the files in the same directory. Can some kind person please tell me how
I can create the tar file so that when it is untar'd the directory
structure will be preserved.
Thanks
David
use Archive::Tar;
use IO::Zlib;
use File::Find;
$dir = "c:/docume~1/david/somedir";
$archive = "c:/docume~1/david/archive.tar";
$tar = Archive::Tar->new;
(@files);
find(\&wanted, $dir);
$tar->add_files(@files);
$tar->write($archive);
sub wanted{
push(@files,$File::Find::name);
}
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 86
*************************************