[29781] in Perl-Users-Digest
Perl-Users Digest, Issue: 1024 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Nov 11 14:09:45 2007
Date: Sun, 11 Nov 2007 11:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 11 Nov 2007 Volume: 11 Number: 1024
Today's topics:
Convert some files from html to plaintext <lucavilla@cashette.com>
Re: Convert some files from html to plaintext <jurgenex@hotmail.com>
Re: launch a DOS program from a Perl script? <jurgenex@hotmail.com>
Re: launch a DOS program from a Perl script? <rkb@i.frys.com>
Perl newbie regular expression usage question/help dfairman16@hotmail.com
Re: Perl newbie regular expression usage question/help <newsgroups@debain.org>
Re: Perl newbie regular expression usage question/help dfairman16@hotmail.com
Re: Perl newbie regular expression usage question/help <mark.clementsREMOVETHIS@wanadoo.fr>
Re: Perl newbie regular expression usage question/help <jurgenex@hotmail.com>
Re: Perl newbie regular expression usage question/help dfairman16@hotmail.com
Re: Perl newbie regular expression usage question/help dfairman16@hotmail.com
Re: Perl newbie regular expression usage question/help <tadmc@seesig.invalid>
Re: Perl newbie regular expression usage question/help dfairman16@hotmail.com
Re: Perl newbie regular expression usage question/help <noreply@gunnar.cc>
sleep/fork/shell/SIGCHLD interaction problem <gerph@gerph.org>
To extract numbers from files with Perl <lucavilla@cashette.com>
Re: To extract numbers from files with Perl <lucavilla@cashette.com>
Re: To extract numbers from files with Perl <bik.mido@tiscalinet.it>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 11 Nov 2007 10:02:14 -0800
From: Luca Villa <lucavilla@cashette.com>
Subject: Convert some files from html to plaintext
Message-Id: <1194804134.823911.115500@19g2000hsx.googlegroups.com>
I have many html files named like these:
c:\dir\femo-black.html
c:\dir\loren-white.html
c:\dir\spark-white.html
c:\dir\kim-black.html
c:\dir\paul-white.html
How can I convert only the files named "c:\dir\*-white.html" to
plaintext files named c:\dir\(original filename)-text.txt?
BTW do you know a better Perl module than HTML::FormatText (
http://search.cpan.org/~sburke/HTML-Format-2.04/lib/HTML/FormatText.pm)
to convert HTML to plaintext?
------------------------------
Date: Sun, 11 Nov 2007 18:37:03 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: Convert some files from html to plaintext
Message-Id: <jZHZi.578$OJ.213@trndny06>
Luca Villa wrote:
> I have many html files named like these:
>
> c:\dir\femo-black.html
> c:\dir\loren-white.html
> c:\dir\spark-white.html
> c:\dir\kim-black.html
> c:\dir\paul-white.html
>
> How can I convert only the files named "c:\dir\*-white.html"
perldoc -f glob
> to plaintext files
Many ways, depending on what you consider the plaintext equivalent of an
HTML file. After all, HTML contains more information than plaintext and
therefore a lossless conversion is not possible. One way would be to use
lynx with the text-output option.
Another way is described in the Perl FAQ: "perldoc -q HTML"
"How do I remove HTML from a string?"
> named c:\dir\(original filename)-text.txt?
Depending upon how you generate the target text e.g. by redirecting the
output of lynx to that file or buy writing to that file or ...
jue
------------------------------
Date: Sun, 11 Nov 2007 16:22:34 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: launch a DOS program from a Perl script?
Message-Id: <e%FZi.4094$cD.2577@trndny08>
Luca Villa wrote:
> This seems to be the solution:
> system "HTML2TXT $_ > ".(/(.*)html/)[0]."txt" for <c:/dir/*-red.html>;
Well, I am not to argue with you but that is not even valid Perl code:
<quote>
Bareword found where operator expected at t.pl line 5, near "<c:/>dir"
(Missing operator before dir?)
Bareword found where operator expected at t.pl line 5, near "*-red"
(Missing operator before red?)
syntax error at t.pl line 5, near "<c:/>dir"
t.pl had compilation errors.
<\quote>
You may want to fix that first.
jue
------------------------------
Date: Sun, 11 Nov 2007 10:58:43 -0800
From: Ron Bergin <rkb@i.frys.com>
Subject: Re: launch a DOS program from a Perl script?
Message-Id: <1194807523.465853.129260@i13g2000prf.googlegroups.com>
On Nov 11, 8:22 am, "J=FCrgen Exner" <jurge...@hotmail.com> wrote:
> Luca Villa wrote:
> > This seems to be the solution:
> > system "HTML2TXT $_ > ".(/(.*)html/)[0]."txt" for <c:/dir/*-red.html>;
>
> Well, I am not to argue with you but that is not even valid Perl code:
> <quote>
> Bareword found where operator expected at t.pl line 5, near "<c:/>dir"
> (Missing operator before dir?)
> Bareword found where operator expected at t.pl line 5, near "*-red"
> (Missing operator before red?)
> syntax error at t.pl line 5, near "<c:/>dir"
> t.pl had compilation errors.
> <\quote>
> You may want to fix that first.
>
> jue
It might help if you type in the command correctly when testing.
<c:/>dir/*-red.html>
is not the same as
<c:/dir/*-red.html>
------------------------------
Date: Sun, 11 Nov 2007 07:18:28 -0800
From: dfairman16@hotmail.com
Subject: Perl newbie regular expression usage question/help
Message-Id: <1194794308.113216.300970@22g2000hsm.googlegroups.com>
Hi all
What I want my Perl programme to print out is
w
=
1
AND
(
(
x.y
=
"FRED ("
)
OR
(
z
=
2
)
)
generated from
w =1 AND ( (x.y="Fred (") OR (z=2) )
The problem is that although Perl and regular expressions are suitable
for the task, my starting point is that I have to learn Perl ... well
I'm learning it anyway, slowly, and the only thing that appears simple
is Hello World. Regular expressions are really complex, more than I
had thought. The bit the regular expression I am trying to use above
is that I don't care about spaces except for when they are enclosed by
double quotes (the "FRED (" bit in the query example above).
The query above is a real cut down simple version of the query string
that I would like to start to work on. The real query is a lot more
complex but I want to get it working for a very simple one first, then
extend the Perl, and during extending it learn more of the subset of
Perl I need to know to get it all done.
At the risk of being accused of asking someone to do my project for
me, as above is a starting point I would really appreciate being
pointed in the right direction. Not that it should matter but I
include the information for completeness, I am using Perl 5.8.8 on
Suse linux 8.
Thank you
David Fairman
UK
ps. I don't clear my email from this email account so reply in this
newsgroup. This email address is just one I use to stop me getting
spam at home.
------------------------------
Date: Sun, 11 Nov 2007 15:31:25 +0000 (UTC)
From: Samuel <newsgroups@debain.org>
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <fh778c$frc$2@tamarack.fernuni-hagen.de>
On Sun, 11 Nov 2007 07:18:28 -0800, dfairman16 wrote:
> What I want my Perl programme to print out is
>
> w
> =
> 1
> AND
> (
> (
> x.y
> =
> "FRED ("
> )
> OR
> (
> z
> =
> 2
> )
> )
>
>
> generated from
>
> w =1 AND ( (x.y="Fred (") OR (z=2) )
You don't want to use plain regular expressions to do this, you want a
lexer/parser.
http://www.google.com/search?q=lex+yacc+perl
-Samuel
------------------------------
Date: Sun, 11 Nov 2007 07:52:23 -0800
From: dfairman16@hotmail.com
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <1194796343.593447.256620@19g2000hsx.googlegroups.com>
On Nov 11, 3:31 pm, Samuel <newsgro...@debain.org> wrote:
> On Sun, 11 Nov 2007 07:18:28 -0800, dfairman16 wrote:
> > What I want my Perl programme to print out is
>
> > w
> > =
> > 1
> > AND
> > (
> > (
> > x.y
> > =
> > "FRED ("
> > )
> > OR
> > (
> > z
> > =
> > 2
> > )
> > )
>
> > generated from
>
> > w =1 AND ( (x.y="Fred (") OR (z=2) )
>
> You don't want to use plain regular expressions to do this, you want a
> lexer/parser.
>
> http://www.google.com/search?q=lex+yacc+perl
>
> -Samuel- Hide quoted text -
>
> - Show quoted text -
You are right, I do. I just didn't know this is what I wanted and
therefore didn't have the right terms to Google it out. Thank you very
much.
David
------------------------------
Date: Sun, 11 Nov 2007 17:10:19 +0100
From: Mark Clements <mark.clementsREMOVETHIS@wanadoo.fr>
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <47372943$0$5097$ba4acef3@news.orange.fr>
dfairman16@hotmail.com wrote:
> On Nov 11, 3:31 pm, Samuel <newsgro...@debain.org> wrote:
>> On Sun, 11 Nov 2007 07:18:28 -0800, dfairman16 wrote:
>>> What I want my Perl programme to print out is
>>> w
>>> =
>>> 1
>>> AND
>>> (
>>> (
>>> x.y
>>> =
>>> "FRED ("
>>> )
>>> OR
>>> (
>>> z
>>> =
>>> 2
>>> )
>>> )
>>> generated from
>>> w =1 AND ( (x.y="Fred (") OR (z=2) )
>> You don't want to use plain regular expressions to do this, you want a
>> lexer/parser.
>>
>> http://www.google.com/search?q=lex+yacc+perl
>>
>> -Samuel- Hide quoted text -
>>
>> - Show quoted text -
>
> You are right, I do. I just didn't know this is what I wanted and
> therefore didn't have the right terms to Google it out. Thank you very
> much.
You may have found this already but I found
http://www.perl.com/pub/a/2006/01/05/parsing.html?page=1
to be a good introduction.
Mark
------------------------------
Date: Sun, 11 Nov 2007 16:26:58 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <m3GZi.1196$eV.1007@trndny04>
dfairman16@hotmail.com wrote:
> What I want my Perl programme to print out is
>
> w
> =
> 1
> AND
> (
> (
> x.y
> =
> "FRED ("
> )
> OR
> (
> z
> =
> 2
> )
> )
>
> generated from
>
> w =1 AND ( (x.y="Fred (") OR (z=2) )
It appears like you want to split the line at spaces:
split (/\s*/, 'w =1 AND ( (x.y="Fred (") OR (z=2)');
This is assuming that the non-split "FRED (" was an oversight in your sample
result above.
jue
------------------------------
Date: Sun, 11 Nov 2007 08:43:13 -0800
From: dfairman16@hotmail.com
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <1194799393.320978.311770@22g2000hsm.googlegroups.com>
On Nov 11, 4:26 pm, "J=FCrgen Exner" <jurge...@hotmail.com> wrote:
> dfairma...@hotmail.com wrote:
> > What I want my Perl programme to print out is
>
> > w
> > =3D
> > 1
> > AND
> > (
> > (
> > x.y
> > =3D
> > "FRED ("
> > )
> > OR
> > (
> > z
> > =3D
> > 2
> > )
> > )
>
> > generated from
>
> > w =3D1 AND ( (x.y=3D"Fred (") OR (z=3D2) )
>
> It appears like you want to split the line at spaces:
> split (/\s*/, 'w =3D1 AND ( (x.y=3D"Fred (") OR (z=3D2)');
> This is assuming that the non-split "FRED (" was an oversight in your sam=
ple
> result above.
>
> jue- Hide quoted text -
>
> - Show quoted text -
Thanks for your help J=FCrgen. I almost had something similar to what
you wrote in my Perl regular expression. The "FRED (" was not an
oversight however - the token I want is "FRED (" and including the
spaces. This is my big problem, spaces everythere except for between
quotes should be treated as just white space, in quotes they are not.
Thank you
David
------------------------------
Date: Sun, 11 Nov 2007 08:50:11 -0800
From: dfairman16@hotmail.com
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <1194799811.363329.57790@50g2000hsm.googlegroups.com>
On Nov 11, 4:43 pm, dfairma...@hotmail.com wrote:
> On Nov 11, 4:26 pm, "J=FCrgen Exner" <jurge...@hotmail.com> wrote:
>
>
>
>
>
> > dfairma...@hotmail.com wrote:
> > > What I want my Perl programme to print out is
>
> > > w
> > > =3D
> > > 1
> > > AND
> > > (
> > > (
> > > x.y
> > > =3D
> > > "FRED ("
> > > )
> > > OR
> > > (
> > > z
> > > =3D
> > > 2
> > > )
> > > )
>
> > > generated from
>
> > > w =3D1 AND ( (x.y=3D"Fred (") OR (z=3D2) )
>
> > It appears like you want to split the line at spaces:
> > split (/\s*/, 'w =3D1 AND ( (x.y=3D"Fred (") OR (z=3D2)');
> > This is assuming that the non-split "FRED (" was an oversight in your s=
ample
> > result above.
>
> > jue- Hide quoted text -
>
> > - Show quoted text -
>
> Thanks for your help J=FCrgen. I almost had something similar to what
> you wrote in my Perl regular expression. The "FRED (" was not an
> oversight however - the token I want is "FRED (" and including the
> spaces. This is my big problem, spaces everythere except for between
> quotes should be treated as just white space, in quotes they are not.
>
> Thank you
>
> David- Hide quoted text -
>
> - Show quoted text -
Oh, and the quotes should be part of the token too, ie. they shouldn't
be discarded.
------------------------------
Date: Sun, 11 Nov 2007 11:18:34 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <slrnfjeeba.h3q.tadmc@tadmc30.sbcglobal.net>
dfairman16@hotmail.com <dfairman16@hotmail.com> wrote:
> the token I want is "FRED (" and including the
> spaces. This is my big problem, spaces everythere except for between
> quotes should be treated as just white space, in quotes they are not.
Now you have a Question that is Asked Frequently:
How can I split a [character] delimited string except when
inside [character]?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Sun, 11 Nov 2007 09:53:33 -0800
From: dfairman16@hotmail.com
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <1194803613.900811.254260@57g2000hsv.googlegroups.com>
On Nov 11, 5:18 pm, Tad McClellan <ta...@seesig.invalid> wrote:
> dfairma...@hotmail.com <dfairma...@hotmail.com> wrote:
> > the token I want is "FRED (" and including the
> > spaces. This is my big problem, spaces everythere except for between
> > quotes should be treated as just white space, in quotes they are not.
>
> Now you have a Question that is Asked Frequently:
>
> How can I split a [character] delimited string except when
> inside [character]?
>
> --
> Tad McClellan
> email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
OK, the link is
http://www.perl.com/doc/FAQs/FAQ/oldfaq-html/Q4.28.html
I'm glad
1. You haven't come down on me with wrath
2. I mentioned out the outset I'm learning <g>
Thank you
David.
------------------------------
Date: Sun, 11 Nov 2007 19:35:23 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Perl newbie regular expression usage question/help
Message-Id: <5pp0bsFsioidU1@mid.individual.net>
dfairman16@hotmail.com wrote:
> What I want my Perl programme to print out is
>
> w
> =
> 1
> AND
> (
> (
> x.y
> =
> "FRED ("
> )
> OR
> (
> z
> =
> 2
> )
> )
>
> generated from
>
> w =1 AND ( (x.y="Fred (") OR (z=2) )
local $_ = 'w =1 AND ( (x.y="Fred (") OR (z=2) )';
print "$_\n" for /".*?"|\w+\.\w+|\w+|\S/g;
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 11 Nov 2007 15:41:34 +0000
From: Justin Fletcher <gerph@gerph.org>
Subject: sleep/fork/shell/SIGCHLD interaction problem
Message-Id: <Pine.LNX.4.63.0711111530310.4685@buttercup.gerph.org>
Hiya,
I'm having a problem trying to get a simple program to respond the way
that I expect. The basic premise is thus :
1. Fork a child.
2. Sleep for a while.
3. Do other stuff.
This seems pretty simple, and I have a SIGCHLD handler which will catch my
forked process if it exits. I thought everything was fine. Then I found
that is I press ctrl-Z to suspend the parent whilst I'm running the
program and then background it, it hangs. I've reduced the problem to the
simplest I can, as follows :
----
#!/bin/perl
$SIG{'CHLD'} = sub {
print "SIGCHLD\n";
$pid = wait;
print "leave SIGCHLD for pid $pid\n";
};
print "Forking to do some long running task\n";
unless ($pid = fork) {
$SIG{'CHLD'} = 'DEFAULT';
exec "tail -f /dev/null";
die "failed\n";
};
print "Sleeping\n";
sleep 50;
print "Waking\n";
----
The problem is that if I press ctrl-Z whilst the program is sleeping, and
then resume it in the background with 'bg', a SIGCHLD is triggered. The
handler then does a 'wait' to get the PID and hangs because there isn't a
child that's exited. We never leave the SIGCHLD handler (unless the long
running task completes). The use of 'tail -f /dev/null' is purely to
simulate a task which just keeps running.
In the shell, the following sequence is seen:
----
justin@buttercup:~/Root/perltest$ perl testsleep.pl
Forking to do some long running task
Sleeping
[1]+ Stopped perl testsleep.pl
justin@buttercup:~/Root/perltest$ bg
[1]+ perl testsleep.pl &
SIGCHLD
justin@buttercup:~/Root/perltest$
----
I'm running bash 3.1.17, linux kernel 2.6.18, from debian stable, with
perl 5.8.8.
I believe this sort of construct to be normal and even recommended from
the perlipc pages; so... am I doing something wrong ? is bash ? is the
kernel ? is perl ?
I'm hoping I'm just misunderstanding how process control should be done.
--
Gerph <http://gerph.org/>
... And you never see me walking toward you.
------------------------------
Date: Sun, 11 Nov 2007 08:58:46 -0800
From: Luca Villa <lucavilla@cashette.com>
Subject: To extract numbers from files with Perl
Message-Id: <1194800326.116382.189620@19g2000hsx.googlegroups.com>
I have thousands of files named like these:
c:\input\pumico-home.html
c:\input\ofofo-home.html
c:\input\cimaba-office.html
c:\input\plata-home.html
c:\input\plata-office.html
c:\input\zito-home.html
I need a Perl script that only for the files of those that match "c:
\input\*-home.html" performs some regular expression extractions like
in this two examples:
for a "pumico-home.html" that contains:
ziritabcdef12.80tttcucurullumnopq1zzzspugnizuabcdef1.25tttcantabarramnopq2zzzlocomotoabcdef0.32tttyamazetamnopq1zzz
it generates a "pumico-home-extract.txt" file that contains these
three couples of numbers, delimited by "|":
12.80|1|1.25|2|0.32|1
for a "ofofo-home.html" that contains:
lumabcdef7.44tttcimizetamnopq3zzzpupopoabcdef5.11tttpletoramnopq2zzz
it generates a "ofofo-home-extract.txt" file that contains these two
couples of numbers, delimited by "|":
7.44|3|5.11|2
Note: that the numbers are always in couples as in the examples. The
number of couples in each source file can vary from one to hundreds...
I already found the regular expressions that extract the numbers:
abcdef(\d+\.\d\d)ttt
mnopq(\d+)zzz
I'm stuck on the rest... (including file handling...)
Thanks in advance for any help
------------------------------
Date: Sun, 11 Nov 2007 10:14:27 -0800
From: Luca Villa <lucavilla@cashette.com>
Subject: Re: To extract numbers from files with Perl
Message-Id: <1194804867.843947.295810@o38g2000hse.googlegroups.com>
quasi-solution:
{local @ARGV=<c:/input/*-home.html>; local $^I='.extract.txt'; local $
\=$/;
while( <> ){
print join'|',/([\d.]+)/g if /\d/
}
}
This is still not the solution because it puts the new file in pumico-
home.html and the old file in pumico-home.html.extract.txt
------------------------------
Date: Sun, 11 Nov 2007 19:17:58 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: To extract numbers from files with Perl
Message-Id: <qghej35h04q1ahc7vd77c0rf6jblqet8fa@4ax.com>
On Sun, 11 Nov 2007 08:58:46 -0800, Luca Villa
<lucavilla@cashette.com> wrote:
>I need a Perl script that only for the files of those that match "c:
>\input\*-home.html" performs some regular expression extractions like
>in this two examples:
You can directly use glob().
>for a "pumico-home.html" that contains:
>ziritabcdef12.80tttcucurullumnopq1zzzspugnizuabcdef1.25tttcantabarramnopq2zzzlocomotoabcdef0.32tttyamazetamnopq1zzz
>
>it generates a "pumico-home-extract.txt" file that contains these
perldoc -f open
>three couples of numbers, delimited by "|":
>12.80|1|1.25|2|0.32|1
local ($,,$\)=("|", "\n");
print /\d+(?:\.\d+)?/g;
>I'm stuck on the rest... (including file handling...)
That is in the docs.
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 1024
***************************************