[23091] in Perl-Users-Digest
Perl-Users Digest, Issue: 5312 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Aug 5 14:06:50 2003
Date: Tue, 5 Aug 2003 11:05:48 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 5 Aug 2003 Volume: 10 Number: 5312
Today's topics:
$name getting changed to $_ (Brian)
Re: $name getting changed to $_ <pinyaj@rpi.edu>
Re: $name getting changed to $_ <grazz@pobox.com>
Re: $name getting changed to $_ <noreply@gunnar.cc>
Re: [regex] Can't get it to be ungreedy <jane.doe@acme.com>
Re: [regex] Can't get it to be ungreedy (matija)
Re: [regex] Can't get it to be ungreedy <noreply@gunnar.cc>
Re: [regex] Can't get it to be ungreedy <jane.doe@acme.com>
Re: [regex] Can't get it to be ungreedy (Tad McClellan)
Re: [regex] Can't get it to be ungreedy ctcgag@hotmail.com
Re: [regex] Can't get it to be ungreedy <jane.doe@acme.com>
Re: [regex] Can't get it to be ungreedy <matthew.garrish@sympatico.ca>
Re: Anyone build the Berkeley DBXML library for AS and <bobx@linuxmail.org>
Best way to search for multiple strings in a line? (Scott Stark)
Re: Best way to search for multiple strings in a line? <mpapec@yahoo.com>
Re: Best way to search for multiple strings in a line? <bwalton@rochester.rr.com>
bluescreens with perl for NT (Bohne)
Re: bluescreens with perl for NT <matthew.garrish@sympatico.ca>
Re: bluescreens with perl for NT <abuse@sgrail.org>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 5 Aug 2003 09:51:39 -0700
From: die2self@yahoo.com (Brian)
Subject: $name getting changed to $_
Message-Id: <509f2711.0308050851.46463b32@posting.google.com>
I am stumped here - Can someone look at my code and tell me why my
$name variable is being changed to what is stored in $_? Thx.
foreach(@name) {
if ($files[$x] ne "") {
open SETI, $files[$x] or die "Can not open $files[$x] :$!";
print "name after open = @name\n";
while (<SETI>) {
### $name actually changes here - moved it for readability
if ($. eq 39) {
print "Default Input = $_\n";
print "name after if = @name\n";
chomp;
/(\d.) hr (\d.)/;
$Avg_Time[$x] = $1*60+$2;
}
}
$x++;
close SETI;
}
}
Please let me know if you need any more information.
Thx - Brian.
------------------------------
Date: Tue, 5 Aug 2003 13:03:09 -0400
From: Jeff 'japhy' Pinyan <pinyaj@rpi.edu>
To: Brian <die2self@yahoo.com>
Subject: Re: $name getting changed to $_
Message-Id: <Pine.SGI.3.96.1030805130017.441726A-100000@vcmr-64.server.rpi.edu>
[posted & mailed]
On 5 Aug 2003, Brian wrote:
>I am stumped here - Can someone look at my code and tell me why my
>$name variable is being changed to what is stored in $_? Thx.
>
>foreach(@name) {
[snip]
> while (<SETI>) {
That's why.
When you do a for-loop on an array like that, the variable you use to
iterate over the array (by default, $_) is *aliased* to the element you're
working on:
@stuff = (1, 2, 3);
for (@stuff) { $_ += 5 }
print "@stuff"; # 6 7 8
That is, changes done to $_ effect the element in the array.
Additionally, when you use a while loop on a filehandle, if you don't
specify a variable, $_ is used. The problem is that it's the SAME $_
you happen to be using to iterate over your array.
while (<FILE>)
# is actually
while (defined($_ = <FILE>))
So, change at least ONE of those loops.
for my $n (@name) { ... }
while (my $line = <FILE>) { ... }
like so.
--
Jeff Pinyan RPI Acacia Brother #734 2003 Rush Chairman
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
------------------------------
Date: Tue, 05 Aug 2003 17:05:40 GMT
From: Steve Grazzini <grazz@pobox.com>
Subject: Re: $name getting changed to $_
Message-Id: <EtRXa.10118$W%3.2079@nwrdny01.gnilink.net>
Brian <die2self@yahoo.com> wrote:
> I am stumped here - Can someone look at my code and tell me why my
> $name variable is being changed to what is stored in $_? Thx.
[ snip non-essentials ]
> foreach(@name) {
> while (<SETI>) {
> ### $name actually changes here - moved it for readability
> }
> }
First, foreach() makes $_ an alias to the current element of @name,
so that if you modify or assign to $_, you alter that element. And
then "while(<FH>)" just assigns to $_.
--
Steve
------------------------------
Date: Tue, 05 Aug 2003 19:06:30 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: $name getting changed to $_
Message-Id: <bgoop5$qcl0o$1@ID-184292.news.uni-berlin.de>
Brian wrote:
> I am stumped here - Can someone look at my code and tell me why my
> $name variable is being changed to what is stored in $_?
You have no $name variable, just a @name variable, so how could it be
changed?
It would have been possible to write
foreach my $name (@name) {
Read about foreach loops at
http://www.perldoc.com/perl5.8.0/pod/perlsyn.html
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 03 Aug 2003 14:34:15 +0200
From: Jane Doe <jane.doe@acme.com>
Subject: Re: [regex] Can't get it to be ungreedy
Message-Id: <t60qivcu0tgpqmpg359lg9gcb3qhh08qha@4ax.com>
On Sun, 03 Aug 2003 00:32:13 +0200, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:
>I'd say it works exactly as expected. It matches everything from the
>_first_ occurrence of <tr> until the first occurrence of </tr> after
>the string 'Item2'.
>
>You may want to try:
>
> s|<tr>\s*<td>\s*Item2.+?</tr>||sig;
Thanks Gunnar and others. I tried the ideas you gave, and did read the
PerlDoc "What does it mean that regexes are greedy? How can I get
around it?" _before_ asking the question... but Privoxy is still
acting greedy, no matter what I try. I'll come up with another trick
somehow :-)
Thx again
JD.
------------------------------
Date: 3 Aug 2003 08:01:29 -0700
From: mpapec@yahoo.com (matija)
Subject: Re: [regex] Can't get it to be ungreedy
Message-Id: <ab28ebc3.0308030701.7fc40849@posting.google.com>
On Sat, 02 Aug 2003 23:13:47 +0200, Jane Doe <jane.doe@acme.com> wrote:
>figure out why Privoxy searches for patterns in greedy mode, regarless
>of my use of either the U switch, or the ? limiter after a counter
>like .* or .+ .
>
>Here's the starting HTML, the Privoxy filter, and the output:
>
>1. I just want to remove the complete "Item2" row, as shown :
>
><html>
><head>
></head>
><body>
> <table>
> <tr>
> <td>
> Item1
> </td>
> </tr>
> <!-- I want to remove this part -->
> <tr>
> <td>
> Item2
> </td>
> </tr>
> <!-- until this point -->
> </table>
></body>
></html>
You might go away with eval regex if your app. supports it,
$table =~ s{(<tr.+?</tr>)}{
my $tr = $1;
#do something to $tr
..
$tr;
}iges;
Still it's probably highly unefficient, but as you're not using perl..
------------------------------
Date: Sun, 03 Aug 2003 17:02:19 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: [regex] Can't get it to be ungreedy
Message-Id: <bgj8jh$okoj7$1@ID-184292.news.uni-berlin.de>
Jane Doe wrote:
> On Sun, 03 Aug 2003 00:32:13 +0200, Gunnar Hjalmarsson
> <noreply@gunnar.cc> wrote:
>>You may want to try:
>>
>> s|<tr>\s*<td>\s*Item2.+?</tr>||sig;
>
> Thanks Gunnar and others. I tried the ideas you gave, and did read the
> PerlDoc "What does it mean that regexes are greedy? How can I get
> around it?" _before_ asking the question... but Privoxy is still
> acting greedy, no matter what I try.
If you by that mean that it also removes 'Item1', it sounds weird.
Maybe Privoxy has its very own regex language. As Abigail pointed out,
there is no /U switch in Perl, and maybe Privoxy doesn't understand \s
Can't help you more, though. After all, this is a Perl group.
> I'll come up with another trick somehow :-)
Good luck.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 03 Aug 2003 21:06:30 +0200
From: Jane Doe <jane.doe@acme.com>
Subject: Re: [regex] Can't get it to be ungreedy
Message-Id: <r7nqiv0hgkq6o7ok16dfrgsb3rfjftikjk@4ax.com>
On Sun, 03 Aug 2003 17:02:19 +0200, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:
>If you by that mean that it also removes 'Item1', it sounds weird.
Someone mentioned that I could take a look at the [^...] syntax to
exclude patterns. Might do the trick.
Thank your for your help anyway :-)
JD.
------------------------------
Date: Sun, 3 Aug 2003 17:41:16 -0500
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: [regex] Can't get it to be ungreedy
Message-Id: <slrnbir3sc.be1.tadmc@magna.augustmail.com>
Jane Doe <jane.doe@acme.com> wrote:
> Someone mentioned that I could take a look at the [^...] syntax to
> exclude patterns.
You cannot use the [^...] syntax to exclude patterns.
A "character class", even a negated one, matches a *single character*.
You can use the [^...] syntax to exclude _characters_, not patterns.
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: 04 Aug 2003 16:09:36 GMT
From: ctcgag@hotmail.com
Subject: Re: [regex] Can't get it to be ungreedy
Message-Id: <20030804120936.887$9W@newsreader.com>
Jane Doe <jane.doe@acme.com> wrote:
> On Sun, 03 Aug 2003 00:32:13 +0200, Gunnar Hjalmarsson
> <noreply@gunnar.cc> wrote:
> >I'd say it works exactly as expected. It matches everything from the
> >_first_ occurrence of <tr> until the first occurrence of </tr> after
> >the string 'Item2'.
> >
> >You may want to try:
> >
> > s|<tr>\s*<td>\s*Item2.+?</tr>||sig;
>
> Thanks Gunnar and others. I tried the ideas you gave, and did read the
> PerlDoc "What does it mean that regexes are greedy? How can I get
> around it?" _before_ asking the question... but Privoxy is still
> acting greedy,
No, it isn't (or at least you haven't demonstrated such). You just do
not know what it means to be greedy or non-greedy. The nongreedy
quantifier between <tr> and Item2 means it will find the first Item2 after
a <tr>, not that it will find the last <tr> before an Item2.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service New Rate! $9.95/Month 50GB
------------------------------
Date: Tue, 05 Aug 2003 00:46:18 +0200
From: Jane Doe <jane.doe@acme.com>
Subject: Re: [regex] Can't get it to be ungreedy
Message-Id: <i0otivk84muadkqnjktbm5v90olienj4nn@4ax.com>
On 04 Aug 2003 16:09:36 GMT, ctcgag@hotmail.com wrote:
>No, it isn't (or at least you haven't demonstrated such).
Mmm... Using the regex I gave (s|<tr>.+Item2.+</tr>||sigU), Privoxy
returns this:
<body>
<table>
</table>
</body>
ie. making it non-greedy with either U or the ? quantifier doesn't
limit the search to the second line.
>The nongreedy quantifier between <tr> and Item2 means it will find the first Item2 after
>a <tr>, not that it will find the last <tr> before an Item2.
OK, but I expected a non-greedy regex to backtrack when finding Item2,
and stop when it found the first occurence of <tr> before Item2.
Obviously, it doesn't. The search goes on...
Thx anyhow
JD.
------------------------------
Date: Mon, 4 Aug 2003 19:39:21 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: [regex] Can't get it to be ungreedy
Message-Id: <G8CXa.1245$_a4.290903@news20.bellglobal.com>
"Jane Doe" <jane.doe@acme.com> wrote in message
news:i0otivk84muadkqnjktbm5v90olienj4nn@4ax.com...
>
> OK, but I expected a non-greedy regex to backtrack when finding Item2,
> and stop when it found the first occurence of <tr> before Item2.
> Obviously, it doesn't. The search goes on...
>
You have to keep in mind that regexes do what you tell them to do, not what
you would like them to do. Your regex works from left to right, not from the
middle out. In other words: find a <tr> (which will be the first in the
file, since you have nothing before <tr> in your regex), then do a
non-greedy match until you find Item2, then if you find Item2 keep going
until you find the next </tr>. And that's what it's doing.
It's almost hopeless to parse any markup language by a single regular
expression. I still haven't found anything that compares with Omnimark (Perl
and James Clark are the next best thing), but I don't suppose there's any
point in telling you to program in another language, since it looks like
you're stuck with what you have. And even with those tools, most html isn't
going to parse cleanly anyway.
I did have a similar problem once, and the only solution I came up with was
to read the file into an array and then read through the array looking for,
in your case, Item2. Whenever you hit a <tr> in a line, save that entry
number to a variable. Then when you find Item2, start looking for the next
</tr>. When you get that entry number, you can then wipe out the
corresponding array entries in between and do a little cleanup on the start
and end (i.e., to make sure nothing precedes the opening <tr> or follows the
</tr> on those lines.
I offer this up only as a kludge and not as an elegant solution. And there
are, as you would find, still a lot of problems inherent in this method
(i.e., everything and more on one line, missing </tr> tags that will destroy
your data, etc.), but when you're getting desperate...
Matt
------------------------------
Date: Mon, 04 Aug 2003 03:01:19 GMT
From: "Bob X" <bobx@linuxmail.org>
Subject: Re: Anyone build the Berkeley DBXML library for AS and Windows?
Message-Id: <30kXa.9686$gi.4821527@news2.news.adelphia.net>
According to the docs that is not the case. It has to be built with the same
compiler Perl was. Since I use ActiveState that means VC++.
"David Segleau" <dave@sleepycat.com> wrote in message
news:5saWa.19546$Oz4.6877@rwcrnsc54...
> You can use the free GCC compiler available from Cygwin.
>
> Dave
>
> "Bob" <bobx@linuxmail.org> wrote in message
> news:1001ff04.0307290651.a57c8b4@posting.google.com...
> > I am on Windows and do not have a compiler. Had anyone compiled the
> > Berkeley DBXML library for Windows? I would like to try out DBXML.
> >
> > Bob
>
>
------------------------------
Date: 3 Aug 2003 10:53:26 -0700
From: sstark@us.ibm.com (Scott Stark)
Subject: Best way to search for multiple strings in a line?
Message-Id: <ce94ec71.0308030953.50eed78b@posting.google.com>
I hope this isn't too dumb of a question, but what's the best way to
search for an array of items in the lines of a file? The following
seems inefficient, especially with a large number of search strings:
while($line=<FILE>){
foreach $s (@searchStrings){
push(@found,$line) if($line=~/$s/);
}
}
thanks,
Scott
------------------------------
Date: Sun, 03 Aug 2003 22:37:13 +0200
From: Matija Papec <mpapec@yahoo.com>
Subject: Re: Best way to search for multiple strings in a line?
Message-Id: <b7sqivkfdrpr7snjcqheh4cm2t2239f58j@4ax.com>
X-Ftn-To: Scott Stark
sstark@us.ibm.com (Scott Stark) wrote:
>I hope this isn't too dumb of a question, but what's the best way to
>search for an array of items in the lines of a file? The following
>seems inefficient, especially with a large number of search strings:
>
>while($line=<FILE>){
> foreach $s (@searchStrings){
> push(@found,$line) if($line=~/$s/);
> }
>}
my $s = join '|', map { '\Q'.$_.'\E' } @searchStrings;
$s = qr/$s/; #compile regex
while($line = <FILE>){
push (@found,$line) if $line =~ /$s/;
}
if this doesn't work replace first line with,
my $s = join '|', map { qq/\Q$_\E/ } @searchStrings;
--
Matija
------------------------------
Date: Sun, 03 Aug 2003 21:06:37 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: Best way to search for multiple strings in a line?
Message-Id: <3F2D7959.4020908@rochester.rr.com>
Scott Stark wrote:
> I hope this isn't too dumb of a question, but what's the best way to
> search for an array of items in the lines of a file? The following
> seems inefficient, especially with a large number of search strings:
>
> while($line=<FILE>){
> foreach $s (@searchStrings){
> push(@found,$line) if($line=~/$s/);
> }
> }
...
> Scott
You might benefit from the use of the "study" function (perldoc -f
study) for your task. Check out the examples given there.
Also, if you are truly searching for strings rather than patterns,
consider using the "index" function instead of regexps. It should be
faster most of the time.
I assume from the way you coded it above that you know for certain your
strings do not contain any regexp metacharacters. If they do, you will
need to quote them using \Q and \E.
Don't build a new regexp every time through the file read loop -- that
requires each regexp to be compiled for each line. You could use the
alternation metacharacter | to build a single regexp that will match any
of your search strings, as was suggested by a previous poster. Or you
could build an array of regexps and later apply them one at a time.
Maybe like:
for(@searchStrings){push @regexps,qr/\Q$_\E/}
Then later:
while($line=~<FILE>){
study $line;
for(@regexps){push @found,$line if $line=~$_}
}
Finally, consider that a line may match more than one of your strings.
If that happens, your code will put the line in @found more than once.
Is that desired behavior? If not, you might want to terminate the
foreach loop when you have a successful match. That would also speed
things up, as additional matches would not be tested for once a match is
found.
HTH.
--
Bob Walton
------------------------------
Date: 4 Aug 2003 05:53:32 -0700
From: simjesse@aol.com (Bohne)
Subject: bluescreens with perl for NT
Message-Id: <bfedec4.0308040453.70441081@posting.google.com>
He there
I am having problems with bluescreens.
Mainly when I am using the perl debugger.
I am using Perl v5.6.1
NT 4.00.1381
Norton Antivirus 7.61.934
Has anybody else encountered this problem?
Any help available?
Thanks
Simone
------------------------------
Date: Mon, 4 Aug 2003 15:59:13 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: bluescreens with perl for NT
Message-Id: <kWyXa.2880$ef7.288838@news20.bellglobal.com>
"Bohne" <simjesse@aol.com> wrote in message
news:bfedec4.0308040453.70441081@posting.google.com...
> I am having problems with bluescreens.
> Mainly when I am using the perl debugger.
>
> I am using Perl v5.6.1
> NT 4.00.1381
> Norton Antivirus 7.61.934
>
What does "mainly" mean? Do bsods occur all the time, regardless of whether
you're running Perl scripts or not? Do they only occur when you run your
Perl scripts? Do they occur regardless of what Perl script you're running?
Do they only occur when you run a specific script?
I once had a problem with perl (okay, me...) causing bsods on an NT box, but
all I can remember of that unfortunate incident is that it had something to
do with a system call gone awry. I would look for something suspcious like
that if it is just one script that is causing your problems...
Matt
------------------------------
Date: Mon, 04 Aug 2003 20:53:22 GMT
From: derek / nul <abuse@sgrail.org>
Subject: Re: bluescreens with perl for NT
Message-Id: <8thtivchi2fc2u7qvl7jkdgncun8s35quv@4ax.com>
>"Bohne" <simjesse@aol.com> wrote in message
>news:bfedec4.0308040453.70441081@posting.google.com...
>> I am having problems with bluescreens.
>> Mainly when I am using the perl debugger.
>>
>> I am using Perl v5.6.1
>> NT 4.00.1381
>> Norton Antivirus 7.61.934
>>
Reapply service pack 6a
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 5312
***************************************