[15621] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3034 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri May 12 21:10:33 2000

Date: Fri, 12 May 2000 18:10:14 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <958180214-v9-i3034@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Fri, 12 May 2000     Volume: 9 Number: 3034

Today's topics:
    Re: Regex Question (hopefully, an educated guess) mcnuttj@missouri.edu
    Re: Regular expression ? <thunderbear@bigfoot.com>
    Re: Regular expression ? <jeff@vpservices.com>
    Re: regular expression (Abigail)
    Re: regx question (dc)
    Re: Running perl code thru crontab <s2mdalle@titan.vcu.edu>
        SNMP::Session mcnuttj@missouri.edu
    Re: split the big file <s2mdalle@titan.vcu.edu>
    Re: split the big file <lr@hpl.hp.com>
        two hashes to one <jwboer@NOSPAM.chriscom.nl>
    Re: two hashes to one <lr@hpl.hp.com>
    Re: two hashes to one <jwboer@NOSPAM.chriscom.nl>
        upgrading mparker200@my-deja.com
    Re: upgrading (Brandon Metcalf)
        Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 12 May 2000 23:49:15 GMT
From: mcnuttj@missouri.edu
Subject: Re: Regex Question (hopefully, an educated guess)
Message-Id: <8fi59r$4ok$1@dipsy.missouri.edu>

Tad McClellan <tadmc@metronet.com> wrote:

:>@TEMP = snmpwalk($community, $ip, $mib);  # snmpwalk is in a sub I wrote
:>SYSTEM: foreach ( @TEMP ) {

: You know that you don't need @TEMP at all, don't you?
:    SYSTEM: foreach ( snmpwalk($community, $ip, $mib) ) {

:>	if ( /^sysDescr.+ : (.*)$/ ) {

I tried that.  The 'snmpwalk' function got called anew at each pass
through the loop, thus the first line of the output was processed ad
infinitum.

:>So here's the question:  Is that regular expression a good one?  
: If it works it is "a good one".

<grin>

:>More
:>specifically:
:>
:>1)  How do the ^ and $ requirements affect the speed?  Faster or slower?

: Faster (generally). They reduce the number of alternatives 
: for the regex engine to consider.

: And since perl matches patterns left-to-right, anchoring to
: the beginning of the string is (nearly?) always a big gain.

Yes!  <arm pumping>  That's the mindset I've been using.  The more
specific, the faster the pattern.  As few .+'s as possible, etc.

:>2)  How do the specific strings affect the speed?  If the code around it
:>is done properly, I could, for example, use / : (.*)$/ and it would work
:>just fine, but is that slower than using the far-more-specific regex I
:>have above?

: I dunno.

: Why don't you benchmark it and find out for yourself with
: your real data (we can't do it for you 'cause we don't
: have your data)?

:    perldoc Benchmark

Yeah.  Found out about that yesterday.  I'm going to try that.  The
project has taken a huge turn for the better in a more important area (see
below).

:>3)  If won't work in all cases, 

: Show us some of those.

: Or even one of those.

<grin>  Typo.  'If' == 'It'.  What I meant was that I can't always use
split with the pattern mentioned below.

:>but someone mentioned before that 'split'
:>might work better in a case like this.  I could split on /\s+:\s+/, since
:>the colon will never appear on the "left" side of the string (it might
:>appear on the "right", 

: Have you considered the implications of using the 3rd argument
: to split?

: If you set it to 2, you get everything to the left of the first
: colon as a list element, and all of the rest of the string in
: the other list element:

:    my($left, $right) = split /\s+:\s+/, $_, 2;

This is the LIMIT argument, right?  Yet another thing I've learned today.
Boy, the last two weeks have been educational.  :-P

:>but by then, the pattern is already matched).  As
:>long as I don't use s///g, it'll only match once, and I'll be all right,
:>right?

: You never want to use s/// or s///g or even m//g with split().

: It will not match only once. split() keeps applying it over
: and over to generate its return list.

Ahhh...  I'm glad I haven't tried it yet.  If I go with split(), I will
*have* to use the LIMIT arg.

:>4)  How much do (\w+), (.+), (\s+), (\d+), etc. slow things down?  

: Ummm, compared to what?

Well, mostly compared to each other.  I imagine it depends upon the data,
though, and the placement of those regexes in the pattern.  I'll just have
to use Benchmark and see where I get.  This was rather a dumb question.

:>I avoid
:>the use of things like (.*) (the code shown above is an exception), but
:>what about the + quantifier and the parentheses?


: + is likely faster than *

: .* can match zero times, so if the engine doesn't match any
: chars at that position, it has to keep trying to match.

: But if it can't match any characters at the position corresponding 
: to .+, it can stop right there and return false.

That's what I thought.  Thanks!

: parens slow things down due to copying the matched chars to
: memory, so don't use "memory parens":

:    
:      if ( /^sysDescr.+ : (?:.*)$/ ) {
:                          ^^^  ^
:                          ^^^  ^ does grouping without memory

: Of course that isn't going to help if you later need the
: matched chars...

I haven't had to use grouping yet where I haven't later needed the data
anyway.  In other words, if (\.\.\.) is there, I need $1 = '...'.

I had to write that sentence very carefully.  :-)

:>Speed is the objective here.  

: So you have already profiled your code then?

Of course not!  <silly grin>  I don't know how yet.

<deep breath, apologetic look>  I'm learning as quick as I can, guys.  How
do I profile it (http://what.where.[org|net|com|edu])?  :-)

: And you know for a fact that it is this part of your code
: that is slow?

Actually, no.  I'm fairly sure that the patterns I have written are fast,
especially given your answers above.  The slowest part of my code is an
external program (snmpwalk - part of scotty Tcl/Tk tools) that I call
rather often.  More below...

I just wanted to see what I could improve while I was learning how to 'use
SNMP' (I love perl that's grammatically correct in English).

:>Ideas and comments welcome.  

: See above.

Many thanks!

I have - at last! - figured out the SNMP module included with ucd-snmp (I
couldn't download it from CPAN for some reason).  I'll post my next
question in a moment.  It's finally a halfway intelligent, and very
specific question.

:>Flames can be
:>sent to /dev/null  

: Oh no!

: I gotta start reading the complete article before I decide
: to answer.

: If I had seen that earlier I would have killfiled you and moved
: on. But since I have it typed in already, I'll only do one
: of those things.

: So long.

: [ People say that when they know they are doing something
:   wrong. I find apologizing for what you're about to do,
:   and then doing it anyway to be most offensive.

:   It seems strange to me that you included it, because you
:   seemed to have followed netiquette as far as I can tell...
: ]

:>Consider the various
:>'perlfaq' man pages thumbed through, but not *combed* through.  :-)

: That's all that 'netiquette requires, so you had nothing to fear.

<ponders>

Ahhh...  I didn't realize it might be taken that way.  Was just being
silly.  The "flames" comment and the line "#include <std.dsclmr>" used to
be in my stock tagline.

<shrug>  Netiquette is different from NG to NG.  You can get away with
stuff in comp.os.linux.* that you can't even consider in comp.lang.perl.
I can see both sides, really, but I tend to side with the newbie, only
because I have *been* the newbie so often.

And, as you see, a newbie willing to learn can go from "What's a module?"
to "Where are the specs for SNMP::Session::getnext?" in a week or two, IF
s/he can get some help.

Which I did.  Gratuitous thanks to the people who explained 'perldoc' and
CPAN to me.  The 'M' in 'RTFM' got a lot thicker yesterday.  :-)

Later...

--J


------------------------------

Date: Sat, 13 May 2000 00:08:25 +0200
From: =?iso-8859-1?Q?Thorbj=F8rn?= Ravn Andersen <thunderbear@bigfoot.com>
Subject: Re: Regular expression ?
Message-Id: <391C80D9.7B2F9CB7@bigfoot.com>

"Godzilla!" wrote:

> paying attention. As you know, I often say,
> 
> "...read for comprehension."

This works the best, when the writer writes for clarity.

> Would you expect less from an English professor?

No.  Please elucidate us with brilliant, clear code.  Please stay with
standard 7-bit ASCII.


-- 
  Thorbjørn Ravn Andersen          "... plus .. Tubular Bells!"
  http://www.mip.sdu.dk/~ravn/


------------------------------

Date: Fri, 12 May 2000 18:02:32 -0700
From: Jeff Zucker <jeff@vpservices.com>
Subject: Re: Regular expression ?
Message-Id: <391CA9A8.51BC00F1@vpservices.com>

"Godzilla!" wrote:
> 
> > > 305¦0_0¦2.01¦A
> 
> 
> What you are looking at is not a pipe symbol.
> 

Fine, if that is the deepest level at which you are willing to discuss
your own code, there really is no point in discussing it with you.  I
have tried several times in several ways to open a dialogue with you and
you consistently refuse to respond with anything but obfuscation and
misleading counter arguments that have no bearing on the original
criticism.  I give up.  Goodbye.

-- 
Jeff


------------------------------

Date: 12 May 2000 22:30:37 GMT
From: abigail@foad.org (Abigail)
Subject: Re: regular expression
Message-Id: <slrn8hp1gb.bgd.abigail@ucan.foad.org>

On Thu, 11 May 2000 11:07:42 +0200, Nils <ii4533@fh-wedel.de> wrote:
++ Hi!
++ I am looking for an regular expression which matches something like
++ this:
++  <!-- Back:Text -->
++ :Text might ba anything excepted '-->' (somithing like ":this is a link
++ <a href=url> go here </a>' SHOULD be a valid expression for Text! also
++ if :Text is not present should be valid)
++ 
++ the expression I've got at moment is :
++ /<!--\s*Back(:.*)?\s*-->/
++ my problem is that (:.*)? also matches '-->' but I have no idea how to
++ say that it should NOT match '-->"
++ 
++ anyone an idea? thanks nils

One of the many ways:

  /^<!--\s*Back:?([^-]+|-(?!->))*\s*-->$/;



Abigail


------------------------------

Date: Sat, 13 May 2000 00:15:41 GMT
From: ANTISPAMfartknocker@cyberdude.com (dc)
Subject: Re: regx question
Message-Id: <391c9cf3.448740624@24.2.2.74>

Thanks for your suggestions.  Anchoring made a pretty big difference.
Surprisingly, at least to me, using the mini-matches was actually
slower by a very small degree.  I'm pretty sure this has to do with
the test data I ran it on -- /usr/dict/words under Solaris and Redhat.
I used the Benchmark perl module (using the example in _Effective Perl
Programming_, and ran them both through the words file 100 times.  I
even switched the order of testing, though I'm not sure that makes a
difference (judging from the times it did not).

>>>>>> "Bart" == Bart Lateur <bart.lateur@skynet.be> writes:
>
>Bart> At least, anchoring the regex will improve speed a lot in the case of
>Bart> failure, because it prevents just that backtracking.
>
>Bart> 	/^(?=.*a)(?=.*e)(?=.*i)(?=.*o).*u/
>

On 11 May 2000 17:08:32 -0700, merlyn@stonehenge.com (Randal L.
Schwartz) wrote:
>
>No, the better you can do is to use mini-matches, not maxi-matches:
>
>/^(?=.*?a)(?=.*?e)(?=.*?i)(?=.*?o).*?u/
>
>That way you find the first a, not the last one!
>


   Anti-Spam Address in Use.  Remove the AntiSpam
    in above address before replying to this
    message.  


------------------------------

Date: Fri, 12 May 2000 17:21:27 -0500
From: "David Allen" <s2mdalle@titan.vcu.edu>
Subject: Re: Running perl code thru crontab
Message-Id: <8fi08l$347$2@bob.news.rcn.net>

In article <8fhfaj$btm$1@nnrp1.deja.com>, vpanicker@my-deja.com wrote:
> I am facing problems running a perl code thru the crontab , as my env
> variable LD_LIBRARY_PATH is not getting set, which is not letting me
> call shared libraries. It seems %ENV in perl passes impoverished env.
> setting in cronjobs.

Yeah, it's a bitch.  :(

> Can anyone suggest a way out on this. Thanx With regards vinod

A couple of things - you can RTFM on cron, I'll bet it has a
configuration file that lets you set what the environment of executed
crontjobs should be.

You could add whatever was in your LD_LIBRARY_PATH to your
usual system library configuration file.  this is often /etc/ld.so.conf in
Linux anyway...

You could write a wrapper script and execute that instead of your perl
job.  I.e. if you want to run 'foobar.pl' every minute, make your cron
entry:

* * * * * /path/to/foobar.wrapper

and then make foobar.wrapper be:

#!/bin/sh
export LD_LIBRARY_PATH=whatever
perl foobar.pl

-- 
David Allen
http://opop.nols.com/
----------------------------------------
Firearms are second only to the Constitution in importance; they are the 
peoples' liberty's teeth. 
	-- George Washington



------------------------------

Date: 13 May 2000 00:34:39 GMT
From: mcnuttj@missouri.edu
Subject: SNMP::Session
Message-Id: <8fi7uv$add$1@dipsy.missouri.edu>

Okay, here's (finally) a decent question:

Where can I find some English docs/examples, in man, perldoc, or HTML
form, for the SNMP module?  Here's what I'm trying to do:

$SNMP::use_sprint_value = 1;
$sess = new SNMP::Session(DestHost => $ARGV[0], Community => $ARGV[1]);
$val = 'true';
while ( $val eq 'true' ) {
        $mib = 'rcStgPortEnableStp';
        $val = $sess->getnext($mib);                
        print "Got $val.\n";
}

Using snmpwalk (scotty) to walk the MIB shows its form to be:

rcStgPortEnableStp.1.1 : <value>
rcStgPortEnableStp.2.1 : <value>
rcStgPortEnableStp.3.1 : <value>
rcStgPortEnableStp.4.1 : <value>
 .
 .
rcStgPortEnableStp.25.1 : <value>
rcStgPortEnableStp.33.1 : <value>
rcStgPortEnableStp.34.1 : <value>
rcStgPortEnableStp.35.1 : <value>

The first \d+ value is a "port ID" that I know how to translate.  I don't
know where the second value (1) comes from, except that the MIB suggests
something about all the ports living in Spanning Tree Group 1 (STG).
Given the context, this makes sense to *me*, *however*...

I want to know how to teach perl to get those values and query the *next*
one, regardless of the fact that they aren't sequential.  This would be a
huge kludge:

for ( my $x=0 ; $x < MAXPORTID ; $x++ ) { ... }

Besides, the code at the top *runs*, but it runs *infintely*, always
querying "rcStgPortEnableStp.1.1" (which infinitely returns "true" in my
case).

SO...  How do I find out how to adjust the MIB so that it queries the next
sensible value?

Thanks!

--J


------------------------------

Date: Fri, 12 May 2000 17:28:15 -0500
From: "David Allen" <s2mdalle@titan.vcu.edu>
Subject: Re: split the big file
Message-Id: <8fi0le$6rs$1@bob.news.rcn.net>

In article <2089d0c2.02b7e7ed@usw-ex0104-087.remarq.com>, Samay
<samay1NOsaSPAM@hotmail.com.invalid> wrote:
> Hi, I have log file with size around 500 MB. I would like to divide into
> small files based upon the particular word it contains.. What are the
> effective ways?
> 
> I am looking for effective ways for code and performance. My patterns
> are simple strings. such as 'japan' or '/america/newyork/'
> 
> I did.. open FILE1, ">file1"; open FILE2, ">file2";
> .
> open FILE50,">file50";
>
> while(<IN>){
>    if(/pattern1/){
>       print FILE1; next;
>    }
>    if(/pattern2/){
>       print FILE2; next
> 
> ..for 50 patterns..
> }

If you want to match 50 patters against a schload of lines, you should
precompile them and shove them in an array that you can run through
really quickly.  If you need to know which regex matched, then maybe
use a hash or something...maybe a bit like this:

my %REGEXS = ( "Match Numbers" => "\d",
                          "Match anything" => ".");

while(<FILE>){
    foreach $key (keys %REGEXS){
	if(m/$REGEXS{$key}/g){
             print "$key matched.\n";
        }
    }
}

You'll get a performance boost by compiling the regular expressions once instead
of each time through the loop.

> 
> This will give me average time.. I am looking for something better.
> Memory could be issue, space is not an issue..

A thought on how to find split points is maybe to use tell() to find out how many
bytes into a file you are rather than counting lines or what not.  When you found
a split point, get the position with tell() and then use a subroutine that opens the
file again, and reads only the data between the last tell point and the current
tell point and saves that into a separate file.  (Do it in 1024 byte chunks or 
whatever and NOT by lines so it goes quickly and doesn't munch too much memory)

Hope it helps...
-- 
David Allen
http://opop.nols.com/
----------------------------------------
Firearms are second only to the Constitution in importance; they are the 
peoples' liberty's teeth. 
	-- George Washington



------------------------------

Date: Fri, 12 May 2000 16:07:56 -0700
From: Larry Rosler <lr@hpl.hp.com>
Subject: Re: split the big file
Message-Id: <MPG.13862ea9299f819e98aa6c@nntp.hpl.hp.com>

In article <8fi0le$6rs$1@bob.news.rcn.net> on Fri, 12 May 2000 17:28:15 
-0500, David Allen <s2mdalle@titan.vcu.edu> says...

 ...

> If you want to match 50 patters against a schload of lines, you should
> precompile them and shove them in an array that you can run through
> really quickly.  If you need to know which regex matched, then maybe
> use a hash or something...maybe a bit like this:
> 
> my %REGEXS = ("Match Numbers"  => "\d",
>               "Match anything" => ".");

Perl 5.6.0 with warnings enabled will tell you what you did wrong on the 
first line.  Hurray!
 
> while(<FILE>){
>     foreach $key (keys %REGEXS){
> 	if(m/$REGEXS{$key}/g){
>              print "$key matched.\n";
>         }
>     }
> }
> 
> You'll get a performance boost by compiling the regular expressions once instead
> of each time through the loop.

Yes, but your code doesn't compile them (even after correction to single 
quotes).  You need 'regex' quotes:

  my %REGEXS = ("Match Numbers"  => qr/\d/,
                "Match anything" => qr/./);

 ...

-- 
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: Sat, 13 May 2000 01:26:03 +0200
From: "Jan Willem Boer" <jwboer@NOSPAM.chriscom.nl>
Subject: two hashes to one
Message-Id: <fq0T4.28$b3.1797@24hoursnet-reader-1>

Hi,
i am new to perl.

I try to make one hash out of 2. This code doesn't seem to work:

    $numberOfRecords = push(%database, %addRecordInfo);

How should i do this?

Thanks,
Jan Willem




------------------------------

Date: Fri, 12 May 2000 17:06:28 -0700
From: Larry Rosler <lr@hpl.hp.com>
Subject: Re: two hashes to one
Message-Id: <MPG.13863c5c48c7940898aa6f@nntp.hpl.hp.com>

In article <fq0T4.28$b3.1797@24hoursnet-reader-1> on Sat, 13 May 2000 
01:26:03 +0200, Jan Willem Boer <jwboer@NOSPAM.chriscom.nl> says...
> Hi,
> i am new to perl.
> 
> I try to make one hash out of 2. This code doesn't seem to work:
> 
>     $numberOfRecords = push(%database, %addRecordInfo);
> 
> How should i do this?

Permit me not to use your names, especially the one with the studlyCaps.

This makes a new hash out of two:

    %new = (%data, %add);

This adds the contents of a second hash to an existing hash:

    %data = (%data, %add); # SLOW; don't do it this way!

This uses the least memory:
 
    while (my ($key, $value) = each %add) { $data{$key} = $value }

This is likely to be faster, but uses more memory:

    @data{keys %add} = values %add; # I like this hash slice best.

In each case, if there are duplicate keys, the second one prevails.

-- 
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: Sat, 13 May 2000 02:26:40 +0200
From: "Jan Willem Boer" <jwboer@NOSPAM.chriscom.nl>
Subject: Re: two hashes to one
Message-Id: <4j1T4.29$b3.1792@24hoursnet-reader-1>

"Larry Rosler" <lr@hpl.hp.com> schreef in bericht
news:MPG.13863c5c48c7940898aa6f@nntp.hpl.hp.com...

> This makes a new hash out of two:
>
>     %new = (%data, %add);
>
> This adds the contents of a second hash to an existing hash:
>
>     %data = (%data, %add); # SLOW; don't do it this way!
>
> This uses the least memory:
>
>     while (my ($key, $value) = each %add) { $data{$key} = $value }
>
> This is likely to be faster, but uses more memory:
>
>     @data{keys %add} = values %add; # I like this hash slice best.
>
> In each case, if there are duplicate keys, the second one prevails.


wow thanks!

i think the first options (although slow) may work for me. The problem is
that $value in your example is another hash. So: the %add hash looks like
(1, {"name", "mrLuup", "address", "Fastlane"}, 2, { etc})

or doesn't that make difference?

JW




------------------------------

Date: Fri, 12 May 2000 22:18:56 GMT
From: mparker200@my-deja.com
Subject: upgrading
Message-Id: <8fi002$vef$1@nnrp1.deja.com>

How can I tell which perl modules have been installed on a Solaris
system?  I think that they should be installed in site_perl but that is
not always the case.

Thanks for your help


Sent via Deja.com http://www.deja.com/
Before you buy.


------------------------------

Date: 12 May 2000 23:10:06 GMT
From: bmetcalf@baynetworks.com (Brandon Metcalf)
Subject: Re: upgrading
Message-Id: <8fi30e$jeb$1@spinner.corpeast.baynetworks.com>

mparker200@my-deja.com writes:

 > How can I tell which perl modules have been installed on a Solaris
 > system?  I think that they should be installed in site_perl but that is
 > not always the case.

perldoc perllocal

-brandon


------------------------------

Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V9 Issue 3034
**************************************


home help back first fref pref prev next nref lref last post