[17166] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4578 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Oct 11 00:10:31 2000

Date: Tue, 10 Oct 2000 21:10:12 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <971237411-v9-i4578@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Tue, 10 Oct 2000     Volume: 9 Number: 4578

Today's topics:
        regex challenge <martini@invision.net>
    Re: regex challenge <wyzelli@yahoo.com>
    Re: regex challenge <wyzelli@yahoo.com>
    Re: regex challenge (Martien Verbruggen)
    Re: regex challenge <wyzelli@yahoo.com>
    Re: regex challenge <godzilla@stomp.stomp.tokyo>
    Re: regex challenge <james@NOSPAM.demon.co.uk>
    Re: regex challenge (Martien Verbruggen)
    Re: regex challenge <uri@sysarch.com>
    Re: regex challenge <uri@sysarch.com>
    Re: regex challenge <jeffp@crusoe.net>
    Re: regex challenge (Martien Verbruggen)
    Re: regex challenge <stephenk@cc.gatech.edu>
        system processes again <nmr_boy@hotmail.com>
    Re: system processes again <bwalton@rochester.rr.com>
    Re: TCP Servers with IO::Socket <uri@sysarch.com>
    Re: UltraNewbie: grepping weblog <lr@hpl.hp.com>
    Re: Writing multple lines to a file at once? (Brett W. McCoy)
        Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 10 Oct 2000 22:04:15 -0400
From: Matt Martini <martini@invision.net>
Subject: regex challenge
Message-Id: <39E3CA9F.21ABE4D0@invision.net>

I am looking for a regex (an elegant one) that will convert a mac
address from one form to another.
The SNMP module returns mac addresses in the format:

    " AB 21 34 65 78 09 "

Yes, including the quotes and leading/trainling spaces.  I want to
convert this into a more standard
notation for mac addresses:

    AB21:3465:7809

Now, this is trivial with:

$newval =~
s/"\s([0-9a-f]+)\s([0-9a-f]+)\s([0-9a-f]+)\s([0-9a-f]+)\s([0-9a-f]+)\s([0-9a-f]+)\s"/$1$2:$3$4:$5$6/;

but that is yucky.  Even:

$newval =~ s/([0-9a-f]+)\s([0-9a-f]+)\s/$1$2:/g;
$newval =~ s/".|."/g;

is better but not perlish.   I'm looking fo something that is more
elegant.  The hard part for
me is how do I change every other space into a colon?

Thanks,
Matt

PS email to matt@invision.net requested.

--

___________________________http://www.invision.net/________________________

 Matthew E. Martini, PE        InVision, LLC.       (516) 543-1000
 Chief Technology Officer      matt@invision.net    (516) 864-8896 Fax
_______________________________________________________________________pgp_





------------------------------

Date: Wed, 11 Oct 2000 12:00:59 +0930
From: "Wyzelli" <wyzelli@yahoo.com>
Subject: Re: regex challenge
Message-Id: <LeQE5.10$Dr4.2413@vic.nntp.telstra.net>

"Matt Martini" <martini@invision.net> wrote in message
news:39E3CA9F.21ABE4D0@invision.net...
> I am looking for a regex (an elegant one) that will convert a mac
> address from one form to another.
> The SNMP module returns mac addresses in the format:
>
>     " AB 21 34 65 78 09 "
>
> Yes, including the quotes and leading/trainling spaces.  I want to
> convert this into a more standard
> notation for mac addresses:
>
>     AB21:3465:7809
>

One option is to remove all the spaces and quotes, and then break it
into three sets of 4 'anythings'.

tr/ "//d;
s/(.{4})(.{4})(.{4})/$1:$2:$3/;

If you wish to be particular about what you want to be allowed in the
character places than you will have to be tighter in the regex, which
will make it longer.

tr/ "//d;
s/([0-9A-F]{4})([0-9A-F]{4})([0-9A-F]{4})/$1:$2:$3/;

But that is getting back to where you were...

It depends a lot on how tight your data is that you are going to feed to
the regex.



Wyzelli


--
push@x,$_ for(a..z);push@x,' ';
@z='092018192600131419070417261504171126070002100417'=~/(..)/g;
foreach $y(@z){$_.=$x[$y]}y/jp/JP/;print;






------------------------------

Date: Wed, 11 Oct 2000 12:06:53 +0930
From: "Wyzelli" <wyzelli@yahoo.com>
Subject: Re: regex challenge
Message-Id: <ikQE5.11$Dr4.2717@vic.nntp.telstra.net>

"Wyzelli" <wyzelli@yahoo.com> wrote in message
news:LeQE5.10$Dr4.2413@vic.nntp.telstra.net...
>
> tr/ "file://d;
> s/(.{4})(.{4})(.{4})/$1:$2:$3/;

I don't know how that comes across for you but Outlook keeps changing
the tr command.

It is actually tr| "||d where the | is actually a /.

Wyzelli
--
@x='07411711511603209711011111610410111403208010111410803210409709910710
1114'=~/(...)/g;
print chr for @x;




------------------------------

Date: Wed, 11 Oct 2000 02:38:42 GMT
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: regex challenge
Message-Id: <slrn8u7kl9.8q8.mgjv@verbruggen.comdyn.com.au>

On Wed, 11 Oct 2000 12:00:59 +0930,
	Wyzelli <wyzelli@yahoo.com> wrote:
> "Matt Martini" <martini@invision.net> wrote in message
> news:39E3CA9F.21ABE4D0@invision.net...
> >
> >     " AB 21 34 65 78 09 "
> >
> > Yes, including the quotes and leading/trainling spaces.  I want to
> > convert this into a more standard
> > notation for mac addresses:
> >
> >     AB21:3465:7809
[snip]
> 
> tr/ "//d;
> s/([0-9A-F]{4})([0-9A-F]{4})([0-9A-F]{4})/$1:$2:$3/;

You can make this slightly shorter:

tr/ "//d;
1 while s/([A-F0-9]+)([A-F0-9]{4})/$1:$2/;

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | Useful Statistic: 75% of the people
Commercial Dynamics Pty. Ltd.   | make up 3/4 of the population.
NSW, Australia                  | 


------------------------------

Date: Wed, 11 Oct 2000 12:24:16 +0930
From: "Wyzelli" <wyzelli@yahoo.com>
Subject: Re: regex challenge
Message-Id: <AAQE5.13$Dr4.2789@vic.nntp.telstra.net>

"Martien Verbruggen" <mgjv@tradingpost.com.au> wrote in message
news:slrn8u7kl9.8q8.mgjv@verbruggen.comdyn.com.au...

> 1 while s/([A-F0-9]+)([A-F0-9]{4})/$1:$2/;

I find the use of the + in the first capture very interesting.  I played
around with that a bit and think it very clever.

My brain copes better with having the size specified in the first group
and the + in the second, but both seem to work quite well.  Another
trick to add to my collection. :)

1 while s/([A-F0-9]{4})([A-F0-9]+)/$1:$2/;

I can almost understand how that works! <grin>

Actually, would that involve less backtracking?  Maybe a benchmark is in
order. (going to fiddle some more....)

Wyzelli
--
#Modified from the original by Jim Menard
for(reverse(1..100)){$s=($_!=1)? 's':'';print"$_ bottle$s of beer on the
wall,\n";
print"$_ bottle$s of beer,\nTake one down, pass it around,\n";
$_--;$s=($_==1)?'':'s';print"$_ bottle$s of beer on the
wall\n\n";}print'*burp*';





------------------------------

Date: Tue, 10 Oct 2000 19:58:06 -0700
From: "Godzilla!" <godzilla@stomp.stomp.tokyo>
Subject: Re: regex challenge
Message-Id: <39E3D73E.431BA3AF@stomp.stomp.tokyo>

Matt Martini wrote:

(snipped)

> The SNMP module returns mac addresses in the format:
 
>     " AB 21 34 65 78 09 "

> convert this into a more standard notation for mac addresses:

>     AB21:3465:7809
TEST SCRIPT:
____________


#!/usr/local/bin/perl

print "Content-Type: text/plain\n\n";

$in = '" AB 21 34 65 78 09 "';

print "Input:\n  $in\n\n";

$in =~ tr/ "//d;

$out = "${\substr($in, 0, 4)}:${\substr($in, 4, 4)}:${\substr($in, 8, 4)}";

print "Output:\n  $out";

exit;


PRINTED RESULTS:
________________

Input:
  " AB 21 34 65 78 09 "

Output:
  AB21:3465:7809


Godzilla!


------------------------------

Date: Wed, 11 Oct 2000 03:58:23 +0100
From: James Taylor <james@NOSPAM.demon.co.uk>
Subject: Re: regex challenge
Message-Id: <ant1102231cbfNdQ@oakseed.demon.co.uk>

In article <39E3CA9F.21ABE4D0@invision.net>, Matt Martini wrote:
> Even:
> 
> $newval =~ s/([0-9a-f]+)\s([0-9a-f]+)\s/$1$2:/g;
> $newval =~ s/".|."/g;
> 
> is better but not perlish.

If two statements is okay, are these neat enough for you:

$newval =~ s/\W//g;               # Removes the spaces and quotes
$newval =~ s/\w{4}\B/$&:/g;       # Inserts the colons

If the style of input data can be relied upon then \w is visually
neater than [0-9a-fA-F].

-- 
James Taylor <james (at) oakseed demon co uk>
PGP key available ID: 3FBE1BF9
Fingerprint: F19D803624ED6FE8 370045159F66FD02



------------------------------

Date: Wed, 11 Oct 2000 03:13:18 GMT
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: regex challenge
Message-Id: <slrn8u7mm5.8q8.mgjv@verbruggen.comdyn.com.au>

On Wed, 11 Oct 2000 12:24:16 +0930,
	Wyzelli <wyzelli@yahoo.com> wrote:
> "Martien Verbruggen" <mgjv@tradingpost.com.au> wrote in message
> news:slrn8u7kl9.8q8.mgjv@verbruggen.comdyn.com.au...
> 
> > 1 while s/([A-F0-9]+)([A-F0-9]{4})/$1:$2/;
> 
> I find the use of the + in the first capture very interesting.  I played
> around with that a bit and think it very clever.

Added to my toolset a while ago, based on the FAQ entry on how to add
commas to numbers :) 

> My brain copes better with having the size specified in the first group
> and the + in the second, but both seem to work quite well.  Another
> trick to add to my collection. :)
> 
> 1 while s/([A-F0-9]{4})([A-F0-9]+)/$1:$2/;

Yes, that does the same in this case. If, however, the string length
isn't necessarily a whole multiple of the group you create, there is a
difference.

Try both with 'ABCD1234DCBA56'

> I can almost understand how that works! <grin>
> 
> Actually, would that involve less backtracking?  Maybe a benchmark is in
> order. (going to fiddle some more....)

I doubt there will be much, if any, difference. However, if you're
interested in speed, your original solution is going to be much faster
:). I was more interested in economy of keystrokes.

use re 'debug' shows they do pretty much the same thing, in a
different order.

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | Failure is not an option. It comes
Commercial Dynamics Pty. Ltd.   | bundled with your Microsoft product.
NSW, Australia                  | 


------------------------------

Date: Wed, 11 Oct 2000 03:14:41 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: regex challenge
Message-Id: <x73di41334.fsf@home.sysarch.com>

>>>>> "MV" == Martien Verbruggen <mgjv@tradingpost.com.au> writes:

  MV> On Wed, 11 Oct 2000 12:00:59 +0930,
  MV> 	Wyzelli <wyzelli@yahoo.com> wrote:
  >> "Matt Martini" <martini@invision.net> wrote in message
  >> news:39E3CA9F.21ABE4D0@invision.net...
  >> >
  >> >     " AB 21 34 65 78 09 "
  >> >
  >> > Yes, including the quotes and leading/trainling spaces.  I want to
  >> > convert this into a more standard
  >> > notation for mac addresses:
  >> >
  >> >     AB21:3465:7809
  MV> [snip]
  >> 
  >> tr/ "//d;
  >> s/([0-9A-F]{4})([0-9A-F]{4})([0-9A-F]{4})/$1:$2:$3/;

  MV> You can make this slightly shorter:

  MV> tr/ "//d;
  MV> 1 while s/([A-F0-9]+)([A-F0-9]{4})/$1:$2/;

bleh! let's cheat a little:

	s/(\w\w) (\w\w) /$1$2:/g ; chop ;

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  -----------  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  ----------  http://www.northernlight.com


------------------------------

Date: Wed, 11 Oct 2000 03:22:24 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: regex challenge
Message-Id: <x7y9zwysd0.fsf@home.sysarch.com>

>>>>> "G" == Godzilla!  <godzilla@stomp.stomp.tokyo> writes:

  G> Matt Martini wrote:
  G> (snipped)

you snipped the important part and didn't read the subject. typical lack
of reading comprehension from moronzilla.


From: Matt Martini <martini@invision.net>
Subject: regex challenge
Newsgroups: comp.lang.perl.misc
Date: Tue, 10 Oct 2000 22:04:15 -0400
Organization: Posted via Supernews, http://www.supernews.com
Reply-To: martini@invision.net
Path: typhoon.ne.mediaone.net!chnws05.ne.mediaone.net!24.147.2.43!chnws02.mediaone.net!newsfeed2.skycache.com!newsfeed.skycache.com!Cidera!newsfeed.direct.ca!look.ca!logbridge.uoregon.edu!sn-xit-03!sn-post-01!supernews.com!corp.supernews.com!not-for-mail
Message-ID: <39E3CA9F.21ABE4D0@invision.net>
X-Mailer: Mozilla 4.75 (Macintosh; U; PPC)
X-Accept-Language: en,en-US
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; x-mac-type="54455854"; x-mac-creator="4D4F5353"
Content-Transfer-Encoding: 7bit
X-Complaints-To: newsabuse@supernews.com
Lines: 41
Xref: chnws05.ne.mediaone.net comp.lang.perl.misc:277593

I am looking for a regex (an elegant one) that will convert a mac
address from one form to another.

see his request? did you read it? can you understand it?

  G> $in =~ tr/ "//d;

  G> $out = "${\substr($in, 0, 4)}:${\substr($in, 4, 4)}:${\substr($in, 8, 4)}";

can anyone find the regular expression in the above code?

so we have an code which doesn't answer the question, use of a dubious
operation (which is not legal in perl4, moronzilla's favorite dead
camel) and the longest answer as well.

3 strikes. now get out of this newsgroup and go leanr python. they need
you!!

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  -----------  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  ----------  http://www.northernlight.com


------------------------------

Date: Tue, 10 Oct 2000 23:32:04 -0400
From: Jeff Pinyan <jeffp@crusoe.net>
Subject: Re: regex challenge
Message-Id: <Pine.GSO.4.21.0010102320470.14163-100000@crusoe.crusoe.net>

[posted & mailed]

On Oct 10, Matt Martini said:

>The SNMP module returns mac addresses in the format:
>
>    " AB 21 34 65 78 09 "
>
>Yes, including the quotes and leading/trainling spaces.  I want to
>convert this into a more standard
>notation for mac addresses:
>
>    AB21:3465:7809

Depending how exact the MAC address is formatted, this should work:

  ($addr = substr($MAC,2,-2)) =~ s/ /('',':')[$i++%2]/eg;

That alternates between '' and ':' for replacements for spaces.

-- 
Jeff "japhy" Pinyan     japhy@pobox.com     http://www.pobox.com/~japhy/
PerlMonth - An Online Perl Magazine            http://www.perlmonth.com/
The Perl Archive - Articles, Forums, etc.    http://www.perlarchive.com/
CPAN - #1 Perl Resource  (my id:  PINYAN)        http://search.cpan.org/





------------------------------

Date: Wed, 11 Oct 2000 03:36:47 GMT
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: regex challenge
Message-Id: <slrn8u7o26.8q8.mgjv@verbruggen.comdyn.com.au>

On Wed, 11 Oct 2000 03:14:41 GMT,
	Uri Guttman <uri@sysarch.com> wrote:
> >>>>> "MV" == Martien Verbruggen <mgjv@tradingpost.com.au> writes:
> 
>   MV> On Wed, 11 Oct 2000 12:00:59 +0930,
>   MV> 	Wyzelli <wyzelli@yahoo.com> wrote:
>   >> "Matt Martini" <martini@invision.net> wrote in message
>   >> news:39E3CA9F.21ABE4D0@invision.net...
>   >> >
>   >> >     " AB 21 34 65 78 09 "
>   >> >
>   >> > Yes, including the quotes and leading/trainling spaces.  I want to
>   >> > convert this into a more standard
>   >> > notation for mac addresses:
>   >> >
>   >> >     AB21:3465:7809
>   MV> [snip]
>   >> 
>   >> tr/ "//d;
>   >> s/([0-9A-F]{4})([0-9A-F]{4})([0-9A-F]{4})/$1:$2:$3/;
> 
>   MV> You can make this slightly shorter:
> 
>   MV> tr/ "//d;
>   MV> 1 while s/([A-F0-9]+)([A-F0-9]{4})/$1:$2/;
> 
> bleh! let's cheat a little:
> 
> 	s/(\w\w) (\w\w) /$1$2:/g ; chop ;

Heh, good one. You need to cheat a little more though :)

s/(" )?(\w\w) (\w\w) "?/$2$3:/g; chop;

or 

s/(?:" )?(\w\w) (\w\w) "?/$1$2:/g; chop;

Since 

$_ = '" AB 21 34 65 78 09 "';

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | 42.6% of statistics is made up on the
Commercial Dynamics Pty. Ltd.   | spot.
NSW, Australia                  | 


------------------------------

Date: Tue, 10 Oct 2000 23:37:00 -0400
From: Stephen Kloder <stephenk@cc.gatech.edu>
Subject: Re: regex challenge
Message-Id: <39E3E05C.1413FB13@cc.gatech.edu>

Matt Martini wrote:

> I am looking for a regex (an elegant one) that will convert a mac
> address from one form to another.
> The SNMP module returns mac addresses in the format:
>
>     " AB 21 34 65 78 09 "
>
> Yes, including the quotes and leading/trainling spaces.  I want to
> convert this into a more standard
> notation for mac addresses:
>
>     AB21:3465:7809
>
> Now, this is trivial with:
>
> $newval =~
> s/"\s([0-9a-f]+)\s([0-9a-f]+)\s([0-9a-f]+)\s([0-9a-f]+)\s([0-9a-f]+)\s([0-9a-f]+)\s"/$1$2:$3$4:$5$6/;
>
> but that is yucky.  Even:
>
> $newval =~ s/([0-9a-f]+)\s([0-9a-f]+)\s/$1$2:/g;
> $newval =~ s/".|."/g;
>
> is better but not perlish.   I'm looking fo something that is more
> elegant.  The hard part for
> me is how do I change every other space into a colon?
>

Here is the most elegant method I can think of:
$old =~ tr/" //d;    #remove spaces and quotes
$new = join ':',($old =~ s/\w{4}/g);    # group by 4's

Of course, \w can be replaced by [\dA-F] if necessary, and {4} can be replaced by {2,4} if parity is
uncertain (for whatever reason).

--
Stephen Kloder               |   "I say what it occurs to me to say.
stephenk@cc.gatech.edu       |      More I cannot say."
Phone 404-874-6584           |   -- The Man in the Shack
ICQ #65153895                |            be :- think.




------------------------------

Date: Wed, 11 Oct 2000 13:08:05 +1000
From: Joel Mackay <nmr_boy@hotmail.com>
Subject: system processes again
Message-Id: <39E3D995.EBFEE840@hotmail.com>

Hi,
I am trying to write a CGI script that will allow people with a browser
to run some number crunching software on my (web serving) UNIX machine.
 One normally invokes the software with something
like:
dyana </pathname/inputfile &

If i write a perl program that is:

#!/usr/sbin/perl
`dyana </pathname/inputfile &`

and run it from the command line, this works fine (it is a ~30 min
calculation).
However, if i put exactly the same code into a CGI script and execute it

from the browser, the process doesn't seem to start. No error is found
in the error log and the program has a+x permissions...
Can anyone help me here, or do i need to provide more info?

thanks,
joel





------------------------------

Date: Wed, 11 Oct 2000 02:38:07 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: system processes again
Message-Id: <39E3D28C.8E2DB5E1@rochester.rr.com>

Joel Mackay wrote:
> 
> Hi,
> I am trying to write a CGI script that will allow people with a browser
> to run some number crunching software on my (web serving) UNIX machine.
>  One normally invokes the software with something
> like:
> dyana </pathname/inputfile &
> 
> If i write a perl program that is:
> 
> #!/usr/sbin/perl
> `dyana </pathname/inputfile &`
> 
> and run it from the command line, this works fine (it is a ~30 min
> calculation).
> However, if i put exactly the same code into a CGI script and execute it
> 
> from the browser, the process doesn't seem to start. No error is found
> in the error log and the program has a+x permissions...
> Can anyone help me here, or do i need to provide more info?
> 
> thanks,
> joel

You should carefully check the following items (none of which have
anything to do with Perl):

1.  Is "dyana" on the path for the "user" your web server runs under? 
If not, you will need to specify an absolute path to your executable.
2.  Does "inputfile" have read permission by the user your web server
runs under?  If not, you'll have to straighten up the permissions.
3.  I'm sure that in 30 minute's effort, something must get written back
to disk to indicate the results of that exercise.  Does the user your
web server runs under have permission to write to wherever that gets
written?  Or perhaps it emails it back to the user via a pipe?
4.  Any other tweaky permissions problems which may exist -- it could be
that the user your web server runs under doesn't even have write access
to /tmp, and maybe your executable uses that?  Etc.
5.  Does Perl itself have execute permission by the user under which the
web server is running?  And all the packages you use have read
permission?  Etc.

You may get better results with questions like these in a CGI newsgroup,
such as comp.infosystems.www.authoring.cgi, as your request is off-topic
here.
-- 
Bob Walton


------------------------------

Date: Wed, 11 Oct 2000 02:54:18 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: TCP Servers with IO::Socket
Message-Id: <x7aecc1413.fsf@home.sysarch.com>

>>>>> "TR" == Troy Rasiah <troyr@vicnet.net.au> writes:

  >> the perl cookbook has several server examples

  TR> thanks uri...would you happen to know if the perl cookbook has an online
  TR> resource

go to oreilly.com, search for the cookbook and all the source code is
online there.

but i suggest you buy the book anyhow as you will learn much from it.

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  -----------  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  ----------  http://www.northernlight.com


------------------------------

Date: Tue, 10 Oct 2000 18:41:45 -0700
From: Larry Rosler <lr@hpl.hp.com>
Subject: Re: UltraNewbie: grepping weblog
Message-Id: <MPG.144d6533610acea498ae21@nntp.hpl.hp.com>

In article <8rt9j5$klr$1@news.panix.com> on 9 Oct 2000 20:25:09 GMT, 
James Young <jamesN0@5PAM.young.net> says...
> Hi guys - Still in the early stages of learning perl.  I'm trying to write a
> script to do the following:  
> 
> 1) examine a web log and return the number of hits per each particular
> category in my log. I'm currently using:
> 
> cat /tmp/logfile.tmp | grep "cat=1" | wc -l 
> cat /tmp/logfile.tmp | grep "cat=2" | wc -l 
> cat /tmp/logfile.tmp | grep "cat=3" | wc -l 
> etc...

Though this isn't a shell or Unix tools group, I can't let that go 
without shouwing you how to do each operation in oune process instead of 
three.

  grep -c "cat=1" /tmp/logfile.tmp
  ...

> but I think perl would be the preferred solution if I knew what I was
> doing.

Indeed, because you can gather all the data with one pass through the 
log file.

> 2) Once I get the total number of hits per category, I woul also like to get
> total number of hits per category in each individual language (also defined
> in my web logs).  For this, I am using:
> 
> cat access_log | grep -i "cat=2" | grep -i "lang=EN" | wc -l >> result
> etc...

Which means you are reading the file 60 times!  Ugh.

> I would love to be able to define 12 categories in perl (cat = 1, 2, 3, etc.)
> and five languages (EN, SP, FR, TU, PR) and have perl generate a simple
> report with both total hits per category, as well as hits per category, per
> language (each language would have hits for 12 categories).  
> 
> This is what I have so far - I know its bad, but at least I'm trying....
> 
> Can anyone help give me a jump start?

The following code is a jump start in the sense that it shows one way to 
read the data.  Producing your specific reports by searching the data 
structures is up to you.  Ask for help again if needed.

I have chosen to use an array for the 12 categories (with a wasted first 
spot, but that is easy to deal with or to eliminate by subtracting 1), 
and for each category a hash on the language.  It might be easier to 
generate the reports if you used a hash on the language, each of whose 
entries was an array on the categories.  That's up to you also.

Elements appear in the data structure as they are read from the data, 
not as predefined.  (The Perl literature calls this 'autovivification'.)

The regex assumes that each line has 'cat=' followed by 'lang=' followed 
by the count.  Adapt and adjust as needed.


#!/usr/bin/perl -wl
use strict;
use Data::Dumper;

my @data;

/cat=(\d+)\s+lang=([A-Z]+)\s+(\d+)/ and $data[$1]{$2} += $3
    while <DATA>;

print Dumper \@data;
__END__
cat=1 lang=EN 123
cat=1 lang=EN   1
cat=2 lang=EN   2
cat=2 lang=SP   3
cat=1 lang=SP 456

Output:

$VAR1 = [
          undef,
          {
            'SP' => '456',
            'EN' => '124'
          },
          {
            'SP' => '3',
            'EN' => '2'
          }
        ];

-- 
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: Wed, 11 Oct 2000 01:30:24 GMT
From: bmccoy@news.speakeasy.org (Brett W. McCoy)
Subject: Re: Writing multple lines to a file at once?
Message-Id: <slrn8u7gmn.trk.bmccoy@chapelperilous.net>

On Tue, 10 Oct 2000 21:30:27 GMT, billgerba@my-deja.com
<billgerba@my-deja.com> wrote:

>print <<OUTFILE;
>la dee da la dee da
>la dee da la dee da
>la dee da la dee da
>la dee da la dee da
>la dee da la dee da
>OUTFILE

How about

print OUTFILE <<EOF;
la dee da la dee da
la dee da la dee da
la dee da la dee da
la dee da la dee da
la dee da la dee da
EOF

-- 
Brett W. McCoy
                                              http://www.chapelperilous.net
---------------------------------------------------------------------------
I have already given two cousins to the war and I stand ready to sacrifice
my wife's brother.
		-- Artemus Ward


------------------------------

Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V9 Issue 4578
**************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[17166] in Perl-Users-Digest

Perl-Users Digest, Issue: 4578 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Oct 11 00:10:31 2000

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Oct 11 00:10:31 2000