[31429] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2681 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Nov 18 00:09:41 2009

Date: Tue, 17 Nov 2009 21:09:05 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 17 Nov 2009     Volume: 11 Number: 2681

Today's topics:
    Re: Convert HTML to XML sln@netherlands.com
    Re: My perl script to run arbitrary tasks in parallel despen@verizon.net
    Re: My perl script to run arbitrary tasks in parallel <ben@morrow.me.uk>
    Re: My perl script to run arbitrary tasks in parallel <ignoramus28865@NOSPAM.28865.invalid>
    Re: parse the logs and search for matching string <cyrusgreats@gmail.com>
    Re: parse the logs and search for matching string <tadmc@seesig.invalid>
    Re: parse the logs and search for matching string <cyrusgreats@gmail.com>
    Re: parse the logs and search for matching string sln@netherlands.com
    Re: parse the logs and search for matching string <cyrusgreats@gmail.com>
    Re: parse the logs and search for matching string <tadmc@seesig.invalid>
    Re: parse the logs and search for matching string <tadmc@seesig.invalid>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 17 Nov 2009 17:12:50 -0800
From: sln@netherlands.com
Subject: Re: Convert HTML to XML
Message-Id: <15i6g5pg4p7f7gsnvcpuhtae9slv2oe950@4ax.com>

On Mon, 16 Nov 2009 13:40:49 -0800 (PST), Ninja Li <nickli2000@gmail.com> wrote:

>On Nov 16, 2:04 pm, Lawrence Statton <yankeeinex...@gmail.com> wrote:
>> Ninja Li <nickli2...@gmail.com> writes:
>>
>> HTML::TreeBuilder really is the "right" tool for parsing HTML you get
>> from the web. One of it's major strengths is it can generate reasonable
>> parse-trees from even unreasonable HTML.
>>
>> Keep in mind that scraping earnings.com's website may be in violation of
>> their terms of use, and you should make sure you have appropriate
>> permission before doing that in an automated way.
>>
>> --L
>
>Thanks for your help and concern. We are a client of the website and
>are trying to move for Excel-based program to perl.

I looked at the source to the page link you provided.
I hope thats not in violation and the Feds are gonna come get me.

I wouldn't call it scraping would you? I'd guess Yaaahooei/Googleballs
own the web cause they do it all the time.

I've heard there is some kind of Perl module that will turn table data
into some kind of hash for you. I have personal software (written by me)
that sucks table data out of html/xml like buttaa. Unfortunately you can't
get it.

Look for that module on cpan or somewhere.

-sln


------------------------------

Date: Tue, 17 Nov 2009 21:15:55 -0500
From: despen@verizon.net
Subject: Re: My perl script to run arbitrary tasks in parallel
Message-Id: <iciqd8h92c.fsf@verizon.net>

Ignoramus28865 <ignoramus28865@NOSPAM.28865.invalid> writes:

> On 2009-11-17, despen@verizon.net <despen@verizon.net> wrote:
>> Ignoramus28865 <ignoramus28865@NOSPAM.28865.invalid> writes:
>>
>>> On 2009-11-17, despen@verizon.net <despen@verizon.net> wrote:
>>>> Ignoramus30118 <ignoramus30118@NOSPAM.30118.invalid> writes:
>> small_%.jpg: %.jpg
>> 	convert -geometry 320x240 $< $@
>> 	chmod a+r $@
>> small_%.gif: %.gif
>> 	convert -geometry 320x240 -interlace plane  $< $@
>> 	chmod a+r $@
>>
>> Thumbnails only get created when the image is updated.
>> The same Makefile, uploads everything that's changed to the web server.
>
> I think that you have a great approach, but in my case, I do not want
> to keep the originals. (which may be addressed by adding rm in the
> makefile, I am not sure)
>
> Also, would this makefile make small_small_small_mypic.jpg if you
> invoke it a few times, starting with mypic.jpg? Am I missing anything?

No, you don't invoke it a few times.
Here is an example:

large:=$(wildcard *.jpg *.gif)

all: $(addprefix small_,$large)

small_%.jpg: %.jpg
	convert -geometry 320x240 $< $@
	chmod a+r $@
small_%.gif: %.gif
	convert -geometry 320x240 -interlace plane  $< $@
	chmod a+r $@

What that does is first wildcard expand all the .jpg and .gif files
in the current directory into the variable "large".

Then it declares the default target "all" as all the .jpg/gif files
with a "small_" prefix.

So now make has all these targets, small_x.jpg, small_z.gif.
It looks for a rule that matches and finds the 2 rules shown.

If the file small_x.jpg already exists and was last updated AFTER
x.jpg was, it does nothing.

Otherwise it runs the rule shown.

How many it runs at one time depends on the -j option to make.

It can run no commands or hundreds, depending on how many files there
are and whether the "small_" file is current.


The real power in Makefiles comes from the chains of dependency.
You start with the images, make the thumbnails, then make uploading
dependent on the thumbnails.

Only whats needed gets done and you run as many processes at a time
as you want.


------------------------------

Date: Wed, 18 Nov 2009 03:34:45 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: My perl script to run arbitrary tasks in parallel
Message-Id: <lj6ct6-hn32.ln1@osiris.mauzo.dyndns.org>

[f'ups set to clpmisc]

Quoth despen@verizon.net:
> Ignoramus28865 <ignoramus28865@NOSPAM.28865.invalid> writes:
> 
> > Also, would this makefile make small_small_small_mypic.jpg if you
> > invoke it a few times, starting with mypic.jpg? Am I missing anything?
> 
> No, you don't invoke it a few times.
> Here is an example:
> 
> large:=$(wildcard *.jpg *.gif)

If any small_*.jpg files exist, this will include them.

> all: $(addprefix small_,$large)

This will then try to create a corresponding small_small_*.jpg file.
While it is possible to fix this, you quickly run up against the extreme
ugliness of trying to do anything complicated with make. 

(For completeness I should also point out that this makefile is
gmake-specific. BSD make has facilities that would make this easier, but
it's still not going to be pretty.)

A straightforward Perl solution could look something like (untested):

    use Parallel::ForkManager;

    my $J = Parallel::ForkManager->new(3);

    my @todo = 
        grep { -M $_ > -M "small_$_" }
        grep !/^small_/,
        <*.jpg *.gif>

    for (@todo) {
        $J->start and next;

        # process the file in $_, using system "convert ..." or
        # Image::Magick or ...

        $J->finish;
    }

    $J->wait_all_children;

Ben



------------------------------

Date: Tue, 17 Nov 2009 22:05:40 -0600
From: Ignoramus28865 <ignoramus28865@NOSPAM.28865.invalid>
Subject: Re: My perl script to run arbitrary tasks in parallel
Message-Id: <qaydnQS0h-UJ757WnZ2dnUVZ_jmdnZ2d@giganews.com>

On 2009-11-18, Ben Morrow <ben@morrow.me.uk> wrote:
> [f'ups set to clpmisc]
>
> Quoth despen@verizon.net:
>> Ignoramus28865 <ignoramus28865@NOSPAM.28865.invalid> writes:
>> 
>> > Also, would this makefile make small_small_small_mypic.jpg if you
>> > invoke it a few times, starting with mypic.jpg? Am I missing anything?
>> 
>> No, you don't invoke it a few times.
>> Here is an example:
>> 
>> large:=$(wildcard *.jpg *.gif)
>
> If any small_*.jpg files exist, this will include them.

Yep

>> all: $(addprefix small_,$large)
>
> This will then try to create a corresponding small_small_*.jpg file.
> While it is possible to fix this, you quickly run up against the extreme
> ugliness of trying to do anything complicated with make. 

You can give small ones the .JPG extension as opposed to .jpg.

I have a big perl script for creating nice HTML indices of directory
trees containing text, images, and PDF files and video files. 

The script does a few things, such as creating thumbnails, including
GIF animation based video thumbnails, html index files, RSS feeds etc.

The way it names thumbnails is ./.-THUMBNAIL.original-file-name.jpg

So they are invisible in regular wildcards and ls, and do not clutter
anything.

> (For completeness I should also point out that this makefile is
> gmake-specific. BSD make has facilities that would make this easier, but
> it's still not going to be pretty.)
>
> A straightforward Perl solution could look something like (untested):
>
>     use Parallel::ForkManager;
>
>     my $J = Parallel::ForkManager->new(3);
>
>     my @todo = 
>         grep { -M $_ > -M "small_$_" }
>         grep !/^small_/,
>         <*.jpg *.gif>
>
>     for (@todo) {
>         $J->start and next;
>
>         # process the file in $_, using system "convert ..." or
>         # Image::Magick or ...
>
>         $J->finish;
>     }
>
>     $J->wait_all_children;
>
> Ben
>

The nice aspect of my script is that it could let you do this sort of
thing with one command, e.g.:

(
 for i in *.jpg; do 
   echo convert $i -geometry 20%x20% $(basename $i .jpg).JPG
 done
) | parallel.pl

as well as a myriad other things. The job of this script is
parallelization; what exactly to parallelize -- images, videos, or
porn downloads -- would be the user's decision.

i


------------------------------

Date: Tue, 17 Nov 2009 15:14:21 -0800 (PST)
From: Obama <cyrusgreats@gmail.com>
Subject: Re: parse the logs and search for matching string
Message-Id: <f154aee9-2102-46d8-95a8-00eddcdc2c04@u16g2000pru.googlegroups.com>

On Nov 17, 2:22=A0pm, s...@netherlands.com wrote:
> On Tue, 17 Nov 2009 13:57:33 -0800 (PST), Obama <cyrusgre...@gmail.com> w=
rote:
> >On Nov 17, 1:39=A0pm, Obama <cyrusgre...@gmail.com> wrote:
> >> Hello my good people out there,
> >> I need to write script to =A0parse a log file and does the following:
>
> >> - search for name of server (in this case network-1:Test1)
> >> - search for start time (in this case Fri May 25 00:13:20)
> >> - search for end time (in this case May 25 00:13:49)
> >> - search for amount (in this case 2048 KB)
>
> >> Note that these lines will not typically be sequential. There may be
> >> numerous between the start and end lines, and many servers. So I
> >> assume I can place them matching name into array and then print the
> >> result into excel-sheet!
>
> >> log file: The below lines are examples.
> >> src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> >> src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048
> >> KB)
> >> ...
>
> >> ..
> >> src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> >> src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
> >> (212048 KB)
>
> >> my code..
> >> ----------------------
> >> open (FILE, $file) || die "Can't open datafile: $file";
> >> =A0my @lines =3D reverse <FILE>;
> >> =A0my $count =3D 1;
>
> >> =A0foreach my $line ( @lines ) {
> >> =A0 =A0next if $line =3D~ /^#/; =A0 =A0 # skip comments
> >> =A0 =A0next if $line =3D~ /^\s*$/; =A0# skip empty lines
> >> =A0 =A0next unless $line =3D~ /$match/;
>
> >> #here I need your advice
> >> =A0 =A0($key, $val) =3D $line =3D~ /([^src]*) ........../x or die "bad=
 data:
> >> invalid field '$_' in chunk $.";
> >> =A0 =A0if ($key) {
> >> =A0 =A0 =A0$worksheet->write($row, $column , $val);
>
> >> =A0 =A0 =A0$row++;
> >> =A0 =A0 =A0$count++;
> >> =A0 =A0 =A0last;
> >> =A0 =A0 }
>
> >Correction: the line in log could start with src or dst
>
> I am going to try to help you but you leave out so many
> really important details.
>
> Are these adjacent lines?
> Is there a Start and End?
> Are the fields of fixed width?
> Is the size KB after End on the same line?
> Does it wrap because of your news client?
> What happens when either Start/End is invalid?
> Throw the pair away?
>
> =A0 src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> =A0 src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048=
 KB)
> ~vs:
> =A0 src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> =A0 src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
> =A0 (212048 KB)
>
> These are some serious questions.
> Do you have acess to the code that generates these logs?
Not sure what do you mean by code, these are logs from server
> If so, what is the intended data format?
format can be viewed as text file when I open the log I do see
following lines

src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048
KB)
dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
dst Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
(212048 KB)
src Fri May 25 00:19:20 EDT myserver1:Test1 network-1:Test1 Start
src Fri May 25 00:19:49 EDT myserver1:Test1 network-1:Test1 End (2028
KB)
dst Sat May 26 00:10:20 EDT myserver2:Test2 network-2:Test2 Start
dst Sat May 26 00:10:49 EDT myserver2:Test2 network-2:Test2 End
(2198048 KB)

I think it is easy to parse the log file

I think if use the following, the I get server name, and start date:
\w*\s+(\w+\s+\d+\s+\d+\:\d+\:\d+)\w+\s+\w+\s+\w+:+\w+\s+(\w+:+\w+)\s
+Start

and if I use the following then I get the End date, server name and
amount.
\w*\s+(\w+\s+\d+\s+\d+\:\d+\:\d+)\w+\s+\w+\s+\w+:+\w+\s+(\w+:+\w+)\s
+End\s+(\(\d+\s+\w+\))

can I have this:

 foreach my $line ( @lines ) {
   next if $line =3D~ /^#/;     # skip comments
   next if $line =3D~ /^\s*$/;  # skip empty lines

#get start info
($key, $start_time, $server_name) =3D $line =3D~ /\w*\s+\w+\s+\d+\s+(\d+\:
\d+\:\d+)\w+\s+\w+\s+\w+:+\w+\s+(\w+:+\w+)\s+Start/x or die "bad data:
invalid field '$_' in chunk $.";

#get the end info
$key, $end_time, $server_name, $amount) =3D $line =3D~ \w*\s+(\w+\s+\d+\s+
\d+\:\d+\:\d+)\w+\s+\w+\s+\w+:+\w+\s+(\w+:+\w+)\s+End\s+(\(\d+\s+\w+\)/
x or die "bad data: invalid field '$_' in chunk $.";

But I could be wrong...




------------------------------

Date: Tue, 17 Nov 2009 17:16:00 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: parse the logs and search for matching string
Message-Id: <slrnhg6bg4.adk.tadmc@tadbox.sbcglobal.net>

Obama <cyrusgreats@gmail.com> wrote:

> I need to write script to  parse a log file and does the following:
>
> - search for name of server (in this case network-1:Test1)
> - search for start time (in this case Fri May 25 00:13:20)
> - search for end time (in this case May 25 00:13:49)
> - search for amount (in this case 2048 KB)

> log file: The below lines are examples.
> src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048
> KB)
> src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
> (212048 KB)


You should take care that long lines are not line wrapped by
your news software.


> open (FILE, $file) || die "Can't open datafile: $file";


Don't you want to know _why_ it failed?


    open (FILE, $file) || die "Can't open datafile '$file' $!";
                                                           ^^
                                                           ^^

>  my @lines = reverse <FILE>;


Why do you think you need to reverse the lines?


> #here I need your advice


You sure do...


>    ($key, $val) = $line =~ /([^src]*) ........../x or die "bad data:


What do you think that will match?

(it matches any line!)

If you want to match only lines that start with "src", then:

    /^src..../x

The caret (^) means different things in different places.

Your question in nearly incoherent, but I'll take a wild guess
at helping you.

Have you seen the Posting Guidelines that are posted here frequently?


----------------------------
#!/usr/bin/perl
use warnings;
use strict;

while (<DATA>) {
    if ( /^src (.*?)EDT \S+ (\S+) (\S+)( \(([^)]+))?/ ) {
        print "$3 date: $1\n    server: $2";
        print "\n    size: $5\n" if $3 eq 'End';
        print "\n";
    }
}

__DATA__
src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048 KB)
src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End (212048 KB)
----------------------------

output:

Start date: Fri May 25 00:13:20 
    server: network-1:Test1
End date: Fri May 25 00:13:49 
    server: network-1:Test1
    size: 2048 KB

Start date: Sat May 26 00:15:20 
    server: network-2:Test2
End date: Sat May 26 00:15:49 
    server: network-2:Test2
    size: 212048 KB


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Tue, 17 Nov 2009 16:05:06 -0800 (PST)
From: Obama <cyrusgreats@gmail.com>
Subject: Re: parse the logs and search for matching string
Message-Id: <17ec15f0-76ac-473c-a4dc-6e86d8981c59@i12g2000prg.googlegroups.com>

On Nov 17, 3:16=A0pm, Tad McClellan <ta...@seesig.invalid> wrote:
> Obama <cyrusgre...@gmail.com> wrote:
> > I need to write script to =A0parse a log file and does the following:
>
> > - search for name of server (in this case network-1:Test1)
> > - search for start time (in this case Fri May 25 00:13:20)
> > - search for end time (in this case May 25 00:13:49)
> > - search for amount (in this case 2048 KB)
> > log file: The below lines are examples.
> > src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> > src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048
> > KB)
> > src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> > src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
> > (212048 KB)
>
> You should take care that long lines are not line wrapped by
> your news software.
>
> > open (FILE, $file) || die "Can't open datafile: $file";
>
> Don't you want to know _why_ it failed?
>
> =A0 =A0 open (FILE, $file) || die "Can't open datafile '$file' $!";
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0^^
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0^^
>
> > =A0my @lines =3D reverse <FILE>;
>
> Why do you think you need to reverse the lines?
>
> > #here I need your advice
>
> You sure do...
>
> > =A0 =A0($key, $val) =3D $line =3D~ /([^src]*) ........../x or die "bad =
data:
>
> What do you think that will match?
>
> (it matches any line!)
>
> If you want to match only lines that start with "src", then:
>
> =A0 =A0 /^src..../x
>
> The caret (^) means different things in different places.
>
> Your question in nearly incoherent, but I'll take a wild guess
> at helping you.
>
> Have you seen the Posting Guidelines that are posted here frequently?
>
> ----------------------------
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> while (<DATA>) {
> =A0 =A0 if ( /^src (.*?)EDT \S+ (\S+) (\S+)( \(([^)]+))?/ ) {
> =A0 =A0 =A0 =A0 print "$3 date: $1\n =A0 =A0server: $2";
> =A0 =A0 =A0 =A0 print "\n =A0 =A0size: $5\n" if $3 eq 'End';
> =A0 =A0 =A0 =A0 print "\n";
> =A0 =A0 }
>
> }
>
> __DATA__
> src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048 KB)
> src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End (212048 K=
B)
> ----------------------------
>
> output:
>
> Start date: Fri May 25 00:13:20
> =A0 =A0 server: network-1:Test1
> End date: Fri May 25 00:13:49
> =A0 =A0 server: network-1:Test1
> =A0 =A0 size: 2048 KB
>
> Start date: Sat May 26 00:15:20
> =A0 =A0 server: network-2:Test2
> End date: Sat May 26 00:15:49
> =A0 =A0 server: network-2:Test2
> =A0 =A0 size: 212048 KB
>
> --
> Tad McClellan
> email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Thank you so much that is almost exactly what I'm looking for but the
line does not always start with "src" this could be "dst" and start
and end at the end of each line are the key

src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Request
src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1
Softlock_add
src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
<-----------------
src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048
KB) <-----------------
dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Request
dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2
Softlock_add
dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
<-----------------
dst Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
(212048 KB) <-----------------


------------------------------

Date: Tue, 17 Nov 2009 16:20:52 -0800
From: sln@netherlands.com
Subject: Re: parse the logs and search for matching string
Message-Id: <93f6g5dssa75dojqe2gk11gqb3qqc2t6q2@4ax.com>

On Tue, 17 Nov 2009 16:05:06 -0800 (PST), Obama <cyrusgreats@gmail.com> wrote:

>On Nov 17, 3:16 pm, Tad McClellan <ta...@seesig.invalid> wrote:
>> Obama <cyrusgre...@gmail.com> wrote:
>> > I need to write script to  parse a log file and does the following:
>>
>> > - search for name of server (in this case network-1:Test1)
>> > - search for start time (in this case Fri May 25 00:13:20)
>> > - search for end time (in this case May 25 00:13:49)
>> > - search for amount (in this case 2048 KB)
>> > log file: The below lines are examples.
>> > src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
>> > src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048
>> > KB)
>> > src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
>> > src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
>> > (212048 KB)
>>
>> You should take care that long lines are not line wrapped by
>> your news software.
>>
>> > open (FILE, $file) || die "Can't open datafile: $file";
>>
>> Don't you want to know _why_ it failed?
>>
>>     open (FILE, $file) || die "Can't open datafile '$file' $!";
>>                                                            ^^
>>                                                            ^^
>>
>> >  my @lines = reverse <FILE>;
>>
>> Why do you think you need to reverse the lines?
>>
>> > #here I need your advice
>>
>> You sure do...
>>
>> >    ($key, $val) = $line =~ /([^src]*) ........../x or die "bad data:
>>
>> What do you think that will match?
>>
>> (it matches any line!)
>>
>> If you want to match only lines that start with "src", then:
>>
>>     /^src..../x
>>
>> The caret (^) means different things in different places.
>>
>> Your question in nearly incoherent, but I'll take a wild guess
>> at helping you.
>>
>> Have you seen the Posting Guidelines that are posted here frequently?
>>
>> ----------------------------
>> #!/usr/bin/perl
>> use warnings;
>> use strict;
>>
>> while (<DATA>) {
>>     if ( /^src (.*?)EDT \S+ (\S+) (\S+)( \(([^)]+))?/ ) {
>>         print "$3 date: $1\n    server: $2";
>>         print "\n    size: $5\n" if $3 eq 'End';
>>         print "\n";
>>     }
>>
>> }
>>
>> __DATA__
>> src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
>> src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048 KB)
>> src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
>> src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End (212048 KB)
>> ----------------------------
>>
>> output:
>>
>> Start date: Fri May 25 00:13:20
>>     server: network-1:Test1
>> End date: Fri May 25 00:13:49
>>     server: network-1:Test1
>>     size: 2048 KB
>>
>> Start date: Sat May 26 00:15:20
>>     server: network-2:Test2
>> End date: Sat May 26 00:15:49
>>     server: network-2:Test2
>>     size: 212048 KB
>>
>> --
>> Tad McClellan
>> email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
>
>Thank you so much that is almost exactly what I'm looking for but the
>line does not always start with "src" this could be "dst" and start
>and end at the end of each line are the key
>
>src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Request
>src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1
>Softlock_add
>src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
><-----------------
>src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048
>KB) <-----------------
>dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Request
>dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2
>Softlock_add
>dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
><-----------------
>dst Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
>(212048 KB) <-----------------

It is up to you to determine the results. Tad just showed you how
to parse the line (and a very nice regex it is).

The regex gets exponentionally more complicated the more ambiguity
you expect it to resolve.

If you need to get more out of it just split it into all of its major
parts, then resolve what you need and what you don't.

-sln

-----------------
use strict;
use warnings;

while (<DATA>)
{
	my @all = split /[()\s]+/;
	print "$_\n" for (@all);
	print "\n";
}

__DATA__
src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048 KB)
src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End (212048 KB)

-----------------

src
Fri
May
25
00:13:20
EDT
myserver1:Test1
network-1:Test1
Start

src
Fri
May
25
00:13:49
EDT
myserver1:Test1
network-1:Test1
End
2048
KB

src
Sat
May
26
00:15:20
EDT
myserver2:Test2
network-2:Test2
Start

src
Sat
May
26
00:15:49
EDT
myserver2:Test2
network-2:Test2
End
212048
KB



------------------------------

Date: Tue, 17 Nov 2009 16:40:00 -0800 (PST)
From: Obama <cyrusgreats@gmail.com>
Subject: Re: parse the logs and search for matching string
Message-Id: <eb2886b2-f962-443b-8efd-e0c84211936b@t11g2000prh.googlegroups.com>

On Nov 17, 4:20=A0pm, s...@netherlands.com wrote:
> On Tue, 17 Nov 2009 16:05:06 -0800 (PST), Obama <cyrusgre...@gmail.com> w=
rote:
> >On Nov 17, 3:16=A0pm, Tad McClellan <ta...@seesig.invalid> wrote:
> >> Obama <cyrusgre...@gmail.com> wrote:
> >> > I need to write script to =A0parse a log file and does the following=
:
>
> >> > - search for name of server (in this case network-1:Test1)
> >> > - search for start time (in this case Fri May 25 00:13:20)
> >> > - search for end time (in this case May 25 00:13:49)
> >> > - search for amount (in this case 2048 KB)
> >> > log file: The below lines are examples.
> >> > src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> >> > src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (204=
8
> >> > KB)
> >> > src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> >> > src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
> >> > (212048 KB)
>
> >> You should take care that long lines are not line wrapped by
> >> your news software.
>
> >> > open (FILE, $file) || die "Can't open datafile: $file";
>
> >> Don't you want to know _why_ it failed?
>
> >> =A0 =A0 open (FILE, $file) || die "Can't open datafile '$file' $!";
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0^^
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0^^
>
> >> > =A0my @lines =3D reverse <FILE>;
>
> >> Why do you think you need to reverse the lines?
>
> >> > #here I need your advice
>
> >> You sure do...
>
> >> > =A0 =A0($key, $val) =3D $line =3D~ /([^src]*) ........../x or die "b=
ad data:
>
> >> What do you think that will match?
>
> >> (it matches any line!)
>
> >> If you want to match only lines that start with "src", then:
>
> >> =A0 =A0 /^src..../x
>
> >> The caret (^) means different things in different places.
>
> >> Your question in nearly incoherent, but I'll take a wild guess
> >> at helping you.
>
> >> Have you seen the Posting Guidelines that are posted here frequently?
>
> >> ----------------------------
> >> #!/usr/bin/perl
> >> use warnings;
> >> use strict;
>
> >> while (<DATA>) {
> >> =A0 =A0 if ( /^src (.*?)EDT \S+ (\S+) (\S+)( \(([^)]+))?/ ) {
> >> =A0 =A0 =A0 =A0 print "$3 date: $1\n =A0 =A0server: $2";
> >> =A0 =A0 =A0 =A0 print "\n =A0 =A0size: $5\n" if $3 eq 'End';
> >> =A0 =A0 =A0 =A0 print "\n";
> >> =A0 =A0 }
>
> >> }
>
> >> __DATA__
> >> src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> >> src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048 =
KB)
> >> src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> >> src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End (21204=
8 KB)
> >> ----------------------------
>
> >> output:
>
> >> Start date: Fri May 25 00:13:20
> >> =A0 =A0 server: network-1:Test1
> >> End date: Fri May 25 00:13:49
> >> =A0 =A0 server: network-1:Test1
> >> =A0 =A0 size: 2048 KB
>
> >> Start date: Sat May 26 00:15:20
> >> =A0 =A0 server: network-2:Test2
> >> End date: Sat May 26 00:15:49
> >> =A0 =A0 server: network-2:Test2
> >> =A0 =A0 size: 212048 KB
>
> >> --
> >> Tad McClellan
> >> email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
>
> >Thank you so much that is almost exactly what I'm looking for but the
> >line does not always start with "src" this could be "dst" and start
> >and end at the end of each line are the key
>
> >src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Request
> >src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1
> >Softlock_add
> >src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> ><-----------------
> >src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048
> >KB) <-----------------
> >dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Request
> >dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2
> >Softlock_add
> >dst Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> ><-----------------
> >dst Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End
> >(212048 KB) <-----------------
>
> It is up to you to determine the results. Tad just showed you how
> to parse the line (and a very nice regex it is).
>
> The regex gets exponentionally more complicated the more ambiguity
> you expect it to resolve.
>
> If you need to get more out of it just split it into all of its major
> parts, then resolve what you need and what you don't.
>
> -sln
>
> -----------------
> use strict;
> use warnings;
>
> while (<DATA>)
> {
> =A0 =A0 =A0 =A0 my @all =3D split /[()\s]+/;
> =A0 =A0 =A0 =A0 print "$_\n" for (@all);
> =A0 =A0 =A0 =A0 print "\n";
>
> }
>
> __DATA__
> src Fri May 25 00:13:20 EDT myserver1:Test1 network-1:Test1 Start
> src Fri May 25 00:13:49 EDT myserver1:Test1 network-1:Test1 End (2048 KB)
> src Sat May 26 00:15:20 EDT myserver2:Test2 network-2:Test2 Start
> src Sat May 26 00:15:49 EDT myserver2:Test2 network-2:Test2 End (212048 K=
B)
>
> -----------------
>
> src
> Fri
> May
> 25
> 00:13:20
> EDT
> myserver1:Test1
> network-1:Test1
> Start
>
> src
> Fri
> May
> 25
> 00:13:49
> EDT
> myserver1:Test1
> network-1:Test1
> End
> 2048
> KB
>
> src
> Sat
> May
> 26
> 00:15:20
> EDT
> myserver2:Test2
> network-2:Test2
> Start
>
> src
> Sat
> May
> 26
> 00:15:49
> EDT
> myserver2:Test2
> network-2:Test2
> End
> 212048
> KB

thank you all, you guys are awesome!


------------------------------

Date: Tue, 17 Nov 2009 19:03:37 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: parse the logs and search for matching string
Message-Id: <slrnhg6hps.aoa.tadmc@tadbox.sbcglobal.net>


[ Please do not quote an entire article. It is seen as bad manners.
  Quote just enough to establish the context for the comments
  you plan to add.
]


Obama <cyrusgreats@gmail.com> wrote:
> On Nov 17, 3:16 pm, Tad McClellan <ta...@seesig.invalid> wrote:

>> If you want to match only lines that start with "src", then:
>>
>>     /^src..../x

>> --
>> Tad McClellan
>> email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


Please do not quote .sigs, as that is seen as bad manners too.


> Thank you so much that is almost exactly what I'm looking for but the
> line does not always start with "src" this could be "dst"


Then modify the code I gave you so that it works for both
"src" and "dst" lines.

If your code to do that does not work, then post it here (see the
Posting Guidelines first please) and we will help you fix it.

We are not here to write your program for you, we are here to help
you write your program.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Tue, 17 Nov 2009 19:06:28 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: parse the logs and search for matching string
Message-Id: <slrnhg6hv7.aoa.tadmc@tadbox.sbcglobal.net>

Obama <cyrusgreats@gmail.com> wrote:

[ snip 200 lines of quoted text ]

Posting 200 lines to add 1 line is an abuse.

If you continue to abuse us...


> thank you all, you guys are awesome!


 ... we will become noticably less awesome!


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2681
***************************************


home help back first fref pref prev next nref lref last post