[22762] in Perl-Users-Digest
Perl-Users Digest, Issue: 4983 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed May 14 11:11:08 2003
Date: Wed, 14 May 2003 08:10:18 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 14 May 2003 Volume: 10 Number: 4983
Today's topics:
newbie need help <peng.zhao@epfl.ch>
Re: newbie need help (Anno Siegel)
Re: newbie need help <josef.moellers@fujitsu-siemens.com>
Re: newbie need help <tore@aursand.no>
Re: newbie need help <barryk2@SPAM-KILLER.mts.net>
Re: removing spaces in a string <barryk2@SPAM-KILLER.mts.net>
Which module to use for ordered hashes? <kopetnik@s-pam-nie.yahoo.com>
Re: Which module to use for ordered hashes? <tore@aursand.no>
Re: Which module to use for ordered hashes? <kopetnik@s-pam-nie.yahoo.com>
Re: Which module to use for ordered hashes? <bob@nowhere.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 14 May 2003 13:08:15 +0200
From: Peng Zhao <peng.zhao@epfl.ch>
Subject: newbie need help
Message-Id: <3EC2239F.3050401@epfl.ch>
hi folks,
I have to process some huge files containing following lines:
time event source
---------------------------------------------
...
11.322490 GET_PAGE _o2831/1133 200
11.330820 GET_PAGE _o2820/1122 200
11.476960 GET_PAGE _o2438/740 200
11.491210 GET_PAGE _o2711/1013 200
11.502060 GET_PAGE _o2536/838 200
13.048310 GET_PAGE _o2887/1189 200
13.054010 GET_PAGE _o2582/884 200
13.073399 REQUEST_DONE _o2711/1013 7 191989
13.182151 REQUEST_DONE _o2831/1133 6 722069
13.347800 GET_PAGE _o1731/33 200
13.554530 GET_PAGE _o2427/729 200
13.630060 REQUEST_DONE _o2536/838 4 408025
13.652410 GET_PAGE _o2619/921 200
13.654830 GET_PAGE _o2661/963 200
13.683290 GET_PAGE _o2648/950 200
...
one of the task I want to do is to output the time difference between
GET_PAGE and REQUEST_DONE from the same source.
as they are huge files, I have to process it line by line.
Is there an easy way to do it without keeping all sources in an array ?
thanks in advance for any help.
cheers,
P.
------------------------------
Date: 14 May 2003 11:14:00 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: newbie need help
Message-Id: <b9t8do$4sv$2@mamenchi.zrz.TU-Berlin.DE>
Peng Zhao <peng.zhao@epfl.ch> wrote in comp.lang.perl.misc:
> hi folks,
>
> I have to process some huge files containing following lines:
>
> time event source
> ---------------------------------------------
> ...
> 11.322490 GET_PAGE _o2831/1133 200
> 11.330820 GET_PAGE _o2820/1122 200
> 11.476960 GET_PAGE _o2438/740 200
> 11.491210 GET_PAGE _o2711/1013 200
> 11.502060 GET_PAGE _o2536/838 200
> 13.048310 GET_PAGE _o2887/1189 200
> 13.054010 GET_PAGE _o2582/884 200
> 13.073399 REQUEST_DONE _o2711/1013 7 191989
> 13.182151 REQUEST_DONE _o2831/1133 6 722069
> 13.347800 GET_PAGE _o1731/33 200
> 13.554530 GET_PAGE _o2427/729 200
> 13.630060 REQUEST_DONE _o2536/838 4 408025
> 13.652410 GET_PAGE _o2619/921 200
> 13.654830 GET_PAGE _o2661/963 200
> 13.683290 GET_PAGE _o2648/950 200
> ...
>
>
> one of the task I want to do is to output the time difference between
> GET_PAGE and REQUEST_DONE from the same source.
> as they are huge files, I have to process it line by line.
>
> Is there an easy way to do it without keeping all sources in an array ?
Well, you'd more probably keep them in a hash for this purpose. If you
delete() those entries whose REQUEST_DONE you have seen, you don't
have to keep them all but only those that are "pending".
Anno
------------------------------
Date: Wed, 14 May 2003 13:16:58 +0200
From: Josef =?iso-8859-1?Q?M=F6llers?= <josef.moellers@fujitsu-siemens.com>
Subject: Re: newbie need help
Message-Id: <3EC225AA.E018F959@fujitsu-siemens.com>
Peng Zhao wrote:
> =
> hi folks,
> =
> I have to process some huge files containing following lines:
> =
> time event source
> ---------------------------------------------
> ...
> 11.322490 GET_PAGE _o2831/1133 200
> 11.330820 GET_PAGE _o2820/1122 200
> 11.476960 GET_PAGE _o2438/740 200
> 11.491210 GET_PAGE _o2711/1013 200
> 11.502060 GET_PAGE _o2536/838 200
> 13.048310 GET_PAGE _o2887/1189 200
> 13.054010 GET_PAGE _o2582/884 200
> 13.073399 REQUEST_DONE _o2711/1013 7 191989
> 13.182151 REQUEST_DONE _o2831/1133 6 722069
> 13.347800 GET_PAGE _o1731/33 200
> 13.554530 GET_PAGE _o2427/729 200
> 13.630060 REQUEST_DONE _o2536/838 4 408025
> 13.652410 GET_PAGE _o2619/921 200
> 13.654830 GET_PAGE _o2661/963 200
> 13.683290 GET_PAGE _o2648/950 200
> ...
> =
> one of the task I want to do is to output the time difference between
> GET_PAGE and REQUEST_DONE from the same source.
> as they are huge files, I have to process it line by line.
> =
> Is there an easy way to do it without keeping all sources in an array ?=
Consider hashes, keeping only those "GET_PAGE" entries that have not yet
been "REQUEST_DONE"ed, delete()ing the REQUEST_DONEed.
-- =
Josef M=F6llers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize
-- T. Pratchett
------------------------------
Date: Wed, 14 May 2003 14:10:53 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: newbie need help
Message-Id: <pan.2003.05.14.12.10.49.391517@aursand.no>
On Wed, 14 May 2003 13:08:15 +0200, Peng Zhao wrote:
> I have to process some huge files containing following lines:
> [...]
Try this little script; It seems to do what you want, but it's hacked
together in just a few minutes. I myself see a lot of optimizations which
could be done.
----
#!/usr/bin/perl
#
use strict;
use warnings;
my %entries = ();
while ( <DATA> ) {
chomp;
my ( $time, $command, $source, @foo ) = split( /\s+/ );
if ( $command eq 'REQUEST_DONE' ) {
if ( exists $entries{$source} ) {
my $diff = $time - $entries{$source};
print $source . "\t" . $diff . "\n";
}
else {
# REQUEST_DONE found for a source, but the
# source doesn't have a GET_PAGE time set.
}
}
elsif ( $command eq 'GET_PAGE' ) {
$entries{$source} = $time;
}
else {
# Unknown $command
}
};
__DATA__
11.322490 GET_PAGE _o2831/1133 200
11.330820 GET_PAGE _o2820/1122 200
11.476960 GET_PAGE _o2438/740 200
11.491210 GET_PAGE _o2711/1013 200
11.502060 GET_PAGE _o2536/838 200
13.048310 GET_PAGE _o2887/1189 200
13.054010 GET_PAGE _o2582/884 200
13.073399 REQUEST_DONE _o2711/1013 7 191989
13.182151 REQUEST_DONE _o2831/1133 6 722069
13.347800 GET_PAGE _o1731/33 200
13.554530 GET_PAGE _o2427/729 200
13.630060 REQUEST_DONE _o2536/838 4 408025
13.652410 GET_PAGE _o2619/921 200
13.654830 GET_PAGE _o2661/963 200
----
--
Tore Aursand <tore@aursand.no>
------------------------------
Date: Wed, 14 May 2003 09:01:17 -0500
From: Barry Kimelman <barryk2@SPAM-KILLER.mts.net>
Subject: Re: newbie need help
Message-Id: <MPG.192c08046b45da8b9897ce@news.mts.net>
[This followup was posted to comp.lang.perl.misc]
In article <3EC2239F.3050401@epfl.ch>, Peng Zhao (peng.zhao@epfl.ch)
says...
> hi folks,
>
> I have to process some huge files containing following lines:
>
> time event source
> ---------------------------------------------
> ...
> 11.322490 GET_PAGE _o2831/1133 200
> 11.330820 GET_PAGE _o2820/1122 200
> 11.476960 GET_PAGE _o2438/740 200
> 11.491210 GET_PAGE _o2711/1013 200
> 11.502060 GET_PAGE _o2536/838 200
> 13.048310 GET_PAGE _o2887/1189 200
> 13.054010 GET_PAGE _o2582/884 200
> 13.073399 REQUEST_DONE _o2711/1013 7 191989
> 13.182151 REQUEST_DONE _o2831/1133 6 722069
> 13.347800 GET_PAGE _o1731/33 200
> 13.554530 GET_PAGE _o2427/729 200
> 13.630060 REQUEST_DONE _o2536/838 4 408025
> 13.652410 GET_PAGE _o2619/921 200
> 13.654830 GET_PAGE _o2661/963 200
> 13.683290 GET_PAGE _o2648/950 200
> ...
>
>
> one of the task I want to do is to output the time difference between
> GET_PAGE and REQUEST_DONE from the same source.
> as they are huge files, I have to process it line by line.
>
> Is there an easy way to do it without keeping all sources in an array ?
>
> thanks in advance for any help.
>
> cheers,
>
> P.
Try the following code.
Note : you may want to add more error checking...
$filename = $ARGV[0];
open(INPUT,"<$filename") or
die("Can't open $filename : $!\n");
%get_page = ();
while ( $buffer = <INPUT> ) {
chomp $buffer;
@fields = split(/\s+/,$buffer);
$time = $fields[0];
$type = $fields[1];
$source = $fields[2];
if ( $type eq "GET_PAGE" ) {
$get_page{$source} = $time;
} elsif ( $type eq "REQUEST_DONE" ) {
if ( exists $get_page{$source} ) {
$difference = $time - $get_page{$source};
print "Source : $source , request time : $difference\n";
delete $get_page{$source};
}
else {
# found a REQUEST_DONE wuthout a corresponding
# GET_PAGE
}
} else {
# unexpected type ???
}
}
close INPUT;
--
---------
Barry Kimelman
Winnipeg, Manitoba, Canada
email : bkimelman@hotmail.com
------------------------------
Date: Wed, 14 May 2003 08:43:20 -0500
From: Barry Kimelman <barryk2@SPAM-KILLER.mts.net>
Subject: Re: removing spaces in a string
Message-Id: <MPG.192c03f8c5420ccf9897cd@news.mts.net>
[This followup was posted to comp.lang.perl.misc]
In article <337213e2.0305130932.730d10ac@posting.google.com>, Kevin
(kgiles@optonline.net) says...
> how do you remove spaces in a string varaible?
>
> thanks
>
> kg
>
$buffer =~ s/ //g;
--
---------
Barry Kimelman
Winnipeg, Manitoba, Canada
email : bkimelman@hotmail.com
------------------------------
Date: Wed, 14 May 2003 13:18:22 +0200
From: Lechu <kopetnik@s-pam-nie.yahoo.com>
Subject: Which module to use for ordered hashes?
Message-Id: <v994cvgpho8o22b4i1cd21r2f9vuuu4phd@4ax.com>
There are some implementations of ordered hashes on CPAN. (modules
Tie::*)
Which one to use? I need something simple and efficient.
Lech
------------------------------
Date: Wed, 14 May 2003 14:00:23 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: Which module to use for ordered hashes?
Message-Id: <pan.2003.05.14.12.00.20.669090@aursand.no>
On Wed, 14 May 2003 13:18:22 +0200, Lechu wrote:
> There are some implementations of ordered hashes on CPAN.
> [...]
That's right, and this question seems to be answered in the FAQ;
How can I always keep my hash sorted?
You can look into using the DB_File module and tie() using
the $DB_BTREE hash bindings as documented in "In Memory
Databases" in DB_File. The Tie::IxHash module from CPAN
might also be instructive.
--
Tore Aursand <tore@aursand.no>
------------------------------
Date: Wed, 14 May 2003 14:33:33 +0200
From: Lechu <kopetnik@s-pam-nie.yahoo.com>
Subject: Re: Which module to use for ordered hashes?
Message-Id: <lnd4cv0vas27um123ag7v6i6i48nhon0sc@4ax.com>
On Wed, 14 May 2003 14:00:23 +0200, Tore Aursand <tore@aursand.no>
wrote:
>On Wed, 14 May 2003 13:18:22 +0200, Lechu wrote:
>> There are some implementations of ordered hashes on CPAN.
>> [...]
>The Tie::IxHash module from CPAN
> might also be instructive.
Unfortunately this module doesn't provide solution for
multidimensional hashes:
my $hash={};
tie %$hash,'Tie::IxHash';
# here the order is preserved
$hash->{a}={};
$hash->{b}={};
$hash->{c}={};
$hash->{d}={};
$hash->{e}={};
$hash->{f}={};
# here the order is lost
$hash->{a}{a}={};
$hash->{a}{b}={};
$hash->{a}{c}={};
$hash->{a}{d}={};
$hash->{a}{e}={};
$hash->{a}{f}={};
------------------------------
Date: Wed, 14 May 2003 09:12:31 -0500
From: bob <bob@nowhere.com>
Subject: Re: Which module to use for ordered hashes?
Message-Id: <3ec24ecb_1@127.0.0.1>
On Wed, 14 May 2003 07:33:33 -0500, Lechu wrote:
> On Wed, 14 May 2003 14:00:23 +0200, Tore Aursand <tore@aursand.no>
> wrote:
>
>>On Wed, 14 May 2003 13:18:22 +0200, Lechu wrote:
>>> There are some implementations of ordered hashes on CPAN. [...]
>
>>The Tie::IxHash module from CPAN
>> might also be instructive.
> Unfortunately this module doesn't provide solution for multidimensional
> hashes:
>
>
> my $hash={};
> tie %$hash,'Tie::IxHash';
>
> # here the order is preserved
> $hash->{a}={};
> $hash->{b}={};
> $hash->{c}={};
> $hash->{d}={};
> $hash->{e}={};
> $hash->{f}={};
There's a workaround for this. Solution I used in one project is to
insert something like this here:
$out = {};
tie %{$out}, 'Tie::IxHash';
#THIS PART UNTESTED
$hash->{a}{a}= $out;
#but this out to work:
# $out->{a} = {};
# $out->{b} = {};
# etc
> # here the order is lost
> $hash->{a}{a}={};
> $hash->{a}{b}={};
> $hash->{a}{c}={};
> $hash->{a}{d}={};
> $hash->{a}{e}={};
> $hash->{a}{f}={};
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 4983
***************************************