[22762] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4983 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed May 14 11:11:08 2003

Date: Wed, 14 May 2003 08:10:18 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 14 May 2003     Volume: 10 Number: 4983

Today's topics:
        newbie need help <peng.zhao@epfl.ch>
    Re: newbie need help (Anno Siegel)
    Re: newbie need help <josef.moellers@fujitsu-siemens.com>
    Re: newbie need help <tore@aursand.no>
    Re: newbie need help <barryk2@SPAM-KILLER.mts.net>
    Re: removing spaces in a string <barryk2@SPAM-KILLER.mts.net>
        Which module to use for ordered hashes? <kopetnik@s-pam-nie.yahoo.com>
    Re: Which module to use for ordered hashes? <tore@aursand.no>
    Re: Which module to use for ordered hashes? <kopetnik@s-pam-nie.yahoo.com>
    Re: Which module to use for ordered hashes? <bob@nowhere.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 14 May 2003 13:08:15 +0200
From: Peng Zhao <peng.zhao@epfl.ch>
Subject: newbie need help
Message-Id: <3EC2239F.3050401@epfl.ch>

hi folks,

I have to process some huge files containing following lines:

time		event		source
---------------------------------------------
 ...
11.322490       GET_PAGE        _o2831/1133     200
11.330820       GET_PAGE        _o2820/1122     200
11.476960       GET_PAGE        _o2438/740      200
11.491210       GET_PAGE        _o2711/1013     200
11.502060       GET_PAGE        _o2536/838      200
13.048310       GET_PAGE        _o2887/1189     200
13.054010       GET_PAGE        _o2582/884      200
13.073399       REQUEST_DONE    _o2711/1013     7       191989
13.182151       REQUEST_DONE    _o2831/1133     6       722069
13.347800       GET_PAGE        _o1731/33       200
13.554530       GET_PAGE        _o2427/729      200
13.630060       REQUEST_DONE    _o2536/838      4       408025
13.652410       GET_PAGE        _o2619/921      200
13.654830       GET_PAGE        _o2661/963      200
13.683290       GET_PAGE        _o2648/950      200
 ...


one of the task I want to do is to output the time difference between 
GET_PAGE and REQUEST_DONE from the same source.
as they are huge files, I have to process it line by line.

Is there an easy way to do it without keeping all sources in an array ?

thanks in advance for any help.

cheers,

P.



------------------------------

Date: 14 May 2003 11:14:00 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: newbie need help
Message-Id: <b9t8do$4sv$2@mamenchi.zrz.TU-Berlin.DE>

Peng Zhao  <peng.zhao@epfl.ch> wrote in comp.lang.perl.misc:
> hi folks,
> 
> I have to process some huge files containing following lines:
> 
> time		event		source
> ---------------------------------------------
> ...
> 11.322490       GET_PAGE        _o2831/1133     200
> 11.330820       GET_PAGE        _o2820/1122     200
> 11.476960       GET_PAGE        _o2438/740      200
> 11.491210       GET_PAGE        _o2711/1013     200
> 11.502060       GET_PAGE        _o2536/838      200
> 13.048310       GET_PAGE        _o2887/1189     200
> 13.054010       GET_PAGE        _o2582/884      200
> 13.073399       REQUEST_DONE    _o2711/1013     7       191989
> 13.182151       REQUEST_DONE    _o2831/1133     6       722069
> 13.347800       GET_PAGE        _o1731/33       200
> 13.554530       GET_PAGE        _o2427/729      200
> 13.630060       REQUEST_DONE    _o2536/838      4       408025
> 13.652410       GET_PAGE        _o2619/921      200
> 13.654830       GET_PAGE        _o2661/963      200
> 13.683290       GET_PAGE        _o2648/950      200
> ...
> 
> 
> one of the task I want to do is to output the time difference between 
> GET_PAGE and REQUEST_DONE from the same source.
> as they are huge files, I have to process it line by line.
> 
> Is there an easy way to do it without keeping all sources in an array ?

Well, you'd more probably keep them in a hash for this purpose.  If you
delete() those entries whose REQUEST_DONE you have seen, you don't
have to keep them all but only those that are "pending".

Anno


------------------------------

Date: Wed, 14 May 2003 13:16:58 +0200
From: Josef =?iso-8859-1?Q?M=F6llers?= <josef.moellers@fujitsu-siemens.com>
Subject: Re: newbie need help
Message-Id: <3EC225AA.E018F959@fujitsu-siemens.com>

Peng Zhao wrote:
> =

> hi folks,
> =

> I have to process some huge files containing following lines:
> =

> time            event           source
> ---------------------------------------------
> ...
> 11.322490       GET_PAGE        _o2831/1133     200
> 11.330820       GET_PAGE        _o2820/1122     200
> 11.476960       GET_PAGE        _o2438/740      200
> 11.491210       GET_PAGE        _o2711/1013     200
> 11.502060       GET_PAGE        _o2536/838      200
> 13.048310       GET_PAGE        _o2887/1189     200
> 13.054010       GET_PAGE        _o2582/884      200
> 13.073399       REQUEST_DONE    _o2711/1013     7       191989
> 13.182151       REQUEST_DONE    _o2831/1133     6       722069
> 13.347800       GET_PAGE        _o1731/33       200
> 13.554530       GET_PAGE        _o2427/729      200
> 13.630060       REQUEST_DONE    _o2536/838      4       408025
> 13.652410       GET_PAGE        _o2619/921      200
> 13.654830       GET_PAGE        _o2661/963      200
> 13.683290       GET_PAGE        _o2648/950      200
> ...
> =

> one of the task I want to do is to output the time difference between
> GET_PAGE and REQUEST_DONE from the same source.
> as they are huge files, I have to process it line by line.
> =

> Is there an easy way to do it without keeping all sources in an array ?=


Consider hashes, keeping only those "GET_PAGE" entries that have not yet
been "REQUEST_DONE"ed, delete()ing the REQUEST_DONEed.

-- =

Josef M=F6llers (Pinguinpfleger bei FSC)
	If failure had no penalty success would not be a prize
						-- T.  Pratchett


------------------------------

Date: Wed, 14 May 2003 14:10:53 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: newbie need help
Message-Id: <pan.2003.05.14.12.10.49.391517@aursand.no>

On Wed, 14 May 2003 13:08:15 +0200, Peng Zhao wrote:
> I have to process some huge files containing following lines:
> [...]

Try this little script;  It seems to do what you want, but it's hacked
together in just a few minutes.  I myself see a lot of optimizations which
could be done.

----
#!/usr/bin/perl
#
use strict;
use warnings;
 
 
my %entries = ();
while ( <DATA> ) {
        chomp;
        my ( $time, $command, $source, @foo ) = split( /\s+/ );
        if ( $command eq 'REQUEST_DONE' ) {
                if ( exists $entries{$source} ) {
                        my $diff = $time - $entries{$source};
            print $source . "\t" . $diff . "\n";
                }
                else {
                        # REQUEST_DONE found for a source, but the
                        # source doesn't have a GET_PAGE time set.
                }
        }
        elsif ( $command eq 'GET_PAGE' ) {
                $entries{$source} = $time;
        }
        else {
                # Unknown $command
        }
};
 
 
__DATA__
11.322490       GET_PAGE        _o2831/1133     200
11.330820       GET_PAGE        _o2820/1122     200
11.476960       GET_PAGE        _o2438/740      200
11.491210       GET_PAGE        _o2711/1013     200
11.502060       GET_PAGE        _o2536/838      200
13.048310       GET_PAGE        _o2887/1189     200
13.054010       GET_PAGE        _o2582/884      200
13.073399       REQUEST_DONE    _o2711/1013     7       191989
13.182151       REQUEST_DONE    _o2831/1133     6       722069
13.347800       GET_PAGE        _o1731/33       200
13.554530       GET_PAGE        _o2427/729      200
13.630060       REQUEST_DONE    _o2536/838      4       408025
13.652410       GET_PAGE        _o2619/921      200
13.654830       GET_PAGE        _o2661/963      200
----


-- 
Tore Aursand <tore@aursand.no>



------------------------------

Date: Wed, 14 May 2003 09:01:17 -0500
From: Barry Kimelman <barryk2@SPAM-KILLER.mts.net>
Subject: Re: newbie need help
Message-Id: <MPG.192c08046b45da8b9897ce@news.mts.net>

[This followup was posted to comp.lang.perl.misc]

In article <3EC2239F.3050401@epfl.ch>, Peng Zhao (peng.zhao@epfl.ch) 
says...
> hi folks,
> 
> I have to process some huge files containing following lines:
> 
> time		event		source
> ---------------------------------------------
> ...
> 11.322490       GET_PAGE        _o2831/1133     200
> 11.330820       GET_PAGE        _o2820/1122     200
> 11.476960       GET_PAGE        _o2438/740      200
> 11.491210       GET_PAGE        _o2711/1013     200
> 11.502060       GET_PAGE        _o2536/838      200
> 13.048310       GET_PAGE        _o2887/1189     200
> 13.054010       GET_PAGE        _o2582/884      200
> 13.073399       REQUEST_DONE    _o2711/1013     7       191989
> 13.182151       REQUEST_DONE    _o2831/1133     6       722069
> 13.347800       GET_PAGE        _o1731/33       200
> 13.554530       GET_PAGE        _o2427/729      200
> 13.630060       REQUEST_DONE    _o2536/838      4       408025
> 13.652410       GET_PAGE        _o2619/921      200
> 13.654830       GET_PAGE        _o2661/963      200
> 13.683290       GET_PAGE        _o2648/950      200
> ...
> 
> 
> one of the task I want to do is to output the time difference between 
> GET_PAGE and REQUEST_DONE from the same source.
> as they are huge files, I have to process it line by line.
> 
> Is there an easy way to do it without keeping all sources in an array ?
> 
> thanks in advance for any help.
> 
> cheers,
> 
> P.

Try the following code.
Note : you may want to add more error checking...


$filename = $ARGV[0];
open(INPUT,"<$filename") or
    die("Can't open $filename : $!\n");

%get_page = ();
while ( $buffer = <INPUT> ) {
    chomp $buffer;
    @fields = split(/\s+/,$buffer);
    $time = $fields[0];
    $type = $fields[1];
    $source = $fields[2];
    if ( $type eq "GET_PAGE" ) {
        $get_page{$source} = $time;
    } elsif ( $type eq "REQUEST_DONE" ) {
        if ( exists $get_page{$source} ) {
            $difference = $time - $get_page{$source};
            print "Source : $source , request time : $difference\n";
            delete $get_page{$source};
        }
        else {
            # found a REQUEST_DONE wuthout a corresponding
            # GET_PAGE
        }
    } else {
        # unexpected type ???
    }
}
close INPUT;

-- 
---------

Barry Kimelman
Winnipeg, Manitoba, Canada
email : bkimelman@hotmail.com


------------------------------

Date: Wed, 14 May 2003 08:43:20 -0500
From: Barry Kimelman <barryk2@SPAM-KILLER.mts.net>
Subject: Re: removing spaces in a string
Message-Id: <MPG.192c03f8c5420ccf9897cd@news.mts.net>

[This followup was posted to comp.lang.perl.misc]

In article <337213e2.0305130932.730d10ac@posting.google.com>, Kevin 
(kgiles@optonline.net) says...
> how do you remove spaces in a string varaible?
> 
> thanks
> 
> kg
> 

$buffer =~ s/ //g;

-- 
---------

Barry Kimelman
Winnipeg, Manitoba, Canada
email : bkimelman@hotmail.com


------------------------------

Date: Wed, 14 May 2003 13:18:22 +0200
From: Lechu  <kopetnik@s-pam-nie.yahoo.com>
Subject: Which module to use for ordered hashes?
Message-Id: <v994cvgpho8o22b4i1cd21r2f9vuuu4phd@4ax.com>

There are some implementations of ordered hashes on CPAN. (modules
Tie::*)

Which one to use? I need something simple and efficient.

Lech


------------------------------

Date: Wed, 14 May 2003 14:00:23 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: Which module to use for ordered hashes?
Message-Id: <pan.2003.05.14.12.00.20.669090@aursand.no>

On Wed, 14 May 2003 13:18:22 +0200, Lechu wrote:
> There are some implementations of ordered hashes on CPAN.
> [...]

That's right, and this question seems to be answered in the FAQ;

  How can I always keep my hash sorted?
 
  You can look into using the DB_File module and tie() using
  the $DB_BTREE hash bindings as documented in "In Memory
  Databases" in DB_File.  The Tie::IxHash module from CPAN
  might also be instructive.


-- 
Tore Aursand <tore@aursand.no>



------------------------------

Date: Wed, 14 May 2003 14:33:33 +0200
From: Lechu  <kopetnik@s-pam-nie.yahoo.com>
Subject: Re: Which module to use for ordered hashes?
Message-Id: <lnd4cv0vas27um123ag7v6i6i48nhon0sc@4ax.com>

On Wed, 14 May 2003 14:00:23 +0200, Tore Aursand <tore@aursand.no>
wrote:

>On Wed, 14 May 2003 13:18:22 +0200, Lechu wrote:
>> There are some implementations of ordered hashes on CPAN.
>> [...]

>The Tie::IxHash module from CPAN
>  might also be instructive.
Unfortunately this module doesn't provide solution for
multidimensional hashes:


my $hash={};
tie %$hash,'Tie::IxHash';

# here the order is preserved
$hash->{a}={};
$hash->{b}={};
$hash->{c}={};
$hash->{d}={};
$hash->{e}={};
$hash->{f}={};

# here the order is lost
$hash->{a}{a}={};
$hash->{a}{b}={};
$hash->{a}{c}={};
$hash->{a}{d}={};
$hash->{a}{e}={};
$hash->{a}{f}={}; 


------------------------------

Date: Wed, 14 May 2003 09:12:31 -0500
From: bob <bob@nowhere.com>
Subject: Re: Which module to use for ordered hashes?
Message-Id: <3ec24ecb_1@127.0.0.1>

On Wed, 14 May 2003 07:33:33 -0500, Lechu wrote:

> On Wed, 14 May 2003 14:00:23 +0200, Tore Aursand <tore@aursand.no>
> wrote:
> 
>>On Wed, 14 May 2003 13:18:22 +0200, Lechu wrote:
>>> There are some implementations of ordered hashes on CPAN. [...]
> 
>>The Tie::IxHash module from CPAN
>>  might also be instructive.
> Unfortunately this module doesn't provide solution for multidimensional
> hashes:
> 
> 
> my $hash={};
> tie %$hash,'Tie::IxHash';
> 
> # here the order is preserved
> $hash->{a}={};
> $hash->{b}={};
> $hash->{c}={};
> $hash->{d}={};
> $hash->{e}={};
> $hash->{f}={};


There's a workaround for this.  Solution I used in one project is to
insert something like this here:

$out = {};

tie %{$out}, 'Tie::IxHash'; 

#THIS PART UNTESTED
$hash->{a}{a}= $out;

#but this out to work:
# $out->{a} = {};
# $out->{b} = {}; 
# etc

> # here the order is lost
> $hash->{a}{a}={};
> $hash->{a}{b}={};
> $hash->{a}{c}={};
> $hash->{a}{d}={};
> $hash->{a}{e}={};
> $hash->{a}{f}={};


----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4983
***************************************


home help back first fref pref prev next nref lref last post