[30147] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1390 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Mar 25 11:09:44 2008

Date: Tue, 25 Mar 2008 08:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 25 Mar 2008     Volume: 11 Number: 1390

Today's topics:
    Re: Pattern matching <yankeeinexile@gmail.com>
    Re: Perl code to fill-in online form <hjp-usenet2@hjp.at>
        Readline using foreach and while <hundredrabh@gmail.com>
    Re: Readline using foreach and while <benkasminbullock@gmail.com>
    Re: Readline using foreach and while <noreply@gunnar.cc>
    Re: Readline using foreach and while <someone@example.com>
    Re: Readline using foreach and while <jurgenex@hotmail.com>
    Re: strategys other than subroutine and OO? <szrRE@szromanMO.comVE>
    Re: The huge amount response data problem <falconzyx@gmail.com>
    Re: The huge amount response data problem <falconzyx@gmail.com>
    Re: The huge amount response data problem <RedGrittyBrick@SpamWeary.foo>
    Re: The huge amount response data problem <jurgenex@hotmail.com>
    Re: use base and @ISA with package re-definition <Peter@PSDT.com>
    Re: use base and @ISA with package re-definition bigmattstud@gmail.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 25 Mar 2008 08:13:45 -0600
From: Lawrence Statton <yankeeinexile@gmail.com>
Subject: Re: Pattern matching
Message-Id: <877ifqlwme.fsf@hummer.cluon.com>

Deepan Perl XML Parser <deepan.17@gmail.com> writes:
> 
> No i am writing my own XML parser.

Don't.  There are many good XML parsers out there, the world doesn't
need another one.

-- 
	Lawrence Statton - lawrenabae@abaluon.abaom s/aba/c/g
Computer  software  consists of  only  two  components: ones  and
zeros, in roughly equal proportions.   All that is required is to
place them into the correct order.


------------------------------

Date: Tue, 25 Mar 2008 08:25:18 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Perl code to fill-in online form
Message-Id: <slrnfuha72.asg.hjp-usenet2@hrunkner.hjp.at>

On 2008-03-25 00:47, Babul <minhaztuhin@hotmail.com> wrote:
> On Mar 24, 3:16 pm, "J. Gleixner" <glex_no-s...@qwest-spam-no.invalid>
> wrote:
>> Babulwrote:
>> > The reason I am using Perl,
[...]
>>
>> Thanks for sharing your intentions.  If you have questions, then post
>> your code and ask, otherwise no one really needs to know your
>> intentions.
>
> If you think that you are really helpful, please see my code at the
> beginning of the discussion. Ben Bullock suggested me to use
> greasemonkey, so I explained here why I am using Perl.

Be more careful in quoting, then. You quoted a lot from Ben's reply but
you didn't quote the part about gresemonkey. Please quote the part you
are replying to (and only that) - otherwise nobody knows what you are
talking about.

The part which you did quote however, did probably identify your
problem. Did you try his advice?

	hp


------------------------------

Date: Tue, 25 Mar 2008 00:25:01 -0700 (PDT)
From: Saurabh Jain <hundredrabh@gmail.com>
Subject: Readline using foreach and while
Message-Id: <de27040f-e3df-41cf-bc25-965244ac5a44@u10g2000prn.googlegroups.com>

Hi,
   Is there any difference in reading a file using a while or a
foreach in perl?

If I do :
 foreach(<filehandle>) {
    my $local = <filehandle>;  # I assumed I will increment the file
descriptor here
    print " local $local\n";
}

But if I do :
while(<filehandle>) {
    my $local = <filehandle>;  # I assumed I will increment the file
descriptor here
    print " local $local\n";
}
 It works fine....
Is there something wrong or some difference in the two operations? Or
am I missing something?

Thanks and Regards,
Saurabh


Small example to replicate the issue
my file name is test.pl

#!/usr/bin/perl

open (handle,"test.pl")||die "\n $0 Could not open $! \n";

my $line = <handle>;#read a line till \n or eof
print " line $line";
#foreach(<handle>){ # Not as expected
while(<handle>){  # Works as expected
$line =<handle>;#read a line till \n or eof

print " in side $line";
$line =<handle>;#read a line till \n or eof
print " in side $line";
$line =<handle>;#read a line till \n or eof
}
close handle;


------------------------------

Date: Tue, 25 Mar 2008 01:06:47 -0700 (PDT)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: Readline using foreach and while
Message-Id: <21188ba1-8471-4b8f-9429-b47677603fd7@e6g2000prf.googlegroups.com>

On Mar 25, 4:25 pm, Saurabh Jain <hundredr...@gmail.com> wrote:
> Hi,
>    Is there any difference in reading a file using a while or a
> foreach in perl?

The foreach version seems to first read the whole of the file into an
array, and then go through it line by line:

#!/usr/bin/perl
#use warnings;
use strict;
open (handle,"testangleop.pl") or die "$0 Could not open $!";

my $line = <handle>;		#read a line till \n or eof
print "0 line $line";
foreach (<handle>) {		# Not as expected
#while(<handle>){  # Works as expected
    print $_;
    $line =<handle>;		#read a line till \n or eof
    print "1 in side $line";
    $line =<handle>;		#read a line till \n or eof
    print "2 in side $line";
    $line =<handle>;		#read a line till \n or eof
    print "3 in side $line";
}

The while seems to increment through the loop.

See also

http://www.unix.org.ua/orelly/perl/prog3/ch02_11.htm


------------------------------

Date: Tue, 25 Mar 2008 09:37:09 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Readline using foreach and while
Message-Id: <64rrq3F2ccqmtU1@mid.individual.net>

Saurabh Jain wrote:
>    Is there any difference in reading a file using a while or a
> foreach in perl?

Yes.

     foreach (<FILEHANDLE>)

reads the whole file at once, and creates a list in memory of all the 
lines, so that method is inefficient and not recommended in most cases.

     while (<FILEHANDLE>)

reads one line at a time.

Please study "perldoc perlsyn" for more comprehensive descriptions.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Tue, 25 Mar 2008 11:59:23 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: Readline using foreach and while
Message-Id: <vO5Gj.122942$C61.40783@edtnps89>

Ben Bullock wrote:
> On Mar 25, 4:25 pm, Saurabh Jain <hundredr...@gmail.com> wrote:
>> Hi,
>>    Is there any difference in reading a file using a while or a
>> foreach in perl?
> 
> The foreach version seems to first read the whole of the file into an
> array, and then go through it line by line:

perldoc -q "What is the difference between a list and an array"


John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Tue, 25 Mar 2008 13:00:01 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Readline using foreach and while
Message-Id: <q0thu39qpemvfq2o5rm6lk4i1e95m2m8io@4ax.com>

Saurabh Jain <hundredrabh@gmail.com> wrote:
>   Is there any difference in reading a file using a while or a
>foreach in perl?

Yes. Reading a file line by line works only with while. foreach reads the
whole file at once.

>If I do :
> foreach(<filehandle>) {
>    my $local = <filehandle>;  # I assumed I will increment the file

You just tried to read from a file that is at EOF already.

>descriptor here
>    print " local $local\n";
>}
>
>But if I do :
>while(<filehandle>) {
>    my $local = <filehandle>;  # I assumed I will increment the file

You are alternating between reading one line into $_ (in the while
condition) and one line into $local. Is this what you meant to do?

>descriptor here
>    print " local $local\n";
>}

>
>Small example to replicate the issue
>my file name is test.pl
>
>#!/usr/bin/perl
>
>open (handle,"test.pl")||die "\n $0 Could not open $! \n";
>
>my $line = <handle>;#read a line till \n or eof
>print " line $line";
>#foreach(<handle>){ # Not as expected
>while(<handle>){  # Works as expected
>$line =<handle>;#read a line till \n or eof
>
>print " in side $line";
>$line =<handle>;#read a line till \n or eof
>print " in side $line";
>$line =<handle>;#read a line till \n or eof

And here you are reading one line into $_ (in the while condition) and then
successively three lines into $line. This may make sense if you know that a
data set has a fixed format of always 4 lines. But in 99% of all cases it's
a bug.
As for the foreach version: it already slurps the whole file into a list,
therefore there is nothing left that could be read into any of the $line.
 
jue


------------------------------

Date: Tue, 25 Mar 2008 00:11:23 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: strategys other than subroutine and OO?
Message-Id: <fsa8ir028em@news3.newsguy.com>

Willem wrote:
> Peter wrote:
> ) On 2008-03-24 05:46, szr <szrRE@szromanMO.comVE> wrote:
> )> Actually it would have to read like:
> )>
> )>    while (<>) {
> )>       some_code();
> )>       if ($x) {
> )>           more_code();
> )>       }
> )>       elsif ($x && $y) {
> )>          even_more_code();
> )>       }
> )>       elsif ($x && $y && $z) {
> )>          if_all_goes_well();
> )>       }
> )>    }

See below...


> )> in order to equate to:
> )>
> )>    while (<>) {
> )>       some_code();
> )>       if ($x) {
> )>          more_code();
> )>          if ($y) {
> )>             even_more_code();
> )>             if ($z) {
> )>                if_all_goes_well();
> )>             }
> )>          }
> )>       }
> )>    }
> )>
> )
> ) Nope. Anyone up for a third try?
> )
> ) hp
>
> Assuming more_code and even_more_code have no effect on $y and $z:
> (Which is an unwarranted assumption, by the way)
>
>    while (<>) {
>       some_code();
>       if ($x && not $y) {
>          more_code();
>       }
>       elsif ($x && not $z) {
>          more_code();
>          even_more_code();
>       }
>       elsif ($x) {
>          more_code();
>          even_more_code();
>          if_all_goes_well();
>       }
>    }
>
> Which is quite silly, of course.

Actually I should of written it as:

   while (<>) {
      some_code();
      if ($x) {
          more_code();
      }
      if ($x && $y) {
         even_more_code();
      }
      if ($x && $y && $z) {
         if_all_goes_well();
      }
   }

That way it falls through giving the same effect as the original and is 
possibly a bit easier to read.

-- 
szr 




------------------------------

Date: Tue, 25 Mar 2008 01:25:37 -0700 (PDT)
From: "falconzyx@gmail.com" <falconzyx@gmail.com>
Subject: Re: The huge amount response data problem
Message-Id: <68de44d6-0de4-4587-9b5b-7a9b9041b89a@u10g2000prn.googlegroups.com>

On Mar 25, 3:06 pm, Ben Bullock <benkasminbull...@gmail.com> wrote:
> Your code is hopelessly inefficient. 100,000 strings of even twenty
> characters is at least two megabytes of memory. Then you've doubled
> that number with the creation of the URL, and then you are creating
> arrays of all these things, so you've used several megabytes of
> memory.
>
> Instead of first creating a huge array of names, then a huge array of
> URLs, why don't you just read in one line of the file at a time, then
> try to get data from each URL? Read in one line of the first file,
> create its URL, get the response data, store it, then go back and get
> the next line of the file, etc. A 100,000 line file actually isn't
> that big.
>
> But if you are getting all these files from the internet, the biggest
> bottleneck is probably the time the code spends waiting for a response
> from the web servers it's requested. You'd have to think about making
> parallel requests somehow to solve that.

Thanks Ben,

However, is there any good solution that use threads method? I use
that, and out of memory time by time after I refactor the code as you
told
I try thread::Pool and some other thread module that I found.
Doesn't it really Perl suit for mutil threads programming??

Thanks again for eveyone.


------------------------------

Date: Tue, 25 Mar 2008 02:01:37 -0700 (PDT)
From: "falconzyx@gmail.com" <falconzyx@gmail.com>
Subject: Re: The huge amount response data problem
Message-Id: <b362d7fb-c218-4170-9e35-e3f199053eb4@e23g2000prf.googlegroups.com>

On Mar 25, 4:25 pm, "falcon...@gmail.com" <falcon...@gmail.com> wrote:
> On Mar 25, 3:06 pm, Ben Bullock <benkasminbull...@gmail.com> wrote:
>
>
>
> > Your code is hopelessly inefficient. 100,000 strings of even twenty
> > characters is at least two megabytes of memory. Then you've doubled
> > that number with the creation of the URL, and then you are creating
> > arrays of all these things, so you've used several megabytes of
> > memory.
>
> > Instead of first creating a huge array of names, then a huge array of
> > URLs, why don't you just read in one line of the file at a time, then
> > try to get data from each URL? Read in one line of the first file,
> > create its URL, get the response data, store it, then go back and get
> > the next line of the file, etc. A 100,000 line file actually isn't
> > that big.
>
> > But if you are getting all these files from the internet, the biggest
> > bottleneck is probably the time the code spends waiting for a response
> > from the web servers it's requested. You'd have to think about making
> > parallel requests somehow to solve that.
>
> Thanks Ben,
>
> However, is there any good solution that use threads method? I use
> that, and out of memory time by time after I refactor the code as you
> told
> I try thread::Pool and some other thread module that I found.
> Doesn't it really Perl suit for mutil threads programming??
>
> Thanks again for eveyone.

here is my refactor code :
use threads;
use LWP::UserAgent;
use Data::Dumper;
use strict;



&get_request();

sub get_request {
		open (FH, "...") or die "can not open file $!";
    while (<FH>) {
        my $i = <FH>;
        my $url = ".../$i";
        my $t = threads->new(\&get_html, $url);
        $t->join();

    }
    close (FH);
}
sub get_html {
    my ($url) = @_;
    my $user_agent = LWP::UserAgent->new();
    my $response = $user_agent->request(HTTP::Request->new('GET',
$url));
    my $content = $response->content;
    format_html ($content);
}
sub format_html {
		my ($content) = shift;
    my $html_data = $content;
    my $word;
    my $data;
    while ( $html_data =~ m{...}igs ) {
    	$word = $1;
 		}
    while ( $html_data =~ m{...}igs ) {
        $data = $1;
        save_data( $word, $data );
    }
    while ( $data =~ m{...}igs ) {
        my $title = $1;
        my $sound = $1.$2;
        if ( defined($sound) ) {
        	save_sound( $word, $title, $sound );
        }
    }
}

sub save_data {
	my ( $word, $data ) = @_;
    open ( FH, " > ..." ) or die "Can not open $!";
    print FH $data;
    close(FH);
}

sub save_sound {
	my ( $word, $title, $sound ) = @_;
    getstore("....", "...") or warn $!;
}


------------------------------

Date: Tue, 25 Mar 2008 09:49:20 +0000
From: RedGrittyBrick <RedGrittyBrick@SpamWeary.foo>
Subject: Re: The huge amount response data problem
Message-Id: <47e8caa2$0$32055$da0feed9@news.zen.co.uk>

falconzyx@gmail.com wrote:
> On Mar 25, 3:06 pm, Ben Bullock <benkasminbull...@gmail.com> wrote:
>> Your code is hopelessly inefficient. 100,000 strings of even twenty
>> characters is at least two megabytes of memory. Then you've doubled
>> that number with the creation of the URL, and then you are creating
>> arrays of all these things, so you've used several megabytes of
>> memory.
>>
>> Instead of first creating a huge array of names, then a huge array of
>> URLs, why don't you just read in one line of the file at a time, then
>> try to get data from each URL? Read in one line of the first file,
>> create its URL, get the response data, store it, then go back and get
>> the next line of the file, etc. A 100,000 line file actually isn't
>> that big.
>>
>> But if you are getting all these files from the internet, the biggest
>> bottleneck is probably the time the code spends waiting for a response
>> from the web servers it's requested. You'd have to think about making
>> parallel requests somehow to solve that.
> 
> Thanks Ben,
> 
> However, is there any good solution that use threads method? I use
> that, and out of memory time by time after I refactor the code as you
> told

That's because, if your file contains 100000 lines, your program tries 
to create 100000 simultaneous threads doesn't it?

I would create a pool with a fixed number of threads (say 10). I'd read 
the file adding tasks to a queue of the same size, after filling the 
queue I'd pause reading the file until the queue has a spare space. 
Maybe this could be achieved by sleeping a while (say 100ms) and 
re-checking if the queue is stuill full. When a thread is created or has 
finished a task it should remove a task from the queue and process it. 
If the queue is empty the thread should sleep for a while (say 200ms) 
and try again, you'd need some mechanism to signal threads that all 
tasks have been queued (maybe a flag, a special marker task, a signal or 
a certain number of consecutive failed attempts to find work.)

I've never tried to program something like this in Perl so I'd imagine 
someone (probably several people) has already solved this and added 
modules to CPAN to assist in this sort of task.

There's probably some OO Design Patterns that apply too.

> I try thread::Pool and some other thread module that I found.
> Doesn't it really Perl suit for mutil threads programming??

I find it hard to understand what you are saying but I think the answer 
is: Yes, Perl is well suited to programming with multiple threads (or 
processes).

-- 
RGB


------------------------------

Date: Tue, 25 Mar 2008 13:03:30 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: The huge amount response data problem
Message-Id: <mothu3lt3upv1b38ghchmmgfc8hqolha07@4ax.com>

"falconzyx@gmail.com" <falconzyx@gmail.com> wrote:
>consturct almost 200000 url address to send and parse response data.
>And the speed is very very slow.
>
>Please give me some advices that what I should do to improve the speed

Get a T1 line.

jue


------------------------------

Date: Tue, 25 Mar 2008 12:09:19 GMT
From: Peter Scott <Peter@PSDT.com>
Subject: Re: use base and @ISA with package re-definition
Message-Id: <pan.2008.03.25.12.09.13.642197@PSDT.com>

On Mon, 24 Mar 2008 18:34:07 -0700, bigmattstud wrote:
> I'm trying to load multiple classes with the same name from different
> directories in order to implement a pluggable deployment system.  I
> have managed to get the classes to be redefined by playing with the
> global symbol table, and it seems to be behaving the way I want except
> for the definition of @ISA.  Here's some sample code:
> 
> Here's the driving script:
> 
> #  Create a class and instantiate it from the first directory
> print "Constructing MyMod from D:/Data/Perl/Module1\n" ;
> push @INC,'D:/Data/Perl/Module1' ;
> require MyMod ;
> print "MyMod is a ",join(',',@MyMod::ISA),"\n" ;
> $build = MyMod->new() ;
> print "\n" ;
> 
> #  Clean out the previous definition of MyMod by deleting the entries
> from the INC hash and the global symbol
> #  table
> delete $INC{"MyMod.pm"} ;
> delete $main::{'MyMod::'} ;
> pop @INC ;
> 
> #  Create a class and instantiate it from the second directory
> print "Constructing MyMod from D:/Data/Perl/Module2\n" ;
> push @INC,"D:/Data/Perl/Module2" ;
> require MyMod ;
> print "MyMod is a ",join(',',@MyMod::ISA),"\n" ;
> $build = MyMod->new() ;

[snip]

> This is the output:
> 
> Constructing MyMod from D:/Data/Perl/Module1
> MyMod is a BaseBuild1
> Inside BaseBuild1 constructor
> 
> Constructing MyMod from D:/Data/Perl/Module2
> MyMod is a BaseBuild1
> Inside BaseBuild2 constructor

Your problem stems from the line

  delete $main::{'MyMod::'} ;

which falls into the "don't do that" category.  There have been attempts
in the past to patch perl to issue a warning or error on such statements. 
Even now it is fraught with danger:

$ perl -e 'delete $main::{"foo::"}; push @foo::ISA, "bar"'
Segmentation fault

Oops.  If you want further proof that this is too dangerous to use, run
your program under the debugger and right before the second statement

  print "MyMod is a ",join(',',@MyMod::ISA),"\n" ;

examine the value of @MyMod::ISA.  Scary.

Uri's answer is the best help.  If you wanted to persist in the direction
you were going, instead of deleting the whole symbol table, just empty
@MyMod::ISA and turn off subroutine redeclaration warnings.

-- 
Peter Scott
http://www.perlmedic.com/
http://www.perldebugged.com/



------------------------------

Date: Tue, 25 Mar 2008 05:44:30 -0700 (PDT)
From: bigmattstud@gmail.com
Subject: Re: use base and @ISA with package re-definition
Message-Id: <921f5702-869d-44ea-8249-664b98c48f79@d4g2000prg.googlegroups.com>

On Mar 25, 11:09 pm, Peter Scott <Pe...@PSDT.com> wrote:

> Uri's answer is the best help.  If you wanted to persist in the direction
> you were going, instead of deleting the whole symbol table, just empty
> @MyMod::ISA and turn off subroutine redeclaration warnings.
>

I'm not sure that Uri's answer is going to help in my case because of
difficulties in ensuring that there will never be namespace conflicts,
but I did try your suggestion and it seems to work better.  I also
cleaned it up to remove a bit more of the other magic being done.

#  Create a class and instantiate it from the first directory
print "Constructing MyMod from D:/Data/Perl/Module1\n" ;
require 'D:/Data/Perl/Module1/MyMod.pm' ;
print "MyMod is a ",join(',',@MyMod::ISA),"\n" ;
$build = MyMod->new() ;
print "\n" ;

#  Clean out the previous definition of MyMod
undef @MyMod::ISA ;

#  Create a class and instantiate it from the second directory
print "Constructing MyMod from D:/Data/Perl/Module2\n" ;
require 'D:/Data/Perl/Module2/MyMod.pm' ;
print "MyMod is a ",join(',',@MyMod::ISA),"\n" ;
$build = MyMod->new() ;

This is the output:

Constructing MyMod from D:/Data/Perl/Module1
MyMod is a BaseBuild1
Inside BaseBuild1 constructor

Constructing MyMod from D:/Data/Perl/Module2
MyMod is a BaseBuild2
Inside BaseBuild2 constructor

Thanks for your help



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1390
***************************************


home help back first fref pref prev next nref lref last post