[28198] in Perl-Users-Digest
Perl-Users Digest, Issue: 9562 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Aug 4 14:10:15 2006
Date: Fri, 4 Aug 2006 11:10:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 4 Aug 2006 Volume: 10 Number: 9562
Today's topics:
Re: Recursion <koko_loko_0@yahoo.co.uk>
Re: Recursion <koko_loko_0@yahoo.co.uk>
Re: Recursion xhoster@gmail.com
Re: Recursion xhoster@gmail.com
Re: Recursion <koko_loko_0@yahoo.co.uk>
Re: Regex...HTML::Parser...Getting webpage data? <wbresson@gmail.com>
Re: Regex...HTML::Parser...Getting webpage data? <mritty@gmail.com>
Re: Regex...HTML::Parser...Getting webpage data? <jgibson@mail.arc.nasa.gov>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 4 Aug 2006 15:47:11 +0100
From: "kokolo" <koko_loko_0@yahoo.co.uk>
Subject: Re: Recursion
Message-Id: <eavmlf$mn8$1@news.al.sw.ericsson.se>
<xhoster@gmail.com> wrote in message
news:20060803140819.424$dp@newsreader.com...
> That said, your program does seem to be N**2, rather than NlogN, and I see
> no reason that it should be. Even with all the badness built in, it
should
> still be NlogN, or maybe N(logN)**2, not N**2. I don't quite get it.
>
>
> > I wonder how good QuickSort can be in Perl so it will show me how bad or
> > good my algorithm is.
>
> It can be at least 100 times faster. ("use sort '_quicksort';" give me
> a built in sort that takes <1 second to sort 305780 numbers, versus 140
> seconds for your sort). OK, so that may not qualify as being "in Perl",
> because I'm sure much of it in C, but is available from Perl and still
> uses the Perl variable access methods and such.
>
I re-wrote so that the subroutine works with indexes of the original array.
100000 for 14 seconds, 305780 for 112. Very very bad.
I wonder how much it can be optimized and what are bottlenecks here:
_____________________________________________________________
print "Enter the number of elements to be sorted: \n";
chomp($size=<STDIN>);
foreach (1..$size) {push @array, int(rand(1000))}
$start=time();
qs(0,$#array);
#print "@array \n";
print "The array of $size elements Quicksorted in ",time()-$start," seconds.
\n";
sub qs{
my $left=shift;
my $right=shift;
my @smaller;
my @bigger;
for ($i=$left;$i<$right;$i++){
if ($array[$i] <= $array[$right]) {push @smaller,$array[$i]}
else {push @bigger,$array[$i]}
}
$pivot = $left + @smaller;
$array[$pivot] = $array[$right];
if ($pivot>$left){
$i=$pivot;
while (@smaller){$array[--$i]=pop @smaller};
if ($pivot-1>$left){qs($left,$pivot-1)}
}
if ($pivot<$right){
$i=$pivot;
while (@bigger){$array[++$i]=pop @bigger};
if ($pivot+1<$right){qs($pivot+1,$right)}
}
}
______________________________________________________________
kokolo
------------------------------
Date: Fri, 4 Aug 2006 16:09:02 +0100
From: "kokolo" <koko_loko_0@yahoo.co.uk>
Subject: Re: Recursion
Message-Id: <eavnue$n67$1@news.al.sw.ericsson.se>
"Ted Zlatanov" <tzz@lifelogs.com> wrote in message
news:g69y7u5g0ks.fsf@CN1374059D0130.kendall.corp.akamai.com...
> Try doing this with numeric offsets:
>
> qs(\@data, $start, $pivot, $end);
>
> then in qs:
>
> # untested example code
> sub qs
> {
> my $data = shift @_;
> my $start = shift @_; # start offset of range to be qsorted
> my pivot = shift @_; # pivot offset of range to be qsorted
> my $end = shift @_; # end offset of range to be qsorted
> ... examples of usage follow ...
> # get element 15 of $data
> $data->[15];
> ...
> # swap elements 12 and 22 of $data
> ($data->[12], $data->[22]) = ($data->[22], $data->[12]);
> ...
> # sort a subrange from inside qs()
> qs($data, $newstart, $newpivot, $newend);
> ...
> }
>
> I think the offsets are what you've been missing... If you just pass
> a reference, you'll never know what part of it to sort, and if you
> make a new array you're not sorting the original.
>
> Ted
I made it with sending $left and $right to the subroutine and sorting the
original array between these limits.You can take a look at the other post
for the code.
Is there still a need to send the array reference? Even in your example is
it a benefit if you work with the reference and not with the original array?
kokolo
------------------------------
Date: 04 Aug 2006 15:42:45 GMT
From: xhoster@gmail.com
Subject: Re: Recursion
Message-Id: <20060804115109.036$ZI@newsreader.com>
"kokolo" <koko_loko_0@yahoo.co.uk> wrote:
>
> I made it with sending $left and $right to the subroutine and sorting the
> original array between these limits.You can take a look at the other post
> for the code.
> Is there still a need to send the array reference?
That depends. Are you happy needing to create a separate sort routine
for every differently named array you want to sort?
> Even in your example
> is it a benefit if you work with the reference and not with the original
> array?
Yes. Then you can tell the sort routine which array to sort.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
------------------------------
Date: 04 Aug 2006 15:50:18 GMT
From: xhoster@gmail.com
Subject: Re: Recursion
Message-Id: <20060804115842.187$8B@newsreader.com>
"kokolo" <koko_loko_0@yahoo.co.uk> wrote:
> <xhoster@gmail.com> wrote in message
> news:20060803140819.424$dp@newsreader.com...
> > That said, your program does seem to be N**2, rather than NlogN, and I
> > see no reason that it should be. Even with all the badness built in,
> > it
> should
> > still be NlogN, or maybe N(logN)**2, not N**2. I don't quite get it.
> >
> >
> > > I wonder how good QuickSort can be in Perl so it will show me how bad
> > > or good my algorithm is.
> >
> > It can be at least 100 times faster. ("use sort '_quicksort';" give me
> > a built in sort that takes <1 second to sort 305780 numbers, versus 140
> > seconds for your sort). OK, so that may not qualify as being "in
> > Perl", because I'm sure much of it in C, but is available from Perl and
> > still uses the Perl variable access methods and such.
> >
>
> I re-wrote so that the subroutine works with indexes of the original
> array. 100000 for 14 seconds, 305780 for 112. Very very bad.
It also gives the wrong answer!
> I wonder how much it can be optimized and what are bottlenecks here:
Don't wonder, profile! Devel::SmallProf
>
> sub qs{
> my $left=shift;
> my $right=shift;
> my @smaller;
> my @bigger;
>
> for ($i=$left;$i<$right;$i++){
>
> if ($array[$i] <= $array[$right]) {push @smaller,$array[$i]}
> else {push @bigger,$array[$i]}
> }
>
> $pivot = $left + @smaller;
> $array[$pivot] = $array[$right];
Your array now has two of whatever was in $array[$right], and none of
whatever was in $array[$pivot].
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
------------------------------
Date: Fri, 4 Aug 2006 17:40:52 +0100
From: "kokolo" <koko_loko_0@yahoo.co.uk>
Subject: Re: Recursion
Message-Id: <eavtam$p74$1@news.al.sw.ericsson.se>
<xhoster@gmail.com> wrote in message
news:20060804115842.187$8B@newsreader.com...
> It also gives the wrong answer!
>
> > I wonder how much it can be optimized and what are bottlenecks here:
>
> Don't wonder, profile! Devel::SmallProf
>
> >
> > sub qs{
> > my $left=shift;
> > my $right=shift;
> > my @smaller;
> > my @bigger;
> >
> > for ($i=$left;$i<$right;$i++){
> >
> > if ($array[$i] <= $array[$right]) {push @smaller,$array[$i]}
> > else {push @bigger,$array[$i]}
> > }
> >
> > $pivot = $left + @smaller;
> > $array[$pivot] = $array[$right];
>
> Your array now has two of whatever was in $array[$right], and none of
> whatever was in $array[$pivot].
>
> Xho
Damn, I can't get it. It is obviously wrong but it works 3-4 times out of 5
and it tricked me. It looked suspisious to me as there were too many
repeating numbers .
As for what you mentioned about $pivot, $array[$pivot] is saved either in
@smaller or @bigger.
I need to re-think it as I cannot figure out why it works if I extract
if ($pivot-1>$left){qs($left,$pivot-1)}
if ($pivot+1<$right){qs($pivot+1,$right)}
from the "if" statements and put them at the end of the sub.
kokolo
------------------------------
Date: 4 Aug 2006 08:44:58 -0700
From: "Wesley Bresson" <wbresson@gmail.com>
Subject: Re: Regex...HTML::Parser...Getting webpage data?
Message-Id: <1154706298.589975.262540@s13g2000cwa.googlegroups.com>
xhoster@gmail.com wrote:
> "Wesley Bresson" <wbresson@gmail.com> wrote:
> > I tried your code for this page and it errored out but
> > I'm assuming its either my windows perl that is messing it up or extra
> > spaces in the copy/paste, I saved the page to the same dir that I was
> > running from but no go. I'll look at it more later, thanks for your
> > help
> >
> > C:\Users\Me\Desktop>perl -0777 -lne 's/\s+/ /g;/2006 1oz Silver
> > American Eagles
> > .+?20 - 99.*?\$(\d{1,5}\.\d\d)/and print "$1\n";'
> > Silver_American_Eagles.html
> > Can't find string terminator "'" anywhere before EOF at -e line 1.
>
> On Windows, you need to wrap your -e program in double quotes rather
> than single quotes which means you need to change any double quotes
> occuring inside the script to something else, like qq'$1\n'
>
> Or just put the program into a file.
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> $/=undef; # same as the -0777 command line
> $_=<>; # slurp
> s/\s+/ /g;
> /2006 1oz Silver American Eagles.+?20 - 99.*?\$(\d{1,5}\.\d\d)/
> and print "$1\n";
> __END__
>
> --
> -------------------- http://NewsReader.Com/ --------------------
> Usenet Newsgroup Service $9.95/Month 30GB
Thanks, I can see that works now. Now, hang in with a newbie, but I'm
trying to understand why exactly your code works.
$/=undef --- inputs the whole file instead of one line by one line
correct ? Why is it needed ?
$_=<> --not sure....is this what inputs the file off of the command
line ?
s/\s+/ /g; --not sure...is this taking out the white spaces ? If so why
is it needed ?
------------------------------
Date: 4 Aug 2006 08:51:11 -0700
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Regex...HTML::Parser...Getting webpage data?
Message-Id: <1154706671.838650.113290@b28g2000cwb.googlegroups.com>
Wesley Bresson wrote:
> xhoster@gmail.com wrote:
> > #!/usr/bin/perl
> > use strict;
> > use warnings;
> > $/=undef; # same as the -0777 command line
> > $_=<>; # slurp
> > s/\s+/ /g;
> > /2006 1oz Silver American Eagles.+?20 - 99.*?\$(\d{1,5}\.\d\d)/
> > and print "$1\n";
> Thanks, I can see that works now. Now, hang in with a newbie, but I'm
> trying to understand why exactly your code works.
>
>
> $/=undef --- inputs the whole file instead of one line by one line
> correct ?
Not quite. $/ is the "input record separator" variable. It determines
what makes a "line" when the readline operator is used. As you
surmised, setting it to undef makes the entire file the "line". But
this doesn't do any reading of the file by itself.
> Why is it needed ?
Because HTML can have linebreaks in it. You need to match the entire
string - you can't process the file "line by line" because the pattern
your searching for could span more than one line.
> $_=<> --not sure....is this what inputs the file off of the command
> line ?
Yes. This is what does the reading. Ordinarily, this would read one
line from either the first file given on the command line, or if there
are none, from STDIN. However, because the $/ variable was set to
undef earlier, this reads the entire file, and puts it into $_.
> s/\s+/ /g; --not sure...is this taking out the white spaces ?
Again, not quite. It's replacing all sequences of one or more
whitespace characters (which includes space characters, newlines, and
tabs) with one single whitespace character. It does so for every
sequence of one or more whitespace characters it finds (that's the /g
part)
> If so why is it needed ?
Because your pattern is specifically looking for one space to separate
each "word". Just because that's how it appears in the browser does
not mean that's how it's formatted in the HTML. HTML doesn't care
about whitespace. So it's possible there's actually 5 newlines and 3
tabs between the two words that your pattern is expecting a single
whitespace character between.
Paul Lalli
------------------------------
Date: Fri, 04 Aug 2006 10:05:11 -0700
From: Jim Gibson <jgibson@mail.arc.nasa.gov>
Subject: Re: Regex...HTML::Parser...Getting webpage data?
Message-Id: <040820061005115322%jgibson@mail.arc.nasa.gov>
In article <1154706298.589975.262540@s13g2000cwa.googlegroups.com>,
Wesley Bresson <wbresson@gmail.com> wrote:
> xhoster@gmail.com wrote:
> > #!/usr/bin/perl
> > use strict;
> > use warnings;
> > $/=undef; # same as the -0777 command line
> > $_=<>; # slurp
> > s/\s+/ /g;
> > /2006 1oz Silver American Eagles.+?20 - 99.*?\$(\d{1,5}\.\d\d)/
> > and print "$1\n";
> > __END__
> >
> > --
> > -------------------- http://NewsReader.Com/ --------------------
> > Usenet Newsgroup Service $9.95/Month 30GB
>
> Thanks, I can see that works now. Now, hang in with a newbie, but I'm
> trying to understand why exactly your code works.
>
>
> $/=undef --- inputs the whole file instead of one line by one line
> correct ? Why is it needed ?
To read the entire file at once and process it, after getting rid of
newlines, since the text you are looking for may be on more than one
line.
>
> $_=<> --not sure....is this what inputs the file off of the command
> line ?
The <> is the input operator and returns the result of a read-line
operation. Since $/ is undef, your input file is treated as one big
line, and the whole file ends up as a string in the $_ variable.
>
> s/\s+/ /g; --not sure...is this taking out the white spaces ? If so why
> is it needed ?
It is changing all occurences of whitespace (\s) to a single space,
concatenating any successive whitespace characters (\s+) into one space
character. Since newlines (\n) are whitespace, this also removes all
newlines from your string and you can use space characters in your
regular expression.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 9562
***************************************