[16628] in Perl-Users-Digest
Perl-Users Digest, Issue: 4040 Volume: 9
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 17 03:10:26 2000
Date: Thu, 17 Aug 2000 00:10:17 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <966496217-v9-i4040@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Thu, 17 Aug 2000 Volume: 9 Number: 4040
Today's topics:
Sorting is very slow <mecks@ust.hk>
Re: Sorting is very slow <uri@sysarch.com>
Re: Sorting is very slow <srmalloy@home.com>
Re: Sorting is very slow <uri@sysarch.com>
Re: Sorting is very slow (Gwyn Judd)
Re: Sorting is very slow (Logan Shaw)
Re: windows headache <lr@hpl.hp.com>
Re: windows headache (Keith Calvert Ivey)
Re: windows headache <yhvhboy1@home.com>
Re: windows headache <yhvhboy1@home.com>
Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 17 Aug 2000 11:44:02 +0800
From: Jim Chim <mecks@ust.hk>
Subject: Sorting is very slow
Message-Id: <399B5F82.42E16A26@ust.hk>
This is a multi-part message in MIME format.
--------------24306586744174C0DA66310A
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Hi,
I have write a code to sort some meteorological data which contain
281560 data point.
And write a sortin scheme to sort it. I know bubble sort only. I found
that it take more
than 10 hours to sort the data in PIII 800MHz machine. Can anyone give
me some idea
of how to improve the scheme so that it work faster. Thanks.
My code is attached in this message.
Jim
--------------24306586744174C0DA66310A
Content-Type: application/x-perl;
name="tmp.pl"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="tmp.pl"
#/usr/sbin/perl
.
.
.
o
#Sort by the gust, from small value into large value
for ($i=1; $i<=$c; $i++) {
for ($j=$i; $j<=$c; $j++) {
if ($gust[$i] > $gust[$j]) {
($wdir[$i],$wdir[$j]) = swapnum($wdir[$i],$wdir[$j]);
($gust[$i],$gust[$j]) = swapnum($gust[$i],$gust[$j]);
} # end if
} # end for $j
} # end for $i
# print "gust=@gust \n";
#----------------------------------------------------------------------------------------
#Subroutine to swap two number
sub swapnum {
my ($a, $b)=@_;
my $tmp=0;
$tmp = $a;
$a = $b;
$b = $tmp;
($a,$b);
} # end sub
--------------24306586744174C0DA66310A--
------------------------------
Date: Thu, 17 Aug 2000 04:36:14 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: Sorting is very slow
Message-Id: <x7vgx0wlkw.fsf@home.sysarch.com>
>>>>> "JC" == Jim Chim <mecks@ust.hk> writes:
JC> I have write a code to sort some meteorological data which
JC> contain 281560 data point. And write a sortin scheme to sort
JC> it. I know bubble sort only. I found that it take more than 10
JC> hours to sort the data in PIII 800MHz machine. Can anyone give me
JC> some idea of how to improve the scheme so that it work
JC> faster. Thanks.
did you realize perl has a builtin sort function?
(i hear the loud SMACK to his forehead across the net).
and for more on sorting read perlfaq4 'how do isort by anything'
and
http://www.sysarch.com/perl/sort_paper.html
uri
--
Uri Guttman --------- uri@sysarch.com ---------- http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page ----------- http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net ---------- http://www.northernlight.com
------------------------------
Date: Thu, 17 Aug 2000 05:33:17 GMT
From: Sean Malloy <srmalloy@home.com>
Subject: Re: Sorting is very slow
Message-Id: <CHWbOXOG13w0IBoz0L56RiGf1fUN@4ax.com>
Jim Chim <mecks@ust.hk> wrote:
> I have write a code to sort some meteorological data which contain
>281560 data point.
>And write a sortin scheme to sort it. I know bubble sort only. I found
>that it take more
>than 10 hours to sort the data in PIII 800MHz machine. Can anyone give
>me some idea
>of how to improve the scheme so that it work faster. Thanks.
> My code is attached in this message.
The bubble sort is an O(n^2) algorithm; that is, for any data volume
n, the amount of time the sort takes is proportional to the square of
n. It's a very bad sort algorithm, compared to others. Quicksort, for
example is O(n ln(n)) -- for a data volume n, the amount of time it
takes will be proportional to n times the natural log of n. So, for
any volume of data, quicksort will be (n/ln(n)) times faster. For
281560 data points, that comes out to 22438 times faster. For a
processing time of ~10 hours with bubblesort, you would need less than
1.75 hours with quicksort.
Go to http://www.cs.ubc.ca/spider/harrison/Java/sorting-demo.html and
look at the demos of the various sort algorithms; each one has a link
to Java code for the sort algorithm.
Also, if you put 'sorting algorithm' into a search engine like
Dogpile, you'll get a _ton_ of references that you can use to get sort
algorithms and code samples. ePaperPress.com has a page for sort
algorithms at http://epaperpress.com/s_man.html -- and there are many
more.
Pseudocode for a quicksort routine is listed below; this was found at
http://www.dcc.uchile.cl/~rbaeza/handbook/algs/4/413.sort.c.html
sort( r, lo, up )
ArrayToSort r;
int lo, up;
{
int i, j;
ArrayEntry tempr;
while ( up>lo ) {
i = lo;
j = up;
tempr = r[lo];
/*** Split file in two ***/
while ( i<j ) {
for ( ; r[j].k > tempr.k; j-- );
for ( r[i]=r[j]; i<j && r[i].k<=tempr.k; i++ );
r[j] = r[i];
}
r[i] = tempr;
/*** Sort recursively, the smallest first ***/
if ( i-lo < up-i ) { sort(r,lo,i-1); lo = i+1; }
else { sort(r,i+1,up); up = i-1; }
}
}
--
Sean R. Malloy | American Non Sequitur
Naval Medical Center | Society
San Diego, CA 92134-5000 |
srmalloy@home.net | "We may not make sense,
srmalloy@nmcsd.med.navy.mil | but we do like pizza"
FORMAL NOTICE: unsolicited commercial email will be read
at a charge of $500 per item. Receipt of such email shall
be considered to constitute acceptance of contract, and
will be billed immediately.
------------------------------
Date: Thu, 17 Aug 2000 05:41:08 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: Sorting is very slow
Message-Id: <x7r97owikq.fsf@home.sysarch.com>
>>>>> "SM" == Sean Malloy <srmalloy@home.com> writes:
SM> Pseudocode for a quicksort routine is listed below; this was found at
SM> http://www.dcc.uchile.cl/~rbaeza/handbook/algs/4/413.sort.c.html
why did you post a quicksort algorithm instead of telling him that perl
has a sort function?
uri
--
Uri Guttman --------- uri@sysarch.com ---------- http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page ----------- http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net ---------- http://www.northernlight.com
------------------------------
Date: Thu, 17 Aug 2000 06:13:03 GMT
From: tjla@guvfybir.qlaqaf.bet (Gwyn Judd)
Subject: Re: Sorting is very slow
Message-Id: <slrn8pn0ja.s14.tjla@thislove.dyndns.org>
I was shocked! How could Jim Chim <mecks@ust.hk>
say such a terrible thing:
>This is a multi-part message in MIME format.
Please don't do that.
>Hi,
> I have write a code to sort some meteorological data which contain
>281560 data point.
>And write a sortin scheme to sort it. I know bubble sort only. I found
What happened when you typed "perldoc -f sort" or "perldoc -q sort" or
looked up sort in <Insert your perl booK>?
>that it take more
>than 10 hours to sort the data in PIII 800MHz machine. Can anyone give
>me some idea
>of how to improve the scheme so that it work faster. Thanks.
> My code is attached in this message.
>#/usr/sbin/perl
>#Sort by the gust, from small value into large value
> for ($i=1; $i<=$c; $i++) {
> for ($j=$i; $j<=$c; $j++) {
> if ($gust[$i] > $gust[$j]) {
> ($wdir[$i],$wdir[$j]) = swapnum($wdir[$i],$wdir[$j]);
> ($gust[$i],$gust[$j]) = swapnum($gust[$i],$gust[$j]);
> } # end if
> } # end for $j
> } # end for $i
># print "gust=@gust \n";
I believe this will be faster:
(push @sorted_gust, $_->[1]),
(push @sorted_wdir, $_->[0]) for
sort {$a->[1] <=> $b->[1]}
map { [shift @wdir => $_] } @gust;
I haven't tested it though as I don't have your data. You might be
better off storing your data in an array-of-arrays rather than in two
seperate arrays:
push @data, shift @gust, shift @wdir while @gust;
@data = sort {@a->[0] <=> @b->[0]} @data;
You can then access the gust data as $data[$index][0] and the wdir data
as $data[$index][1]. You can see how this has made the sort process a
lot simpler and easier to understand
>#Subroutine to swap two number
> sub swapnum {
> my ($a, $b)=@_;
> my $tmp=0;
> $tmp = $a;
> $a = $b;
> $b = $tmp;
> ($a,$b);
> } # end sub
I recommend that when you are writing in Perl, you write Perl rather
than C:
sub swapnum
{
return ($_[1], $_[0]);
}
Although with my version that is not needed.
As a sidenote, does anyone else feel Perl needs something like Haskell's
"zip", "unzip" and related functions? For the uneducated, zip takes two
lists and joins them into a two dimensional array (well list actually
but why split hairs) by forming pairs of items. unzip does the reverse.
I don't think I've explained this well so here is a poor example:
sub zip
{
my (@a, @b) = ($_[0], $_[1]);
return (map {shift @b => $_} @b);
}
sub unzip
{
my @a, @b;
(push @a, $_->[0]),
(push @b, $_->[1])
for @_;
(\@a, \@b);
}
Then I could have written the above as something like:
@result = unzip (sort {$a->[0] <=> $b->[0]} (zip @gust, @wdir));
@gust = @{$result[0]};
@wdir = @{$result[1]};
Which is maybe easier to understand. Or maybe not.
--
Gwyn Judd (print `echo 'tjla@guvfybir.qlaqaf.bet' | rot13`)
I think it's a new feature. Don't tell anyone it was an accident. :-)
-- Larry Wall on s/foo/bar/eieio in <10911@jpl-devvax.JPL.NASA.GOV>
------------------------------
Date: 17 Aug 2000 01:14:20 -0500
From: logan@cs.utexas.edu (Logan Shaw)
Subject: Re: Sorting is very slow
Message-Id: <8nfvrs$e7h$1@provolone.cs.utexas.edu>
In article <CHWbOXOG13w0IBoz0L56RiGf1fUN@4ax.com>,
Sean Malloy <srmalloy@home.com> wrote:
> Quicksort, for
>example is O(n ln(n)) -- for a data volume n, the amount of time it
>takes will be proportional to n times the natural log of n. So, for
>any volume of data, quicksort will be (n/ln(n)) times faster. For
>281560 data points, that comes out to 22438 times faster. For a
>processing time of ~10 hours with bubblesort, you would need less than
>1.75 hours with quicksort.
O.K., you're saying that 10 divided by 1.75 equals 22438?
Do you mean 1.75 *seconds*? :-)
Also, I think someone should point out that these 281560 data points
might be taking up a lot of memory, depending on how much data there is
per data point. If the working set size is too large, the system is
going to have to page heavily in order to perform this sort. If that's
the case, even quick sort won't totally fix it. Instead, something
like merge sort might be a better idea, because it can do the sorting
without having the entire data set in memory.
In truth, though, if the data set is large, I'd probably just read in
every element and add it one of the tied DB hash formats that is
capable of traversing the stored hash in order. I think DB_File does
this. Anyway, presumably this writes to disk in a binary search tree
format or something similar; if so, creating it should be O (n log n)
and accessing it in order should be O (n). And the great thing is that
it can all take place on disk and not in memory.
- Logan
------------------------------
Date: Wed, 16 Aug 2000 21:51:49 -0700
From: Larry Rosler <lr@hpl.hp.com>
Subject: Re: windows headache
Message-Id: <MPG.14050f3ba47630dc98ac87@nntp.hpl.hp.com>
In article <399B43C2.9284E30B@home.com>, yhvhboy1@home.com says...
...
> # but to do this:
> while (<IN>) {
> print OUT $_;
> }
> # has the same effect. the outfile is still 85 bytes.
> # so what do i do?
> # perhaps my computer has an anti-perl virus?
Which parts of the two responses pointing you toward binmode() didn't
you understand?
--
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com
------------------------------
Date: Thu, 17 Aug 2000 03:16:46 GMT
From: kcivey@cpcug.org (Keith Calvert Ivey)
Subject: Re: windows headache
Message-Id: <399f4b0c.10818465@news.newsguy.com>
yhvhboy1 <yhvhboy1@home.com> wrote:
># but to do this:
>while (<IN>) {
> print OUT $_;
>}
># has the same effect. the outfile is still 85 bytes.
># so what do i do?
># perhaps my computer has an anti-perl virus?
Did you use binmode(IN) after opening the file, as Larry Rosler
suggested in his other message?
Please stop writing your messages as series of Perl comments.
The # at the beginning of each line makes it look like you're
quoting something.
--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC
------------------------------
Date: Thu, 17 Aug 2000 06:31:53 GMT
From: yhvhboy1 <yhvhboy1@home.com>
Subject: Re: windows headache
Message-Id: <399B87EE.1308FD8E@home.com>
Larry Rosler wrote:
> In article <399B43C2.9284E30B@home.com>, yhvhboy1@home.com says...
>
> ...
>
> > # but to do this:
> > while (<IN>) {
> > print OUT $_;
> > }
> > # has the same effect. the outfile is still 85 bytes.
> > # so what do i do?
> > # perhaps my computer has an anti-perl virus?
>
> Which parts of the two responses pointing you toward binmode() didn't
> you understand?
apparently all of them. sorry
# also sorry about the perl comments.
i thought they were appropriate-
i have problems. ;)
------------------------------
Date: Thu, 17 Aug 2000 06:45:49 GMT
From: yhvhboy1 <yhvhboy1@home.com>
Subject: Re: windows headache
Message-Id: <399B8B33.5189320D@home.com>
Larry Rosler wrote:
> In article <399B43C2.9284E30B@home.com>, yhvhboy1@home.com says...
>
> ...
>
> > # but to do this:
> > while (<IN>) {
> > print OUT $_;
> > }
> > # has the same effect. the outfile is still 85 bytes.
> > # so what do i do?
> > # perhaps my computer has an anti-perl virus?
>
> Which parts of the two responses pointing you toward binmode() didn't
> you understand?
and although i do appreciate your help
i still have problems with this file reading...
it almost worked-- but it still is not perfect-
it didn't copy the file quite right
but now instead of ending the copy prematurely
it is doing the opposite- and adding bytes to it
so my pictures show up distorted, and my command.com
isn't working right.
anything else i can do to avoid this?
thanks you guys.
------------------------------
Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V9 Issue 4040
**************************************