[12621] in Perl-Users-Digest
Perl-Users Digest, Issue: 30 Volume: 9
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Jul 6 23:17:11 1999
Date: Tue, 6 Jul 1999 20:06:52 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 6 Jul 1999 Volume: 9 Number: 30
Today's topics:
Re: hiring perl programmers <ptimmins@itd.sterling.com>
Re: hiring perl programmers (Abigail)
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers (Andrew Allen)
Re: hiring perl programmers (Abigail)
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers (Abigail)
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers (Mark W. Schumann)
Re: hiring perl programmers (Abigail)
Re: hiring perl programmers (Mark W. Schumann)
Re: hiring perl programmers <revjack@radix.net>
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers (Benjamin Franz)
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers <uri@sysarch.com>
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers (Abigail)
Re: hiring perl programmers (Abigail)
Re: hiring perl programmers (Abigail)
Re: hiring perl programmers (Bart Lateur)
Re: hiring perl programmers <uri@sysarch.com>
Re: hiring perl programmers (Abigail)
Re: hiring perl programmers (Greg Bacon)
Re: hiring perl programmers (Bart Lateur)
Digest Administrivia (Last modified: 1 Jul 99) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 03 Jul 1999 07:02:04 GMT
From: Patrick Timmins <ptimmins@itd.sterling.com>
Subject: Re: hiring perl programmers
Message-Id: <7lkcha$l8$1@nnrp1.deja.com>
In article <377a5462@cs.colorado.edu>,
tchrist@mox.perl.com (Tom Christiansen) wrote:
> 1) You have an array of non-negative integers. Move all the
> zero-valued elements to the front of that array.
foreach (@orig) {
push(@zero,$_) if $_ == 0;
push(@other,$_) if $_ != 0;
}
push(@zero,@other);
@orig = @zero;
> 2) You have the head node of a doubly linked list and a target
> value. Delete the first node in the list whose value matches
> your target.
Given @list, you can run this link_list() sub whenever @list changes.
sub list_link {
$dl_list{0}{back} = "";
$dl_list{$#list}{fwd} = "";
for ($i=0; $i<=$#list; $i++) {
$dl_list{$i}{value} = $list[$i];
( $dl_list{$i+1}{back} = $list[$i] ) if ($i <= $#list-1);
( $dl_list{$i-1}{fwd} = $list[$i] ) if ($i >= 1);
}
}
eg, to delete the first occurance of target:
sub delete_target {
for ($i=0; $i<=$#list; $i++) {
if ($list[$i] eq $target) {
$found_i = $i;
last;
}
}
for ($i=0; $i<=$#list; $i++) {
if ($i<$found_i) {
push (@front,$list[$i]);
}
if ($i>$found_i) {
push (@back,$list[$i]);
}
}
push (@front,@back);
@list = @front;
list_link();
}
So do I get to work for 'Tom Christiansen Perl Consultancy'? :)
$monger{Omaha}[0]
Patrick Timmins
ptimmins@itd.sterling.com
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
------------------------------
Date: 3 Jul 1999 03:33:11 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: hiring perl programmers
Message-Id: <slrn7nrilj.31h.abigail@alexandra.delanet.com>
Tom Christiansen (tchrist@mox.perl.com) wrote on MMCXXXII September
MCMXCIII in <URL:news:377d55d8@cs.colorado.edu>:
..
.. In comp.lang.perl.misc, abigail@delanet.com writes:
.. : for (my $i = my $j = 0; $i < @array; $i ++) {
.. : next if $array [$i];
.. : if ($i != $j) {@array [$i, $j] = @array [$j, $i];}
.. : $j ++;
.. : }
..
.. We considered:
..
.. t = a[i];
.. a[i] = a[j];
.. a[j] = t;
..
.. To be suboptimal. Think about it. :-)
I did, and I'm not sure if I understand. The code Tom writes looks very
C-ish to me, and AFAIK, C doesn't have a Perlish slice, so the fact a
slice isn't being used can't be it.
But what is? There are tricks using ^=, even in one expression, to
swap 2 integers, but that has mainly hack value. I wouldn't put that in
production code, and I wouldn't produce that during an interview either,
unless the subject of 'cute little tricks' comes on the table.
Could it be that Tom and the other interviewers had preferred seeing
a[i] = a[j];
a[j] = 0;
Perhaps. But is it worth it? The t = a[i]; a[i] = a[j]; a[j] = t; is
very common code, and so is the Perlish @array [$i, $j] = @array [$j, $i];
Kernighan and Pike [1] spend seven and a half page in the first chapter
discussing the merits of consistency, and use of common idiom. Usage of
t = a[i];
a[i] = a[j];
a[j] = t;
certainly has its advantages. I would only consider using
a[i] = a[j];
a[j] = 0;
if it was part of a tight, time critical loop.
[1] Kernighan and Pike: "The Practice of Programming". Addison-Wesley, 1999.
Abigail
--
sub J::FETCH{Just }$_.='print+"@{[map';sub J::TIESCALAR{bless\my$J,J}
sub A::FETCH{Another}$_.='{tie my($x),$';sub A::TIESCALAR{bless\my$A,A}
sub P::FETCH{Perl }$_.='_;$x}qw/J A P';sub P::TIESCALAR{bless\my$P,P}
sub H::FETCH{Hacker }$_.=' H/]}\n"';eval;sub H::TIESCALAR{bless\my$H,H}
-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----
------------------------------
Date: Sat, 03 Jul 1999 12:52:42 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <377f076f.2903500@news.skynet.be>
Abigail wrote:
>~~ I think this still won't work OK for the first node.
>
>Indeed.
Well, it depends on the exact structure of your link lists. If you
provide an extra headercell, it WILL work. Referencing this list is done
by referencing this headercell, which will keep working even if the
first item in the list changes. So it has other benefits as well.
>~~ You mean the O(fn(N)) stuff? Blerk. I always hated that.
>
>But it is important! How could you ever decide which algorithm to use
>if you don't know this stuff?
It's not the "ultimate answer". It only talks about scaling. It says
NOTHING about absolute speed. And that is what matters in the real
world. For example: acooring to this theory, there's nothing to be
gained from the Schwartzian Transform for sorting. It's still
O(n*log(n)).
>It's Omega (N * M) worst case indeed. push() isn't O(1), it can be linear.
>However, starting from an array of size N, M pushes will have a total
>running time of O (N + M), making it O(1) *amortized* time.
"amortized"? What is that? I can't see how anything O(N+M) can turn into
O(1).
>Yes, but incorrect algorithms don't score at all.
Incorrect? Buggy, maybe but not incorrect. I've tested it. It seems to
be doing the right thing, even for the first item in the list. And the
last. And the middle one.
#! perl -w
# Warning! DO NOT USE in production code!
# This code results in circular references, which will only be
# disposed off when the script terminates!
#compose a 3 item list
my $listhead;
{
my @list = ( {}, { VALUE => 'one' }, { VALUE => 'two' },
{ VALUE => 'three' });
$listhead = $list[0];
foreach my $i (1 .. $#list) {
$list[$i]->{PREVIOUS} = $list[$i-1];
$list[$i-1]->{NEXT} = $list[$i];
}
}
printitems($listhead);
if(find_and_unlink($listhead, 'one')){
print "Found it!\n\n";
printitems($listhead);
}
sub printitems {
my($listhead)= @_;
my $previous = $listhead;
print "List items:\n";
while(my $this = $previous->{NEXT}) {
print "This: $this->{VALUE}\n";
print " Previous: @{[
exists $previous->{VALUE}?$previous->{VALUE}:'[LISTHEAD]']}\n";
print " Next: @{[
$this->{NEXT}?$this->{NEXT}->{VALUE}:'[NIL]']}\n";
$previous = $this;
}
print "\n";
}
sub find_and_unlink {
my($listhead,$find)= @_;
my $previous = $listhead;
while(my $this = $previous->{NEXT}) {
if ($this->{VALUE} eq $find) {
my $next = delete $this->{NEXT};
$previous->{NEXT} = $next;
$next->{PREVIOUS} = $previous if $next;
delete $this->{PREVIOUS};
return $this;
}
$previous = $this;
}
}
__END__
Bart.
------------------------------
Date: Sat, 03 Jul 1999 13:05:13 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <37830a7d.3685357@news.skynet.be>
Abigail wrote:
>.. We considered:
>..
>.. t = a[i];
>.. a[i] = a[j];
>.. a[j] = t;
>..
>.. To be suboptimal. Think about it. :-)
>
>I did, and I'm not sure if I understand.
Me neither. At best, in assembler, this musn't take more than 3 opcodes
(t is typically a register). Suboptimal? Huh?
>Could it be that Tom and the other interviewers had preferred seeing
> a[i] = a[j];
> a[j] = 0;
>Perhaps. But is it worth it?
Definitely not. Say that the requirements changed to "the items in the
array are integers, now put the negative integers in front", then this
trick surely would not work. The simple exchange, would.
Bart.
------------------------------
Date: Sat, 03 Jul 1999 16:52:41 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <377e3768.362559@news.skynet.be>
Tom Christiansen wrote:
>We considered:
>
> t = a[i];
> a[i] = a[j];
> a[j] = t;
>
>To be suboptimal. Think about it. :-)
Wait a minute. Maybe I'm getting your point. (Maybe not :-)
Is it the array subindices, so that the address of a[i] and a[j] is
calculated twice? You could first get the address, and then use that.
ai = &a[i]; aj = &a[j];
t = *ai; *ai = *aj; *aj = t;
(or whatever the correct syntax for those wretched pointers may be...)
Then again, many modern processors have array indexing built-in. It may
even be a compiler optimization.
If you (and your colleagues) think that is sub-optimal, you may be
stretching it a bit too far to my taste. If such people would be ruling
the computing world, perl would not even exist, because perl, as a
whole, is far more "sub-optimal" than this one code snippet.
Bart.
------------------------------
Date: 3 Jul 1999 18:01:35 GMT
From: ada@fc.hp.com (Andrew Allen)
Subject: Re: hiring perl programmers
Message-Id: <7llj5v$a41$1@fcnews.fc.hp.com>
Tom Christiansen (tchrist@mox.perl.com) wrote:
: [courtesy cc of this posting mailed to cited author]
: In comp.lang.perl.misc, abigail@delanet.com writes:
: : for (my $i = my $j = 0; $i < @array; $i ++) {
: : next if $array [$i];
: : if ($i != $j) {@array [$i, $j] = @array [$j, $i];}
: : $j ++;
: : }
: We considered:
: t = a[i];
: a[i] = a[j];
: a[j] = t;
: To be suboptimal. Think about it. :-)
I can't believe noone has yet mentioned
a[i]=a[i]^a[j];
a[j]=a[i]^a[j];
a[i]=a[i]^a[j];
just to show off. Of course, noone would ever show off in _this_ group :)
Andrew
------------------------------
Date: 3 Jul 1999 16:14:34 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: hiring perl programmers
Message-Id: <slrn7nsv97.31h.abigail@alexandra.delanet.com>
Bart Lateur (bart.lateur@skynet.be) wrote on MMCXXXII September MCMXCIII
in <URL:news:377f076f.2903500@news.skynet.be>:
!! Abigail wrote:
!!
!! >~~ I think this still won't work OK for the first node.
!! >
!! >Indeed.
!!
!! Well, it depends on the exact structure of your link lists. If you
!! provide an extra headercell, it WILL work. Referencing this list is done
!! by referencing this headercell, which will keep working even if the
!! first item in the list changes. So it has other benefits as well.
But you never stated this extra requirement. Which makes your algorithm
incomplete, and hence wrong.
!! >~~ You mean the O(fn(N)) stuff? Blerk. I always hated that.
!! >
!! >But it is important! How could you ever decide which algorithm to use
!! >if you don't know this stuff?
!!
!! It's not the "ultimate answer". It only talks about scaling. It says
!! NOTHING about absolute speed. And that is what matters in the real
!! world. For example: acooring to this theory, there's nothing to be
!! gained from the Schwartzian Transform for sorting. It's still
!! O(n*log(n)).
Irrelevant. I never said that it was the ultimate answer. It's necessary,
but not sufficient.
!! >It's Omega (N * M) worst case indeed. push() isn't O(1), it can be linear.
!! >However, starting from an array of size N, M pushes will have a total
!! >running time of O (N + M), making it O(1) *amortized* time.
!!
!! "amortized"? What is that? I can't see how anything O(N+M) can turn into
!! O(1).
O (N + M) / Omega (N + M) == O (1). You are doing M pushes, and have
an array of length N to start with. So, you have N + M elements to
charge the pushes over. Hence, O (1) amortized time.
!! >Yes, but incorrect algorithms don't score at all.
!!
!! Incorrect? Buggy, maybe but not incorrect.
Eh? What's the difference?
!! Incorrect? Buggy, maybe but not incorrect. I've tested it. It seems to
!! be doing the right thing, even for the first item in the list. And the
!! last. And the middle one.
Only after adding the extra requirements you initially didn't mention.
Abigail
--
perl5.004 -wMMath::BigInt -e'$^V=Math::BigInt->new(qq]$^F$^W783$[$%9889$^F47]
.qq]$|88768$^W596577669$%$^W5$^F3364$[$^W$^F$|838747$[8889739$%$|$^F673$%$^W]
.qq]98$^F76777$=56]);$^U=substr($]=>$|=>5)*(q.25..($^W=@^V))=>do{print+chr$^V
%$^U;$^V/=$^U}while$^V!=$^W'
-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----
------------------------------
Date: Sat, 03 Jul 1999 21:44:47 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <377e8351.499707@news.skynet.be>
Abigail wrote:
>!! Incorrect? Buggy, maybe but not incorrect.
>
>Eh? What's the difference?
That's why I'm a programmer and not a mathematician. I use *some* theory
to get the basics right. Enough to get a feel for the general case, and
where I may expect problems. Then I do practical tests on a computer to
iron out the bugs. Including the bordecases. I don't care for doing
*everything* on paper, like a real mathematician.
And BTW, I would be crap at determining what may have happened 1 second
after the Big Bang. I don't care, anyway.
Bart.
------------------------------
Date: 3 Jul 1999 18:20:15 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: hiring perl programmers
Message-Id: <slrn7nt6ks.31h.abigail@alexandra.delanet.com>
Bart Lateur (bart.lateur@skynet.be) wrote on MMCXXXII September MCMXCIII
in <URL:news:377e8351.499707@news.skynet.be>:
"" Abigail wrote:
""
"" >!! Incorrect? Buggy, maybe but not incorrect.
"" >
"" >Eh? What's the difference?
""
"" That's why I'm a programmer and not a mathematician. I use *some* theory
"" to get the basics right. Enough to get a feel for the general case, and
"" where I may expect problems. Then I do practical tests on a computer to
"" iron out the bugs. Including the bordecases. I don't care for doing
"" *everything* on paper, like a real mathematician.
Let me remind myself not to hire you.
Abigail
--
srand 123456;$-=rand$_--=>@[[$-,$_]=@[[$_,$-]for(reverse+1..(@[=split
//=>"IGrACVGQ\x02GJCWVhP\x02PL\x02jNMP"));print+(map{$_^q^"^}@[),"\n"
-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----
------------------------------
Date: Sun, 04 Jul 1999 08:47:02 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <377f1e87.1375376@news.skynet.be>
Abigail wrote:
>Let me remind myself not to hire you.
Fat chance. I flunked your test, anyway.
Bart.
------------------------------
Date: 4 Jul 1999 14:42:33 -0400
From: catfood@apk.net (Mark W. Schumann)
Subject: Re: hiring perl programmers
Message-Id: <7lo9up$oar@junior.apk.net>
In article <377d1d7b@cs.colorado.edu>,
Tom Christiansen <tchrist@mox.perl.com> wrote:
>You're all missing the point. The point wasn't to see how well you
>know the standard library. The goal was first of all, to see whether
>you can understand basic algorithms, and second of all, whether you've
>a clue what it means to be parsimonious with respect to time and space
>in designing those algorithms. I'm afraid that anyone who called a
>built-in sort function *flunked* miserably.
I would have used the built-in sort to solve that problem, probably
in a production system.
My projects rarely call for the fastest or tightest code possible.
Usually they have to be fast _enough_. If the code is fast _enough_
(translation: it doesn't make the user wait or hold up production
processes) then usually the next priorities are readability and quick
comprehension by moderately experienced Perl programmers.
Unless the array in question is particularly large, or unless speed
is especially critical, the built-in sort wins IME.
------------------------------
Date: 4 Jul 1999 13:50:54 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: hiring perl programmers
Message-Id: <slrn7nvb7q.h6v.abigail@alexandra.delanet.com>
Mark W. Schumann (catfood@apk.net) wrote on MMCXXXIII September MCMXCIII
in <URL:news:7lo9up$oar@junior.apk.net>:
!!
!! I would have used the built-in sort to solve that problem, probably
!! in a production system.
!!
!! My projects rarely call for the fastest or tightest code possible.
!! Usually they have to be fast _enough_. If the code is fast _enough_
!! (translation: it doesn't make the user wait or hold up production
!! processes) then usually the next priorities are readability and quick
!! comprehension by moderately experienced Perl programmers.
!!
!! Unless the array in question is particularly large, or unless speed
!! is especially critical, the built-in sort wins IME.
While this is true, I would not use the buildin sort. If I would, then
each time I see that code, I will have to surpress the urge to fix it.
Omega (N log N) solutions for O (N) problems look *icky*.
I remember once recoding a piece of code of some other programmer, just
because his code was quadratic, and I could do it in linear time. Never
mind that the set of elements had size 1, and it being very unlikely it
would ever be more than 1. It just looked wrong.
Abigail
--
perl -we 'print split /(?=(.*))/s => "Just another Perl Hacker\n";'
-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----
------------------------------
Date: 4 Jul 1999 14:58:32 -0400
From: catfood@apk.net (Mark W. Schumann)
Subject: Re: hiring perl programmers
Message-Id: <7loaso$qah@junior.apk.net>
In article <slrn7nvb7q.h6v.abigail@alexandra.delanet.com>,
Abigail <abigail@delanet.com> wrote:
>Mark W. Schumann (catfood@apk.net) wrote on MMCXXXIII September MCMXCIII
>in <URL:news:7lo9up$oar@junior.apk.net>:
>!!
>!! I would have used the built-in sort to solve that problem, probably
>!! in a production system.
>!!
>!! My projects rarely call for the fastest or tightest code possible.
>!! Usually they have to be fast _enough_. If the code is fast _enough_
>!! (translation: it doesn't make the user wait or hold up production
>!! processes) then usually the next priorities are readability and quick
>!! comprehension by moderately experienced Perl programmers.
>!!
>!! Unless the array in question is particularly large, or unless speed
>!! is especially critical, the built-in sort wins IME.
>
>While this is true, I would not use the buildin sort. If I would, then
>each time I see that code, I will have to surpress the urge to fix it.
>Omega (N log N) solutions for O (N) problems look *icky*.
But you would have to fix it with something that is at least a little
less concise than
@a = sort @a;
Where there is a concise solution that works, other solutions look
*icky* to me. Even if they're significantly faster. Unless the
concise solution is _not_fast_enough_.
>I remember once recoding a piece of code of some other programmer, just
>because his code was quadratic, and I could do it in linear time. Never
>mind that the set of elements had size 1, and it being very unlikely it
>would ever be more than 1. It just looked wrong.
Doesn't the "P" in Perl stand for "practical"?
Oops. It can also be for "pathological."
------------------------------
Date: 4 Jul 1999 22:50:54 GMT
From: revjack <revjack@radix.net>
Subject: Re: hiring perl programmers
Message-Id: <7looge$cas$2@news1.Radix.Net>
Keywords: Hexapodia as the key insight
Tom Christiansen explains it all:
:You're all missing the point. The point wasn't to see how well you
:know the standard library. The goal was first of all, to see whether
:you can understand basic algorithms, and second of all, whether you've
:a clue what it means to be parsimonious with respect to time and space
:in designing those algorithms. I'm afraid that anyone who called a
:built-in sort function *flunked* miserably.
What's the best answer, Tom? Let the lurkers learn.
------------------------------
Date: Mon, 05 Jul 1999 12:45:40 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <3781a4c1.504101@news.skynet.be>
Abigail wrote [about O(fn(n))]:
>Irrelevant. I never said that it was the ultimate answer. It's necessary,
>but not sufficient.
It's not very useful. It is an outdated simplification. It "estimates"
the behaviour for extremely large data sizes, but it DOES NOT take into
account what happens if the data set gets really large, larger than the
available memory: caching and trashing.
I have had a case, where a simple linear search, O(n), was SEVERAL TIMES
FASTER than a binary search, O(log(n)). No tricks. The same computer,
the same programming language (not Perl), the same comparison functions,
same data. It was an array of 1024 items: average number of comparisons
for linear search: 512; for binary search: 10. Yet the linear search was
faster, by several times.
Why? I have only one explanation: caching. What the binary search
presumably did, was for each comparison loading a data block into the
cache, do the comparison, jump to some other point in the data which
made the cache dump the data, and reload a partly different set, etc.
Ten times.
The linear search probably made the cache load only once, and did all
the comparisons in the cache.
Bart.
------------------------------
Date: Mon, 05 Jul 1999 15:53:57 GMT
From: snowhare@long-lake.nihongo.org (Benjamin Franz)
Subject: Re: hiring perl programmers
Message-Id: <py4g3.34$u75.3265@typhoon-sf.snfc21.pbi.net>
In article <3781a4c1.504101@news.skynet.be>,
Bart Lateur <bart.lateur@skynet.be> wrote:
>Abigail wrote [about O(fn(n))]:
>
>>Irrelevant. I never said that it was the ultimate answer. It's
>>necessary, but not sufficient.
>
>It's not very useful. It is an outdated simplification. It
>"estimates" the behaviour for extremely large data sizes, but
>it DOES NOT take into account what happens if the data set gets
>really large, larger than the available memory: caching and
>trashing.
This is completely wrong. Think about it.
>I have had a case, where a simple linear search, O(n), was SEVERAL
>TIMES FASTER than a binary search, O(log(n)). No tricks. The same
>computer, the same programming language (not Perl), the same
>comparison functions, same data. It was an array of 1024 items:
>average number of comparisons for linear search: 512; for binary
>search: 10. Yet the linear search was faster, by several times.
[snip attribution of effect to caching]
And a bubblesort can be faster than a mergesort for N < 30. You
are making a mistake here. N = 1024 is a *small* dataset. Small
enough that the entire dataset could be cached in memory - making
the 'C' term of CO(N) *very* small. The point about O complexity
is that it states unequivocably that for _large enough_ N, CO(0)
will *ALWAYS* beat KO(N).
Lets say that your linear O(N) search was 10x faster for 1024
elements than the binary O(log(N)) search. That calibrates the
relative size of their constant terms. Now let's grow the dataset
to N=102400. The O(N) routine is now 100x slower than it was,
while the O(log(N)) routine is only 1.66x slower than it was: The
binary search is now 6x faster than the linear search.
Grow the dataset by *another* factor of 1000x and the linear
search is running 100000x slower while the binary search is only
down to 2.66x slower. The trend is obvious. No matter _how_ small
that constant is - it is utterly overwhelmed by O complexity at
some point.
Additionally, caching is by definition finite. At some point your
dataset outgrows *any* cache and linear searching loses that 'small
constant' edge (it may even invert depending on your query patterns)
*because it can't all be cached*. This is where *space* O
complexity comes into play as well. Caching is O(N) *space*
complex if performed without limit. Binary searches are
O(0) *space* complexity.
Lastly, *assuming* that caching was in fact the source of the
performance difference, it could be neutralized by the search
program by *explicitly* scanning the entire dataset at the start
of the first run. This would place the two programs on equal footing
with regard to I/O cycles thereafter at an O(N) cost on the very
first run (1CO(N)+mCO(log(n)) vs mCO(n)). This would of course
only work for small datasets (in general, the exact same datasets
where the linear search outperformed the binary search
originally).
--
Benjamin Franz
PGP: 1024/77579DB1 FP=67 B0 18 4A 28 70 BE 78 DE 5A 28 1E 10 A5 8E 31
------------------------------
Date: Mon, 05 Jul 1999 17:05:30 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <3784e595.2145095@news.skynet.be>
Benjamin Franz wrote:
>Lets say that your linear O(N) search was 10x faster for 1024
>elements than the binary O(log(N)) search. That calibrates the
>relative size of their constant terms. Now let's grow the dataset
>to N=102400. The O(N) routine is now 100x slower than it was,
>while the O(log(N)) routine is only 1.66x slower than it was: The
>binary search is now 6x faster than the linear search.
I accept the drift of your reasoning. But I do not accept the exact
figures. That was the main point of my post: caching of the dataset will
throw off exact speed ratio figures. Try sorting 20Mb of data if all
that fits into memory is 10Mb. The speed loss will be immense, compared
to execution time for sorting 8Mb. Far worse than an optimistic 2.66/1
ratio.
O(fn(n)) theory simply ignores this.
Bart.
------------------------------
Date: 05 Jul 1999 13:39:13 -0400
From: Uri Guttman <uri@sysarch.com>
Subject: Re: hiring perl programmers
Message-Id: <x7iu7zxagu.fsf@home.sysarch.com>
>>>>> "BL" == Bart Lateur <bart.lateur@skynet.be> writes:
BL> I accept the drift of your reasoning. But I do not accept the exact
BL> figures. That was the main point of my post: caching of the dataset will
BL> throw off exact speed ratio figures. Try sorting 20Mb of data if all
BL> that fits into memory is 10Mb. The speed loss will be immense, compared
BL> to execution time for sorting 8Mb. Far worse than an optimistic 2.66/1
BL> ratio.
BL> O(fn(n)) theory simply ignores this.
no it doesn't. there is a whole branch of sort/search theory that deals
with multilevel access issues like ram and disk. ever heard of tape
sorts? they were invented (and probably still used) when tapes were much
larger than ram size. they can be effectively applied to these sorts
too. they have different O() values than regular in ram sorts. you can
only compare O() functions if the assumptions about the various
algorithms are the same. in memory sorts (bubble vs binary vs. quick)
are one example. if you bring in cache issues, all the sorts are
different and must assume that the sort size is larger than ram
size. just letting virtual memory handle it is a loser since there is
knowledge about the algorithm that the virtual system doesn't know that
can be exploited to improve the sorts.
FYI: a tape sort just reads in enough records to fill ram (including
overhead like tree pointers). it sorts them and write this block out to
disk/tape. repeat until all the input is sorted into blocks. then you do
a multiple file merge on the sorted blocks. if all the blocks are online
(enought tape drives or disk files exist) this phase is easy. if you
have fewer drives than blocks and have to mount them, there is another
algorithm to do the merge. it is based on fibbonacci numbers and it
optimally merges N off line blocks to 1 in the fewest passes. i bet the
commercial mega-sort programs still use this in some form.
so stop comparing apples and oranges. sort theory looks at the most
costly elements of an algotithm and disk access is more costly than ram
access.
for more on perl sorting, see larry rosler and yours truly present our
paper on efficient sorting in perl at monterey.
uri
--
Uri Guttman ----------------- SYStems ARCHitecture and Software Engineering
uri@sysarch.com --------------------------- Perl, Internet, UNIX Consulting
Have Perl, Will Travel ----------------------------- http://www.sysarch.com
The Best Search Engine on the Net ------------- http://www.northernlight.com
"F**king Windows 98", said the general in South Park before shooting Bill.
------------------------------
Date: Mon, 05 Jul 1999 18:26:47 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <3780f7e4.2571588@news.skynet.be>
Uri Guttman wrote:
>you can
>only compare O() functions if the assumptions about the various
>algorithms are the same.
Exactly.
Cache issues are strange. Execution speed depends not only on the data
size, but also on how close together subsequent accesses are. It's like
estimating a processor's execution time by adding up times per opcode.
That no longer works when caching is involved. Jumping around can have
serious consequences on your execution time.
Same thing with data: data access time per byte depends strongly on how
close together subsequent accesses are. Life is no longer simple. Or:
O(...) no longer depends just on N.
Bart.
------------------------------
Date: 5 Jul 1999 14:54:33 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: hiring perl programmers
Message-Id: <slrn7o23b4.h6v.abigail@alexandra.delanet.com>
Bart Lateur (bart.lateur@skynet.be) wrote on MMCXXXIV September MCMXCIII
in <URL:news:3781a4c1.504101@news.skynet.be>:
-- Abigail wrote [about O(fn(n))]:
--
-- >Irrelevant. I never said that it was the ultimate answer. It's necessary,
-- >but not sufficient.
--
-- It's not very useful. It is an outdated simplification.
"Outdated" As in replaced with what? Barts thumb?
-- It "estimates"
-- the behaviour for extremely large data sizes, but it DOES NOT take into
-- account what happens if the data set gets really large, larger than the
-- available memory: caching and trashing.
Wrong. *THAT IS EXACTLY WHAT IT IS DOING*. An example of that follows:
-- I have had a case, where a simple linear search, O(n), was SEVERAL TIMES
-- FASTER than a binary search, O(log(n)). No tricks. The same computer,
-- the same programming language (not Perl), the same comparison functions,
-- same data. It was an array of 1024 items: average number of comparisons
-- for linear search: 512; for binary search: 10. Yet the linear search was
-- faster, by several times.
And it scales of course really, really well to larger datasets. (Not!)
Now, I don't buy it that 512 comparisons are faster than 10 comparisons.
Unless you are doing something else as well.
-- Why? I have only one explanation: caching. What the binary search
-- presumably did, was for each comparison loading a data block into the
-- cache, do the comparison, jump to some other point in the data which
-- made the cache dump the data, and reload a partly different set, etc.
-- Ten times.
Ah. Sure. And it didn't do it 512 times for the linear search? That
doesn't make much sense, does it? Do you realize that if you had had
a decent cache manager that your binary search would have been much
faster than your linear search?
-- The linear search probably made the cache load only once, and did all
-- the comparisons in the cache.
Whatever.
But this only shows you've no clue how to interpret "O(f(n))" information.
I bet you don't even know what it means. Due to your silly cache manager,
your binary search algorith had a large constant C, such that the binary
sort takes less than C * log n time. However, for larger datasets, that
will be much less than D * n, for any D > 0.
Abigail
--
tie $" => A; $, = " "; $\ = "\n"; @a = ("") x 2; print map {"@a"} 1 .. 4;
sub A::TIESCALAR {bless \my $A => A} # Yet Another silly JAPH by Abigail
sub A::FETCH {@q = qw /Just Another Perl Hacker/ unless @q; shift @q}
-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----
------------------------------
Date: 5 Jul 1999 14:56:24 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: hiring perl programmers
Message-Id: <slrn7o23ek.h6v.abigail@alexandra.delanet.com>
Bart Lateur (bart.lateur@skynet.be) wrote on MMCXXXIV September MCMXCIII
in <URL:news:3784e595.2145095@news.skynet.be>:
[] Benjamin Franz wrote:
[]
[] >Lets say that your linear O(N) search was 10x faster for 1024
[] >elements than the binary O(log(N)) search. That calibrates the
[] >relative size of their constant terms. Now let's grow the dataset
[] >to N=102400. The O(N) routine is now 100x slower than it was,
[] >while the O(log(N)) routine is only 1.66x slower than it was: The
[] >binary search is now 6x faster than the linear search.
[]
[] I accept the drift of your reasoning. But I do not accept the exact
[] figures. That was the main point of my post: caching of the dataset will
[] throw off exact speed ratio figures. Try sorting 20Mb of data if all
[] that fits into memory is 10Mb. The speed loss will be immense, compared
[] to execution time for sorting 8Mb. Far worse than an optimistic 2.66/1
[] ratio.
[]
[] O(fn(n)) theory simply ignores this.
No it doesn't. Again you come with a too small dataset to draw the wrong
conclusions from.
Abigail
--
perl -MTime::JulianDay -lwe'@r=reverse(M=>(0)x99=>CM=>(0)x399=>D=>(0)x99=>CD=>(
0)x299=>C=>(0)x9=>XC=>(0)x39=>L=>(0)x9=>XL=>(0)x29=>X=>IX=>0=>0=>0=>V=>IV=>0=>0
=>I=>$r=-2449231+gm_julian_day+time);do{until($r<$#r){$_.=$r[$#r];$r-=$#r}for(;
!$r[--$#r];){}}while$r;$,="\x20";print+$_=>September=>MCMXCIII=>()'
-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----
------------------------------
Date: 5 Jul 1999 15:04:53 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: hiring perl programmers
Message-Id: <slrn7o23uh.h6v.abigail@alexandra.delanet.com>
Bart Lateur (bart.lateur@skynet.be) wrote on MMCXXXIV September MCMXCIII
in <URL:news:3780f7e4.2571588@news.skynet.be>:
%% Uri Guttman wrote:
%%
%% >you can
%% >only compare O() functions if the assumptions about the various
%% >algorithms are the same.
%%
%% Exactly.
%%
%% Cache issues are strange. Execution speed depends not only on the data
%% size, but also on how close together subsequent accesses are. It's like
%% estimating a processor's execution time by adding up times per opcode.
%% That no longer works when caching is involved. Jumping around can have
%% serious consequences on your execution time.
%%
%% Same thing with data: data access time per byte depends strongly on how
%% close together subsequent accesses are. Life is no longer simple. Or:
%% O(...) no longer depends just on N.
Clueless.
Abigail
--
perl -wle '$, = " "; print grep {(1 x $_) !~ /^(11+)\1+$/} 2 .. shift'
-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----
------------------------------
Date: Mon, 05 Jul 1999 20:22:46 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <378213f8.321091@news.skynet.be>
Abigail wrote:
>But this only shows you've no clue how to interpret "O(f(n))" information.
>I bet you don't even know what it means.
I bet you invented the stuff.
>Due to your silly cache manager,
>your binary search algorith had a large constant C,
C is not a constant.
Bart.
------------------------------
Date: 05 Jul 1999 17:32:19 -0400
From: Uri Guttman <uri@sysarch.com>
Subject: Re: hiring perl programmers
Message-Id: <x7908uye8s.fsf@home.sysarch.com>
>>>>> "BL" == Bart Lateur <bart.lateur@skynet.be> writes:
BL> Abigail wrote:
>> But this only shows you've no clue how to interpret "O(f(n))" information.
>> I bet you don't even know what it means.
BL> I bet you invented the stuff.
>> Due to your silly cache manager,
>> your binary search algorith had a large constant C,
BL> C is not a constant.
huh??
i agree with abigail that you don't seem to have a firm grasp on O()
theory. constant factors are not an issue. the order of the growth of
the function is critical. no linear search O(N) can be faster than a
binary search O(logN) given the same comparison overhead for large
enough N. the extra work to setup a binary tree may be slower in reality
than a linear search for small N, but O() theory is not about small N
but scaling of N.
as for caches, there is no algorithm theory about a particular algorithm
and its behavior under a cache. that is too unpredictable. you can
analyze cache behavior in a general case but you can't derive O()
functions for a specific algorithm using a cache. in reality you can
hack code to optimize the use of the cache (not having the array size be
a multiple of the cache size, having all the critical code fit in the
cache, etc.). many bogus benchmarks are done using those tricks (intel
is famous for them). you can make linpack and others scream by hand
coding tricks to take advantage of your particular cache. but they are
not O() comparable then. O() functions are pure theory and then they
applied to reality.
as an example, if you do a very slow O(N) pre/post process stage to a
O(N * logN) algorithm with a small enough data set, you can see O(N)
growth rather than O(N * logN) growth. though it still may be faster
than some algorithm with a fast or no pre/post process stage and a slow
element operation in the main part of the algorithm. in the long run,
the lower O() function will always win which is the space O() theory
works in.
so bringing in cache issues to O() theory is very bogus in general.
uri
--
Uri Guttman ----------------- SYStems ARCHitecture and Software Engineering
uri@sysarch.com --------------------------- Perl, Internet, UNIX Consulting
Have Perl, Will Travel ----------------------------- http://www.sysarch.com
The Best Search Engine on the Net ------------- http://www.northernlight.com
"F**king Windows 98", said the general in South Park before shooting Bill.
------------------------------
Date: 5 Jul 1999 17:07:05 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: hiring perl programmers
Message-Id: <slrn7o2b3l.h6v.abigail@alexandra.delanet.com>
Bart Lateur (bart.lateur@skynet.be) wrote on MMCXXXIV September MCMXCIII
in <URL:news:378213f8.321091@news.skynet.be>:
// Abigail wrote:
//
// >Due to your silly cache manager,
// >your binary search algorith had a large constant C,
//
// C is not a constant.
You're cluefree.
Abigail
--
sub _'_{$_'_=~s/$a/$_/}map{$$_=$Z++}Y,a..z,A..X;*{($_::_=sprintf+q=%X==>"$A$Y".
"$b$r$T$u")=~s~0~O~g;map+_::_,U=>T=>L=>$Z;$_::_}=*_;sub _{print+/.*::(.*)/s}
*_'_=*{chr($b*$e)};*__=*{chr(1<<$e)};
_::_(r(e(k(c(a(H(__(l(r(e(P(__(r(e(h(t(o(n(a(__(t(us(J())))))))))))))))))))))))
-----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
http://www.newsfeeds.com The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including Dedicated Binaries Servers ==-----
------------------------------
Date: 6 Jul 1999 18:11:55 GMT
From: gbacon@itsc.uah.edu (Greg Bacon)
Subject: Re: hiring perl programmers
Message-Id: <7ltgtb$ksc$1@info2.uah.edu>
In article <377e8351.499707@news.skynet.be>,
bart.lateur@skynet.be (Bart Lateur) writes:
: That's why I'm a programmer and not a mathematician.
I don't believe in the existence of such a beast. Programming is
mathematics in action.
Greg
--
Christ died for our sins. Dare we make his martyrdom meaningless by not
committing them?
-- Jules Feiffer
------------------------------
Date: Tue, 06 Jul 1999 20:18:36 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: hiring perl programmers
Message-Id: <3782625f.405840@news.skynet.be>
Greg Bacon wrote:
>: That's why I'm a programmer and not a mathematician.
>
>I don't believe in the existence of such a beast. Programming is
>mathematics in action.
At's a matter of rigurousness. The mathemathicians I used to know, have
a tendency to think everything through, up to the very last detail. I
think things through until I get something solid enough for a prototype.
Only if the prototype actually works, add features. Such as making it
work for bordercases. Those usually get put in in the original code in
"if" statements. Example:
$next->{PREVIOUS} = $previous; # ordinary case
vs.
$next->{PREVIOUS} = $previous if $next; # last one in list
First getting a feel for it, is what matters. Prototypes definitely
help. If it turns out that the prototype is absolute crap, there's no
reason in wasting time, in making the bordercases work.
Another example:
open(HANDLE,$file) or die "Cannot open file: $!";
The "ordinary case" is when opening the file succeeds. The bordercase is
where it failed.
Bart.
------------------------------
Date: 1 Jul 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 1 Jul 99)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.
The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V9 Issue 30
************************************