[24744] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 6899 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Aug 23 11:06:14 2004

Date: Mon, 23 Aug 2004 08:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 23 Aug 2004     Volume: 10 Number: 6899

Today's topics:
    Re: Breaking reference chians <Joe.Smith@inwap.com>
    Re: Breaking reference chians <someone@example.com>
    Re: Earthquake forecasting program   Aug. 16, 2004 <1usa@llenroc.ude.invalid>
    Re: Earthquake forecasting program   Aug. 16, 2004 <1usa@llenroc.ude.invalid>
        finding directory sizes <zebee@zip.com.au>
    Re: finding directory sizes (Damian James)
    Re: finding directory sizes <zebee@zip.com.au>
    Re: finding directory sizes <Joe.Smith@inwap.com>
    Re: finding directory sizes <zebee@zip.com.au>
    Re: finding directory sizes <nobull@mail.com>
    Re: finding directory sizes (Jim Keenan)
    Re: finding directory sizes <jurgenex@hotmail.com>
    Re: how to install Sybase::DBlib??? <mpeppler@peppler.org>
    Re: How to randomize foreach (jdog)
    Re: How to randomize foreach <abigail@abigail.nl>
    Re: How to randomize foreach <abigail@abigail.nl>
    Re: How to randomize foreach <noreply@gunnar.cc>
    Re: How to randomize foreach <sholden@flexal.cs.usyd.edu.au>
    Re: How to randomize foreach <tassilo.von.parseval@rwth-aachen.de>
    Re: How to randomize foreach (Anno Siegel)
    Re: How to randomize foreach <noreply@gunnar.cc>
    Re: perl interpreter automatically exit windows so how  <Joe.Smith@inwap.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 23 Aug 2004 01:31:29 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: Breaking reference chians
Message-Id: <NNbWc.54371$mD.902@attbi_s02>

John W. Krahn wrote:

>     s/\s+\Z//; # used this instead of chomp because of Windows newline

That comment is misleading.  The chomp() function works in win32 perl
for win32 files.  If you're talking about a win32 file being processed
in a *nix enviroment, you should say so.
	-Joe


------------------------------

Date: Mon, 23 Aug 2004 03:34:26 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: Breaking reference chians
Message-Id: <6BdWc.47807$X12.10111@edtnps84>

Joe Smith wrote:
> John W. Krahn wrote:
> 
>>     s/\s+\Z//; # used this instead of chomp because of Windows newline
> 
> That comment is misleading.  The chomp() function works in win32 perl
> for win32 files.  If you're talking about a win32 file being processed
> in a *nix enviroment, you should say so.

Maybe you noticed where I opened the file as:

my $file = '/mnt/windows/WINDOWS/WIN.INI';

:-)

John
-- 
use Perl;
program
fulfillment


------------------------------

Date: 23 Aug 2004 14:33:25 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Earthquake forecasting program   Aug. 16, 2004
Message-Id: <Xns954E6B64DC897asu1cornelledu@132.236.56.8>

"edgrsprj" <edgrsprj@ix.netcom.com> wrote in
news:lXlVc.29585$nx2.4309@newsread2.news.atl.earthlink.net: 

> Why argue over public resources when there are other options 
> available?

Why argue over pollution in the oceans when everyone can build a pool in 
his own backyard?

-- 
A. Sinan Unur
1usa@llenroc.ude.invalid 
(remove '.invalid' and reverse each component for email address)



------------------------------

Date: 23 Aug 2004 14:41:19 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Earthquake forecasting program   Aug. 16, 2004
Message-Id: <Xns954E6CBBA3EF5asu1cornelledu@132.236.56.8>

"edgrsprj" <edgrsprj@ix.netcom.com> wrote in
news:IZ0Wc.7711$2L3.6453@newsread3.news.atl.earthlink.net: 

> "Tad McClellan" <tadmc@augustmail.com> wrote in message
> news:slrncieja1.48o.tadmc@magna.augustmail.com...
>> edgrsprj <edgrsprj@ix.netcom.com> wrote:
> 
>>
>> > ***  Is Perl the best language to use for this application?
>>
>> No.
>>
>> Python would be much much better for your task.
>>
> 
> Pascal has been mentioned before as a more appropriate language.  So
> has Ruby.  I looked at both briefly and did not see where anything
> would be gained by switching.  However if a formal effort gets stared
> then that would be the time to make a decision on that.

We are all praying that you switch to some other language than Perl.

>> > I am going to state the following once more.
> 
>> WTF for?
> 
> The reason for repeating this is the following:

WTF for?

> Right now this effort needs to have an organized group get the
> computer programming part moving along.  There are individual
> programmers who are interested in helping.  But I don't presently have
> time to direct their efforts.  I have been saying this so that other
> people would be aware that this is what is needed.  And if they hear
> about any groups interested in such a project then they might remember
> what I said and let that group know about this.

http://www.sydes.net/jokes/pictures/t/the_end_is_near.jpg

> And so, my posts are recommending to people around the world, "Make
> progress now if that is possible.  Don't wait until the ground is
> starting to shake." 

Progress comes not from trying to pinpoint when an earthquake is going to 
occur, but from taking precautions before, way before, it occurs. 

-- 
A. Sinan Unur
1usa@llenroc.ude.invalid 
(remove '.invalid' and reverse each component for email address)



------------------------------

Date: Mon, 23 Aug 2004 06:48:43 GMT
From: Zebee Johnstone <zebee@zip.com.au>
Subject: finding directory sizes
Message-Id: <slrncij4g1.pvv.zebee@zeus.zipworld.com.au>

I want to archive directories to CD.  I have many of them in 
various places, I don't care if one from /data/web is on the
same CD as one from /home as long as the specified directory is
not split any further.

The important point is that there are things I need to exclude, such as
log files.  

I'm currently getting the size by using du in an open, and munging
the result, is there a better way?

open (DU,"find $snapshot -type d -maxdepth 1 -exec du -sk --exclude=access_log* --exclude=error_log* {} \\;|") || die "can't do find for $snapshot $!\n";

I did think of using stat to add up every file, but if I'm talking 
a few hundred per directory, is that wise?  And how would I exclude
files, considering that each main directory set has more than one file
pattern to exclude?  (this has 2, others have 3 or 4)

Zebee


------------------------------

Date: 23 Aug 2004 07:07:30 GMT
From: damian@qimr.edu.au (Damian James)
Subject: Re: finding directory sizes
Message-Id: <slrncij5ti.fbv.damian@puma.qimr.edu.au>

On Mon, 23 Aug 2004 06:48:43 GMT, Zebee Johnstone said:
>...
>open (DU,"find $snapshot -type d -maxdepth 1 -exec du -sk --exclude=access_log* --exclude=error_log* {} \\;|") || die "can't do find for $snapshot $!\n";
>
>I did think of using stat to add up every file, but if I'm talking 
>a few hundred per directory, is that wise?  And how would I exclude
>files, considering that each main directory set has more than one file
>pattern to exclude?  (this has 2, others have 3 or 4)

I'd suggest using File::Find with an appropriate callback sub. It's in
the standard distribution, and the docs have a few recipes. 

Cheers,
Damian


------------------------------

Date: Mon, 23 Aug 2004 07:27:57 GMT
From: Zebee Johnstone <zebee@zip.com.au>
Subject: Re: finding directory sizes
Message-Id: <slrncij6pj.ren.zebee@zeus.zipworld.com.au>

In comp.lang.perl.misc on 23 Aug 2004 07:07:30 GMT
Damian James <damian@qimr.edu.au> wrote:
> On Mon, 23 Aug 2004 06:48:43 GMT, Zebee Johnstone said:
>>...
>>open (DU,"find $snapshot -type d -maxdepth 1 -exec du -sk --exclude=access_log* --exclude=error_log* {} \\;|") || die "can't do find for $snapshot $!\n";
>>
>>I did think of using stat to add up every file, but if I'm talking 
>>a few hundred per directory, is that wise?  And how would I exclude
>>files, considering that each main directory set has more than one file
>>pattern to exclude?  (this has 2, others have 3 or 4)
> 
> I'd suggest using File::Find with an appropriate callback sub. It's in
> the standard distribution, and the docs have a few recipes. 

I'm not sure what you mean by 'appropriate callback sub".

Do you mean use File::Find recursively to run stat on every file?

As far as I know, if you do that, you can't pass parameters to
the sub that's processing the files, so suddenly everything's global?

and as I say, is running stat on every file  in dirs that have hundreds
of files the right way to go?  and how to exclude ones you don't want?
I know the patterns I want to exclude, how do I pass those to the
File::Find subroutine?

Zebee


------------------------------

Date: Mon, 23 Aug 2004 07:44:31 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: finding directory sizes
Message-Id: <zfhWc.73268$TI1.21608@attbi_s52>

Zebee Johnstone wrote:

> and as I say, is running stat on every file  in dirs that have hundreds
> of files the right way to go?

If you run `du` on a directory with hundreds of files, it is going
to stat() every file in the directory and all its subdirectories.

> and how to exclude ones you don't want?

Use the global $prune variable.

I know the patterns I want to exclude, how do I pass those to the
> File::Find subroutine?

sub wanted {
   return($File::Find::prune = 1) if /unwanted|directory/;
   ...
}

	-Joe


------------------------------

Date: Mon, 23 Aug 2004 08:14:24 GMT
From: Zebee Johnstone <zebee@zip.com.au>
Subject: Re: finding directory sizes
Message-Id: <slrncij9gm.t2h.zebee@zeus.zipworld.com.au>

In comp.lang.perl.misc on Mon, 23 Aug 2004 07:44:31 GMT
Joe Smith <Joe.Smith@inwap.com> wrote:
> Zebee Johnstone wrote:
> 
> 
>> and how to exclude ones you don't want?
> 
> Use the global $prune variable.

Where is that documented?  I saw it in the File::Find perldoc but only
in passing, and it doesn't reall explain what it is or does.  perldoc -f
and perltoc don't mention it.

Does it exclude files or just directories, or whatever's matched by 
the regexp?

Zebee


------------------------------

Date: Mon, 23 Aug 2004 09:08:14 +0100
From: Brian McCauley <nobull@mail.com>
Subject: Re: finding directory sizes
Message-Id: <cgc8dl$bte$1@sun3.bham.ac.uk>



Zebee Johnstone wrote:
> In comp.lang.perl.misc on 23 Aug 2004 07:07:30 GMT
> Damian James <damian@qimr.edu.au> wrote:
> 
>>On Mon, 23 Aug 2004 06:48:43 GMT, Zebee Johnstone said:
>>
>>>...
>>>open (DU,"find $snapshot -type d -maxdepth 1 -exec du -sk --exclude=access_log* --exclude=error_log* {} \\;|") || die "can't do find for $snapshot $!\n";
>>>
>>>I did think of using stat to add up every file, but if I'm talking 
>>>a few hundred per directory, is that wise?  And how would I exclude
>>>files, considering that each main directory set has more than one file
>>>pattern to exclude?  (this has 2, others have 3 or 4)
>>
>>I'd suggest using File::Find with an appropriate callback sub. It's in
>>the standard distribution, and the docs have a few recipes. 
> 
> 
> I'm not sure what you mean by 'appropriate callback sub".
> 
> Do you mean use File::Find recursively to run stat on every file?
> 
> As far as I know, if you do that, you can't pass parameters to
> the sub that's processing the files, so suddenly everything's global?

Do not have an irrational fear of using package variables an local(). 
(Have only a rational fear).  Some time ago someone motivated by 
irrational fear actually modified File::Find itself not to use package 
variables for it's global variables but instead to use file-scoped 
lexicals (still global in the programming sense).  Because local() 
doesn't work on lexicals this person just unthinkingly removed all the 
local()s.  In so doing they, of course, broke the re-entrancy of File::Find.

However, that said, you only need to use global variables (meaning 
file-socped lexicals or package scoped variables) if you want the 
callback to be a named subroutine.  If you use an anonymous subroutine 
then it acts as a closure meaning it can see lexically scoped variables 
that were in scope where the anonymous sub was defined.

sub do_find {
   my $foo = 'somthing';
   my $wanted = sub {
      # do stuff with foo
   };
   find($wanted, '/foo', '/bar');
}

> and as I say, is running stat on every file  in dirs that have hundreds
> of files the right way to go?

Well obviously you have to do this in some way - but on Win32 IIRC the 
implementation of stat() is (was?) pathological.  If speed is of the 
essence on Win32 then spawn a native windows recursive directory lister 
and parse the output.

> and how to exclude ones you don't want?

return if ....

Or to exclude whole dirtectories

$File::Find::prune = 1 if ...

> I know the patterns I want to exclude, how do I pass those to the
> File::Find subroutine?

Shared variables (either global or via closures).

-- 
      \\   ( )
   .  _\\__[oo
  .__/  \\ /\@
  .  l___\\
   # ll  l\\
  ###LL  LL\\



------------------------------

Date: 23 Aug 2004 04:46:12 -0700
From: jkeen_via_google@yahoo.com (Jim Keenan)
Subject: Re: finding directory sizes
Message-Id: <196cb7af.0408230346.7c6512f9@posting.google.com>

Zebee Johnstone <zebee@zip.com.au> wrote in message news:<slrncij4g1.pvv.zebee@zeus.zipworld.com.au>...
> I want to archive directories to CD.  I have many of them in 
> various places, I don't care if one from /data/web is on the
> same CD as one from /home as long as the specified directory is
> not split any further.
> 
> The important point is that there are things I need to exclude, such as
> log files.  
> 
> I'm currently getting the size by using du in an open, and munging
> the result, is there a better way?
> 

Unless you can demonstrate through benchmarking that this is a faster
approach than another such as using 'stat', I don't see why you need
to open a filehandle connection to read a file when you are simply
interested in the file's name and size.

jimk


------------------------------

Date: Mon, 23 Aug 2004 14:21:13 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: finding directory sizes
Message-Id: <t3nWc.13450$LV1.12228@nwrddc02.gnilink.net>

Zebee Johnstone wrote:
> Damian James <damian@qimr.edu.au> wrote:
>> I'd suggest using File::Find with an appropriate callback sub. It's
>> in the standard distribution, and the docs have a few recipes.
>
> I'm not sure what you mean by 'appropriate callback sub".

The "wanted()" function, that _you_ need to provide sucht hat File::Find
knows what to do with each file.

> Do you mean use File::Find recursively

No need for. That is the beauty of File::Find that it will recurse
automatically without _you_ doing all the leg work.

> to run stat on every file?

Try "-s" instead.

> and as I say, is running stat on every file  in dirs that have
> hundreds of files the right way to go?  and how to exclude ones you
> don't want? I know the patterns I want to exclude, how do I pass
> those to the File::Find subroutine?

Did you look at the documentation and examples that come with File::Find?

jue




------------------------------

Date: Mon, 23 Aug 2004 08:38:23 +0200
From: Michael Peppler <mpeppler@peppler.org>
Subject: Re: how to install Sybase::DBlib???
Message-Id: <pan.2004.08.23.06.38.23.941931@peppler.org>

On Sat, 21 Aug 2004 23:59:01 -0400, yubo he wrote:

> hey, I am using activate perl in windows, I used ppm to install
> Sybase::DBlib
> error: could not locate a PPD file for package Sybase-DBlib

There is an ActiveState binary for sybperl at
http://www.peppler.org/downloads/ActiveState

Note that I don't use Windows myself, so I have no personal knowledge on
how to install this, or how well it works.

Michael
-- 
Michael Peppler                              Data Migrations, Inc.
mpeppler@peppler.org                       http://www.peppler.org/
Sybase T-SQL/OpenClient/OpenServer/C/Perl developer available for short or 
long term contract positions - http://www.peppler.org/resume.html



------------------------------

Date: 22 Aug 2004 19:44:42 -0700
From: josh.lincoln@gmail.com (jdog)
Subject: Re: How to randomize foreach
Message-Id: <f59ab99c.0408221844.779322eb@posting.google.com>

Thanks for the feedback. Right now the array is fairly small so I'm
using the most convenient option (List::Util::Shuffle).  If the script
becomes slow when the array grows I'll do some benchmarking.

Josh


------------------------------

Date: 23 Aug 2004 07:27:36 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: How to randomize foreach
Message-Id: <slrncij738.q3p.abigail@alexandra.abigail.nl>

Matthew Braid (mb@uq.net.au.invalid) wrote on MMMMX September MCMXCIII in
<URL:news:cgbd5q$j40$1@bunyip.cc.uq.edu.au>:
''  
''  Odd. I just tried the same thing (well, I added a third option - sort 
''  {int(rand(3)) - 1} @arr - to double check:

I've seen variations of the third option before, being presented as a
way to shuffle a list. But I've yet to see any proof that this gives a
fair shuffle (fair being every possible permutation of the list has the
same chance of being produced, given a truely random 'rand()' function).


Abigail
-- 
perl -we '$_ = q ;4a75737420616e6f74686572205065726c204861636b65720as;;
          for (s;s;s;s;s;s;s;s;s;s;s;s)
              {s;(..)s?;qq qprint chr 0x$1 and \161 ssq;excess;}'


------------------------------

Date: 23 Aug 2004 07:28:26 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: How to randomize foreach
Message-Id: <slrncij74q.q3p.abigail@alexandra.abigail.nl>

jdog (josh.lincoln@gmail.com) wrote on MMMMX September MCMXCIII in
<URL:news:f59ab99c.0408221844.779322eb@posting.google.com>:
^^  Thanks for the feedback. Right now the array is fairly small so I'm
^^  using the most convenient option (List::Util::Shuffle).  If the script
^^  becomes slow when the array grows I'll do some benchmarking.


Given that's also one of the most efficient ways of doing it, no need
for a chance if the array grows.


Abigail
-- 
perl -we 'print split /(?=(.*))/s => "Just another Perl Hacker\n";'


------------------------------

Date: Mon, 23 Aug 2004 10:17:35 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: How to randomize foreach
Message-Id: <2otnq3Fd89ujU1@uni-berlin.de>

Matthew Braid wrote:
> Gunnar Hjalmarsson wrote:
>> I measured with the List::Util::shuffle() function, too, and it
>> run about 70 times per second. However, when I included loading
>> of the module in the benchmark, the overall speed was reduced to
>> 12 times per second, i.e. slightly slower than the slice()
>> method...

<snip>

>> Output including module loading:
>> 
>>             Rate shuffle()  splice()
>> shuffle() 12.0/s        --      -23%
>> splice()  15.5/s       30%        --
> 
> Odd. I just tried the same thing (well, I added a third option -
> sort {int(rand(3)) - 1} @arr - to double check:

<snip>

> and I got:
> 
>             Rate    rand()  splice() shuffle()
> rand()    23.1/s        --      -47%      -88%
> splice()  43.3/s       87%        --      -77%
> shuffle()  189/s      719%      338%        --
> 
> Where shuffle was the clear winner by a mile.... Normally I'd say
> YMMV, but this seems like a slightly over the top difference. I'm
> using an slightly old perl (5.8.1) - maybe something is different
> between our versions?

Well, I ran it using 5.8.0 on Windows 98. Just tried it with 5.8.1 on
Linux, and then the results were similar to yours. Can the explanation
possibly be that loading a compiled module is much slower on Windows?

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: 23 Aug 2004 08:32:55 GMT
From: Sam Holden <sholden@flexal.cs.usyd.edu.au>
Subject: Re: How to randomize foreach
Message-Id: <slrncijatn.poa.sholden@flexal.cs.usyd.edu.au>

On 23 Aug 2004 07:27:36 GMT, Abigail <abigail@abigail.nl> wrote:
> Matthew Braid (mb@uq.net.au.invalid) wrote on MMMMX September MCMXCIII in
><URL:news:cgbd5q$j40$1@bunyip.cc.uq.edu.au>:
> ''  
> ''  Odd. I just tried the same thing (well, I added a third option - sort 
> ''  {int(rand(3)) - 1} @arr - to double check:
>
> I've seen variations of the third option before, being presented as a
> way to shuffle a list. But I've yet to see any proof that this gives a
> fair shuffle (fair being every possible permutation of the list has the
> same chance of being produced, given a truely random 'rand()' function).

Clearly it doesn't. Consider the simple case of a two element list (a,b).
There are two possible shuffle results: (a,b) and (b,a) each of which will
have equal probability of occuring in a fair shuffle.

Any sane implementation of sort will when asked to sort a two element list
do a single comparison, so for the above there are three cases:

	(int(rand(3)) -1)==-1 -> (a,b)
	(int(rand(3)) -1)==0  -> (a,b)
	(int(rand(3)) -1)==1  -> (b,a)

Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
it might improve with a larger list, but it doesn't bode well when the
algorithm fails for such a simple case.

Now the middle case, it could be argued, could result in (b,a) if the
sort was not stable - but even so if it did it always would and hence
(b,a) would be favoured and the problem remains. Well, for a sane
sort implementation anyway.

-- 
Sam Holden


------------------------------

Date: Mon, 23 Aug 2004 12:45:52 +0200
From: "Tassilo v. Parseval" <tassilo.von.parseval@rwth-aachen.de>
Subject: Re: How to randomize foreach
Message-Id: <2ou072FesdipU1@uni-berlin.de>

Also sprach Sam Holden:

> On 23 Aug 2004 07:27:36 GMT, Abigail <abigail@abigail.nl> wrote:
>> Matthew Braid (mb@uq.net.au.invalid) wrote on MMMMX September MCMXCIII in
>><URL:news:cgbd5q$j40$1@bunyip.cc.uq.edu.au>:
>> ''  
>> ''  Odd. I just tried the same thing (well, I added a third option - sort 
>> ''  {int(rand(3)) - 1} @arr - to double check:
>>
>> I've seen variations of the third option before, being presented as a
>> way to shuffle a list. But I've yet to see any proof that this gives a
>> fair shuffle (fair being every possible permutation of the list has the
>> same chance of being produced, given a truely random 'rand()' function).
> 
> Clearly it doesn't. Consider the simple case of a two element list (a,b).
> There are two possible shuffle results: (a,b) and (b,a) each of which will
> have equal probability of occuring in a fair shuffle.
> 
> Any sane implementation of sort will when asked to sort a two element list
> do a single comparison, so for the above there are three cases:
> 
> 	(int(rand(3)) -1)==-1 -> (a,b)
> 	(int(rand(3)) -1)==0  -> (a,b)
> 	(int(rand(3)) -1)==1  -> (b,a)
> 
> Clearly the shuffle isn't fair since (a,b) is twice as likely. Of course
> it might improve with a larger list, but it doesn't bode well when the
> algorithm fails for such a simple case.

The naive solution to that would be disallowing '0' so that pairs are
always swapped.

However, this wont make the shuffling fair either. One problem (which
can be verified with some simple empirical tests), is that the fairness
depends on the algorithm used.

Taking this little program:

    my @list = (1 .. 4);
    my %perms;

    shuffle() for 1 .. 10000;

    while (my ($key, $val) = each %perms) {
	print "$key => $val\n";
    }

    sub shuffle {
	my @shuffled = sort { int(rand(2)) == 0 ? -1 : 1 } @list;
	$perms{ join "-", @shuffled }++;
    }

On perl5.8.4 with merge-sort, this yields a distribution ressembling:

2-1-4-3 => 620
1-2-4-3 => 606
3-2-1-4 => 313
1-4-2-3 => 331
2-4-3-1 => 300
1-3-4-2 => 330
2-1-3-4 => 583
4-1-2-3 => 320
1-2-3-4 => 678
3-4-2-1 => 619
2-3-4-1 => 313
3-1-4-2 => 301
4-3-1-2 => 646
2-3-1-4 => 325
4-3-2-1 => 627
1-3-2-4 => 317
4-2-1-3 => 296
1-4-3-2 => 323
2-4-1-3 => 289
3-2-4-1 => 303
3-4-1-2 => 609
4-1-3-2 => 332
4-2-3-1 => 303
3-1-2-4 => 316

You consistently get certain permutations that show up only around 300
times where others happen at least 600 times (2-1-4-3 is one of them).

Now running the same with 'use sort "_qsort"' gives a very different
picture. It's even more biased:

2-1-4-3 => 621
1-2-4-3 => 584
3-2-1-4 => 651
1-4-2-3 => 321
2-4-3-1 => 155
1-3-4-2 => 323
2-1-3-4 => 1242
4-1-2-3 => 299
1-2-3-4 => 1199
3-4-2-1 => 142
2-3-4-1 => 344
3-1-4-2 => 286
4-3-1-2 => 169
4-3-2-1 => 155
2-3-1-4 => 647
1-3-2-4 => 629
4-2-1-3 => 328
1-4-3-2 => 154
2-4-1-3 => 290
3-2-4-1 => 316
4-1-3-2 => 174
3-4-1-2 => 147
3-1-2-4 => 659
4-2-3-1 => 165

Lists 2-1-3-4 and 1-2-3-4 are always at least around 1200.

In a strict sense, this doesn't prove anything and the randomized sort
could still be fair. But really, it's not.

It might be possible to write a sort algorithm in a way that it could be
used for shuffling. Perl's implementations however weren't designed that
way. Especially perl's quicksort has been optimized to death and I
wouldn't be surprised if those optimizations contribute to this strong
bias towards certain permutations.

Having said that, I tried it with these two sort-routines:

    sub bubble (&@) {
	my ($cmp, @copy) = @_;
	for my $i (reverse 1 .. $#copy) {
	    for my $j (1 .. $i) {
		local ($a, $b) = @copy[$j-1, $j];
		@copy[$j-1, $j] = @copy[$j, $j-1] if &$cmp == 1;
	    }
	}
	return @copy;
    }

    sub selection (&@) {
	my ($cmp, @copy) = @_;
	my $min;
	for my $i (0 .. $#copy-1) {
	    $min = $i;
	    for my $j ($i+1 .. $#copy) {
		local ($a, $b) = @copy[$j, $min];
		$min = $j if &$cmp == -1;
	    }
	    @copy[$min, $i] = @copy[$i, $min];
	}
	return @copy;
    }

in the hope that maybe those naive sort algorithms could be used (as
they tend to compare each element with all remaining ones). But I was
wrong. Selection-sort has a similar bias as quicksort has (this time
favoring 4-1-2-3 and 4-1-3-2). Bubble-sort is slightly better, but still
more biased than perl's built-in mergesort.

There are other interesting things to note. When allowing the compare
function to return -1, 1 and 0 (as opposed to only -1 and 1), selection
sort gets less biased, whereas bubble sort gets more biased (4-3-2-1
only shows up around ten times and 1-2-3-4 1800 times on average).

Without being a formal proof for anything, these observations show that
the shuffle-by-sort approach is probably very rotten and should best be
left alone.

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


------------------------------

Date: 23 Aug 2004 13:08:43 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: How to randomize foreach
Message-Id: <cgcq8r$h5e$1@mamenchi.zrz.TU-Berlin.DE>

Abigail  <abigail@abigail.nl> wrote in comp.lang.perl.misc:
> Anno Siegel (anno4000@lublin.zrz.tu-berlin.de) wrote on MMMMIX September
> MCMXCIII in <URL:news:cgapo7$82q$1@mamenchi.zrz.TU-Berlin.DE>:
> %%  Gunnar Hjalmarsson  <noreply@gunnar.cc> wrote in comp.lang.perl.misc:
> %% > Josh wrote:
> %% > > I have an array that I need to cycle through once, but in a random 
> %% > > order. For now I'm dumping the variables into a hash, pulling a
> %% > > random number, retrieving that value, using it, deleting it, and
> %% > > repeating the process until there are no more members in the hash.
> %% > > This obviously won't be efficient with a large set. Seems there
> %% > > must be a better way.
> %% > 
> %% > This is at least significantly less code:
> %% > 
> %% >      my @arr = qw/x t m v/;
> %% >      print splice(@arr, rand $_, 1) for reverse 1 .. @arr;
> %%  
> %%  Or even
> %%  
> %%      print splice(@arr, rand @arr, 1) while @arr;
> 
> Let's not. Splicing out elements one-by-one of an array is very
> inefficient.  Shuffling can be done in linear time, while the splice
> method takes time quadratic in the length of the array.

True, though the quadratic term has a small factor.  One time-linear
shuffler swaps list elements instead of extracting them.  That saves
the time splice needs to shift the array tight:

    sub swapping {
        for ( reverse 1 .. $#_ ) {
            my $r = rand( 1 + $_);
            @_[ $r, $_] = @_[ $_, $r];
        }
        @_;
    }

Benchmarking this against

    sub splicing {
        map splice( @_, rand $_, 1), reverse 1 .. @_;
    }

shows splicing() about twice as fast for arrays shorter than 1000.
The break-even point is with lists of length 12_000; from then on
swapping() wins out by an increasing margin. At 30_000, swapping
is twice as fast as splicing.  On my machine. 

That leaves quite a few applications (such as card shuffling) where
the splice solution is practical.

> It's even worse than suggesting to use Bubblesort to sort an array.

Even Bubblesort has found respectable applications.  It is fast for small
lists that are almost in order and has been used to speed up another
sort algorithm.  I think the example is documented in the _Programming
Pearls_ series, though the example I find [1] uses insertion sort (another
n**2 sorting algorithm).

Anno

[1] Bentley, _Programming Pearls_, p. 112f


------------------------------

Date: Mon, 23 Aug 2004 16:57:13 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: How to randomize foreach
Message-Id: <2oufa6Ff0adaU1@uni-berlin.de>

Anno Siegel wrote:
> One time-linear shuffler swaps list elements instead of extracting
> them.  That saves the time splice needs to shift the array tight:
> 
>     sub swapping {
>         for ( reverse 1 .. $#_ ) {
>             my $r = rand( 1 + $_);
>             @_[ $r, $_] = @_[ $_, $r];
>         }
>         @_;
>     }
> 
> Benchmarking this against
> 
>     sub splicing {
>         map splice( @_, rand $_, 1), reverse 1 .. @_;
>     }
> 
> shows splicing() about twice as fast for arrays shorter than 1000.
> The break-even point is with lists of length 12_000; from then on
> swapping() wins out by an increasing margin. At 30_000, swapping is
> twice as fast as splicing.  On my machine.

The benchmark I posted showed that the execution time of
List::Util::shuffle() is much faster than that. Is the explanation,
then, that I was actually comparing apples and oranges, since
List::Util is a compiled module?

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Mon, 23 Aug 2004 07:13:44 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: perl interpreter automatically exit windows so how I can saw the result of script?
Message-Id: <IOgWc.42761$Fg5.37438@attbi_s53>

Jeff 'japhy' Pinyan wrote:

> On Thu, 19 Aug 2004, Greg Bacon wrote:
> 
>>I like to use something similar to the following:
>>
>>   END { system "pause" unless $ENV{PROMPT} }
>>
>>That way, I don't see the pause in a command window.
>>
>>I'd like to know of a workaround that would cause the window to
>>linger when there are syntax errors, a use fails, etc.
> 
> Easy:  make that END { ... } the first line of your code.

Nope, that doesn't work.  END only works after INIT time, and that
does not happen on compile errors.

Putting CHECK{...} as the first line of code does handle
missing 'use' statements and compile errors, but may not show
all the error messages in the right order.
	-Joe

C:\jms>perl -e "CHECK{system 'pause'}; use strict; print $bar"
Global symbol "$bar" requires explicit package name at -e line 1.
Execution of -e aborted due to compilation errors.
Press any key to continue . . .
Can't use string ("bar") as a SCALAR ref while "strict refs" in use at -e line 1.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6899
***************************************


home help back first fref pref prev next nref lref last post