[24761] in Perl-Users-Digest
Perl-Users Digest, Issue: 6914 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 26 09:11:09 2004
Date: Thu, 26 Aug 2004 06:10:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 26 Aug 2004 Volume: 10 Number: 6914
Today's topics:
Re: Performance Improvement of complex data structure ( (Anno Siegel)
Re: performance surprise -- why? <haltingNOSPAM@comcast.net>
Re: Resizing JPG images with Perl? <tore@aursand.no>
Re: Resizing JPG images with Perl? <nospam@thanksanyway.org>
Re: Resizing JPG images with Perl? <nospam@thanksanyway.org>
Re: Resizing JPG images with Perl? <mgjv@tradingpost.com.au>
Re: Resizing JPG images with Perl? <ithinkiam@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 26 Aug 2004 12:29:09 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <cgkl2l$mdl$1@mamenchi.zrz.TU-Berlin.DE>
Scott Gilpin <sgilpin@gmail.com> wrote in comp.lang.perl.misc:
> Anno Siegel wrote:
> > Scott Gilpin <sgilpin@gmail.com> wrote in comp.lang.perl.misc:
[...]
> Thanks for your reply. I apologize for not being more clear in my
> original post. I've included the entire code to produce the desired
> output. The first 10 lines of the input file are:
>
> 928219|7|6|MI|2
> 928219|9|5|MO|1
> 928219|11|5|CA|41
> 928219|8|6|MA|1
> 928219|5|5|WY|3
> 701396|10|7|QC|8
> 701396|17|1|MI|1
> 928219|0|3|CA|2
> 701396|13|1|CA|2
> 928219|1|1|CA|2
>
> The header is:
>
> col_name|matrix1_rows|matrix2_rows|matrix3_rows|cell_values
>
> The source code is:
Yeah, that's much better.
I have added some critical remarks to your code, but to summarize,
I still don't see why it is slow, and I agree that it shouldn't take
6 hours on a modern machine.
To pinpoint the problem, exclude parts of the code from processing.
How long does it take just to read the file and do nothing with
the lines? What if you don't split the lines but use a fixed array
instead of the line data? What if you *do* split, but don't make
the matrix entries? Etc... See my suggestion for timing code below
to facilitate the evaluation.
So:
> #!/usr/local/bin/perl5.8.3
>
> use strict;
No warnings? You should always switch warnings on, especially when
juggling nested data structures.
> ## The list of matrices actually varies between invocations
> ## of the program - anywhere from 3 - 15
> my @matrixList = ("matrix1", "matrix2", "matrix3");
>
> ## Position hash has the row positions of the values for each matrix
> my %positionHash = (matrix1 => 1, matrix2 => 2, matrix3 => 3);
>
> ## Keep track of the columns we've seen so far
> my %columns_seen = ();
>
> ## hash of hashes of hashes - matrix => rows => columns
> my %matrix_values = ();
>
> open (INDATA, "data.txt") ||
> die "can't open data.txt";
>
To facilitate run-time evaluation, add this:
my $linecount = 0;
my $time = - times;
> while(<INDATA>) {
$linecount ++;
Code in {} should be indented!
>
> chomp($_);
Just
chomp;
does the same thing. The way things are you don't need to chomp
at all (because the last field is numeric and doesn't suffer from
a trailing "\n"), but I'd leave it in.
>
> ## Each row is variable width, delimited with |
> my @original_row = split(/\|/,$_);
>
> ## The cell value and the column name are always in the same
> ## position
> my $cell_value = $original_row[4];
> my $col_name = $original_row[0];
Shorter, but essentially no different:
my ( $col_name, $cell_value) = @original_row[ 0, 4];
Also, the "magic number" 4 is one more than the elements in
@matrixList, so:
my ( $col_name, $cell_value) = @original_row[ 0, 1 + @matrixList];
> ## Add this column name to the list of ones we've seen
> $columns_seen{$col_name}=1;
>
> ## For each matrix, loop through and increment the
> ## row/column value
> foreach my $matrix (@matrixList) {
Next level of indentation, please!
> my $row_name = $original_row[$positionHash{$matrix}];
> $matrix_values{$matrix}{$row_name}{$col_name} += $cell_value;
> }
>
> } ## end while
Now show the number of lines and the time used:
$time += times;
warn "$linecount lines read in $time s cpu\n";
> ## The following code runs very quicky compare to the
> ## while loop above (2 mins vs. 6 hrs)
> ## I'm only including it to produce the desired output
Including it is good. This way we can generate valid results for
comparison.
> ## Create a header row with column names that is the same
> ## for all matrices
> my $header = "";
>
>
> foreach my $col_name (sort keys %columns_seen) {
> $header = $header . "," . "$col_name";
> }
A shorter and more standard way, though without the leading "," is
my $header = join ',', sort keys %columns_seen;
> ## Create a file for each separate matrix
> foreach my $matrix (@matrixList) {
>
> ## Open output file
> my $OUT_FILE = $matrix . ".csv";
> open (OUTFILE, ">$OUT_FILE") || die "can't open $OUT_FILE";
>
> ## Now we create the first line of file.
> ## Starting with the matrix name and a comma.
> ## Then printing out the column names.
>
> my $firstline = $matrix . "$header";
Originally there was no need to quote "$header". Now there is,
because we must insert a ",":
my $firstline = $matrix . ",$header";
> print OUTFILE "$firstline\n";
>
> ## Loop for each row in the matrix
> foreach my $row_name (keys(%{$matrix_values{$matrix} } )) {
>
> my $line = $row_name;
>
> ## Loop for each column in the matrix
> foreach my $col_name (sort keys %columns_seen) {
Sorting again for each line that is printed is wasteful. Sort them
once and for all before you enter any loop.
> my $cell_value;
> if ($matrix_values{$matrix}{$row_name}{$col_name}) {
> $cell_value =
> $matrix_values{$matrix}{$row_name}{$col_name};
> } else {
> $cell_value = ".";
> }
This would more commonly be written
my $cell_value = $matrix_values{$matrix}{$row_name}{$col_name} || '.';
but that is arguably wrong. What if the cell value is defined, but
happens to be 0? Should 0 be replaced by "."? If not,
my $cell_value = $matrix_values{$matrix}{$row_name}{$col_name};
defined or $_ = '.' for $cell_value;
> $line = $line . ",$cell_value";
> }
> print OUTFILE "$line\n";
>
> }
> close OUTFILE;
> }
Anno
------------------------------
Date: Thu, 26 Aug 2004 02:13:45 GMT
From: Joe Davison <haltingNOSPAM@comcast.net>
Subject: Re: performance surprise -- why?
Message-Id: <m2zn4iwt62.fsf@Jupiter.local>
On 26 Aug 2004, ctcgag@hotmail.com wrote:
> Joe Davison <haltingNOSPAM@comcast.net> wrote:
> > On 25 Aug 2004, Anno Siegel wrote:
> > >
> > > That is unexpected. The substitution method must move parts of
> > > the string for every match, so I'd expect it to be slower than
> > > global matching.
> > >
> > > I benchmarked both, and also a method based on the index()
> > > function. The results show indexing and global matching in the
> > > same ballpark, both almost twice as fast as substitution (code
> > > appended below):
> > >
> > > substitute 13.6/s -- -41% -45%
> > > indexing 23.1/s 69% -- -7%
> > > globmatch 24.8/s 82% 7% --
> > >
> > > This was on a slow matching with much smaller (200 K) samples,
> > > but the dependence on size should be largely linear. Where your
> > > quadratic (well, more-than-linear) behavior comes from is
> > > anybody's guess.
> > >
> > > Anno
> > >
> >
> > Thanks. I took your program and modified it so that instead of
> > generating the string you're matching, it read if from a file, and
> > took the file name from the command line -- so I could feed it the
> > same files I'd used before.
>
> Let us crawl before we walk. Run the code exactly like Anno posted
> it.
>
> If globmatch is dogging it when you do that, we know that the problem
> lies in your machine or perl installation, not in your Perl program.
>
> (I tested it up to SIZE of 400_000_000, and globmatch always
> came out on top)
>
> Xho
I did that before I modified it. Here's the result -- notice that
globmatch is fastest in this case.
Rate substitute indexing globmatch
substitute 104/s -- -21% -49%
indexing 131/s 26% -- -36%
globmatch 206/s 98% 57% --
time: 4.58s user 1.04s system 94% cpu 5.970 total
The changes I made to his code were trivial, here's my version:
----------------------
#!/usr/bin/perl
use strict; use warnings; $| = 1;
#use Vi::QuickFix;
use English;
use Benchmark ':all';
use constant SIZE => 400_000;
my ($sequence,$ifile,$whatItWas);
my $str;
$sequence = shift @ARGV;
$ifile = shift @ARGV;
unless (open INPUT, $ifile) {
print STDERR "Can't open $ifile: $! \n";
next FILE;
}
# Suck the whole ifile in for processing
$str=<INPUT>; #toss the first record (the >CHR1v03212003 line in one case...)
$whatItWas = $INPUT_RECORD_SEPARATOR;
undef $INPUT_RECORD_SEPARATOR;
$str = <INPUT>;
$INPUT_RECORD_SEPARATOR = $whatItWas;
close INPUT;
# Now do what we came for
$str =~ s/\n//gs; # delete newlines
goto bench;
print "globmatch: ", globmatch(), "\n";
print "indexing: ", indexing(), "\n";
print "substitute: ", substitute(), "\n";
exit;
bench:
cmpthese( -1, {
globmatch => 'globmatch',
substitute => 'substitute',
indexing => 'indexing',
});
######################################################################
sub globmatch {
my @indices;
$_ = $str;
push @indices, pos while /$sequence/g;
scalar @indices;
}
sub substitute {
$_ = $str;
s/$sequence/\n$sequence/g;
}
sub indexing {
$_ = $str;
my @indices;
my $pos = 0;
while ( 1 ) {
last if ( $pos = index( $_, $sequence, $pos)) < 0;
push @indices, $pos;
$pos += length $sequence;
}
scalar @indices;
}
------------------------------
Date: Thu, 26 Aug 2004 03:07:48 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: Resizing JPG images with Perl?
Message-Id: <pan.2004.08.26.01.07.48.593972@aursand.no>
On Wed, 25 Aug 2004 18:03:39 -0700, Mark wrote:
>> Did you even _try_ to search for a module at CPAN?
>> <http://www.cpan.org/>
> Yes. That's always the first place I look.
You should have said so. You also should have said which modules you've
downloaded and why they didn't work for you (or didn't fit your needs).
I only use Perl under Linux, but I remember there was some sort of "this
works on the following platforms"-tests somewhere? I think it was at
CPAN, but my memory might fool me.
--
Tore Aursand <tore@aursand.no>
"The most likely way for the world to be destroyed, most experts agree,
is by accident. That's where we come in; we're computer professionals.
We cause accidents." (Nathaniel Borenstein)
------------------------------
Date: Wed, 25 Aug 2004 19:52:01 -0700
From: "Mark" <nospam@thanksanyway.org>
Subject: Re: Resizing JPG images with Perl?
Message-Id: <BKudnZYbnozOzbDcRVn-rQ@speakeasy.net>
"Mark" <nospam@thanksanyway.org> wrote:
> I downloaded two different modules, both of which
> failed with compiler errors.
For example, I just downloaded PerlMagick and several ImageMagick modules,
and I am attempting to build PerlMagick using Microsoft C++.
Once again, it fails (compiler output below.) Any suggestions?
Is there a freeware C/C++ compiler that I could try that would
build this project successfully?
The environment variables for my compiler installation all use
short filenames, so that doesn't appear to be the problem.
I also tried the "Digital Mars" compiler, and it choked on one of the
compiler command-line switches.
Help!
Thanks
-Mark
C:\temp\PerlMagick-6.02>nmake
Microsoft (R) Program Maintenance Utility Version 1.50
Copyright (c) Microsoft Corp 1988-94. All rights reserved.
cl -c -I../ -I.. -I/usr/include/freetype2 -I/usr/X11R6/include -I/usr/X11R6
/include/X11 -I/
usr/include/libxml2 -nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE
-DNO_STRICT -DHAVE_DES_
FCRYPT -DNO_HASH_SEED -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PER
LIO -DPERL_MSVCRT_READFI
X -g -O2 -Wall -pthread -MD -Zi -DNDEBUG -O1 -DVERSION=\"6.0.2\" -DXS_VE
RSION=\"6.0.2\" "-IC:\P
erl\lib\CORE" -D_FILE_OFFSET_BITS=64 -DHAVE_CONFIG_H Magick.c
Command line warning D4025 : overriding '/O1' with '/O2'
Command line error D2021 : invalid numeric argument '/Wall'
NMAKE : fatal error U1077: 'C:\WINDOWS\system32\cmd.exe' : return code '0x2'
Stop.
------------------------------
Date: Wed, 25 Aug 2004 19:56:07 -0700
From: "Mark" <nospam@thanksanyway.org>
Subject: Re: Resizing JPG images with Perl?
Message-Id: <j8SdnWFp8p3VzLDcRVn-oA@speakeasy.net>
FYI, I'm using (attempting to use) VC6.
Thanks
-Mark
------------------------------
Date: 26 Aug 2004 06:41:01 GMT
From: Martien Verbruggen <mgjv@tradingpost.com.au>
Subject: Re: Resizing JPG images with Perl?
Message-Id: <slrncir1fs.47i.mgjv@verbruggen.comdyn.com.au>
On Wed, 25 Aug 2004 15:21:05 -0700,
Mark <nospam@thanksanyway.org> wrote:
> I need to write some Perl code that will resize every JPG
> image in a directory. Can someone here suggest a Perl
> module that can do this, which isn't too difficult to learn
With a little code around it, GD, Imager and Image::Magick all can do
that (probably in reverse order of preference, depending on what else
you need to do).
> and which won't give me fits trying to build it on my
> Windows XP system?
That depends on the environment on your Windows XP system. The above
can all be compiled and built under Cygwin on XP, although I'd look
for precompiled modules first. If you're not using Cygwin, give up,
and just get ActiveState perl with their pre-compiled modules.
Why are you insisting on compiling them yourself?
Martien
--
|
Martien Verbruggen | We are born naked, wet and hungry. Then
Trading Post Australia | things get worse.
|
------------------------------
Date: Thu, 26 Aug 2004 11:44:43 +0100
From: Chris Cole <ithinkiam@gmail.com>
Subject: Re: Resizing JPG images with Perl?
Message-Id: <pan.2004.08.26.10.44.41.101125@gmail.com>
On Wed, 25 Aug 2004 19:52:01 -0700, Mark wrote:
> "Mark" <nospam@thanksanyway.org> wrote:
>> I downloaded two different modules, both of which failed with compiler
>> errors.
>
> For example, I just downloaded PerlMagick and several ImageMagick
> modules, and I am attempting to build PerlMagick using Microsoft C++.
> Once again, it fails (compiler output below.) Any suggestions? Is there
> a freeware C/C++ compiler that I could try that would build this project
> successfully?
[snip]
gnu gcc is the default C compiler which most 'freeware' software is
written for. However, imagemagick is available for windows precompiled and
should work with the Image::magick module.
http://www.imagemagick.org/www/download.html
I use all it the time for shrinking jpegs, but on a linux platform so
YMMV.
HTH
Chris.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6914
***************************************