[24763] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 6916 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 26 14:11:09 2004

Date: Thu, 26 Aug 2004 11:10:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 26 Aug 2004     Volume: 10 Number: 6916

Today's topics:
    Re: Parsing FileName for upload (Tony McGuire)
    Re: Parsing FileName for upload (Tony McGuire)
    Re: Parsing FileName for upload <noreply@gunnar.cc>
    Re: Performance Improvement of complex data structure ( <sgilpin@gmail.com>
    Re: Performance Improvement of complex data structure ( <sgilpin@gmail.com>
    Re: Performance Improvement of complex data structure ( ctcgag@hotmail.com
    Re: Performance Improvement of complex data structure ( ctcgag@hotmail.com
    Re: Performance Improvement of complex data structure ( (Anno Siegel)
    Re: Performance Improvement of complex data structure ( (Hae Jin)
    Re: Performance Improvement of complex data structure ( (Anno Siegel)
        printing code references (Stuart Kendrick)
    Re: printing code references (Anno Siegel)
    Re: Resizing JPG images with Perl? (Randal L. Schwartz)
        Substitution (regex?) <>
    Re: Substitution (regex?) <mritty@gmail.com>
        There IS a way to test for a file lock (Sara)
    Re: There IS a way to test for a file lock <ron.parker@povray.org>
        using the result of a variable regular expression leifwessman@hotmail.com
    Re: using the result of a variable regular expression <nobull@mail.com>
    Re: using the result of a variable regular expression <leifwessman@hotmail.com>
    Re: using the result of a variable regular expression <noreply@gunnar.cc>
    Re: using the result of a variable regular expression <mritty@gmail.com>
    Re: using the result of a variable regular expression <leifwessman@hotmail.com>
    Re: using the result of a variable regular expression (Anno Siegel)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 26 Aug 2004 09:26:27 -0700
From: tony@paradoxcommunity.com (Tony McGuire)
Subject: Re: Parsing FileName for upload
Message-Id: <f896a829.0408260826.11e7e00c@posting.google.com>

Tore Aursand <tore@aursand.no> wrote in message news:<pan.2004.08.25.22.56.05.213181@aursand.no>...
> On Wed, 25 Aug 2004 14:08:40 -0700, Tony McGuire wrote:
> >> BTW: Why do you need to know _if_ there is a full path present?
>  
> > The examples I've found expect to see either '/' or '\'.  Then they grab
> > the last portion as the file name. [...]
> 
> Still, I don't understand.  Why can't you use File::Basename?  Am I
> misunderstanding something here?

I've not tried this yet.

I'll have to investigate to see if I have what is needed; otherwise
I'll get it...I really need this to work reliably.

By the way, my 'server' is on W2k Pro.  And I'm running Apache
2.something.

Thanks Tore, and everyone else taking the time to try to help me.  As
a newbie to PERL, it is comforting to know there is a place to ask for
and get help.


------------------------------

Date: 26 Aug 2004 10:41:42 -0700
From: tony@paradoxcommunity.com (Tony McGuire)
Subject: Re: Parsing FileName for upload
Message-Id: <f896a829.0408260941.401067d9@posting.google.com>

Tore Aursand <tore@aursand.no> wrote in message news:<pan.2004.08.25.22.56.05.213181@aursand.no>...
> Still, I don't understand.  Why can't you use File::Basename?  Am I
> misunderstanding something here?

Tore,

THANK YOU for pointing this out.

Preliminary results are *perfect*!

The filename is correctly parsed both when sent from an IE/Windows
machine as well as when sent from the Opera/Linux machine.

I never wanted the remote path on the user's machine, but IE sends it
anyway.

The File:Basename and fileparse() get the correct information every
time.

Again, thanks for responding as well as for your patience.


------------------------------

Date: Thu, 26 Aug 2004 19:39:35 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Parsing FileName for upload
Message-Id: <2p6m30Fhgpn2U1@uni-berlin.de>

Tony McGuire wrote:
> Tore Aursand wrote:
>> Still, I don't understand.  Why can't you use File::Basename?  Am
>> I misunderstanding something here?
> 
> THANK YOU for pointing this out.
> 
> Preliminary results are *perfect*!
> 
> The filename is correctly parsed both when sent from an IE/Windows 
> machine as well as when sent from the Opera/Linux machine.
> 
> I never wanted the remote path on the user's machine, but IE sends
> it anyway.
> 
> The File:Basename and fileparse() get the correct information every
> time.

Please read the rest of the thread to find out why File::Basename is
*not* an adequate solution to your problem.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: 26 Aug 2004 06:19:11 -0700
From: "Scott  Gilpin" <sgilpin@gmail.com>
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <cgko0f$8ao@odak26.prod.google.com>




ctcgag@hotmail.com wrote:
> "Scott  Gilpin" <sgilpin@gmail.com> wrote:
>
> > Here is the code that I'm using to build up this data structure.
I'm
> > running perl version 5.8.3 on solaris 8 (sparc processor).  The
system
> > is not memory bound or cpu bound -
>
> If it is not memory or cpu bound, then it must be I/O bound.  That is
> pretty much the only other choice, right?  Are you sure it isn't CPU
bound?
> The fact that it is the only thing that runs on that machine doesn't
mean
> it isn't CPU bound.

I don't believe it's CPU bound, because when I monitor the process
using top, the CPU is always about 75% idle.  Also, vmstat doesn't seem
to reveal anything useful - there is no swapping occuring, and there
aren't an abnormal number of context switches.

>
>
> > ## loop to process each row of the original data
> > while(<INDATA>)
> > {
> > chomp($_);
> >
> > ## Each row is delimited with |
> > my @original_row = split(/\|/o,$_);
>
> } #Let us say you end the loop right there.
>   #How long does it take to run now?

Just doing a split and chomp takes about 1/2 the time as the full
processing.  (3 hrs for 100 million rows)

>
> Xho
>
> --
> -------------------- http://NewsReader.Com/ --------------------
> Usenet Newsgroup Service                        $9.95/Month 30GB



------------------------------

Date: 26 Aug 2004 06:26:01 -0700
From: "Scott  Gilpin" <sgilpin@gmail.com>
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <cgkod9$943@odak26.prod.google.com>

Thanks again for your help.  I'll definitely make these improvements
and see how things go.

btw - I'm using groups-beta.google.com to post messages, and it seems
to remove any indentation from messages that I post (hence the poorly
indented code).

-Scott

Anno Siegel wrote:
> Scott Gilpin <sgilpin@gmail.com> wrote in comp.lang.perl.misc:
> > Anno Siegel wrote:
> > > Scott Gilpin <sgilpin@gmail.com> wrote in comp.lang.perl.misc:
>
> [...]
>
> > Thanks for your reply.  I apologize for not being more clear in my
> > original post. I've included the entire code to produce the desired
> > output.  The first 10 lines of the input file are:
> >
> > 928219|7|6|MI|2
> > 928219|9|5|MO|1
> > 928219|11|5|CA|41
> > 928219|8|6|MA|1
> > 928219|5|5|WY|3
> > 701396|10|7|QC|8
> > 701396|17|1|MI|1
> > 928219|0|3|CA|2
> > 701396|13|1|CA|2
> > 928219|1|1|CA|2
> >
> > The header is:
> >
> > col_name|matrix1_rows|matrix2_rows|matrix3_rows|cell_values
> >
> > The source code is:
>
> Yeah, that's much better.
>
> I have added some critical remarks to your code, but to summarize,
> I still don't see why it is slow, and I agree that it shouldn't take
> 6 hours on a modern machine.
>
> To pinpoint the problem, exclude parts of the code from processing.
> How long does it take just to read the file and do nothing with
> the lines?  What if you don't split the lines but use a fixed array
> instead of the line data?  What if you *do* split, but don't make
> the matrix entries?  Etc...  See my suggestion for timing code below
> to facilitate the evaluation.
>
> So:
>
> > #!/usr/local/bin/perl5.8.3
> >
> > use strict;
>
> No warnings?  You should always switch warnings on, especially when
> juggling nested data structures.
>
> > ## The list of matrices actually varies between invocations
> > ## of the program - anywhere from 3 - 15
> > my @matrixList = ("matrix1", "matrix2", "matrix3");
> >
> > ## Position hash has the row positions of the values for each
matrix
> > my %positionHash = (matrix1 => 1, matrix2 => 2, matrix3 => 3);
> >
> > ## Keep track of the columns we've seen so far
> > my %columns_seen = ();
> >
> > ## hash of hashes of hashes - matrix => rows => columns
> > my %matrix_values = ();
> >
> > open (INDATA, "data.txt") ||
> > die "can't open data.txt";
> >
>
> To facilitate run-time evaluation, add this:
>
>     my $linecount = 0;
>     my $time = - times;
>
> > while(<INDATA>) {
>
>     $linecount ++;
>
> Code in {} should be indented!
>
> >
> > chomp($_);
>
> Just
>
>     chomp;
>
> does the same thing.  The way things are you don't need to chomp
> at all (because the last field is numeric and doesn't suffer from
> a trailing "\n"), but I'd leave it in.
>
> >
> > ## Each row is variable width, delimited with |
> > my @original_row = split(/\|/,$_);
> >
> > ## The cell value and the column name are always in the same
> > ## position
> > my $cell_value = $original_row[4];
> > my $col_name = $original_row[0];
>
> Shorter, but essentially no different:
>
>    my ( $col_name, $cell_value) = @original_row[ 0, 4];
>
> Also, the "magic number" 4 is one more than the elements in
> @matrixList, so:
>
>     my ( $col_name, $cell_value) = @original_row[ 0, 1 +
@matrixList];
>
> > ## Add this column name to the list of ones we've seen
> > $columns_seen{$col_name}=1;
> >
> > ##  For each matrix, loop through and increment the
> > ##  row/column value
> > foreach my  $matrix (@matrixList) {
>
> Next level of indentation, please!
>
> > my $row_name = $original_row[$positionHash{$matrix}];
> > $matrix_values{$matrix}{$row_name}{$col_name} += $cell_value;
> > }
> >
> > }   ## end while
>
> Now show the number of lines and the time used:
>
> $time += times;
> warn "$linecount lines read in $time s cpu\n";
>
> > ## The following code runs very quicky compare to the
> > ##  while loop above (2 mins vs. 6 hrs)
> > ## I'm only including it to produce the desired output
>
> Including it is good.  This way we can generate valid results for
> comparison.
>
> > ## Create a header row with column names that is the same
> > ## for all matrices
> > my $header = "";
> >
> >
> > foreach my $col_name (sort keys %columns_seen) {
> > $header = $header . "," . "$col_name";
> > }
>
> A shorter and more standard way, though without the leading "," is
>
>     my $header = join ',', sort keys %columns_seen;
>
> > ## Create a file for each separate matrix
> > foreach my $matrix (@matrixList) {
> >
> > ## Open output file
> > my $OUT_FILE = $matrix . ".csv";
> > open (OUTFILE, ">$OUT_FILE") || die "can't open $OUT_FILE";
> >
> > ## Now we create the first line of file.
> > ## Starting with the matrix name and a comma.
> > ## Then printing out the column names.
> >
> > my $firstline = $matrix . "$header";
>
> Originally there was no need to quote "$header".  Now there is,
> because we must insert a ",":
>
>     my $firstline = $matrix . ",$header";
>
> > print OUTFILE "$firstline\n";
> >
> > ## Loop for each row in the matrix
> > foreach my $row_name (keys(%{$matrix_values{$matrix} } )) {
> >
> > my $line = $row_name;
> >
> > ## Loop for each column in the matrix
> > foreach my $col_name (sort keys %columns_seen) {
>
> Sorting again for each line that is printed is wasteful.  Sort them
> once and for all before you enter any loop.
>
> > my $cell_value;
> > if ($matrix_values{$matrix}{$row_name}{$col_name}) {
> > $cell_value =
> > $matrix_values{$matrix}{$row_name}{$col_name};
> > } else {
> > $cell_value = ".";
> > }
>
> This would more commonly be written
>
>    my $cell_value = $matrix_values{$matrix}{$row_name}{$col_name} ||
'.';
>
> but that is arguably wrong.  What if the cell value is defined, but
> happens to be 0?  Should 0 be replaced by "."?  If not,
>
>     my $cell_value = $matrix_values{$matrix}{$row_name}{$col_name};
>     defined or $_ = '.' for $cell_value;
>
> > $line = $line . ",$cell_value";
> > }
> > print OUTFILE "$line\n";
> > 
> > }
> > close OUTFILE;
> > }
> 
> Anno



------------------------------

Date: 26 Aug 2004 15:42:09 GMT
From: ctcgag@hotmail.com
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <20040826114209.291$cg@newsreader.com>

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote:
>
> Yeah, that's much better.
>
> I have added some critical remarks to your code, but to summarize,
> I still don't see why it is slow, and I agree that it shouldn't take
> 6 hours on a modern machine.

The sample data he posted had [4] as the last index per line, while
the original code (presumably, which is the one that took 6 hours)
had [24] as the index.  Taking that change of size into account, I would
say that 6 hours is a reasonable time for it to take.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: 26 Aug 2004 15:51:57 GMT
From: ctcgag@hotmail.com
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <20040826115157.009$DL@newsreader.com>

"Scott  Gilpin" <sgilpin@gmail.com> wrote:
> ctcgag@hotmail.com wrote:
> > "Scott  Gilpin" <sgilpin@gmail.com> wrote:
> >
> > > Here is the code that I'm using to build up this data structure.
> I'm
> > > running perl version 5.8.3 on solaris 8 (sparc processor).  The
> system
> > > is not memory bound or cpu bound -
> >
> > If it is not memory or cpu bound, then it must be I/O bound.  That is
> > pretty much the only other choice, right?  Are you sure it isn't CPU
> bound?
> > The fact that it is the only thing that runs on that machine doesn't
> mean
> > it isn't CPU bound.
>
> I don't believe it's CPU bound, because when I monitor the process
> using top, the CPU is always about 75% idle.

Is this a 4 processor machine?  If so, then you could probably parallelize
the program to make it about 4 times faster.  If not, then that is strange
and perhaps you really are IO bound, although that is hard to believe.


> >
> >
> > > ## loop to process each row of the original data
> > > while(<INDATA>)
> > > {
> > > chomp($_);
> > >
> > > ## Each row is delimited with |
> > > my @original_row = split(/\|/o,$_);
> >
> > } #Let us say you end the loop right there.
> >   #How long does it take to run now?
>
> Just doing a split and chomp takes about 1/2 the time as the full
> processing.  (3 hrs for 100 million rows)

When just reading and splitting the data takes half the time, that doesn't
bode well for optimization potential.  At that point, I'd recommend rearing
back and looking at the big picture.  How do you get these 100 million row
files in the first place?  Presumably you get a lot of these (because a 6
hour run-time probably isn't a big problem for a one-time deal), so could
you tap directly into the data source that is used to create the files,
rather than using the intermediate files?


Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: 26 Aug 2004 17:09:28 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <cgl5g8$3kc$1@mamenchi.zrz.TU-Berlin.DE>

 <ctcgag@hotmail.com> wrote in comp.lang.perl.misc:
> anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote:
> >
> > Yeah, that's much better.
> >
> > I have added some critical remarks to your code, but to summarize,
> > I still don't see why it is slow, and I agree that it shouldn't take
> > 6 hours on a modern machine.
> 
> The sample data he posted had [4] as the last index per line, while
> the original code (presumably, which is the one that took 6 hours)
> had [24] as the index.  Taking that change of size into account, I would
> say that 6 hours is a reasonable time for it to take.

You may be right.  After I read that just splitting the data takes
half of the time I had second thoughts of my own.

It also means that there is no gross inefficiency in the other half
of the program, so there won't be a dramatic improvement through
some super algorithm.

You mentioned parallelization... that looks easy.  Split the data
in as many parts as you have CPUs and run them in parallel (on one or
more machines).  A program that adds up the resulting matrices wouldn't
be hard (though not trivial).  More importantly, it wouldn't run long.

Then there's C, but the problem seems to involve some essential string
processing, so that's not particularly attractive.  If only those lines
and columns had numbers instead of names.

Anno


------------------------------

Date: Thu, 26 Aug 2004 08:49:32 -0700
From: "Austin P. So (Hae Jin)" <who@what.where>
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <cgl0r9$rov$1@nntp.itservices.ubc.ca>

Scott Gilpin wrote:
> ctcgag@hotmail.com wrote:
> 
>>"Scott  Gilpin" <sgilpin@gmail.com> wrote:

>>>Here is the code that I'm using to build up this data structure.
> I'm
>>>running perl version 5.8.3 on solaris 8 (sparc processor).  The
> system
>>>is not memory bound or cpu bound -

Hmmm...you have 100 million lines of text that you are reading into a 
big honking hash, and you say that you *aren't* memory bound? If I'm not 
mistaken, UNIX is not technically "memory bound" because it is able to 
utilize the hard drive as virtual RAM, but when that spillover happens. 
it is more demanding on the CPU and more importantly the speed of your 
hard drive. Plus the fact that you are creating a big honking hash that 
I think would crash most systems anyway...

Just out of curiosity, do you know the hardware specs of the machine 
that you are running this on? Do you also know the "niceness" of your 
account that you are using to run this script?

Austin

P.S.



------------------------------

Date: 26 Aug 2004 16:11:35 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <cgl23n$12q$1@mamenchi.zrz.TU-Berlin.DE>

Austin P. So (Hae Jin) <who@what.where> wrote in comp.lang.perl.misc:
> Scott Gilpin wrote:
> > ctcgag@hotmail.com wrote:
> > 
> >>"Scott  Gilpin" <sgilpin@gmail.com> wrote:
> 
> >>>Here is the code that I'm using to build up this data structure.
> > I'm
> >>>running perl version 5.8.3 on solaris 8 (sparc processor).  The
> > system
> >>>is not memory bound or cpu bound -
> 
> Hmmm...you have 100 million lines of text that you are reading into a 
> big honking hash, and you say that you *aren't* memory bound? If I'm not 

Look closer.  The hash summarizes the info, it doesn't grow all that
big.

Anno


------------------------------

Date: 26 Aug 2004 10:10:58 -0700
From: skendric@fhcrc.org (Stuart Kendrick)
Subject: printing code references
Message-Id: <62dbf7f1.0408260910.41dca2a8@posting.google.com>

how do i print the name of the subroutine to which a code ref points?

#!/opt/vdops/bin/perl
use strict;
use warnings;

my $ref;
$ref = \&foo;
print "&$ref\n";

sub foo {
  # Do nothing
}

guru% ./test
&CODE(0x815bed0)
guru%


I would like to see "&foo" instead of "&CODE(0x815bed0)".

--sk

stuart kendrick
fhcrc


------------------------------

Date: 26 Aug 2004 17:55:26 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: printing code references
Message-Id: <cgl86e$4lj$2@mamenchi.zrz.TU-Berlin.DE>

Stuart Kendrick <skendric@fhcrc.org> wrote in comp.lang.perl.misc:
> how do i print the name of the subroutine to which a code ref points?
> 
> #!/opt/vdops/bin/perl
> use strict;
> use warnings;
> 
> my $ref;
> $ref = \&foo;
> print "&$ref\n";
> 
> sub foo {
>   # Do nothing
> }
> 
> guru% ./test
> &CODE(0x815bed0)
> guru%
> 
> 
> I would like to see "&foo" instead of "&CODE(0x815bed0)".

You can't.  A sub can be anonymous, it can also be reachable through
more than one name.  In the first case, there simply is no name, in
the second, which one should it print?

When a sub is called, the name though which it has been called is
available through the caller() function, but that's doesn't help
with a static coderef.  Short of a comprehensive search through
all packages, there is no way to determine if a coderef has names.
(It can be done, and since it can be done, there must be a module...)

Anno


------------------------------

Date: 26 Aug 2004 09:04:11 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Resizing JPG images with Perl?
Message-Id: <86d61drimc.fsf@blue.stonehenge.com>

*** post for FREE via your newsreader at post.newsfeed.com ***

>>>>> "Mark" == Mark  <nospam@thanksanyway.org> writes:

Mark> For example, I just downloaded PerlMagick and several ImageMagick modules,
Mark> and I am attempting to build PerlMagick using Microsoft C++.
Mark> Once again, it fails (compiler output below.) Any suggestions?

Oww.  See my rant about ImageMagick and PerlMagick at
<http://www.stonehenge.com/merlyn/LinuxMag/col33.html>:

  GD's font capabilities never really impressed me, so I turned (with
  some great hesitation) to the ImageMagick library, and its
  PerlMagick binding. I believe I know where the ``Magick'' part of
  the library gets its name: when you finally figure out how to do
  what you want, it appears to be ``magick'', since the documentation,
  well, is mostly absent.

  So, after invoking convert some 200 to 300 times, slowly varying the
  parameters, and trying new things, and then trying to figure out how
  to convert that to the Perl bindings, again invoking it some 200
  times or so (I'm not joking about these numbers, and I wish I was),
  I've come up with [listing one, below].

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


 -----= Posted via Newsfeed.Com, Uncensored Usenet News =-----
http://www.newsfeed.com - The #1 Newsgroup Service in the World!
-----== 100,000 Groups! - 19 Servers! - Unlimited Download! =-----
                  


------------------------------

Date: Thu, 26 Aug 2004 13:51:33 -0400
From: Lou Moran <>
Subject: Substitution (regex?)
Message-Id: <m46si05g0inuvkbv9mde7iq5eihs4dvef1@4ax.com>

I have an iTunes XML file that contains the following line(s):

<key>Album</key><string>1963 - The Lightning Fingers of Roy
Clark</string>

the goal is to make the line read:

<key>Album</key><string>The Lightning Fingers of Roy Clark</string>

I have been trying to do this with s/// in the vein of:

s/<key>Album</key><string>(\d+)(\s)-(\s)/<key>Album</key><string>/

But of course that is hoplessly broken and chokes on:

syntax error at C:\Lou\Code\yeardel.pl line 6, near "key>"

I'm not exactly sure how to tackle this... in psuedo-code I imagine
something like

s/<key>Album</key><string>nnnn - /<key>Album</key><string>/

Which I envision s/// simply deleting "1963 - "

Better ideas, man/web pages to check, flames?  


------------------------------

Date: Thu, 26 Aug 2004 18:04:47 GMT
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Substitution (regex?)
Message-Id: <3DpXc.63127$3O2.31999@trndny07>

<Lou Moran> wrote in message news:m46si05g0inuvkbv9mde7iq5eihs4dvef1@4ax.com...
> I have an iTunes XML file that contains the following line(s):
>
> <key>Album</key><string>1963 - The Lightning Fingers of Roy
> Clark</string>
>
> the goal is to make the line read:
>
> <key>Album</key><string>The Lightning Fingers of Roy Clark</string>
>
> I have been trying to do this with s/// in the vein of:
>
> s/<key>Album</key><string>(\d+)(\s)-(\s)/<key>Album</key><string>/
Why are you putting ( ) in the regexp if you're not using the sub-matches later?
They do nothing but add clutter.

>
> But of course that is hoplessly broken and chokes on:
>
> syntax error at C:\Lou\Code\yeardel.pl line 6, near "key>"

This error message is important.  You have a syntax error next to "key>".  You
have that syntax error because you're using the patter match's delimiter within
the pattern match.  If you actually have to do that, you need to backslash the
delimiter:

s/<key>Album<\/key> ... / ... /;

Alternatively, just use a different delimiter.  You can choose any
non-alpha-numeric character to delimit the pattern match:

s!<key>Album</key> ... !  ... !;

This is all still pretty messy though. I would suggest simply getting rid of the
part of the string you don't want, instead of trying to match the entire string
word for word:

s/\d\s*-\s*//;

This may or may not work for you, if that pattern could occasionally match
something you don't want to eliminate.  Even if you need to specify the pattern
completely, you certainly don't need to specify it twice:

s!(<key>Album</key><string>)\d+\s*-\s*/$1/;

Hope this helps,
Paul Lalli




------------------------------

Date: 26 Aug 2004 07:28:49 -0700
From: genericax@hotmail.com (Sara)
Subject: There IS a way to test for a file lock
Message-Id: <776e0325.0408260628.607bc4ab@posting.google.com>

After querying this group, perldoc, Camel, and my local Perl
associates, I came up empty on how to deterimine if a file is locked (
with flock() ). Most of the posts in CLPM suggest where is no way to
do it. Happily though, there is. Perhaps this post will prevent others
from spending 2-3 hours going down the same road.

Looking at the Camel v3 doc for "flock", it states it returns 1 for
success 0 otherwise. If you're using Camel V2 however, there is no
discussion about returned value from flock(), so upgrade :)

The PROBLEM is, if one uses:

  flock(LOCK, LOCK_EX)

on a locked file, it's blocked until its released. So it's not really
a test since we can't evaluate (blocked indefinitely), particularly
when the lock may be perpetual. A third param, an optional timeout,
would be

The trick is use this:

 flock(LOCK, LOCK_EX|LOCK_NB)

which produces exactly what we're after- 0 for a locked file, and 1
for a non-locked file (with the side-effect of actually LOCKING the
file, so if you ONLY wanted to test, you'll need to release the lock).

I hope this assists some with a test for file locking. It would be
nice if stat returned a boolean for file lock, but it doesn't.

All append the usual comments here to save the naysayers the trouble.
Good day!


*****************************************************************************


This is stupid - why post it?

Read Camel idiot!

Did you check perldoc?

This will never work.

There is an easier way using a simple 10 line program with eval and
alarms.



*******************************************************************************


G


------------------------------

Date: Thu, 26 Aug 2004 10:07:28 -0500
From: Ron Parker <ron.parker@povray.org>
Subject: Re: There IS a way to test for a file lock
Message-Id: <slrncirv5g.7cm.ron.parker@mail.parkrrrr.com>

On 26 Aug 2004 07:28:49 -0700, Sara wrote:
> All append the usual comments here to save the naysayers the trouble.

You missed one: "'All' and 'I'll' may be pronounced the same way in some
dialects, but they're never spelled the same."

-- 
#macro R(L P)sphere{L F}cylinder{L P F}#end#macro P(V)merge{R(z+a z)R(-z a-z)R(a
-z-z-z a+z)torus{1F clipped_by{plane{a 0}}}translate V}#end#macro Z(a F T)merge{
P(z+a)P(z-a)R(-z-z-x a)pigment{rgbt 1}hollow interior{media{emission T}}finish{
reflection.1}}#end Z(-x-x.2y)Z(-x-x.4x)camera{location z*-10rotate x*90}


------------------------------

Date: 26 Aug 2004 09:19:34 -0700
From: leifwessman@hotmail.com
Subject: using the result of a variable regular expression
Message-Id: <cgl2im$tru@odak26.prod.google.com>


Hi!

I need to extract a certain value from a text. But the result isn't
always in the variable $1 - it might be in $2, $3, $4 or some other
predefined variable.

Some code to illustrate my problem:

$regexp = "(\d)(\w)(\d)";
$numb   = 3;                # Means the result I'm looking for is in $3
# I don't know this number, it's submitted
by user
# and may differ

if ($data =~ /$regexp/) {

print $numb; # does not work, prints "3"

# alternative solution that works
# but it's UGLY
if ($numb == 1) {
print $1;
} elsif ($numb == 2) {
print $2;
} elsif ($numb == 3) {
print $3;
}

   # is there another way?
}

Thanks for any input!

Leif



------------------------------

Date: Thu, 26 Aug 2004 17:47:28 +0100
From: Brian McCauley <nobull@mail.com>
Subject: Re: using the result of a variable regular expression
Message-Id: <cgl3v3$khn$1@sun3.bham.ac.uk>



leifwessman@hotmail.com wrote:

> I need to extract a certain value from a text. But the result isn't
> always in the variable $1 - it might be in $2, $3, $4 or some other
> predefined variable.
> 
> Some code to illustrate my problem:
> 
> $regexp = "(\d)(\w)(\d)";
> $numb   = 3;                # Means the result I'm looking for is in $3
> # I don't know this number, it's submitted
> by user
> # and may differ
> 
> if ($data =~ /$regexp/) {
> 
> print $numb; # does not work, prints "3"

What you are trying to do is use something called a symbolic ref:

   print $$numb; # works - print value of $3

But you have to be careful using symrefs...

{
   # Untaint and check $numb - don't clobber $1 etc
   die 'Not a number' unless do { ($numb) = $numb =~ /(^\d+$)/ };
   no strict 'refs';
   print $$numb;
}

That said I wouldn't use one myself because I never use $1 etc (other 
than in the RHS of s/// or in while(//g).

if (my @captures = $data =~ /$regexp/) {
   print $captures[$numb-1];
}



------------------------------

Date: 26 Aug 2004 09:56:00 -0700
From: "leifwessman@hotmail.com" <leifwessman@hotmail.com>
Subject: Re: using the result of a variable regular expression
Message-Id: <cgl4n0$2hd@odak26.prod.google.com>


is it possible to use eval to solve my problem?

eval "print \$$numb";

(or something like that?)


leifwessman@hotmail.com wrote:
> Hi!
>
> I need to extract a certain value from a text. But the result isn't
> always in the variable $1 - it might be in $2, $3, $4 or some other
> predefined variable.
>
> Some code to illustrate my problem:
>
> $regexp = "(\d)(\w)(\d)";
> $numb   = 3;                # Means the result I'm looking for is in
$3
> # I don't know this number, it's submitted
> by user
> # and may differ
>
> if ($data =~ /$regexp/) {
>
> print $numb; # does not work, prints "3"
>
> # alternative solution that works
> # but it's UGLY
> if ($numb == 1) {
> print $1;
> } elsif ($numb == 2) {
> print $2;
> } elsif ($numb == 3) {
> print $3;
> }
>
>    # is there another way?
> }
> 
> Thanks for any input!
> 
> Leif



------------------------------

Date: Thu, 26 Aug 2004 18:59:47 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: using the result of a variable regular expression
Message-Id: <2p6jo5Fho1kiU1@uni-berlin.de>

leifwessman@hotmail.com wrote:
> I need to extract a certain value from a text. But the result isn't
> always in the variable $1 - it might be in $2, $3, $4 or some other
> predefined variable.
> 
> Some code to illustrate my problem:

Your problem starts before that code: You have not enabled strictures 
and warnings!

     use strict;
     use warnings;

> $regexp = "(\d)(\w)(\d)";

There is your second problem: $regex get the value '(d)(w)(d)', which 
is not what you want.

     my $regexp = '(\d)(\w)(\d)';
-----------------^------------^

1) Please copy and paste code that you post, do not retype it!

2) Warnings would have told you that something was wrong.

> $numb   = 3;                # Means the result I'm looking for is in $3
> # I don't know this number, it's submitted by user
> # and may differ
> 
> if ($data =~ /$regexp/) {
> 
> print $numb; # does not work, prints "3"
> 
> # alternative solution that works
> # but it's UGLY
> if ($numb == 1) {
> print $1;
> } elsif ($numb == 2) {
> print $2;
> } elsif ($numb == 3) {
> print $3;
> }
> 
>    # is there another way?
> }

You can do:

     if ( my @capt = $data =~ /$regexp/ ) {
         print $capt[$numb-1];
     }

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Thu, 26 Aug 2004 17:13:13 GMT
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: using the result of a variable regular expression
Message-Id: <JSoXc.20191$rT1.1196@trndny02>

<leifwessman@hotmail.com> wrote in message
news:cgl4n0$2hd@odak26.prod.google.com...
> leifwessman@hotmail.com wrote:
> > I need to extract a certain value from a text. But the result isn't
> > always in the variable $1 - it might be in $2, $3, $4 or some other
> > predefined variable.
> is it possible to use eval to solve my problem?
>
> eval "print \$$numb";
>
> (or something like that?)

It's possible, yes, but it's a drastically bad idea.  What you're doing is using
Symbolic References, or SymRefs.  Search this group's archives for those terms
for a host of messages telling you why you shouldn't do this.  Or read:
perldoc -q 'variable name'
for the more official explanation.

Instead, I would suggest you take advantage of the fact that a pattern match
called in a list context returns a list of all submatches (ie, all the $1, $2,
$3, etc variables):

chomp (my $num = <STDIN>);
$regexp = "(\d)(\w)(\d)";
if (my @matches = $data =~ /$regexp/){
    print "Your submatch is: $matches[$num-1]\n";
}

When that script executes, all of the $1, $2, $3, etc variables will be stored
in @matches.  You then pull from @matches whichever one you wanted (the -1 is to
account for 0-based array indices, of course).

Hope this helps,
Paul Lalli




------------------------------

Date: 26 Aug 2004 10:53:29 -0700
From: "leifwessman@hotmail.com" <leifwessman@hotmail.com>
Subject: Re: using the result of a variable regular expression
Message-Id: <cgl82p$9e7@odak26.prod.google.com>


Thanks a lot! I didn't know that. It should say in the perlfaq.

However, I ended up in a new problem. I want to match in a loop:

$regexp = "(\d)(\w)(\d)";
while ($data =~ /$regexp/gs){
print "Your submatch is: $1\n";
}

this works fine. However...

chomp (my $num = <STDIN>);
$regexp = "(\d)(\w)(\d)";
while (my @matches = $data =~ /$regexp/gs){
print "Your submatch is: $matches[$num-1]\n";
}

 ..becomes an infinite loop. I can't figure out why...

Leif

Paul Lalli wrote:
> <leifwessman@hotmail.com> wrote in message
> news:cgl4n0$2hd@odak26.prod.google.com...
> > leifwessman@hotmail.com wrote:
> > > I need to extract a certain value from a text. But the result
isn't
> > > always in the variable $1 - it might be in $2, $3, $4 or some
other
> > > predefined variable.
> > is it possible to use eval to solve my problem?
> >
> > eval "print \$$numb";
> >
> > (or something like that?)
>
> It's possible, yes, but it's a drastically bad idea.  What you're
doing is using
> Symbolic References, or SymRefs.  Search this group's archives for
those terms
> for a host of messages telling you why you shouldn't do this.  Or
read:
> perldoc -q 'variable name'
> for the more official explanation.
>
> Instead, I would suggest you take advantage of the fact that a
pattern match
> called in a list context returns a list of all submatches (ie, all
the $1, $2,
> $3, etc variables):
>
> chomp (my $num = <STDIN>);
> $regexp = "(\d)(\w)(\d)";
> if (my @matches = $data =~ /$regexp/){
>     print "Your submatch is: $matches[$num-1]\n";
> }
>
> When that script executes, all of the $1, $2, $3, etc variables will
be stored
> in @matches.  You then pull from @matches whichever one you wanted
(the -1 is to
> account for 0-based array indices, of course).
> 
> Hope this helps,
> Paul Lalli



------------------------------

Date: 26 Aug 2004 17:57:21 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: using the result of a variable regular expression
Message-Id: <cgl8a1$4lj$3@mamenchi.zrz.TU-Berlin.DE>

Gunnar Hjalmarsson  <noreply@gunnar.cc> wrote in comp.lang.perl.misc:
> leifwessman@hotmail.com wrote:

> > I need to extract a certain value from a text. But the result isn't
> > always in the variable $1 - it might be in $2, $3, $4 or some other
> > predefined variable.

> You can do:
> 
>      if ( my @capt = $data =~ /$regexp/ ) {
>          print $capt[$numb-1];
>      }

Or, without an auxiliary variable:

    defined and print for ( $data =~ /$regex/ )[ $numb - 1];

Anno


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6916
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[24763] in Perl-Users-Digest

Perl-Users Digest, Issue: 6916 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Thu Aug 26 14:11:09 2004

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 26 14:11:09 2004