[30118] in Perl-Users-Digest
Perl-Users Digest, Issue: 1361 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Mar 14 00:09:42 2008
Date: Thu, 13 Mar 2008 21:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 13 Mar 2008 Volume: 11 Number: 1361
Today's topics:
Re: command output <tadmc@seesig.invalid>
Re: comparing a 2D array <rose@russ.org>
Re: comparing a 2D array <tadmc@seesig.invalid>
Re: comparing a 2D array <rose@russ.org>
Re: Faster file iteration <vijay@iavian.com>
Re: m// on very long lines leaks memory <sjackman@gmail.com>
Re: m// on very long lines leaks memory <uri@stemsystems.com>
Re: Matching multiple subexpressions in a regular expre <sjackman@gmail.com>
Re: Matching multiple subexpressions in a regular expre <uri@stemsystems.com>
Re: My editfile.pl script <tadmc@seesig.invalid>
Re: My editfile.pl script <keeling@nucleus.com>
Re: My editfile.pl script <uri@stemsystems.com>
Re: My editfile.pl script <tadmc@seesig.invalid>
Re: My editfile.pl script <ignoramus17007@NOSPAM.17007.invalid>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 13 Mar 2008 22:49:40 GMT
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: command output
Message-Id: <slrnftjacf.78e.tadmc@tadmc30.sbcglobal.net>
jammer <jameslockie@mail.com> wrote:
> There is no output from this:
> `/usr/bin/ls -l "$backupDir/$backupName"`;
That is because there are no output statements there.
> This shows correct values:
> print "/usr/bin/ls -l $backupDir/$backupName";
That's nice.
Did you mean to ask a question?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Fri, 14 Mar 2008 09:41:22 +0800
From: "Rose" <rose@russ.org>
Subject: Re: comparing a 2D array
Message-Id: <frcl46$50q$1@ijustice.itsc.cuhk.edu.hk>
After taking your advice, the codes to compare 2 files have been modified
(the followings), the problem is that I am unable to attach a tag to a
particular row. I once try using
for $aref2 (@row2) {
$f1m[$aref2] = 0;
}
instead of $f1m[$i++] = 0; but this does not help because I find have to
label rows from both files in order to print out two lists: one for the same
and the other for the difference. Is there any good way to achieve that?
#!/usr/bin/perl
#use warnings;
$usage='prog outname1.xls outname2.xls';
die "Usage: $usage\n" if $#ARGV < 1;
$outname = $ARGV[2];
$file1 = $ARGV[0];
$file2 = $ARGV[1];
$cmpsout = $outname . ".cmpofcmpsame.xls";
$cmpdout = $outname . ".cmpofcmpdiff.xls";
open(FP1, $file1);
open(FP2, $file2);
open(CMPS, ">$cmpsout");
open(CMPD, ">$cmpdout");
$line = <FP1>; #get header
while ($line ne "") {
$line = <FP1>;
push @row1, [ split(/[\t ]+/, $line) ];
}
$line = <FP2>; #get header
while ($line ne "") {
$line = <FP2>;
push @row2, [ split(/[\t ]+/, $line) ];
}
$i=0;
for $aref2 (@row2) {
$f1m[$i++] = 0;
}
print CMPD "$file1 statistics\n";
for $aref1 (@row1) {
for $aref2 (@row2) {
$match = 1;
for ($i = 0; $i < @$aref1; $i++) {
if ($aref1->[$i] ne $aref2->[$i]) {
$match = 0;
last;
}
}
if ($match == 1) {
print CMPS "@$aref1";
$flm[$aref2]=1;
} else {
print CMPD "@$aref1";
}
}
}
print CMPD "\n$file2 statistics\n";
$i=0;
for $aref2 (@row2) {
if ($f1m[$i] == 0) {
print CMPD "@$aref2";
}
$i++;
}
#########
File 1:
a1 a2 a3
1 AAAAA 99
0 TCCT 88
99 what -888
File 2:
a1 a2 a3
1 AAAAA 99
0 TCCT 88
-10 TT 88
99 what 888
------------------------------
Date: Thu, 13 Mar 2008 21:55:56 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: comparing a 2D array
Message-Id: <slrnftjq9s.8u9.tadmc@tadmc30.sbcglobal.net>
Rose <rose@russ.org> wrote:
> After taking your advice, the codes to compare 2 files have been modified
> (the followings), the problem is that I am unable to attach a tag to a
> particular row. I once try using
>
> for $aref2 (@row2) {
> $f1m[$aref2] = 0;
> }
>
> instead of $f1m[$i++] = 0; but this does not help because I find have to
> label rows from both files in order to print out two lists: one for the same
> and the other for the difference. Is there any good way to achieve that?
>
>
> #!/usr/bin/perl
>
> #use warnings;
You lose the benfits of warnings when you comment out the
use warnings pragma.
You should also be using the
use strict;
pragma as well.
> $usage='prog outname1.xls outname2.xls';
> die "Usage: $usage\n" if $#ARGV < 1;
If you want to require exactly two command line arguments, then
die "Usage: $usage\n" unless @ARGV == 2;
says it much more clearly.
> $outname = $ARGV[2];
> $file1 = $ARGV[0];
> $file2 = $ARGV[1];
Err, your Usage message says that it should be called with 2 arguments,
so why are you expecting 3 arguments?
If your usage message is wrong, and you really do want those 3
variables loaded from the command line arguments, then
my($file1, $file2, $outname) = @ARGV;
will do that.
> $cmpsout = $outname . ".cmpofcmpsame.xls";
> $cmpdout = $outname . ".cmpofcmpdiff.xls";
Using variable names that differ by a single interior character is
a very bad idea.
Interpolation is nearly always easier to read and understand
than explicit concatenations.
my $same_out = "$outname.cmpofcmpsame.xls";
my $diff_out = "$outname.cmpofcmpdiff.xls";
> open(FP1, $file1);
You should always, yes *always*, check the return value from open():
open(FP1, $file1) or die "could not open '$file1' $!";
Even better, use a lexical filehandle and the 3-arg form of open():
open( my $FP1, '<', $file1) or die "could not open '$file1' $!";
> open(CMPS, ">$cmpsout");
open my $SAME, '>', $same_out or die "could not open '$same_out' $!";
> $line = <FP1>; #get header
If you plan to discard the line anyway, then there is not much
point in saving it in a variable.
<FP1>; # discard header
> while ($line ne "") {
> $line = <FP1>;
That is a truly horrid way of reading lines from a file.
while ( $line = <FP1> ) }
even better, with a lexical filehandle as above:
while ( $line = <$FP1> ) }
> for ($i = 0; $i < @$aref1; $i++) {
foreach my $i ( 0 .. $#$aref1 ) { # less error-prone, does the same thing
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Fri, 14 Mar 2008 11:22:35 +0800
From: "Rose" <rose@russ.org>
Subject: Re: comparing a 2D array
Message-Id: <frcr1v$7id$1@ijustice.itsc.cuhk.edu.hk>
"Tad J McClellan" <tadmc@seesig.invalid> wrote in message
news:slrnftjq9s.8u9.tadmc@tadmc30.sbcglobal.net...
> Rose <rose@russ.org> wrote:
>> After taking your advice, the codes to compare 2 files have been modified
>> (the followings), the problem is that I am unable to attach a tag to a
>> particular row. I once try using
>>
>> for $aref2 (@row2) {
>> $f1m[$aref2] = 0;
>> }
>>
>> instead of $f1m[$i++] = 0; but this does not help because I find have to
>> label rows from both files in order to print out two lists: one for the
>> same
>> and the other for the difference. Is there any good way to achieve that?
>>
>>
>> #!/usr/bin/perl
>>
>> #use warnings;
>
>
> You lose the benfits of warnings when you comment out the
> use warnings pragma.
>
> You should also be using the
>
> use strict;
>
> pragma as well.
>
>
>> $usage='prog outname1.xls outname2.xls';
>> die "Usage: $usage\n" if $#ARGV < 1;
>
>
> If you want to require exactly two command line arguments, then
>
> die "Usage: $usage\n" unless @ARGV == 2;
>
> says it much more clearly.
>
>
>> $outname = $ARGV[2];
>> $file1 = $ARGV[0];
>> $file2 = $ARGV[1];
>
>
> Err, your Usage message says that it should be called with 2 arguments,
> so why are you expecting 3 arguments?
>
> If your usage message is wrong, and you really do want those 3
> variables loaded from the command line arguments, then
>
> my($file1, $file2, $outname) = @ARGV;
>
> will do that.
>
>
>> $cmpsout = $outname . ".cmpofcmpsame.xls";
>> $cmpdout = $outname . ".cmpofcmpdiff.xls";
>
>
> Using variable names that differ by a single interior character is
> a very bad idea.
>
> Interpolation is nearly always easier to read and understand
> than explicit concatenations.
>
> my $same_out = "$outname.cmpofcmpsame.xls";
> my $diff_out = "$outname.cmpofcmpdiff.xls";
>
>
>> open(FP1, $file1);
>
>
> You should always, yes *always*, check the return value from open():
>
> open(FP1, $file1) or die "could not open '$file1' $!";
>
> Even better, use a lexical filehandle and the 3-arg form of open():
>
> open( my $FP1, '<', $file1) or die "could not open '$file1' $!";
>
>
>> open(CMPS, ">$cmpsout");
>
> open my $SAME, '>', $same_out or die "could not open '$same_out' $!";
>
>
>> $line = <FP1>; #get header
>
> If you plan to discard the line anyway, then there is not much
> point in saving it in a variable.
>
> <FP1>; # discard header
>
>
>> while ($line ne "") {
>> $line = <FP1>;
>
>
> That is a truly horrid way of reading lines from a file.
>
> while ( $line = <FP1> ) }
>
> even better, with a lexical filehandle as above:
>
> while ( $line = <$FP1> ) }
>
>
>> for ($i = 0; $i < @$aref1; $i++) {
>
>
> foreach my $i ( 0 .. $#$aref1 ) { # less error-prone, does the same
> thing
>
>
> --
> Tad McClellan
> email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
Dear Mr. Tad McClellan,
Thanks a lot for pointing out all the problems I have! ;)
------------------------------
Date: Thu, 13 Mar 2008 19:43:30 -0700 (PDT)
From: "vijay@iavian.com" <vijay@iavian.com>
Subject: Re: Faster file iteration
Message-Id: <db30a3bd-87d8-4fdc-a064-315a18041646@e25g2000prg.googlegroups.com>
On Mar 13, 9:23 pm, J=FCrgen Exner <jurge...@hotmail.com> wrote:
> Give us a spec and maybe someone will be able to come up with a better
> algorithm.
the specs
We have two files. The first file,say 'one.txt', has data arranged in
three columns, separated by semicolon. something like this:
1234567;7654321;20080225
1234765;5464354;19821111
342312A;5464354;19990101
ABC12;9876544;0
I002222;ACD222;19991130
=2E........
Note that the three columns are not of fixed length. The first two
columns are of a maximum length 7 and can contain alpha-numerals.
The third column is the date column (in YYYYMMDD format). It can also
contain '0' or can be empty too.
The second file,say 'two.txt', also has three columns separated by
spaces, something like:
serialno fileno date
123 1234567 20080315
2 2233442 20081130
311 1232231 20031221
44 1232123 19990831
23 2131312 20000101
132 5464354 19811111
=2E.....
The enitre file contains only numerals, from second line onwards. The
first column length ranges from 1-3 numbers. Second column strictly is
of 7 number length. Third column is the date column strictly in
YYYYMMDD format.
Now, the requirement would be to add two additional columns in
'two.txt'. The fourth and fifth columns will be tab separated and
labeled 'label4' and 'label5' respectively.
The values to be populated under 'label4' should be computed as
follows:
Read the 7-digit number present in the second column (under fileno) of
'two.txt'. Compare the number with the alpha-numeric value present in
the second column of the 'one.txt' file. on finding a perfect match,
trigger a counter. Repeat the previous procedure for subsequent lines
and increment the counter each time you find a match. The fourth
column should then be populated with the final value in teh counter
against the fileno, which is the number of exact matches you've found.
If you've found no match, then just populate the entry with a
'0' (zero). But, there is one condition which you need to take care of
before populating-the date difference in each row should be less than
or equal to 5yrs. to do this, you need to pick up the corresponding
date from next to that fileno in 'two.txt'and also pick up the date
next from the thrid column in 'one.txt', and take a diff. If the
difference is more than 5 yrs, do not increment the counter. *NOTE:
the date in file 'one.txt' is always greater(or later) than the
corresponding date in 'two.txt'. The date ranges from 19900101 to
20041231 in file 'two.txt' and from 19750101 to 20011225in file
'one.txt'
In the above example, the new 'two.txt' will look something like
serialnumber fileno date label4
123 1234567 20080315 0
2 2233442 20081130 0
311 1232231 20031221 0
44 1232123 19990831 0
23 2131312 20000101 0
132 5464354 19811111 1
*Label5: We know that the date in 'one.txt' ends on 12/25/2001. For
every matched file number in 'two.txt', pls do the following:
1. if the date in 'two.txt' is less than 12/25/2001, by 5 yrs or more,
mark as 5 yrs.
2. if its between 12/25/2001 and 12/25/1996 mark the exact number in
terms of number of years,months and days.
3. if its more than 12/25/2001 and till 31/12/2004, mark the exact
number of years,months and days, but put a '-' (minus sign) in front
of it.
------------------------------
Date: Thu, 13 Mar 2008 15:51:50 -0700 (PDT)
From: ShaunJ <sjackman@gmail.com>
Subject: Re: m// on very long lines leaks memory
Message-Id: <7416318f-11a7-4f0f-8f59-e4251c0f0bc3@i12g2000prf.googlegroups.com>
On Mar 13, 2:53 pm, "John W. Krahn" <some...@example.com> wrote:
...
> > I tested it and if I remove the English module it works fine.
> > (So don't use English.pm!)
>
> Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:
>
> use English qw( -no_match_vars );
Wow, thanks! If I use either English.pm or $& (even without
English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
10.4.11). If I use neither English.pm or $& it works fine.
If I use Perl 5.10.0 built from source it works for every case.
Cheers,
Shaun
------------------------------
Date: Thu, 13 Mar 2008 23:24:05 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: m// on very long lines leaks memory
Message-Id: <x71w6exl62.fsf@mail.sysarch.com>
>>>>> "S" == ShaunJ <sjackman@gmail.com> writes:
S> On Mar 13, 2:53 pm, "John W. Krahn" <some...@example.com> wrote:
S> ...
>> > I tested it and if I remove the English module it works fine.
>> > (So don't use English.pm!)
>>
>> Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:
>>
>> use English qw( -no_match_vars );
S> Wow, thanks! If I use either English.pm or $& (even without
S> English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
S> 10.4.11). If I use neither English.pm or $& it works fine.
i was going to mention that but didn't want to get into this thread. $&
(which is used in english.pm without that option) is a known memory hog
(not a leak). since $& is global it must copy the entire match string
for each regex in case it might be used later anywhere in the
program. this is a well known issue and you should google for more about
it or find the points in perldoc perlvar.
S> If I use Perl 5.10.0 built from source it works for every case.
they seem to have fixed this problem (partially from what i heard but i
could be wrong) in 5.10. i still recommend never using $& and no one who
knows perl uses english.pm.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Architecture, Development, Training, Support, Code Review ------
----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Thu, 13 Mar 2008 15:57:19 -0700 (PDT)
From: ShaunJ <sjackman@gmail.com>
Subject: Re: Matching multiple subexpressions in a regular expression
Message-Id: <750606d7-c18f-40f1-b93a-403ce684c68e@e23g2000prf.googlegroups.com>
On Mar 13, 2:28 pm, xhos...@gmail.com wrote:
> ShaunJ <sjack...@gmail.com> wrote:
> > Hi John,
>
> > If I structure my program as in the example, using many small regex
> > instead of one big regex, Perl 5.8.6 runs out of memory and dies:
> > vm_allocate failed, Out of memory! I have 400'000 regex of exactly 27
> > characters each, and the input string is one line 100 kB long. The
> > machine has 2 GB of memory and free disk space, which should be
> > enough, so I presume the code is somehow leeking memory. It's only a
> > dozen or so lines long, so I've posted my code below. Can you see an
> > obvious leak?
>
> > Thanks,
> > Shaun
>
> > my @restrings = <REFILE>;
> > my @re = map { qr/$_/x } @restrings;
> > while (<>) {
>
> ...
>
> Can you produce a version that we can run? (I.e. that doesn't
> depend on REFILE or STDIN, which we don't have access to?)
Yes, see the recent thread 'm// on very long lines leaks memory',
where I gave a small test case. As it turns out, there is a bug in
Perl 5.8.6 (which is shipped with MacOSX 10.4.11 incidentally). Using
either English or $& causes the memory leak. This bug is fixed in
5.10.0.
Cheers,
Shaun
------------------------------
Date: Thu, 13 Mar 2008 23:25:58 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Matching multiple subexpressions in a regular expression
Message-Id: <x7wso6w6ig.fsf@mail.sysarch.com>
>>>>> "S" == ShaunJ <sjackman@gmail.com> writes:
S> On Mar 13, 2:28 pm, xhos...@gmail.com wrote:
>> ShaunJ <sjack...@gmail.com> wrote:
>> > Hi John,
>>
>> > If I structure my program as in the example, using many small regex
>> > instead of one big regex, Perl 5.8.6 runs out of memory and dies:
>> > vm_allocate failed, Out of memory! I have 400'000 regex of exactly 27
>> > characters each, and the input string is one line 100 kB long. The
>> > machine has 2 GB of memory and free disk space, which should be
>> > enough, so I presume the code is somehow leeking memory. It's only a
>> > dozen or so lines long, so I've posted my code below. Can you see an
>> > obvious leak?
>>
>> > Thanks,
>> > Shaun
>>
>> > my @restrings = <REFILE>;
>> > my @re = map { qr/$_/x } @restrings;
>> > while (<>) {
>>
>> ...
>>
>> Can you produce a version that we can run? (I.e. that doesn't
>> depend on REFILE or STDIN, which we don't have access to?)
S> Yes, see the recent thread 'm// on very long lines leaks memory',
S> where I gave a small test case. As it turns out, there is a bug in
S> Perl 5.8.6 (which is shipped with MacOSX 10.4.11 incidentally). Using
S> either English or $& causes the memory leak. This bug is fixed in
S> 5.10.0.
it is not a leak (as someone else proved in another post). so don't go
blabbing that it is a leak. it is a well known ram suck but it doesn't
lose the ram like a true leak does.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Architecture, Development, Training, Support, Code Review ------
----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Thu, 13 Mar 2008 22:49:41 GMT
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: My editfile.pl script
Message-Id: <slrnftjao9.78e.tadmc@tadmc30.sbcglobal.net>
["Followup-To:" header set to comp.lang.perl.misc.]
Ignoramus17007 <ignoramus17007@NOSPAM.17007.invalid> wrote:
> open( IN, $file );
You should always, yes *always*, check the return value from open():
open( IN, $file ) or die "could not open '$file' $!";
Even better, use a lexical filehandle and the 3-arg form of open():
open( my $IN, '<', $file ) or die "could not open '$file' $!";
or
open my $IN, '<', $file or die "could not open '$file' $!";
> while( my $l = <IN> ) {
while( my $l = <$IN> ) {
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Fri, 14 Mar 2008 02:08:37 +0100 (CET)
From: "s. keeling" <keeling@nucleus.com>
Subject: Re: My editfile.pl script
Message-Id: <slrnftjjvd.6n9.keeling@phreaque.nucleus.com>
Tad J McClellan <tadmc@seesig.invalid>:
>
> ["Followup-To:" header set to comp.lang.perl.misc.]
>
>
> Ignoramus17007 <ignoramus17007@NOSPAM.17007.invalid> wrote:
>
>
> > open( IN, $file );
>
>
> You should always, yes *always*, check the return value from open():
>
> open( IN, $file ) or die "could not open '$file' $!";
>
> Even better, use a lexical filehandle and the 3-arg form of open():
>
> open( my $IN, '<', $file ) or die "could not open '$file' $!";
open( IN, ... # no $ needed. But you knew that.
--
Any technology distinguishable from magic is insufficiently advanced.
(*) http://blinkynet.net/comp/uip5.html Linux Counter #80292
- - http://www.faqs.org/rfcs/rfc1855.html Please, don't Cc: me.
------------------------------
Date: Fri, 14 Mar 2008 01:24:43 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: My editfile.pl script
Message-Id: <x77ig6umg4.fsf@mail.sysarch.com>
>>>>> "sk" == s keeling <keeling@nucleus.com> writes:
sk> Tad J McClellan <tadmc@seesig.invalid>:
>>
>> ["Followup-To:" header set to comp.lang.perl.misc.]
>>
>>
>> Ignoramus17007 <ignoramus17007@NOSPAM.17007.invalid> wrote:
>>
>>
>> > open( IN, $file );
>>
>>
>> You should always, yes *always*, check the return value from open():
>>
>> open( IN, $file ) or die "could not open '$file' $!";
>>
>> Even better, use a lexical filehandle and the 3-arg form of open():
>>
>> open( my $IN, '<', $file ) or die "could not open '$file' $!";
sk> open( IN, ... # no $ needed. But you knew that.
no, you are wrong. tad used a lexical filehandle (notice the my!) which
is better than a global glob handle that you and the OP coded. also note
that tad said this in his comment. so yes, the $ IS needed as $IN is not
the same as IN.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Architecture, Development, Training, Support, Code Review ------
----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Thu, 13 Mar 2008 21:39:23 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: My editfile.pl script
Message-Id: <slrnftjpar.8u9.tadmc@tadmc30.sbcglobal.net>
s. keeling <keeling@nucleus.com> wrote:
> Tad J McClellan <tadmc@seesig.invalid>:
>>
>> ["Followup-To:" header set to comp.lang.perl.misc.]
>>
>>
>> Ignoramus17007 <ignoramus17007@NOSPAM.17007.invalid> wrote:
>>
>>
>> > open( IN, $file );
>>
>>
>> You should always, yes *always*, check the return value from open():
>>
>> open( IN, $file ) or die "could not open '$file' $!";
>>
>> Even better, use a lexical filehandle and the 3-arg form of open():
^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^
>> open( my $IN, '<', $file ) or die "could not open '$file' $!";
>
> open( IN, ... # no $ needed. But you knew that.
If you don't know what a lexical filehandle is, then ask what it is.
If you don't know why it is better, then ask why it is better.
If you think that you have corrected a mistake that I made,
then you have made a mistake. :-)
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Thu, 13 Mar 2008 22:18:21 -0500
From: Ignoramus17007 <ignoramus17007@NOSPAM.17007.invalid>
Subject: Re: My editfile.pl script
Message-Id: <s7mdnUf4eKjgc0TanZ2dnUVZ_h7inZ2d@giganews.com>
On 2008-03-14, Tad J McClellan <tadmc@seesig.invalid> wrote:
> s. keeling <keeling@nucleus.com> wrote:
>> Tad J McClellan <tadmc@seesig.invalid>:
>>>
>>> ["Followup-To:" header set to comp.lang.perl.misc.]
>>>
>>>
>>> Ignoramus17007 <ignoramus17007@NOSPAM.17007.invalid> wrote:
>>>
>>>
>>> > open( IN, $file );
>>>
>>>
>>> You should always, yes *always*, check the return value from open():
>>>
>>> open( IN, $file ) or die "could not open '$file' $!";
>>>
>>> Even better, use a lexical filehandle and the 3-arg form of open():
> ^^^^^^^^^^^^^^^^^^
> ^^^^^^^^^^^^^^^^^^
>>> open( my $IN, '<', $file ) or die "could not open '$file' $!";
>>
>> open( IN, ... # no $ needed. But you knew that.
>
>
> If you don't know what a lexical filehandle is, then ask what it is.
>
> If you don't know why it is better, then ask why it is better.
>
> If you think that you have corrected a mistake that I made,
> then you have made a mistake. :-)
>
>
I agree that they are better, but this is a no-function script, so
they are not much better. Now if I had several functions, libraries,
etc, that would certainly make it more imperative.
Anyway, I changed my script to use lexical filehandles to improve my
practices.
i
#!/usr/bin/perl
#
# Copyright (C) Igor Chudov, 2008.
# Released to the public under the GNU Public License V3.
#
use strict;
use warnings;
use Getopt::Long;
use POSIX 'SEEK_SET';
my $before = undef;
my $after = undef;
my $atend = undef;
my $replace = undef;
my $text = undef;
my $line = undef;
my $usage ="USAGE: $0 [--after after_line|--atend|--replace replace_line] --line line {files...}";
die $usage unless
GetOptions(
"before=s" => \$before,
"after=s" => \$after,
"atend!" => \$atend,
"replace=s" => \$replace,
"line=s" => \$line,
);
my @files = @ARGV;
sub Cleanup {
my $x = shift;
return $x unless $x;
# $L and $LINE are used for comparisons to take
# spaces and tabs into account, like diff -w
$x =~ s/^\s+//;
$x =~ s/\s+$//;
$x =~ s/\s+/ /g;
return $x;
}
################################################## Syntax check
my $count = 0;
$count++ if $before;
$count++ if $after;
$count++ if $atend;
$count++ if $replace;
die $usage unless $count == 1;
die $usage unless $line;
die $usage unless @files;
my $BEFORE = Cleanup( $before );
my $AFTER = Cleanup( $after );
my $REPLACE = Cleanup( $replace );
my $LINE = Cleanup( $line );
############################## Now we have good command line arguments.
foreach my $file (@files) {
my $good = undef;
open( my $IN, $file ) || die "Cannot read from $file";
open( my $OUT, ">$file.new" ) || die "Cannot write to $file.new";
# Check if file has this line already!
my $has = undef;
while( my $l = <$IN> ) {
chomp $l;
my $L = $l;
$L =~ s/^\s+//;
$L =~ s/\s+$//;
$L =~ s/\s+/ /g;
if( $L eq $LINE ) {
$has = 1;
last;
}
}
unless( $has ) {
seek( $IN, 0, &POSIX::SEEK_SET ) || die "Cannot seek 0, WTF";
while( my $l = <$IN> ) {
chomp $l;
my $L = $l;
$L =~ s/^\s+//;
$L =~ s/\s+$//;
$L =~ s/\s+/ /g;
if( $before ) {
#print "Before: '$L' ? '$before'\n";
if( $L eq $BEFORE ) {
print $OUT "$line\n";
$good = 1;
}
print $OUT "$l\n";
} elsif( $after ) {
#print "After? '$L' ? '$after'\n";
print $OUT "$l\n";
if( $L eq $AFTER ) {
print $OUT "$line\n";
$good = 1;
}
} elsif( $replace ) {
if( $L eq $REPLACE ) {
print $OUT "$line\n";
$good = 1;
} else {
print $OUT "$l\n";
}
} else {
print $OUT "$l\n";
}
}
if( $atend ) {
print $OUT "$line\n";
$good = 1;
}
}
close( $IN );
close( $OUT );
if( $good ) {
rename( "$file.new", $file ) || die "Cannot rename $file.new to $file";
} else {
unlink( "$file.new" ) || die "Cannot delete $file.new";
}
unless( $good || $has ) {
die "$0 Failed to perform requested operation on $file.";
}
}
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 1361
***************************************