[32584] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3856 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Jan 8 18:09:51 2013

Date: Tue, 8 Jan 2013 15:09:16 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 8 Jan 2013     Volume: 11 Number: 3856

Today's topics:
        best way to make a few changes in a large data file <cartercc@gmail.com>
    Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
    Re: best way to make a few changes in a large data file <bugbear@trim_papermule.co.uk_trim>
    Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
    Re: best way to make a few changes in a large data file <hansmu@xs4all.nl>
    Re: best way to make a few changes in a large data file <cartercc@gmail.com>
    Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
    Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
    Re: best way to make a few changes in a large data file <ben@morrow.me.uk>
    Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
        CGI / postprocess of values <cwpbl@rf.oohay>
    Re: CGI / postprocess of values <news@lawshouse.org>
    Re: CGI / postprocess of values <ben@morrow.me.uk>
    Re: Date in CSV/TSV question <cartercc@gmail.com>
        Rehabilitating bad Perl code <news@lawshouse.org>
    Re: Rehabilitating bad Perl code <ben@morrow.me.uk>
    Re: Rehabilitating bad Perl code <news@lawshouse.org>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 8 Jan 2013 08:25:36 -0800 (PST)
From: ccc31807 <cartercc@gmail.com>
Subject: best way to make a few changes in a large data file
Message-Id: <ba5c2cb5-6cfc-48e3-8246-b8d538bd22ae@googlegroups.com>

My big data file looks like this:
1,al
2,becky
3,carl
4,debbie
5,ed
6,frieda
 ... for perhaps 200K or 300k lines

My change file looks like this:
5, edward
 ... for perhaps ten or twelve lines

My script looks like this (SKIPPING THE DETAILS):
my %big_data_hash;
while (<BIG>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
while (<CHANGE>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
foreach my $id (keys %big_data_hash) 
  { print OUT qq($id,$big_data_hash{$id}\n); }

This seems wasteful to me, loading several hundred thousand lines of data in memory just to make a few changes. Is there any way to tie the data file to a hash and make the changes directly?

Does anyone have any better ideas?

Thanks, CC.


------------------------------

Date: Tue, 08 Jan 2013 16:38:02 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <87vcb7ljf9.fsf@sapphire.mobileactivedefense.com>

ccc31807 <cartercc@gmail.com> writes:
> My big data file looks like this:
> 1,al
> 2,becky
> 3,carl
> 4,debbie
> 5,ed
> 6,frieda
> ... for perhaps 200K or 300k lines
>
> My change file looks like this:
> 5, edward
> ... for perhaps ten or twelve lines
>
> My script looks like this (SKIPPING THE DETAILS):
> my %big_data_hash;
> while (<BIG>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
> while (<CHANGE>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
> foreach my $id (keys %big_data_hash) 
>   { print OUT qq($id,$big_data_hash{$id}\n); }
>
> This seems wasteful to me, loading several hundred thousand lines of
> data in memory just to make a few changes. Is there any way to tie
> the data file to a hash and make the changes directly?

For a text file, no, since insertion or removal of characters affects
the relative positions of all characters afte the place where the
change occured. But you algorithm could be improved: Instead of
reading the data file and the changes file into memory completely,
changing the 'data hash' and looping over all keys of that to generate
the modified output, you could read the change file (which is
presumably much smaller) into memory and then process the data file
line by line, applying changes 'on the go' where necessary, ie,
(uncompiled)

my ($id, $name);	# don't create new variables for every iteration

while (<CHANGE>) {
      ($id, $name) = split /,/;
      $change_hash{$id} = $name;
}                 

while (<BIG>) {
	($id, $name) = split /,/;
        $name = $change_hash{$id} if exists($change_hash{$id});
        
        print OUT qq($id,$name\n);
}


------------------------------

Date: Tue, 08 Jan 2013 16:40:28 +0000
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <soadnYR9_MVh1nHNnZ2dnUVZ8nudnZ2d@brightview.co.uk>

ccc31807 wrote:
> My big data file looks like this:
> 1,al
> 2,becky
> 3,carl
> 4,debbie
> 5,ed
> 6,frieda
> ... for perhaps 200K or 300k lines
>
> My change file looks like this:
> 5, edward
> ... for perhaps ten or twelve lines
>
> My script looks like this (SKIPPING THE DETAILS):
> my %big_data_hash;
> while (<BIG>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
> while (<CHANGE>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
> foreach my $id (keys %big_data_hash)
>    { print OUT qq($id,$big_data_hash{$id}\n); }
>
> This seems wasteful to me, loading several hundred thousand lines of data in memory just to make a few changes. Is there any way to tie the data file to a hash and make the changes directly?
>
> Does anyone have any better ideas?

Any improvement would need a change in the file structure - the big win would come from NOT having to modify
a significant number of the disc blocks that represent the file.

This would involve techniques such as indexing, trees of blocks, fixed size padding of the data, or having "pad" areas
to avoid always having to shuffle the data up and down on edits.

I believe Oracle make a piece of software that does this :-)

  BugBear



------------------------------

Date: Tue, 08 Jan 2013 16:43:46 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <87r4lvlj5p.fsf@sapphire.mobileactivedefense.com>

bugbear <bugbear@trim_papermule.co.uk_trim> writes:
> ccc31807 wrote:
>> My big data file looks like this:
>> 1,al
>> 2,becky
>> 3,carl
>> 4,debbie
>> 5,ed
>> 6,frieda
>> ... for perhaps 200K or 300k lines
>>
>> My change file looks like this:
>> 5, edward
>> ... for perhaps ten or twelve lines

[...]

> Any improvement would need a change in the file structure - the big win would come from NOT having to modify
> a significant number of the disc blocks that represent the file.
>
> This would involve techniques such as indexing, trees of blocks, fixed size padding of the data, or having "pad" areas
> to avoid always having to shuffle the data up and down on edits.
>
> I believe Oracle make a piece of software that does

s/makes/bought/. It is called BerkeleyDB. Any of the other freely
available 'hashed database' packages (eg, GDBM) should be usuable as
well.


------------------------------

Date: Tue, 08 Jan 2013 18:21:43 +0100
From: Hans Mulder <hansmu@xs4all.nl>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <50ec55a9$0$6936$e4fe514c@news2.news.xs4all.nl>

On 8/01/13 17:43:46, Rainer Weikusat wrote:
> bugbear <bugbear@trim_papermule.co.uk_trim> writes:
>> ccc31807 wrote:
>>> My big data file looks like this:
>>> 1,al
>>> 2,becky
>>> 3,carl
>>> 4,debbie
>>> 5,ed
>>> 6,frieda
>>> ... for perhaps 200K or 300k lines
>>>
>>> My change file looks like this:
>>> 5, edward
>>> ... for perhaps ten or twelve lines
> 
> [...]
> 
>> Any improvement would need a change in the file structure - the big win would come from NOT having to modify
>> a significant number of the disc blocks that represent the file.
>>
>> This would involve techniques such as indexing, trees of blocks, fixed size padding of the data, or having "pad" areas
>> to avoid always having to shuffle the data up and down on edits.
>>
>> I believe Oracle make a piece of software that does
> 
> s/makes/bought/. It is called BerkeleyDB. Any of the other freely
> available 'hashed database' packages (eg, GDBM) should be usuable as
> well.

Oracle have also bought a product called MySQL, which may or may not be
what bugbear was thinking of.

-- HansM



------------------------------

Date: Tue, 8 Jan 2013 10:51:11 -0800 (PST)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <881835a8-ace9-453b-bc8e-20f0d391a6f0@googlegroups.com>

On Tuesday, January 8, 2013 11:38:02 AM UTC-5, Rainer Weikusat wrote:
> change occured. But you algorithm could be improved: Instead of=20
> reading the data file and the changes file into memory completely,=20
> changing the 'data hash' and looping over all keys of that to generate=20
> the modified output, you could read the change file (which is=20
> presumably much smaller) into memory and then process the data file=20
> line by line, applying changes 'on the go' where necessary, ie,

You would think so, anyway. This was the first thing I tried, and it turns =
out (on my setup at least) that printing the outfile line by line takes a l=
ot longer than dumping the whole thing into memory then printing the DS onc=
e.

I also thought of using the ID as an index to an array and tying the disk f=
ile to an array, but to be honest I was just too lazy to try it. The array =
would be very sparse (several 100k rows out of a potential 10m array, IDs c=
an go as high as 99999999) and it seemed more wasteful than using a hash wi=
th only the number of keys that I actually have.

CC.

It's not a big deal, it wouldn't matter if it took 5 seconds to run or 5 mi=
nutes to run, as long as it produces the correct results.

>=20
> (uncompiled)
>=20
>=20
>=20
> my ($id, $name);	# don't create new variables for every iteration
>=20
>=20
>=20
> while (<CHANGE>) {
>=20
>       ($id, $name) =3D split /,/;
>=20
>       $change_hash{$id} =3D $name;
>=20
> }                =20
>=20
>=20
>=20
> while (<BIG>) {
>=20
> 	($id, $name) =3D split /,/;
>=20
>         $name =3D $change_hash{$id} if exists($change_hash{$id});
>=20
>        =20
>=20
>         print OUT qq($id,$name\n);
>=20
> }



------------------------------

Date: Tue, 08 Jan 2013 19:34:37 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <87lic3lb8y.fsf@sapphire.mobileactivedefense.com>

ccc31807 <cartercc@gmail.com> writes:
> On Tuesday, January 8, 2013 11:38:02 AM UTC-5, Rainer Weikusat wrote:
>> change occured. But you algorithm could be improved: Instead of 
>> reading the data file and the changes file into memory completely, 
>> changing the 'data hash' and looping over all keys of that to generate 
>> the modified output, you could read the change file (which is 
>> presumably much smaller) into memory and then process the data file 
>> line by line, applying changes 'on the go' where necessary, ie,
>
> You would think so, anyway. This was the first thing I tried, and it
> turns out (on my setup at least) that printing the outfile line by
> line takes a lot longer than dumping the whole thing into memory
> then printing the DS once.

You are both reading and printing the output file 'line by line' at
least insofar the (pseudo-)code you posted represented the code you
are actually using accurately. Consequently, your statement above
doesn't make sense, except insofar it communicates that you tried
something which didn't work as you expected to and that you (judgeing
from the text above) don't really have an idea why it didn't do that.

As can be determined by some experiments, constructing a big hash and
doing a few lookups on that is less expensive than constructing a
small hash and doing a lot of lookups on that.

Data file (d0) was created with

perl -e 'for ('a' .. 'z', 'A' .. 'Z') { print ($n++.",$_", "\n"); }'

and concatenating the output of that with itself multiple times (total
of 468 lines), 'changes files' (d1) was

--------
17,X
41,y
22,W
--------

Provided relatively few replacements have to be done 'early on', the
basic idea of using a hash of changes is indeed faster than using a
data hash. Otherwise, things aren't that simple.

----------------
use Benchmark;

open($out, '>', '/dev/null');

timethese(-5, 
	  { ccc => sub 
	    {
		my ($fh, %h, $id, $d);

		open($fh, '<', 'd0');
		while (<$fh>) {
		    ($id, $d) = split /,/;
		    $h{$id} = $d;
		}
		$fh = undef;
    
		open($fh, '<', 'd1');
		while (<$fh>) {
		    ($id, $d) = split /,/;
		    $h{$id} = $d;
		}

		for (keys(%h)) {
		    print $out ($_, ',', $h{$_});
		}
	    },

	    sane => sub 
	    {
		my ($fh, %h, $id, $d, $v);

		open($fh, '<', 'd1');
		while (<$fh>) {
		    ($id, $d) = split /,/;
		    $h{$id} = $d;
		}
		$fh = undef;

		open($fh, '<', 'd0');
		while (<$fh>) {
		    ($id, $d) = split /,/;

		    $v = $h{$id};
		    print $out ($id, ',', $d), next unless defined($v);

		    print $out ($id, ',', $v);
		    undef($h{$id});
		    last unless %$h;
		}

		print $out ($_) while <$fh>;
	    }});


------------------------------

Date: Tue, 08 Jan 2013 19:49:52 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <87hamrlajj.fsf@sapphire.mobileactivedefense.com>


[...]

> As can be determined by some experiments, constructing a big hash and
> doing a few lookups on that is less expensive than constructing a
> small hash and doing a lot of lookups on that.

As a quick addition to that: This is also to simplistic, because the
original code does exactly as many hash lookups except that most of
the are successful. Judgeing from a few more tests, using 'small
integers' as hash keys doesn't seem to be something the perl hashing
algorithm likes very much, eg,

$h{17} = 'X';
$h{22} = 'Y';

will put (for 5.10.1) both data items in the same slot.


------------------------------

Date: Tue, 8 Jan 2013 22:00:46 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <ehhur9-aa62.ln1@anubis.morrow.me.uk>


Quoth ccc31807 <cartercc@gmail.com>:
> My big data file looks like this:
> 1,al
> 2,becky
> 3,carl
> 4,debbie
> 5,ed
> 6,frieda
> ... for perhaps 200K or 300k lines
> 
> My change file looks like this:
> 5, edward
> ... for perhaps ten or twelve lines
> 
> My script looks like this (SKIPPING THE DETAILS):
> my %big_data_hash;
> while (<BIG>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
> while (<CHANGE>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
> foreach my $id (keys %big_data_hash) 
>   { print OUT qq($id,$big_data_hash{$id}\n); }
> 
> This seems wasteful to me, loading several hundred thousand lines of
> data in memory just to make a few changes. Is there any way to tie the
> data file to a hash and make the changes directly?

If the numbers in the file are consecutive and contiguous, you could use
Tie::File. Otherwise you would be better off using some sort of database
in place of the large file.

Ben



------------------------------

Date: Tue, 08 Jan 2013 22:59:17 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <876237l1ru.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth ccc31807 <cartercc@gmail.com>:
>> My big data file looks like this:
>> 1,al
>> 2,becky
>> 3,carl
>> 4,debbie
>> 5,ed
>> 6,frieda
>> ... for perhaps 200K or 300k lines
>> 
>> My change file looks like this:
>> 5, edward
>> ... for perhaps ten or twelve lines
>> 
>> My script looks like this (SKIPPING THE DETAILS):
>> my %big_data_hash;
>> while (<BIG>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
>> while (<CHANGE>) { my ($id, $name) = split; $big_data_hash{$id} = $name; }
>> foreach my $id (keys %big_data_hash) 
>>   { print OUT qq($id,$big_data_hash{$id}\n); }
>> 
>> This seems wasteful to me, loading several hundred thousand lines of
>> data in memory just to make a few changes. Is there any way to tie the
>> data file to a hash and make the changes directly?
>
> If the numbers in the file are consecutive and contiguous, you could use
> Tie::File. Otherwise you would be better off using some sort of database
> in place of the large file.

That's going to suffer from the same problem as the 'put the changes
into a hash' idea I posted earlier: Searching for a particular key for
a large number of times in a small hash, with most of these searches
being unsuccessful, is going to be slower than building a large hash
and (successfully) searching for a small number of keys in that. And
since there's no way to determine the key of a particular 'big file'
line except by reading this line (which implies reading everyting up to
this line) and parsing it and now way to generate the output stream
except by writing out all 'new' lines in the order they are supposed
to appear,  it won't be possible to save any I/O in this way.

There are a number of possibilities here but without knowing more
about the problem, it is not really possible to make sensible
suggestion (Eg, what is supposed to be saves, memory or execution time?
Is it possible to change the process generating the 'big files'? If
not, how often is a file created and how often processed?).


------------------------------

Date: Tue, 08 Jan 2013 20:58:33 +0100
From: cwpbl <cwpbl@rf.oohay>
Subject: CGI / postprocess of values
Message-Id: <50ec7a6c$0$21924$426a74cc@news.free.fr>

Hello,

I have a very simple test.cgi program, using CGI.pm wich display a 
textfield.
After the form has been submitted, I have a new url, saying :
http://host/cgi-bin/test.cgi?my_textfield_content=azertyu

OK.
Now, I want the value of my_textfield_content appear as an encoded 
string, in the url, something like
http://host/cgi-bin/test.cgi?my_textfield_content=VgHo98km==

I have already written the (de)(en)coding function.
But how can I call this encoding function to have value of 
my_text_field_content modified ?


------------------------------

Date: Tue, 08 Jan 2013 21:40:10 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: CGI / postprocess of values
Message-Id: <CYSdnSqMU8SnD3HNnZ2dnUVZ8iadnZ2d@giganews.com>

On 08/01/13 19:58, cwpbl wrote:
> After the form has been submitted, I have a new url, saying :
> http://host/cgi-bin/test.cgi?my_textfield_content=azertyu

This is intrinsically what the "submit" function does; it sends to the 
server program the value of the field in the form.

> Now, I want the value of my_textfield_content appear as an encoded
> string, in the url, something like
> http://host/cgi-bin/test.cgi?my_textfield_content=VgHo98km==

Unless you write some local code (javascript or whatever) to encode the 
contents of "textfield" /before/ submission then it will be sent off 
unencoded.

I think I have understood your question.  But it would be better if you 
post test.cgi, so we can see what you're doing.

-- 

Henry Law            Manchester, England


------------------------------

Date: Tue, 8 Jan 2013 22:24:07 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: CGI / postprocess of values
Message-Id: <7tiur9-9i62.ln1@anubis.morrow.me.uk>


Quoth cwpbl <cwpbl@rf.oohay>:
> 
> I have a very simple test.cgi program, using CGI.pm wich display a 
> textfield.
> After the form has been submitted, I have a new url, saying :
> http://host/cgi-bin/test.cgi?my_textfield_content=azertyu
> 
> OK.
> Now, I want the value of my_textfield_content appear as an encoded 
> string, in the url, something like
> http://host/cgi-bin/test.cgi?my_textfield_content=VgHo98km==

Why?

(This is not a stupid question. What are you trying to do that makes you
think you need this?)

> I have already written the (de)(en)coding function.

I hope you mean 'I have already found MIME::Base64, and I know how to
use it'.

Ben



------------------------------

Date: Tue, 8 Jan 2013 08:35:23 -0800 (PST)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Date in CSV/TSV question
Message-Id: <6a9f5d72-cacd-492a-a2a9-d1873ebea342@googlegroups.com>

On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
> "07 Jan 2011"   "TFR" 
> "05 Jan 2011"   "DR"> 
> 
> I need change the first field to look like> 
> 
> 2011-01-07   "TFR" 
> 2011-01-05   "DR"

For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
1. my ($day, $month, $year) = split(/ /, $date);
2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);

Line 1 splits your date string into the three components: day, month, year.
Line 2 reassembles those three components and assigns the result back to $date.
The hash table %mo2num looks like this:
my %mo2num = (
  JAN => 1,
  FEB => 2,
  mar => 3,
etc.
);

CC.


------------------------------

Date: Tue, 08 Jan 2013 19:37:57 +0000
From: Henry Law <news@lawshouse.org>
Subject: Rehabilitating bad Perl code
Message-Id: <KtSdnaGzKM8I6HHNnZ2dnUVZ8nWdnZ2d@giganews.com>

I have inherited some code which has uses several horrible constructs, 
such as splicing together completely unrelated assignments and 
statements with commas, like

   $foo = 1, $bar = 2, print "This is horrid\n";

and into the bargain has been written without strictures, and no "my" 
declarations.

It would be quite nice to find a majority of these automatically; in the 
past I've written bits of code (some Perl, also awk and so forth) to 
find things like this but it didn't work terribly well, and there's 
inches thick of this stuff.  Does anyone have any experience I could 
draw on?

I'm thinking about parsing the output from B::Xref; that would at least 
give me a list of the variables, and I could then generate a bunch of 
"my" statements up near the top.  Any other avenues to explore?

-- 

Henry Law            Manchester, England


------------------------------

Date: Tue, 8 Jan 2013 22:21:41 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Rehabilitating bad Perl code
Message-Id: <loiur9-9i62.ln1@anubis.morrow.me.uk>


Quoth Henry Law <news@lawshouse.org>:
> I have inherited some code which has uses several horrible constructs, 
> such as splicing together completely unrelated assignments and 
> statements with commas, like
> 
>    $foo = 1, $bar = 2, print "This is horrid\n";
> 
> and into the bargain has been written without strictures, and no "my" 
> declarations.
> 
> It would be quite nice to find a majority of these automatically; in the 
> past I've written bits of code (some Perl, also awk and so forth) to 
> find things like this but it didn't work terribly well, and there's 
> inches thick of this stuff.  Does anyone have any experience I could 
> draw on?

I would recommend attacking it in pieces: every time you find you need
to change a section of the code, clean up that section, pull the
important functionality out into subs, and move those subs into either a
separate file or a separate section of the same file which is under
'strict' and 'warnings'. Trying to rewrite the whole lot in one go is
likely to produce subtle errors, especially since I don't imagine you
have any sort of test suite.

Once you get to the point where a decent proportion of the code is
clean, you can start working the other way round: take a nasty section,
put it in a sub, and turn off 'strict' and 'warnings' inside that sub.
Once you've got that working you can come back and clean up the subs one
by one, fixing strict errors and other nastiness and replacing globals
with passed-in parameters.

> I'm thinking about parsing the output from B::Xref; that would at least 
> give me a list of the variables, and I could then generate a bunch of 
> "my" statements up near the top.  Any other avenues to explore?

That sounds like a lot of work for only rather marginal benefit. While
file-scoped lexicals are a little safer than undeclared globals, the
real problem with globals of any kind is that they make data-flow
analysis extremely difficult. In a single-file program there is no
difference between a file-scoped lexical and a package global from this
point of view.

Also, to get any benefit from this, you would have to turn on 'strict';
this does more than just requiring you to declare your variables. You
would have to go through the whole program fixing strict errors, and
doing it in a single pass like that you are much more likely to make a
mistake than if you do it a piece at a time, making sure you've properly
understood each piece before you rewrite it.

Basically, if it ain't broke, don't fix it; if it is broke, take the
opportunity to fix the clarity problems at the same time.

Ben



------------------------------

Date: Tue, 08 Jan 2013 22:56:42 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: Rehabilitating bad Perl code
Message-Id: <kr2dncKA3pm2OXHNnZ2dnUVZ8u6dnZ2d@giganews.com>

On 08/01/13 22:21, Ben Morrow wrote:
> I would recommend attacking it in pieces: every time you find you need
> to change a section of the code, clean up that section, pull the
> important functionality out into subs, and move those subs into either a
> separate file or a separate section of the same file which is under
> 'strict' and 'warnings'. Trying to rewrite the whole lot in one go is
> likely to produce subtle errors, especially since I don't imagine you
> have any sort of test suite.

No, you're right.  I have to fight against the impulse to say "Aaaagh; 
this is terrible -- I cannot leave this as it is!"

The major problem is that the code does what it's supposed to do right 
now (at least I believe it does), and regression tests aren't easy to do.

>
> Once you get to the point where a decent proportion of the code is
> clean, you can start working the other way round: take a nasty section,
> put it in a sub, and turn off 'strict' and 'warnings' inside that sub.
> Once you've got that working you can come back and clean up the subs one
> by one, fixing strict errors and other nastiness and replacing globals
> with passed-in parameters.

Sheesh; yes, I see what you mean.  One does begin to wonder whether it 
mightn't be easier to rewrite the damn thing from scratch .. (don't take 
that seriously)

>
> Basically, if it ain't broke, don't fix it; if it is broke, take the
> opportunity to fix the clarity problems at the same time.

Sage, if unwelcome, advice.  Thank you for a very full reply.

-- 

Henry Law            Manchester, England


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3856
***************************************


home help back first fref pref prev next nref lref last post