[32585] in Perl-Users-Digest
Perl-Users Digest, Issue: 3857 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 10 14:09:20 2013
Date: Thu, 10 Jan 2013 11:09:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 10 Jan 2013 Volume: 11 Number: 3857
Today's topics:
Re: best way to make a few changes in a large data file <derykus@gmail.com>
Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
Re: best way to make a few changes in a large data file <tzz@lifelogs.com>
Re: CGI / postprocess of values <cwpbl@rf.oohay>
Re: CGI / postprocess of values <justin.1211@purestblue.com>
Re: CGI / postprocess of values <cwpbl@rf.oohay>
Re: CGI / postprocess of values <news@lawshouse.org>
Re: CGI / postprocess of values <ben@morrow.me.uk>
Re: Date in CSV/TSV question <nospam@lisse.NA>
Re: Date in CSV/TSV question <rweikusat@mssgmbh.com>
Re: Rehabilitating bad Perl code <cartercc@gmail.com>
Re: Rehabilitating bad Perl code <troffasky@hotmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 9 Jan 2013 06:10:35 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <eb2303f8-5e04-43da-8a6f-eabb9c737c88@googlegroups.com>
On Tuesday, January 8, 2013 10:51:11 AM UTC-8, ccc31807 wrote:
> On Tuesday, January 8, 2013 11:38:02 AM UTC-5, Rainer Weikusat wrote:
>=20
> > change occured. But you algorithm could be improved: Instead of=20
>=20
> > reading the data file and the changes file into memory completely,=20
>=20
> > changing the 'data hash' and looping over all keys of that to generate=
=20
>=20
> > the modified output, you could read the change file (which is=20
>=20
> > presumably much smaller) into memory and then process the data file=20
>=20
> > line by line, applying changes 'on the go' where necessary, ie,
>=20
>=20
>=20
> You would think so, anyway. This was the first thing I tried, and it turn=
s out (on my setup at least) that printing the outfile line by line takes a=
lot longer than dumping the whole thing into memory then printing the DS o=
nce.
>=20
>=20
>=20
> I also thought of using the ID as an index to an array and tying the disk=
file to an array, but to be honest I was just too lazy to try it. The arra=
y would be very sparse (several 100k rows out of a potential 10m array, IDs=
can go as high as 99999999) and it seemed more wasteful than using a hash =
with only the number of keys that I actually have.=20
>=20
> ...
> It's not a big deal, it wouldn't matter if it took 5 seconds to run or 5 =
minutes to run, as long as it produces the correct results.
> ...
>=20
Since speed isn't critical, the Tie::File suggestion=20
would simplify the code considerably. Since the whole
file isn't loaded, big files won't be problematic and
any changes to the tied array will update the file
at once. However, id's will sync to actual file line=20
no's and Tie::File will automatically create empty=20
lines in the file if the array is sparse.=20
eg:
use Tie::File;
tie my @array, 'Tie::File', 'd0' or die $!;
open(my $fh, '<', 'd1') or die $!;
while (<$fh>) {
chomp;
my($id, $value) =3D split /,/;
$array[$id-1] =3D "$id,$value";
}
--=20
Charles DeRykus
------------------------------
Date: Wed, 09 Jan 2013 15:27:30 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <87fw2aqsv1.fsf@sapphire.mobileactivedefense.com>
"C.DeRykus" <derykus@gmail.com> writes:
[...]
> Since speed isn't critical, the Tie::File suggestion
> would simplify the code considerably.
[...]
> use Tie::File;
>
> tie my @array, 'Tie::File', 'd0' or die $!;
>
> open(my $fh, '<', 'd1') or die $!;
> while (<$fh>) {
> chomp;
> my($id, $value) = split /,/;
> $array[$id-1] = "$id,$value";
> }
Including 1361 lines of code stored in another file does not
'simplify the code'. It makes it a hell lot more complicated. Assuming
that speed doesn't matter, a simple implementation could look like
this
sub small
{
my ($fh, %chgs);
open($fh, '<', 'd1');
%chgs = map { split /,/ } <$fh>;
open($fh, '<', 'd0');
/(.*),(.*)/s, print ($1, ',', $chgs{$1} // $2) while <$fh>;
}
It can be argued that 'using Tie::File', provided the semantics match,
makes the task of the programmer easier but not the code and even this
isn't necessarily true --- Tie::File surely makes it easier to shoot
oneself in the foot here, see 'Defferred Writing' section in the
manpage --- because reading through more han six pages of technical
documentation becomes then also part of the problem. But this shifts
the issue to a different plane: The problem is no longer a technical
one but a personal one -- how can $person with $skills get this done
with as little (intellectual) effort as possible. And while this may
well be a valid concern (eg, if someone has to solve the problem
quickly for his own use) it doesn't translate to a universal
recommendation.
------------------------------
Date: Wed, 09 Jan 2013 11:12:31 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <871udubaj4.fsf@lifelogs.com>
On Tue, 8 Jan 2013 10:51:11 -0800 (PST) ccc31807 <cartercc@gmail.com> wrote:
c> You would think so, anyway. This was the first thing I tried, and it
c> turns out (on my setup at least) that printing the outfile line by
c> line takes a lot longer than dumping the whole thing into memory then
c> printing the DS once.
I have never experienced this. Could you, for instance, be reopening
the change file repeatedly? Any chance you could post that slow version
of the code?
I would recommend, if you are stuck on the text-based data files, to
use perl -p -e 'BEGIN { # load your change file } ... process ... }'
This doesn't have to be a one-liner, but it's a good way to test quickly
the "slow performance" issue. e.g.
perl -p -e 's/^5,.*/5,edward/' myfile > myrewrite
If that's slow, something's up.
Ted
------------------------------
Date: Wed, 09 Jan 2013 07:39:31 +0100
From: cwpbl <cwpbl@rf.oohay>
Subject: Re: CGI / postprocess of values
Message-Id: <50ed10a7$0$1947$426a74cc@news.free.fr>
Le 08/01/2013 23:24, Ben Morrow a écrit :
>
> Quoth cwpbl <cwpbl@rf.oohay>:
>>
>> I have a very simple test.cgi program, using CGI.pm wich display a
>> textfield.
>> After the form has been submitted, I have a new url, saying :
>> http://host/cgi-bin/test.cgi?my_textfield_content=azertyu
>>
>> OK.
>> Now, I want the value of my_textfield_content appear as an encoded
>> string, in the url, something like
>> http://host/cgi-bin/test.cgi?my_textfield_content=VgHo98km==
>
> Why?
>
> (This is not a stupid question. What are you trying to do that makes you
> think you need this?)
1. I need a url which does not contains some characters (like |) because
these url , entered in a wiki (dokuwiki), are not interpreted correctly.
2. url are send to some users. I do not want they modify these url (easy
to do if in clear).
>
>> I have already written the (de)(en)coding function.
>
> I hope you mean 'I have already found MIME::Base64, and I know how to
> use it'.
>
I use 8bits -> 7bits + encode_base64.
sub cps {
my $bstr = unpack('B*', $_[0]);
$bstr =~ s/.(.{7})/$1/g;
return pack('B*', $bstr);
}
sub ec{
return encode_base64(cps($_[0]),"");
}
The problem is : where call ec ?
> Ben
>
------------------------------
Date: Wed, 9 Jan 2013 10:38:29 +0000
From: Justin C <justin.1211@purestblue.com>
Subject: Re: CGI / postprocess of values
Message-Id: <5utvr9-qr3.ln1@zem.masonsmusic.co.uk>
On 2013-01-09, cwpbl <cwpbl@rf.oohay> wrote:
> Le 08/01/2013 23:24, Ben Morrow a écrit :
>>
>> Quoth cwpbl <cwpbl@rf.oohay>:
>>>
>>> I have a very simple test.cgi program, using CGI.pm wich display a
>>> textfield.
>>> After the form has been submitted, I have a new url, saying :
>>> http://host/cgi-bin/test.cgi?my_textfield_content=azertyu
>>>
>>> OK.
>>> Now, I want the value of my_textfield_content appear as an encoded
>>> string, in the url, something like
>>> http://host/cgi-bin/test.cgi?my_textfield_content=VgHo98km==
>>
>> Why?
>>
>> (This is not a stupid question. What are you trying to do that makes you
>> think you need this?)
>
> 1. I need a url which does not contains some characters (like |) because
> these url , entered in a wiki (dokuwiki), are not interpreted correctly.
I think you have two options. You receive whatever characters the user
wants to submit, and then you tidy the information how you want when
you receive it, or you write some javascript that does the change when
the user clicks submit, the script does the submit after doing the
changes to the string - the problem with this is that any user with
half a clue will see what your code does and, if they so desire will
still find a way of submitting what they want.
> 2. url are send to some users. I do not want they modify these url (easy
> to do if in clear).
Security through obscurity is no security at all.
<URL:http://en.wikipedia.org/wiki/Security_through_obscurity>
>>> I have already written the (de)(en)coding function.
>>
>> I hope you mean 'I have already found MIME::Base64, and I know how to
>> use it'.
>>
> I use 8bits -> 7bits + encode_base64.
>
> sub cps {
> my $bstr = unpack('B*', $_[0]);
> $bstr =~ s/.(.{7})/$1/g;
> return pack('B*', $bstr);
> }
>
> sub ec{
> return encode_base64(cps($_[0]),"");
> }
>
> The problem is : where call ec ?
Call it when you receive the form back from the user.
Justin.
--
Justin C, by the sea.
------------------------------
Date: Wed, 09 Jan 2013 21:15:56 +0100
From: cwpbl <cwpbl@rf.oohay>
Subject: Re: CGI / postprocess of values
Message-Id: <50edcfff$0$1928$426a34cc@news.free.fr>
Le 08/01/2013 22:40, Henry Law a écrit :
> On 08/01/13 19:58, cwpbl wrote:
>> After the form has been submitted, I have a new url, saying :
>> http://host/cgi-bin/test.cgi?my_textfield_content=azertyu
>
> This is intrinsically what the "submit" function does; it sends to the
> server program the value of the field in the form.
>
>> Now, I want the value of my_textfield_content appear as an encoded
>> string, in the url, something like
>> http://host/cgi-bin/test.cgi?my_textfield_content=VgHo98km==
>
> Unless you write some local code (javascript or whatever) to encode the
> contents of "textfield" /before/ submission then it will be sent off
> unencoded.
>
> I think I have understood your question. But it would be better if you
> post test.cgi, so we can see what you're doing.
>
Thank you,
here are the interesting parts of the script :
use CGI;
...
my $filter = $q->param('filter') || ".";
...
sub cps {
my $bstr = unpack('B*', $_[0]);
$bstr =~ s/.(.{7})/$1/g;
return pack('B*', $bstr);
}
sub ec{
return encode_base64(cps($_[0]),"");
}
...
my $q = new CGI;
print
$q->header( "text/html" ),
$q->start_html( ... );
...
print
$q->start_form(-action=>"", -method=>'GET', name=>'filter_form' ),
$q->textfield(-name=>'filter',-default=>"", -size=>20),
$q->submit(-name=> "filter_btn", -value=>"OK"),
$q->end_form();
[...]
print
$q->table( ... ),
$q->end_html();
I want to have the value of filter param modified ( with ec funtion) in
the new url. How "insert" this processing in the script ?
------------------------------
Date: Wed, 09 Jan 2013 21:22:00 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: CGI / postprocess of values
Message-Id: <rM2dnRKtTLTkQnDNnZ2dnUVZ8r2dnZ2d@giganews.com>
On 09/01/13 20:15, cwpbl wrote:
> print
> $q->start_form(-action=>"", -method=>'GET', name=>'filter_form' ),
> $q->textfield(-name=>'filter',-default=>"", -size=>20),
> $q->submit(-name=> "filter_btn", -value=>"OK"),
> $q->end_form();
> [...]
>
> I want to have the value of filter param modified ( with ec funtion) in
> the new url. How "insert" this processing in the script ?
Both Justin C and I have already answered this question: have you missed
the articles?
When the user clicks that button its the contents of all fields in the
form will be transmitted to the server *exactly as they were typed in*;
if you want the contents of the "filter" field to be encoded *before*
sending then it has to be done in the browser, and you will have to do
it by means of javascript, invoked by the "onClick" method of the button.
Whether or not that's actually a good idea is a different matter.
--
Henry Law Manchester, England
------------------------------
Date: Wed, 9 Jan 2013 23:26:20 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: CGI / postprocess of values
Message-Id: <sta1s9-ssh2.ln1@anubis.morrow.me.uk>
Quoth Henry Law <news@lawshouse.org>:
> On 09/01/13 20:15, cwpbl wrote:
> > print
> > $q->start_form(-action=>"", -method=>'GET', name=>'filter_form' ),
> > $q->textfield(-name=>'filter',-default=>"", -size=>20),
> > $q->submit(-name=> "filter_btn", -value=>"OK"),
> > $q->end_form();
> > [...]
>
> >
> > I want to have the value of filter param modified ( with ec funtion) in
> > the new url. How "insert" this processing in the script ?
>
> Both Justin C and I have already answered this question: have you missed
> the articles?
>
> When the user clicks that button its the contents of all fields in the
> form will be transmitted to the server *exactly as they were typed in*;
> if you want the contents of the "filter" field to be encoded *before*
> sending then it has to be done in the browser, and you will have to do
> it by means of javascript, invoked by the "onClick" method of the button.
This is not strictly true. The other way to do this is to issue a
redirect to a URL with the encoded parameters.
I suspect the OP is trying to have the form submit directly to some
other piece of software (a wiki), so writing a CGI which takes the
direct submission and redirects the browser to the URL for the wiki
(with appropriate parameters) is probably the way to go. However, it's
not very clear.
Ben
------------------------------
Date: Wed, 09 Jan 2013 11:57:52 +0200
From: Dr Eberhard Lisse <nospam@lisse.NA>
Subject: Re: Date in CSV/TSV question
Message-Id: <50ED3F20.1040200@lisse.NA>
Thanks,
el
on 2013-01-08 18:35 ccc31807 said the following:
> On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
>> "07 Jan 2011" "TFR"
>> "05 Jan 2011" "DR">
>>
>> I need change the first field to look like>
>>
>> 2011-01-07 "TFR"
>> 2011-01-05 "DR"
>
> For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
> 1. my ($day, $month, $year) = split(/ /, $date);
> 2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);
>
> Line 1 splits your date string into the three components: day, month, year.
> Line 2 reassembles those three components and assigns the result back to $date.
> The hash table %mo2num looks like this:
> my %mo2num = (
> JAN => 1,
> FEB => 2,
> mar => 3,
> etc.
> );
>
> CC.
>
------------------------------
Date: Wed, 09 Jan 2013 12:09:49 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Date in CSV/TSV question
Message-Id: <8738yasgky.fsf@sapphire.mobileactivedefense.com>
Dr Eberhard Lisse <nospam@lisse.NA> writes:
> Thanks,
>
> el
>
> on 2013-01-08 18:35 ccc31807 said the following:
>> On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
>>> "07 Jan 2011" "TFR"
>>> "05 Jan 2011" "DR">
>>>
>>> I need change the first field to look like>
>>>
>>> 2011-01-07 "TFR"
>>> 2011-01-05 "DR"
>>
>> For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
>> 1. my ($day, $month, $year) = split(/ /, $date);
>> 2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);
>>
>> Line 1 splits your date string into the three components: day, month, year.
>> Line 2 reassembles those three components and assigns the result back to $date.
>> The hash table %mo2num looks like this:
>> my %mo2num = (
>> JAN => 1,
>> FEB => 2,
>> mar => 3,
>> etc.
>> );
And assuming the hash exists (I posted a command generating it two
times), the format can be transformed with a subsitution expression (I
also posted two times), namely
s/"(\d+)\s+(\S+)\s+(\d+)"/$3-$mo2num{$2}-$1/
------------------------------
Date: Tue, 8 Jan 2013 18:50:54 -0800 (PST)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Rehabilitating bad Perl code
Message-Id: <40368ab6-b6b0-4401-9c06-92390a2153c3@googlegroups.com>
On Tuesday, January 8, 2013 2:37:57 PM UTC-5, Henry Law wrote:
> I have inherited some code which has uses several horrible constructs, >=
=20
> such as splicing together completely unrelated assignments and >=20
> statements with commas, like
>=20
> inches thick of this stuff. Does anyone have any experience I could >=20
> draw on?
Indeed! I inherited several inches of Perl code (literally, the author died=
very suddenly) that two of us spent several weeks reading - to little avai=
l.
My partner took another position, and I rewrote the whole damn thing (and y=
es, the profanity is justified.)
Benefits:
1. We rewrote the network part in Java, which as it turned out a wise decis=
ion as the applications we interfaced with were Java and we benefited from =
the available classes.
2. We modularized the code, my first big experience in reducing a large cod=
e base to modules, which has benefited me since.
3. I intentionally attempted to rewrite the code in a functional style, whi=
ch has paid for itself in decreased maintenance time over the years.
4. Since I wrote the code, I know it inside out, which I would not have lea=
rned nearly as well simply by reading code someone else has written.
5. Having a negative example, I tried my best to use best practices, which =
taught me some good lessons.
My approach at least in the beginning was to write functions which accepted=
parameters and returned results using lexical variables as much as possibl=
e, and commented out the existing code. Often this exposed global variables=
, some of these literally a couple of KLOC away from the use of the global.=
Getting a handle on the globals really allowed me to increase the pace of =
my rewriting effort.
My best advice is not to look on this as a negative, but as a positive. You=
will be much better off seeing this as an opportunity to do good rather th=
an as a burden to carry.
CC.
------------------------------
Date: Wed, 09 Jan 2013 10:56:23 +0000
From: alexd <troffasky@hotmail.com>
Subject: Re: Rehabilitating bad Perl code
Message-Id: <kcjicn$2td$1@dont-email.me>
Ben Morrow (for it is he) wrote:
> Also, to get any benefit from this, you would have to turn on 'strict';
> this does more than just requiring you to declare your variables.
There's always
use strict "vars";
--
<http://ale.cx/> (AIM:troffasky) (UnSoEsNpEaTm@ale.cx)
10:55:13 up 23 days, 13:27, 9 users, load average: 0.96, 0.77, 0.73
Qua illic est reprehendit, illic est a vindicatum
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3857
***************************************