[31813] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3076 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Aug 14 18:09:26 2010

Date: Sat, 14 Aug 2010 15:09:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 14 Aug 2010     Volume: 11 Number: 3076

Today's topics:
    Re: Appropriate technique for altering a text file? <uri@StemSystems.com>
    Re: Appropriate technique for altering a text file? <uri@StemSystems.com>
    Re: Appropriate technique for altering a text file? <xhoster@gmail.com>
    Re: Appropriate technique for altering a text file? <m@rtij.nl.invlalid>
        code snippet to convolve 2 vectors <toralf.foerster@gmx.de>
    Re: code snippet to convolve 2 vectors <willem@turtle.stack.nl>
    Re: Does this match any number or just single digit one <rvtol+usenet@xs4all.nl>
    Re: Why this warning? <xhoster@gmail.com>
    Re: Why this warning? <kst-u@mib.org>
    Re: Why this warning? <ben@morrow.me.uk>
    Re: Why this warning? <here@softcom.net>
    Re: Why this warning? <uri@StemSystems.com>
    Re: Why this warning? <uri@StemSystems.com>
    Re: Why this warning? <here@softcom.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 13 Aug 2010 22:30:16 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Appropriate technique for altering a text file?
Message-Id: <87eie253k7.fsf@quad.sysarch.com>

>>>>> "TM" == Tad McClellan <tadmc@seesig.invalid> writes:

  TM> Uri Guttman <uri@StemSystems.com> wrote:
  >>>>>>> "PJH" == Peter J Holzer <hjp-usenet2@hjp.at> writes:
  >> 
  PJH> On 2010-08-13 18:14, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
  >> >> Uri would probably tell you [...]
  >> 
  PJH> I didn't see Uri's answer before I posted this. I swear!  :-)
  >> 
  >> great minds. :)

  TM> yes, but why were you and Peter both thinking the same thing?

  TM> :-)  :-)

oh, your mother is a python coder, and your father smells of java!

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Fri, 13 Aug 2010 23:34:49 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Appropriate technique for altering a text file?
Message-Id: <8739uh6f52.fsf@quad.sysarch.com>

>>>>> "c" == ccc31807  <cartercc@gmail.com> writes:

  c> On Aug 13, 1:29 pm, "Uri Guttman" <u...@StemSystems.com> wrote:
  >> parsing and text munging is much easier when the entire file is in
  >> ram. there is no need to mix i/o with logic, the i/o is much faster, you
  >> can send/receive whole documents to servers (which could format things
  >> or whatever), etc. slurping whole files makes a lot of sense in many
  >> areas.

  c> Most of what I do requires me to treat each record as a separate
  c> 'document.' In many cases, this even extends to the output, where
  c> one input document results in hundreds of separate output
  c> documents, each of which must be opened, written to, and closed.

it doesn't make a difference what you mostly do. it matters how to best
solve this problem. don't use the same technique to solve all
problems. hammers don't work well with screws.

  c> I'm not being difficult (or maybe I am) but I'm having a hard time
  c> seeing how this kind of logic which treats each record separately:

  c> while (<IN>)
  c> {
  c>   chomp;
  c>   my ($var1, $var2, ... $varn) = split;
  c>   #do stuff
  c>   print OUT qq("$field1","$field2",..."$fieldn"\n);
  c> }

if that is fine, then use it. speed can be an issue, state of line to
line data can be an issue, parsing multiline things can be an issue.

  c> or this:

  c> foreach my $key (sort keys %{$hashref})
  c> {
  c>   #do stuff using $hashref{$key}{var1}, $hashref{$key}{var2}, etc.
  c>   print OUT qq("$field1","$field2",..."$fieldn"\n);
  c> }

that has nothing to do with line by line vs slurping. also why are you
quoting scalars even in pseudo code? it ends with a newline outside a
string too.

  c> could be made easier by dealing with the entire file at once.

again it depends on the problem. try to parse a multiline structure line
by line vs slurping. it is much easier to do a single regex on the whole
file (in /g mode usually) and grab the structure then parse that. the
line by line method needs state (possibly using the .. op which i have
done plenty in this style), a variable to hold the stuff, a more complex
loop, etc. slurp style is just so much cleaner.

  c> Okay, this is the first time I have had to treat a single file as a
  c> unit, and to be honest the experience was positive. Still, my
  c> worldview consists of record oriented datasets, so I put this in my
  c> nice-to-know-but-not-particularly-useful category.

i would make it a known and very useful when needed tool. it is not how
you think but you just don't have experience seeing problems that are
better slurped. many things work fine line by line but just as many work
better slurping.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Fri, 13 Aug 2010 20:12:44 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: Appropriate technique for altering a text file?
Message-Id: <4c660e4c$0$23882$ed362ca5@nr5-q3a.newsreader.com>

ccc31807 wrote:
 ...

> All of the work I have done in the past has munged the lines one by
> one, as in the first example. Occasionally, I have had to use the
> second style (e.g., where the formatting of each line depends on the
> content of the preceding line.) I've never used the third style at
> all.
> 
> I liked the third way a lot. It seemed quick, easy, and worked
> perfectly. I was actually able to open the resulting document in Word,
> fancify it a little, and print a nice finished copy. However, I can't
> think of any actual uses of the third style in my day to day work.
> 
> My question is this: Is the third attempt, slurping the entire
> document into memory and transforming the text by regexs, very common,

I use the first method, line by line, if the lines are logically 
independent (the most common case), or usually if the dependence is 
simple and entirely backwards.  I use method 3, slurping (either into a 
scalar or an array) otherwise.  I only use method 2, keeping a look-back 
or ring buffer, if the file were so large or had the potential to become 
so large that slurping could threaten my memory.

> or is it considered a last resort when nothing else would work?

No, it is the middle method that I consider a last resort.

Xho


------------------------------

Date: Sat, 14 Aug 2010 09:10:39 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Appropriate technique for altering a text file?
Message-Id: <f4shj7-slo.ln1@news.rtij.nl>

On Fri, 13 Aug 2010 17:56:08 -0700, sln wrote:

> On Fri, 13 Aug 2010 10:08:48 -0700 (PDT), ccc31807 <cartercc@gmail.com>
> wrote:
> 
>>My question is this: Is the third attempt, slurping the entire document
>>into memory and transforming the text by regexs, very common, or is it
>>considered a last resort when nothing else would work?
>>
>>
> The answer is no to slurping, and no to using regex's on large documents
> that don't need to be all in memory.
> 
> There is usually a single drive (say raid). Only one i/o operation is
> performed at a time. If hogged, the other processes will wait until the
> hog is done and thier i/o is dequed, done and returned.
> The speeds of modern sata2, raid configured drives work well when
> reading/writing incremental data, it should always be used this way on
> large data that can be worked on incrementally. The default buffer on
> read between the api and the device is usually small, so as to not clog
> up device i/o and spin locks. So its still going to be incremental.

Utter BS. Doing incremental reads under load will result in a lot of 
seeking so leads to a degradation of performance. Slurping the file is 
much more efficient. 

> A complex regex will perform larger back tracking on large data then on
> smaller data. So it depends on the type and complexity.

True, but with modern fast machines the trade off between programmer time 
and computer time more often falls in favor of using more machine time. 
Only when it proves to slow should you optimize.
 
> The third reason is always memory. Sure, there is a lot of memory, but
> to hog it all, bogs down background file cacheing and other processing.

Also true, but text files are often much smaller than memory. However, 
this is the only thing you really have to think about up front.

M4


------------------------------

Date: Sat, 14 Aug 2010 15:52:52 +0200
From: Toralf =?UTF-8?B?RsO2cnN0ZXI=?= <toralf.foerster@gmx.de>
Subject: code snippet to convolve 2 vectors
Message-Id: <i4673k$pq8$1@news.eternal-september.org>

I'm looking for perl code examples to get the convoltion of 2 arrays eg. 
like this :

[1, 2] x [3, 4] = [ 3, 10, 8]

-- 
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3


------------------------------

Date: Sat, 14 Aug 2010 14:13:34 +0000 (UTC)
From: Willem <willem@turtle.stack.nl>
Subject: Re: code snippet to convolve 2 vectors
Message-Id: <slrni6d94e.8rh.willem@turtle.stack.nl>

Toralf F??rster wrote:
) I'm looking for perl code examples to get the convoltion of 2 arrays eg. 
) like this :
)
) [1, 2] x [3, 4] = [ 3, 10, 8]

What have you got so far, and what are
the bits that you're having trouble with ?


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: Sat, 14 Aug 2010 12:18:35 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: Does this match any number or just single digit ones?
Message-Id: <4c666d7b$0$22935$e4fe514c@news.xs4all.nl>

Thomas Andersson wrote:

> Will this line update $page with a number if it's double digit (ie 10+) or 
> does the code need to change?
> 
> my ($page) = $page_number_txt =~ /PAGE (\d) >/; 

Also, be aware that \d is not the same as [0-9].


There are over 200 digit code points, see the output of:

perl -CO -we '$| = 1;
   my $i;
   $SIG{__WARN__} = sub{ die "$i:$_[0]" };
   eval{ chr() || 1 }
     and chr() =~ /\d/
     and printf qq{%3s: %c [%x]\n}, ++$i, ($_)x2
       for 0 .. 0x1FFFD
' 2>&1 |less

-- 
Ruud


------------------------------

Date: Fri, 13 Aug 2010 18:18:46 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: Why this warning?
Message-Id: <4c65ed7f$0$23877$ed362ca5@nr5-q3a.newsreader.com>

Ben Morrow wrote:
> Quoth Sal <here@softcom.net>:
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>>
>> my %sum = {};
> 
> This line is wrong. {} creates a new anonymous hashref; this is then
> stringified and inserted into the hash as a key with no value.

And my version of perl even generates the warning "Reference found where 
even-sized list expected".

Since the Sal seems to be using a newer Perl than mine (5.8.8), I would 
think he would have gotten a similar warning.

Xho


------------------------------

Date: Fri, 13 Aug 2010 18:53:54 -0700
From: Keith Thompson <kst-u@mib.org>
Subject: Re: Why this warning?
Message-Id: <lnr5i2ufgt.fsf@nuthaus.mib.org>

Sal <here@softcom.net> writes:
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my %sum = {};
> for (my $i = 1; $i <= 6; $i++) {
>   for (my $j = 1; $j <= 6; $j++) {
>     for (my $k = 1; $k <= 6; $k++) {
>       my $tot = $i+$j+$k;
>       my $key = "$i " . "$j " . "$k ";
>       $sum{$key} = $tot;
>       print "$i " . "$j " . "$k " . "  $tot\n";
>     }
>   }
> }
>
> foreach my $key (sort keys %sum) {
>   print "$key => $sum{$key}\n";
> }
>
> When the above is executed it first prints the entire hash, then
> returns the error:
>
> Use of uninitialized value $sum{"HASH(0x95fe818)"} in concatenation
> (.) or string at ./3dice.pl line 19.
> HASH(0x95fe818) =>
>
> Why is the last hash value blank?

I get:

Reference found where even-sized list expected at ./tmp.pl line 6.
[... contents of hash...]
Use of uninitialized value $sum{"HASH(0x9c6b880)"} in concatenation (.)
or string at ./tmp.pl line 19.

Probably the first warning scrolled off your screen before you were
able to see it.  Changing 6 to 2 would have avoided that problem; so
would directing the output to a file or sending stdout to /dev/null
(or your system's equivalent) so you can see the warnings on stderr.

The problem is that {} isn't an empy list; it's a reference to
an empty hash.  As a reference, it's a single scalar, so you're
assigning an odd number of elements (namely 1) to your hash.

(What's happening, I think, is that you get a hash with a single
key/value pair; the key is a stringized hash reference and the
value is undef.  But the details don't really matter much, since
it's wrong in the first place.)

Change {} to () and you should be ok.

Incidentally, rather than:
    my $key = "$i " . "$j " . "$k ";
I would have written:
    my $key = "$i $j $k ";
(and probably left off the trailing blank).

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Nokia
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"


------------------------------

Date: Sat, 14 Aug 2010 03:13:09 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Why this warning?
Message-Id: <lmahj7-ie72.ln1@osiris.mauzo.dyndns.org>


Quoth Keith Thompson <kst-u@mib.org>:
> Sal <here@softcom.net> writes:
> >       my $tot = $i+$j+$k;
> >       my $key = "$i " . "$j " . "$k ";
> >       $sum{$key} = $tot;
> 
> Incidentally, rather than:
>     my $key = "$i " . "$j " . "$k ";
> I would have written:
>     my $key = "$i $j $k ";
> (and probably left off the trailing blank).

Oh, but why do that when you can use an obscure Perl 4 feature instead?

    $sum{$i, $j, $k} = $i + $j + $k;

(Set $; to taste...)

:)

Ben


------------------------------

Date: Fri, 13 Aug 2010 20:00:24 -0700 (PDT)
From: Sal <here@softcom.net>
Subject: Re: Why this warning?
Message-Id: <b0e2dfb8-e930-41ea-96a2-ca097510ee7d@a4g2000prm.googlegroups.com>

Thanks guys.


------------------------------

Date: Fri, 13 Aug 2010 23:37:59 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Why this warning?
Message-Id: <87y6c950fc.fsf@quad.sysarch.com>

>>>>> "S" == Sal  <here@softcom.net> writes:

  S> #!/usr/bin/perl
  S> use strict;
  S> use warnings;

good.

  S> my %sum = {};

very bad and wrong and makes a warning. {} is an anon hash reference,
not how you clear a hash. first off you don't need to assign to a my
hash as it will be empty when declared. if you do need to clear it you
assign the empty list () to it.

  S> for (my $i = 1; $i <= 6; $i++) {

for my $i ( 1 .. 6 ) {

much clearer and also faster.

  S>   for (my $j = 1; $j <= 6; $j++) {

ditto

  S>     for (my $k = 1; $k <= 6; $k++) {

ditto

  S>       my $tot = $i+$j+$k;

ever heard of white space?

  S>       my $key = "$i " . "$j " . "$k ";

ever heard of interpolation? "$i $j $k ";

  S>       $sum{$key} = $tot;
  S>       print "$i " . "$j " . "$k " . "  $tot\n";

ditto

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Fri, 13 Aug 2010 23:41:12 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Why this warning?
Message-Id: <87tymx509z.fsf@quad.sysarch.com>

>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:

  BM> Quoth Keith Thompson <kst-u@mib.org>:
  >> Sal <here@softcom.net> writes:
  >> >       my $tot = $i+$j+$k;
  >> >       my $key = "$i " . "$j " . "$k ";
  >> >       $sum{$key} = $tot;
  >> 
  >> Incidentally, rather than:
  >> my $key = "$i " . "$j " . "$k ";
  >> I would have written:
  >> my $key = "$i $j $k ";
  >> (and probably left off the trailing blank).

  BM> Oh, but why do that when you can use an obscure Perl 4 feature instead?

  BM>     $sum{$i, $j, $k} = $i + $j + $k;

the hell with obscure! i lived on using pseudo multilevel hashes in
perl4. i did a major project where i did tons of that, used globs and
symrefs too to manage a massive tree, scan it and print stuff. i even
gave a talk about it at yapc::montreal. i still have the code from 17
years ago. not too useful now but nostalgic for sure.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Sat, 14 Aug 2010 09:02:34 -0700 (PDT)
From: Sal <here@softcom.net>
Subject: Re: Why this warning?
Message-Id: <3af2f098-6178-4083-ac35-5337014350cf@l25g2000prn.googlegroups.com>


> =A0 S> for (my $i =3D 1; $i <=3D 6; $i++) {
>
> for my $i ( 1 .. 6 ) {
>
> much clearer and also faster.
>
####### Excellent idea!!!!

> =A0 S> =A0 for (my $j =3D 1; $j <=3D 6; $j++) {
>

> =A0 S> =A0 =A0 =A0 my $key =3D "$i " . "$j " . "$k ";
>
> ever heard of interpolation? "$i $j $k ";
>
######## Made that change too.

Thanks to all on this forum who have taken the time to assist me. You
guys are great! After C/C++, Java, PHP, and dabbling a little in
Python, Perl is becoming my favorite language. I honestly don't
understand all the hype about Python.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3076
***************************************


home help back first fref pref prev next nref lref last post