[22522] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4743 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Mar 22 06:06:07 2003

Date: Sat, 22 Mar 2003 03:05:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 22 Mar 2003     Volume: 10 Number: 4743

Today's topics:
    Re: array namen dynamisch generieren (=?ISO-8859-1?Q?Ulrich_G=F6dert?=)
    Re: Data 'thinning' with Perl MySQL <goldbb2@earthlink.net>
    Re: doubts on \n (Xavier Noria)
        Idea for holding a C++ pointer in a Perl scalar with C+ <im_not_giving_it_here@i_hate_spam.com>
    Re: Memory used by hashes <kp@iwerx.com>
    Re: Memory used by hashes <goldbb2@earthlink.net>
    Re: Memory used by hashes (Anno Siegel)
    Re: Perl querystring encoding  question <mbudash@sonic.net>
    Re: Perl querystring encoding  question <NoSpamPleaseButThisIsValid2@gmx.net>
    Re: print to file safley <peakpeek@purethought.com>
    Re: print to file safley <tassilo.parseval@rwth-aachen.de>
    Re: Problem using CGI.pm and SSL. <peakpeek@purethought.com>
    Re: Problem with Getopt::Std and getopts().... <bik.mido@tiscalinet.it>
        Putting/accessing an object from an array - help! <im_not_giving_it_here@i_hate_spam.com>
    Re: regexp question <grazz@nyc.rr.com>
    Re: regexp question <kp@iwerx.com>
    Re: regexp question <mbudash@sonic.net>
    Re: regexp question <mike@luusac.co.uk>
    Re: regexp question <bcaligari@fubar.fireforged.com>
        regexp's to rip html file <tunmaster@hotmail.com>
    Re: regexp's to rip html file (Tony L. Svanstrom)
    Re: regexp's to rip html file <res1uzbe@verizon.net>
    Re: regexp's to rip html file <tunmaster@hotmail.com>
    Re: Text::ParseWords or Text::CSV (david)
    Re: Text::ParseWords or Text::CSV <goldbb2@earthlink.net>
    Re: Working with Objects in Perl. <kp@iwerx.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 22 Mar 2003 00:48:51 -0800
From: google_groops@goedert.org (=?ISO-8859-1?Q?Ulrich_G=F6dert?=)
Subject: Re: array namen dynamisch generieren
Message-Id: <fcb5da4a.0303220048.11322c75@posting.google.com>

Okay, okay, I cought the wrong group (because of language).
The problem I brood about is the following.

I plan to generate several lists (arrays) out of another
list/database.
My idea was that I put the data (eg the results of a select) in an
array. If I would need only one array it is no problem. But I need
three or more arrays (and I don't know how much elements will be in
the array).

The problem now is to create the names of the arrays. So I got the
idea to try something like $t1=substr($var,1,1) (eg $var could be name
of a table,row or clumn - or something else ) and a constant part like
"foo". So I want to get the array @afoo, @bfoo, ...

Later on I want to do something like

[...]
while(a_condition){ 
        push (@afoo, $new_varable);
}
while(b_condition){ 
        push (@bfoo, $new_varable);
}
[...]

greetings.


------------------------------

Date: Fri, 21 Mar 2003 23:27:41 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Data 'thinning' with Perl MySQL
Message-Id: <3E7BE63D.9769612F@earthlink.net>



Steve wrote:
> 
> Benjamin Goldberg <goldbb2@earthlink.net> wrote in message news:<3E78FDEB.114F0C95@earthlink.net>...
> > Steve wrote:
> > >
> > > Dear All
> > >
> > > I have a problem which might be of interest:
> > >
> > > I have an application based on perl and MySQL that logs data from a
> > > series of instruments, logging interval ranges from minutes to hours
> > > on different instruments. I don't need to keep it all forever but I
> > > need to 'thin' it in a defined way. For example, on data over 1 week
> > > old reduce observations to a max of one record per hour per
> > > instrument, over two weeks old reduce to 1 per 2 hours etc etc. This
> > > would allow me to see recent performance in great detail, provide a
> > > historical record and provide an indefinite archive of sample values.
> > >
> > > I cannot come-up with an algorithm that does this in a remotely
> > > elegant way. Either I read everything into an array, thin that and
> > > then rewrite it - I don't like that because I worry about the size of
> > > the array in future developments, or I repeatedly read the table,
> > > delete records, reread the table in a very ugly way - which I presume
> > > to have huge overheads. In particularly I don't want a solution that
> > > depends on being run only once a day or once a week!
> >
> > The following code assumes that the data contains a 'date' column, which
> > is the time that the record was created, and which when fetched produces
> > an integer in seconds since the epoch.
> >
> >    $dbh->{FetchHashKeyName} = 'NAME_lc';
> >
> >    # Sort data cronologically.
> >    my $get = $dbh->prepare( q[SELECT * FROM table ORDER BY date] );
> >    $get->execute;
> >    my @cols = @{ $get->{COLS} };
> >
> >    my $del = $dbh->prepare( q[DELETE FROM table WHERE ]
> >       . join(" AND ", "$_ = ?", @cols ) );
> >
> >    # Last row examined for a particular instrument type.
> >    my %coalesce;
> >
> >    my $today = time() % (60 * 60);
> >
> >    while( my $rec = $dbh->fetchrow_hashref ) {
> >       use integer; # avoid need for int() all over the place.
> >
> >       # how many weeks ago was this record?
> >       my $recweek = ($today - $rec->{date}) / (60 * 60 * 24 * 7);
> >
> >       # no thinning for records less than a week old.
> >       next if $recweek < 1;
> >
> >       my $other = $coalesce{$rec->{instrument}} ||= {};
> >       if( !%$other ) {
> >          %$other = %$rec;
> >          next;
> >       }
> >
> >       my $interval = 60 * 60 * 2 ** $recweek;
> >
> >       my $o_inter = $other->{date} / $interval;
> >       my $r_inter = $rec->{date} / $interval;
> >       if( $o_inter == $r_inter ) {
> >          # delete the more recent of the two.
> >          $del->execute( @{$rec}{@cols} );
> >          # or, instead, do:
> >          # $del->execute( @{$other}{@cols} );
> >          # %$other = %$rec;
> >          # , to delete the older of the two.
> >       } else {
> >          %$other = %$rec;
> >       }
> >    }
> >
> > [untested]
[snip]
> 
> The approach above is certainly neater than anything I have come-up
> with, but I will have to think about just how it will behave !

If the algorithm were written for only one type of instrument, then
psuedocode for it might be:

   $that = undef;
   foreach $this (all records, sorted by time) {
      if( not defined $that ) {
         $that = $this;
         next;
      }
      $interval = interval_length($this);
      if( in_same_interval( $this, $that, $interval ) {
         delete_record($this);
      } else {
         $that = $this;
      }
   }

The algorithm is a *tad* more complicated than this for a few reasons:
  1/ There are multiple instrument types, and each one has to have it's
     own personal $that.
  2/ The $sth->fetchrow_hashref method will always return the *same*
     hashref, but with different contents each time, so I can't simply
     store it into [the equivilant of] $that directly.

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$@[$a%6
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}


------------------------------

Date: 22 Mar 2003 01:07:52 -0800
From: fxn@hashref.com (Xavier Noria)
Subject: Re: doubts on \n
Message-Id: <31a13074.0303220107.47214a84@posting.google.com>

Benjamin Goldberg <goldbb2@earthlink.net> wrote in message news:<3E79FBF7.188C2A85@earthlink.net>...
> Xavier Noria wrote:
> > * Is (default) $/ eq "\n" true in all platforms?
> 
> This is true regardless of binmode.

Just wanted to note that it seems Stein's book on network programming
with Perl is wrong in that point since on page 13 says "The global
variable $/ contains the current character, or sequence of characters,
used to signal the end of line. By default, it is set to [...]
\015\012 on Windows and DOS systems."

I have not found a place where newlines are so well explained than
this thread, maybe I should summarize everything and upload it
somewhere.

-- fxn


------------------------------

Date: Sat, 22 Mar 2003 09:07:01 +0000
From: Asfand Yar Qazi <im_not_giving_it_here@i_hate_spam.com>
Subject: Idea for holding a C++ pointer in a Perl scalar with C++ type information
Message-Id: <b5h917$7h8$1@news8.svr.pol.co.uk>

Hello

I'm new to perlguts, so forgive my mistakes.

I'll reiterate the subject: Idea for holding a C++ pointer in a Perl 
scalar with C++ type information

The basic concept is to use a double typed SV: the UV holds the pointer 
value, and the PV holds the type name (as obtained from type_info).

The SV is then blessed into the 'CXXPtr' class, so that simply any 
double-typed SV is not used to represent a C++ pointer.

Then, when a perl sub that wraps a C++ function is called, it checks 
that the type of the CXXPtr given matches the one the sub expects, and 
thus calls the C++ function with the pointer value contained the the UV 
part of the argument.

Thoughts?




------------------------------

Date: Sat, 22 Mar 2003 05:04:21 +0100
From: Kostis P <kp@iwerx.com>
Subject: Re: Memory used by hashes
Message-Id: <3e7be160_3@news.bluewin.ch>

Hi Fred.

Passing shot...

If you need to have huge amounts of data 'ready to cook', memory is an
issue but speed comes second, then consider using tied hashes.

If I remember the syntax this should work

my $DB_HASH = new DB_File::HASHINFO;
$DB_HASH->{'cachesize'} = 10 * 1000000; #adjust
$DB_HASH->{'nelem'} = 20 * 1000; #adjust

use DB_File;
tie %hash, "DB_File", 'filename.db',O_CREAT|O_RDWR,0666,$DB_HASH ||
"error can't tie hash\n";

You need to have the Berkeley DB installed on your system which a lot of
linux dists have by default anyway.

Regards...


Fr€d wrote:
> Darren Dunham wrote:
> 
>>Fr€d <nospam@euro.com> wrote:
>>
>>>I am running a simple perl program that sums 5 variables over 100Gb of
>>>data.  I have 5 hashes that use the same 12 character key.  There are
>>>about 4 million keys.  Basically the program is:
>>
>>>while (<INPUT>) {
>>>  $key =substr($_,0,12);
>>>  extract 5 numbers from $_;
>>>  $sum1{$key} ++;
>>>  $sum2{$key} += $val;
>>>  ... (for all 5 sums)
>>>}
>>
>>That's not 5 sums, that's 5 * 4 million sums.  Yes?  Is that what you
>>expect?
> 
> 
>   yes
>  
> 
>>The "extract 5 numbers from $_" isn't perl, and I have no idea what you
>>intend by that.  $val is never initialized, and I can't really guess
>>what really happens in the ...(for all 5 sums) section.
> 
> 
> 	well, it was more pseudo code; I am summing 5 variable per key
>  	and tried to make the example short
> 
> 
>>Could you...
>>
>>1) show some sample data?
>>2) show the actual code within that loop?
>>
>>
>>>Can anyone explain why this is using so much memory?  Even given that
>>>all variables are double precision, I can only come up with about 20% of
>>>what's being used.  I thought all hashes that used the same key were
>>>stored in one place; this seems to indicate that this isn't the case.
>>
>>Where did you hear that?  I don't know what it would mean anyway.  If
>>you have multiple hashes, then you have to store multiple values, and
>>that takes space.
> 
> 
> $ perldoc perltoot
>   (Arrays as Objects subsection):
> 
>    A hash representation takes up more memory than an array
> |representation because you have to allocate memory for the keys as well
> as for the values.  However, it really isn't that bad, especially since
> as of version 5.004,  memory is only allocated once for a given hash
> key, no matter how many hashes have that key.	
>  
> or:
> http://www.cs.berkeley.edu/~loretta/perl/nmanual/pod/perltoot/Arays_as_Objects.html
> 
> 
>>Bascially native perl variables are very easy to use, very flexible, but
>>not very space efficient.
> 
> 
> I see that now, thanks
> 
> 
>>I can guess some alternatives that might work (use one hash instead of
>>five with array refs or something even more space efficient), but I'm
>>not convinced I have any idea about what you actually want yet.
> 
> 
> 	Well, I don't have any problems running it, I was just curious about
> the memory usage.
> 
> thanks



------------------------------

Date: Fri, 21 Mar 2003 23:51:21 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Memory used by hashes
Message-Id: <3E7BEBC9.57D384BA@earthlink.net>

"Fr€d" wrote:
> 
> I am running a simple perl program that sums 5 variables over 100Gb of
> data.  I have 5 hashes that use the same 12 character key.  There are
> about 4 million keys.  Basically the program is:
> 
> while (<INPUT>) {
>   $key =substr($_,0,12);
>   extract 5 numbers from $_;
>   $sum1{$key} ++;
>   $sum2{$key} += $val;
>   ... (for all 5 sums)
> }
[snip]

For reducing memory usage, consider the following:

my %allsums;
while( <> ) {
   my $key = substr($_, 0, 12);
   my $sums = \$allsums{$key};
   my @vals = split ' ', substr $_, 12; # or whatever.
   vec( $$sums, $_, 32 ) += $vals[$_] for 0..4;
   vec( $$sums, 5 , 32 ) += 5;
}
while( my ($key, $vals) = each %allsums ) {
   my @vals = unpack 'N', $vals;
   my $count = pop @vals;
   print "For key $key [$count instances], the sums were @vals\n";
}
__END__
[untested]

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$@[$a%6
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}


------------------------------

Date: 22 Mar 2003 10:02:06 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Memory used by hashes
Message-Id: <b5hcau$bsv$1@mamenchi.zrz.TU-Berlin.DE>

Benjamin Goldberg  <goldbb2@earthlink.net> wrote in comp.lang.perl.misc:
> "Fr€d" wrote:
> > 
> > I am running a simple perl program that sums 5 variables over 100Gb of
> > data.  I have 5 hashes that use the same 12 character key.  There are
> > about 4 million keys.  Basically the program is:
> > 
> > while (<INPUT>) {
> >   $key =substr($_,0,12);
> >   extract 5 numbers from $_;
> >   $sum1{$key} ++;
> >   $sum2{$key} += $val;
> >   ... (for all 5 sums)
> > }
> [snip]
> 
> For reducing memory usage, consider the following:
> 
> my %allsums;
> while( <> ) {
>    my $key = substr($_, 0, 12);
>    my $sums = \$allsums{$key};
>    my @vals = split ' ', substr $_, 12; # or whatever.
>    vec( $$sums, $_, 32 ) += $vals[$_] for 0..4;
>    vec( $$sums, 5 , 32 ) += 5;
                              ^
s/5/1/, if I remember the OP right.

> }

Yes, combining scalars is the way to go here.  However, the OP
says "...all variables are double precision".  If so, vec() won't
do, something like "(un)pack 'd*', ..." will have to go in its place.

[...]

Anno


------------------------------

Date: Sat, 22 Mar 2003 02:38:20 GMT
From: Michael Budash <mbudash@sonic.net>
Subject: Re: Perl querystring encoding  question
Message-Id: <mbudash-835F91.18381921032003@typhoon.sonic.net>

In article <vjfn7vg7tvh6qo6al03pq1ieunu8g8evbk@4ax.com>,
 Sharon Grant <peakpeek@purethought.com> wrote:

> On Fri, 21 Mar 2003 00:30:30 GMT, in comp.lang.perl.misc, Michael Budash 
> <mbudash@sonic.net> wrote:
> 
> >In article <907e039c.0303201234.1de824c4@posting.google.com>,
> > rdodd@xltech.net (Robert Dodd) wrote:
> >
> >> I am trying to pass values in a querystring that come from a a flat
> >> file db extracted using Perl (see code below) Can anyone tell me how
> >> to encode the variable encoding querystring variables $job_title?
> >> right now it get truncated because it contains a / the code is below
> >
> >[snip unnecessary code example]
> >
> >> application.shtml?email=$contact_email&state=$state&position=$job_title
> >
> >[snip rest of unnecessary code example]
> >
> >if you're using the CGI perl module:
> >
> >my $href = 'application.shtml?email=' . 
> >            CGI::escape($contact_email) .
> >            '&state=' .
> >            CGI::escape($state) .
> >            '&position=' .
> >            CGI::escape($job_title);
> 
> 
> '&' is not valid HTML here

who's talking about html?? and even if we were, '&' is perfectly valid...

> 
> CGI.pm has a query_string() method, which calls its escape() 
> method, and then constructs the query string ...
> 
> 
> #!/usr/bin/perl -T
> 
> use strict;
> use warnings;
> 
> use CGI;
> 
> my ($contact_email, $state, $job_title)
>         = ('rdodd@xltech.net', 'Florida', 'Analyst/Programmer');
> 
> my $query = new CGI ({'email'=>$contact_email,
>         'state'=>$state,
>         'position'=>$job_title});
> 
> print 'application.shtml?', $query->query_string, "\n";
> 

ok - another way, certainly valid and for some, preferable. yes, it 
produces a url that uses ';' to separate the name/value pairs, but '&' 
is also valid, AFAIK...

or am i missing something here?


------------------------------

Date: Sat, 22 Mar 2003 11:07:23 +0100
From: Wolf Behrenhoff <NoSpamPleaseButThisIsValid2@gmx.net>
Subject: Re: Perl querystring encoding  question
Message-Id: <3E7C35DB.E3219B27@gmx.net>

Michael Budash wrote:
> 
> In article <vjfn7vg7tvh6qo6al03pq1ieunu8g8evbk@4ax.com>,
>  Sharon Grant <peakpeek@purethought.com> wrote:
> 
> > On Fri, 21 Mar 2003 00:30:30 GMT, in comp.lang.perl.misc, Michael Budash
> > <mbudash@sonic.net> wrote:
> > >if you're using the CGI perl module:
> > >
> > >my $href = 'application.shtml?email=' .
> > >            CGI::escape($contact_email) .
> > >            '&state=' .
> > >            CGI::escape($state) .
> > >            '&position=' .
> > >            CGI::escape($job_title);
> >
> > '&' is not valid HTML here
> 
> who's talking about html?? and even if we were, '&' is perfectly valid...

No. Try validator.w3.org!
You will get something like
> Line 10, column 24: cannot generate system identifier for general entity "b"
> 
>   <a href="test.pl?a=test&b=test"></a>
>                           ^

> > CGI.pm has a query_string() method, which calls its escape()
> > method, and then constructs the query string ...
> 
> ok - another way, certainly valid and for some, preferable. yes, it
> produces a url that uses ';' to separate the name/value pairs, but '&'
> is also valid, AFAIK...
> 
> or am i missing something here?

You have to use &amp; to validate your page. Or simply trust the CGI
module.

This should be discussed in a html newsgroup (hm... there als lots of
groups with 'html' in the name - I don't know which group fits best for
a f'up)

Wolf



------------------------------

Date: Sat, 22 Mar 2003 05:22:23 +0000
From: Sharon Grant <peakpeek@purethought.com>
Subject: Re: print to file safley
Message-Id: <52rn7v8pg48ru0uo6hhhu2rtcuvec2pu41@4ax.com>

On Fri, 21 Mar 2003 12:43:06 GMT, in comp.lang.perl.misc, "Blnukem" <blnukem@hotmail.com> wrote:

>I'm working an a msg board for my site and I was just wondering if this is
>acceptable (safe) way to add to the data base without the chance of loosing
>any data. One of my other concerns is what if two users hit the submit ant
>the same time and the @new_post is changed to the data other file any ideas?
>
>snipped:
>
>open (REPLY, "<data/msgboard/$FORM{'forum'}/$FORM{'file'}.dat");
>my @new_post = <REPLY> ;
>close(REPLY);
>
>my $posted =
>"$FORM{'file'}|FORM{'poster'}|FORM{'email'}|$date|$time|$FORM{'reply_msg'}\r
>";

Why the "\r" here?

If any of the input fields contains '|' or "\n" your text file 
will break


>push( @new_post, $posted);
>
>open (REPLY, ">data/msgboard/$FORM{'forum'}/$FORM{'file'}.dat");

This is a security risk


>flock(REPLY, 2);
>
>print REPLY @new_post;

I think this will break your text file


>flock(REPLY, 8);
>close(REPLY);

Use the Tie::File module and use its flock() method ...

use Tie::File;
my @new_post;
my $o = tie @new_post, 'Tie::File', '/msgboardpath/msgboardfile';
$o->flock;
push (@new_post, $posted);
undef $o;
untie @conn;


You still need to:
- escape any '|' and "\n" characters in your input data
- not allow user-entered data in your file name

-- 
Sharon


------------------------------

Date: 22 Mar 2003 07:40:44 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: print to file safley
Message-Id: <b5h41s$kue$1@nets3.rz.RWTH-Aachen.DE>

Also sprach Tina Mueller:

> Tassilo v. Parseval <tassilo.parseval@rwth-aachen.de> wrote:
>> Also sprach Tina Mueller:
> 
>>> sure, but the OP is reading the file, closing it and opening it again.
>>> even if you do both reading and writing with flock(), in that
>>> time in between the two open()/close() there is *no* lock at all.
> 
>> Well, true, but there is no I/O happening in this program either in
>> these moments. After releasing the lock for a few moments, another
>> process could aquire a fresh lock. So when the program opens the file
>> again it wont get the lock so it will block.
> 
> but "these moments" are relative:

[...]

Sorry, you are of course right. I only focused on the flock()s and not
on the program logic and so overlooked that the OP's program contains a
race-condition by first slurping the file in, closing and reopening it
and after that writing out the new content. Anything could happen in
between.

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


------------------------------

Date: Sat, 22 Mar 2003 04:40:50 +0000
From: Sharon Grant <peakpeek@purethought.com>
Subject: Re: Problem using CGI.pm and SSL.
Message-Id: <a9pn7v0vk8rfuiegiuep33nh0do09n9pj2@4ax.com>

On Fri, 21 Mar 2003 15:42:41 +0000 (UTC), in comp.lang.perl.misc, Hemant Shah <shah@typhoon.xnet.com> wrote:

>While stranded on information super highway Hemant Shah wrote:
>
>If I use http://www.mysite.com/tst.pl it displays the header "This is a test"
>If I use https://www.mysite.com/tst.pl I get "Premature end of script headers".
>error.

On Fri, 21 Mar 2003 15:22:04 +0000 (UTC), in comp.lang.perl.misc, Hemant Shah <shah@typhoon.xnet.com> wrote:

>  For testing purposes I soft linked https directory to http directory, so 
>  the file is same

>Did the sys admin miss something while configuring Apache or perl CGI.pm?

It is possible that you can not run CGI scripts from a symlinked 
directory. Is the server Apache? If so, has it been configured to 
use suexec? You might need to check the suexec log

Does the script work if you remove the symlink and put the script 
in the https directory?

It might be simpler to use the same directory for both http and 
https

-- 
Sharon


------------------------------

Date: Sat, 22 Mar 2003 10:15:19 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Problem with Getopt::Std and getopts()....
Message-Id: <ccem7v8dkmvh2c15taul2hb3osjhr007tg@4ax.com>

On 20 Mar 2003 16:36:36 -0800, clearguy02@yahoo.com (John Smith)
wrote:

>Now when I run the following command at CMD,
>
>C:\>test.pl 1

BTW: can you really run it like this (on Win* - as it seems)?!?


Michele
-- 
>It's because the universe was programmed in C++.
No, no, it was programmed in Forth.  See Genesis 1:12:
"And the earth brought Forth ..."
- Robert Israel on sci.math, thread "Why numbers?"


------------------------------

Date: Sat, 22 Mar 2003 10:29:43 +0000
From: Asfand Yar Qazi <im_not_giving_it_here@i_hate_spam.com>
Subject: Putting/accessing an object from an array - help!
Message-Id: <b5hds9$19g$1@news7.svr.pol.co.uk>

Here's the problem:

#!/usr/local/bin/perl -w


package Person;
use strict;

sub new
{
	my $proto = shift();
	my $class = ref($proto) || $proto;
	my $self = {};
	$self->{NAME} = undef;
	$self->{AGE} = undef;
	$self->{PEERS} = [];
	bless($self, $class);
	return $self;
}

sub name
{
	my $self = shift;
	if (@_) { $self->{NAME} = shift; }
	return $self->{NAME};
}

sub age
{
	my $self = shift;
	if (@_) { $self->{AGE} = shift; }
	return $self->{AGE};
}

sub peers
{
	my $self = shift;
	if (@_) { @{ $self->{PEERS} } = @_; }
	return @{ $self->{PEERS} };
}

1;  # package Person;

package main;
use strict;

my @all_recs = [];

my $me = Person->new();
$me->name("Asfand Yar");
$me->age(22);
$me->peers("Naz", "Amjad");

push(@all_recs, $me);

foreach my $i (@all_recs) {
	printf("%s is %d years old.\n", $i->name(), $i->age());
	print("His peers are: ", join(", ", $i->peers()), "\n");
}



And sorry for being so ignorant.

Asfand Yar

asfand at email.com



------------------------------

Date: Sat, 22 Mar 2003 03:18:44 GMT
From: Steve Grazzini <grazz@nyc.rr.com>
Subject: Re: regexp question
Message-Id: <oCQea.1383$iE4.1116926@twister.nyc.rr.com>

Michael Budash <mbudash@sonic.net> wrote:
>"Mike" <mike@luusac.co.uk> wrote:
>> I am trying to read in a file line by line and then 
>> print / assign to an array the line following one matched 
>> by a regexp [...]
>
> while (<FILE>) {
>    if (/#$/) {
>       <FILE>;
>       print;

That probably ought to be

  if (/#$/) {
    print scalar <FILE>;
  }

The angle-bracket operator only sets $_ in a 'while' 
condition.

-- 
Steve


------------------------------

Date: Sat, 22 Mar 2003 04:32:47 +0100
From: Kostis P <kp@iwerx.com>
Subject: Re: regexp question
Message-Id: <3e7bd9f9$1_1@news.bluewin.ch>

Hi Mike.
This one reads for STDIN but you can modify it to open a file and read 
from there instead.

while (my $line = <>) {
     if ($1 eq '#') {
        print "I got it: $line";
     }
     $line =~ /(.)\n$/;
}

This script assumes that you are on a unix system since the end of each 
line is the "new line" special character.

It will not work unless the last character is really a new line but you 
can modify it to suit your needs if your requirements are different.

Regards...

Mike wrote:
> Hi,
> 
> I am trying to read in a file line by line and then print / assign to an
> array the line following one matched by a regexp, ie
> 
> if I have
> 
> Line 1:<start>Hello World#
> Line 2:This is what I want
> Line 3: blah
> Line 4: blah blah
> Line 5:<stop>
> Line 6:<start>Hello Again#
> Line 7: I want this too !
> Line 8: blah
> Line 9: blah blah
> Line 10:<stop>
> 
> etc
> 
> So that I know the content I want will always be preceded by a line which
> ends in a '#'.  Is it possible to do it this way, or will I have to read in
> / slurp the whole file and then parse it ?
> 
> thanks
> 
> Mike
> 
> 



------------------------------

Date: Sat, 22 Mar 2003 03:50:02 GMT
From: Michael Budash <mbudash@sonic.net>
Subject: Re: regexp question
Message-Id: <mbudash-F4444F.19500121032003@typhoon.sonic.net>

In article <oCQea.1383$iE4.1116926@twister.nyc.rr.com>,
 Steve Grazzini <grazz@nyc.rr.com> wrote:

> Michael Budash <mbudash@sonic.net> wrote:
> >"Mike" <mike@luusac.co.uk> wrote:
> >> I am trying to read in a file line by line and then 
> >> print / assign to an array the line following one matched 
> >> by a regexp [...]
> >
> > while (<FILE>) {
> >    if (/#$/) {
> >       <FILE>;
> >       print;
> 
> That probably ought to be
> 
>   if (/#$/) {
>     print scalar <FILE>;
>   }
> 
> The angle-bracket operator only sets $_ in a 'while' 
> condition.

you are completely correct... and to think i really *did* test my reply 
before posting - i just didn't *see* that the answer was wrong!!


------------------------------

Date: Sat, 22 Mar 2003 10:08:31 -0000
From: "Mike" <mike@luusac.co.uk>
Subject: Re: regexp question
Message-Id: <eHWea.237$842.22@newsfep4-winn.server.ntli.net>

Many thanks for the replies ...

Mike

"Mike" <mike@luusac.co.uk> wrote in message
news:oUOea.405$N73.4421@newsfep4-glfd.server.ntli.net...
> Hi,
>
> I am trying to read in a file line by line and then print / assign to an
> array the line following one matched by a regexp, ie
>
> if I have
>
> Line 1:<start>Hello World#
> Line 2:This is what I want
> Line 3: blah
> Line 4: blah blah
> Line 5:<stop>
> Line 6:<start>Hello Again#
> Line 7: I want this too !
> Line 8: blah
> Line 9: blah blah
> Line 10:<stop>
>
> etc
>
> So that I know the content I want will always be preceded by a line which
> ends in a '#'.  Is it possible to do it this way, or will I have to read
in
> / slurp the whole file and then parse it ?
>
> thanks
>
> Mike
>
>




------------------------------

Date: Sat, 22 Mar 2003 12:02:44 +0100
From: "Brendon Caligari" <bcaligari@fubar.fireforged.com>
Subject: Re: regexp question
Message-Id: <3e7c42e7$0$66675$bed64819@news.gradwell.net>


"Mike" <mike@luusac.co.uk> wrote in message
news:oUOea.405$N73.4421@newsfep4-glfd.server.ntli.net...
> Hi,
>
> I am trying to read in a file line by line and then print / assign to an
> array the line following one matched by a regexp, ie
>
> if I have
>
> Line 1:<start>Hello World#
> Line 2:This is what I want
> Line 3: blah
> Line 4: blah blah
> Line 5:<stop>
> Line 6:<start>Hello Again#
> Line 7: I want this too !
> Line 8: blah
> Line 9: blah blah
> Line 10:<stop>
>
> etc
>
> So that I know the content I want will always be preceded by a line which
> ends in a '#'.  Is it possible to do it this way, or will I have to read
in
> / slurp the whole file and then parse it ?
>
> thanks
>
> Mike
>

If you are just after extracting some lines out of a log file, doing it
straight from command line may be handy
    perl -n -e 'print $_=<> if /#$/' <filename>
or
    sed -ne '/#$/{n;p;}' <filename>

B




------------------------------

Date: Sat, 22 Mar 2003 09:18:22 GMT
From: "joe" <tunmaster@hotmail.com>
Subject: regexp's to rip html file
Message-Id: <yTVea.933375$Wr.34268576@Flipper>

Hi there!

I was wondering if anyone could help me out here. I'm trying to make my
own sitesearch-script, but I've encountered some problems ripping an html-
file to plain text.

All I want is to get rid of all of the html-tags. That's no problem, but
what
about tags like the 'script' one. I don't only want to eliminate that tag,
but
also the lines between this opening and closing tag.

This is what I have sofar, considdering that $joe is the string containing
the
html input:

$joe =~ s/<script[^(<\/script>)]*<\/script>//ig;
$joe =~ s/<[^>]*>//ig;
$joe =~ s/\&#?\w{2,6};//ig;
$joe =~ s/\n{2,}/\n/ig;

Is there some easy way to solve this problem or does anyone know of a
module that does the trick for me.

TIA,

Joe




------------------------------

Date: Sat, 22 Mar 2003 09:25:29 GMT
From: tony@svanstrom.com (Tony L. Svanstrom)
Subject: Re: regexp's to rip html file
Message-Id: <1fs7wvf.v217u11fmphp6N%tony@svanstrom.com>

joe <tunmaster@hotmail.com> wrote:

> All I want is to get rid of all of the html-tags. That's no problem,

 Not if you control the format of the incoming HTML it isn't, otherwise
it's impossible to do with a regexp.

> This is what I have sofar,

        [...]

> Is there some easy way to solve this problem or does anyone know of a
> module that does the trick for me.

 <URL: http://search.cpan.org/ >

-- 
# Per scientiam ad libertatem! // Through knowledge towards freedom! #
# Genom kunskap mot frihet! =*= (c) 1999-2002 tony@svanstrom.com =*= #

    perl -e'print$_{$_} for sort%_=`lynx -source svanstrom.com/t`'


------------------------------

Date: Sat, 22 Mar 2003 10:11:06 GMT
From: emcee <res1uzbe@verizon.net>
Subject: Re: regexp's to rip html file
Message-Id: <_EWea.13034$tO3.5311@nwrddc04.gnilink.net>

joe wrote:
> Hi there!
> 
> I was wondering if anyone could help me out here. I'm trying to make my
> own sitesearch-script, but I've encountered some problems ripping an html-
> file to plain text.
> 
> All I want is to get rid of all of the html-tags. That's no problem, but
> what
> about tags like the 'script' one. I don't only want to eliminate that tag,
> but
> also the lines between this opening and closing tag.
> 
> This is what I have sofar, considdering that $joe is the string containing
> the
> html input:
> 
> $joe =~ s/<script[^(<\/script>)]*<\/script>//ig;
> $joe =~ s/<[^>]*>//ig;
> $joe =~ s/\&#?\w{2,6};//ig;
> $joe =~ s/\n{2,}/\n/ig;
> 
> Is there some easy way to solve this problem or does anyone know of a
> module that does the trick for me.
> 
> TIA,
> 
> Joe
> 
> 

how about this: (assuming the html is in $_)

s|<script[^>]*.*?</script[^>]*||g;



------------------------------

Date: Sat, 22 Mar 2003 11:00:35 GMT
From: "joe" <tunmaster@hotmail.com>
Subject: Re: regexp's to rip html file
Message-Id: <nnXea.937397$Wr.34412851@Flipper>

"emcee" <res1uzbe@verizon.net> schreef in bericht
news:_EWea.13034$tO3.5311@nwrddc04.gnilink.net...
> joe wrote:
> > Hi there!
> >
> > I was wondering if anyone could help me out here. I'm trying to make my
> > own sitesearch-script, but I've encountered some problems ripping an
html-
> > file to plain text.
> >
> > All I want is to get rid of all of the html-tags. That's no problem, but
> > what
> > about tags like the 'script' one. I don't only want to eliminate that
tag,
> > but
> > also the lines between this opening and closing tag.
> >
> > This is what I have sofar, considdering that $joe is the string
containing
> > the
> > html input:
> >
> > $joe =~ s/<script[^(<\/script>)]*<\/script>//ig;
> > $joe =~ s/<[^>]*>//ig;
> > $joe =~ s/\&#?\w{2,6};//ig;
> > $joe =~ s/\n{2,}/\n/ig;
> >
> > Is there some easy way to solve this problem or does anyone know of a
> > module that does the trick for me.
> >
> > TIA,
> >
> > Joe
> >
> >
>
> how about this: (assuming the html is in $_)
>
> s|<script[^>]*.*?</script[^>]*||g;
>

s|<script[^>]*.*?</script[^>]*>||g;   (I added a greater than-character at
the end of the regexp)

Your solution works great, thanks a lot! But can you explain why this
questionmark in the middle makes this of a big difference? I allways thought
perl to be a greedy one and that it would eliminate all scripts and other
content between the very first openingtag and very last closingtag, but this
questionmark prevents it from committing this sin.





------------------------------

Date: 21 Mar 2003 21:22:30 -0800
From: dwlepage@yahoo.com (david)
Subject: Re: Text::ParseWords or Text::CSV
Message-Id: <b09a22ae.0303212122.24d21002@posting.google.com>

Benjamin Goldberg <goldbb2@earthlink.net> wrote in message news:<3E79F041.1952E365@earthlink.net>...
> david wrote:
> > 
> > Does anyone have experience with either of these two modules? Im
> > trying to figure out how to parse a comma delimited file, but the
> > problem is certain fields also have commas so when the file is
> > exported to text format, of course it now looks like it has more
> > fields. I want to remove the comma's from the fields that have them:
> > 
> > dlepage,engineer,mn,"infrastrucuture",6-5-02
> > aidan,support,va,"infrastructure,engineering", 3-1-96
> > 
> > I put quotes around the actual 'field 3', but the actual file I want
> > to work on does NOT have the quotes:
> > 
> > dlepage,engineer,mn,infrastrucuture,6-5-02
> > aidan,support,va,infrastructure,engineering, 3-1-96
> > 
> > i.e.
> > field0=aidan
> > field1=support
> > field2=va
> > field3=infrastructure,engineering
> > field4=3-1-96
> 
> while( <IN> ) {
>    my ($field0, $field1, $field2, @fields34) = split /,/, $_, -1;
>    my $field4 = pop @fields34;
>    my $field3 = join(",", @fields34);
> }
> 
> [untested]

Everyone thanks for the ideas. I have it down for the most part, but
there is one thing im stuck on. Please see what I have:

#!/bin/perl -w
use strict;

my (@fields, $fields, $field3,@test, @out, $field5);
open (IN, "<parsetext.txt") || die "Cannot open $!\n";
open (OUT, ">parseout.txt") || die "Canot open $!\n";

while ( <IN> ) {
        my ($field0, $field1, $field2, @fields34) = split /,/, $_, -1;
                my $field4 = @fields34;             
                my $field3 = join("+", @fields34);
                        @test = ($field0, $field1, $field2,
"$field3\n");
                        push (@out, join(",", @test));
                        
               }


print @out;

close(OUT);
close(IN);

Now if my data looks like:
dlepage,engineer,mn,infrastrucuture
aidan,support,va,infrastructure,engineering, 3-1-96

I get:

dlepage,engineer,mn,infrastrucuture

aidan,support,va,infrastructure+engineering+3-1-96

when I really want:

dlepage,engineer,mn,infrastrucuture
aidan,support,va,infrastructure+engineering,3-1-96

where + could be any delimiter, except ,

It is obviously because I am loading the remaining fields into an
array, is there a better way to get around this?


------------------------------

Date: Sat, 22 Mar 2003 03:38:35 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Text::ParseWords or Text::CSV
Message-Id: <3E7C210B.EED370DA@earthlink.net>

david wrote:
> Benjamin Goldberg wrote:
[snip]
> >    my ($field0, $field1, $field2, @fields34) = split /,/, $_, -1;
> >    my $field4 = pop @fields34;
> >    my $field3 = join(",", @fields34);
[snip]
>      my ($field0, $field1, $field2, @fields34) = split /,/, $_, -1;
>      my $field4 = @fields34;
>      my $field3 = join("+", @fields34);
[snip]

Notice something missing?

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$@[$a%6
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}


------------------------------

Date: Sat, 22 Mar 2003 04:37:10 +0100
From: Kostis P <kp@iwerx.com>
Subject: Re: Working with Objects in Perl.
Message-Id: <3e7bdb00$1_4@news.bluewin.ch>

Hmm.
Could it be tht this is actually a read-only property you are trying to set?

Read-only properties are very common in windows objects.

Regards...

Lax wrote:
> I'm a Perl newbie working on object-oriented Perl. 
> Could somebody please help with this. 
> 
> I'm trying to set a property of an object. 
> I'm able to get the value of an object's property, but am unable to
> set it.
> I get the errors : "Can't modify non-lvalue subroutine call at
> <line-number>"
> 
> 
> 
> ============================================
> use Win32::OLE; 
> $getMSI = $ARGV[0] ; 
> 
> Win32::OLE::CreateObject("WindowsInstaller.Installer",$ins) || die
> "Cant create object : $! !!\n"  ;
> $database = $ins->OpenDatabase($getMSI, 0) || die "Cant open
> connection : $!\n" ;
> 
> $summaryInfo = $database->SummaryInformation(17) ; 
> $getProperty = $summaryInfo->Property(4) ; 
> printf "Property: $getProperty\n" ; 
> 
> # I need to set $summaryInfo's Property(4) to a new value here.
> 
> $getDB->Persist() ; 
> 
> =============================================
> 
> "SummaryInformation" is a property of "database" object. 
> "$summaryInfo" is the "SummaryInfo" object. 
> I access the "Property" property of this($summaryInfo) object and am
> able to get its value.
> But am unable to set its value. 
> 
> Thanks in advance,
> Lax



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4743
***************************************


home help back first fref pref prev next nref lref last post