[15697] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3110 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun May 21 06:05:26 2000

Date: Sun, 21 May 2000 03:05:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <958903511-v9-i3110@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Sun, 21 May 2000     Volume: 9 Number: 3110

Today's topics:
    Re: Comparing variables in a conditional "if" statement <godzilla@stomp.stomp.tokyo>
    Re: Comparing variables in a conditional "if" statement <brian@bluecoat93.org>
    Re: Comparing variables in a conditional "if" statement <lr@hpl.hp.com>
    Re: Comparing variables in a conditional "if" statement <anmcguire@ce.mediaone.net>
        Is this the right behaviour for Net::SMTP/Cmd? <sweeheng@usa.net>
        PPM Packages - Documentation Issues <evilbeaver.picksoft@NOSPAMzext.net>
    Re: PPMFIX Doesn't Fix Anything (Steve A. Taylor)
    Re: printing all variables! <bwalton@rochester.rr.com>
    Re: regexes *sigh* damn I hate these things <uri@sysarch.com>
    Re: regexes *sigh* damn I hate these things <anmcguire@ce.mediaone.net>
    Re: regexes *sigh* damn I hate these things <nospam@devnull.com>
    Re: regexes *sigh* damn I hate these things <uri@sysarch.com>
    Re: regexes *sigh* damn I hate these things <nospam@devnull.com>
    Re: regexes *sigh* damn I hate these things <uri@sysarch.com>
    Re: regexes *sigh* damn I hate these things <Tbone@pimpdaddy.com>
    Re: regexes *sigh* damn I hate these things <nospam@devnull.com>
        SETUID problem (maybe lame) <maciek@treko.net.au>
        SSI in Perl Script ? <lancelotboyle@hotmail.com>
    Re: ugly mysql call <makarand_kulkarni@My-Deja.com>
        Untaint URL character class bbfrancis@networld.com
        updated : Re: regexes *sigh* damn I hate these things <nospam@devnull.com>
        valid email address <webmaster@momsathome.on.ca>
    Re: valid email address <evilbeaver.picksoft@NOSPAMzext.net>
    Re: valid email address <webmaster@momsathome.on.ca>
    Re: valid email address <uri@sysarch.com>
        Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 20 May 2000 22:21:29 -0700
From: "Godzilla!" <godzilla@stomp.stomp.tokyo>
Subject: Re: Comparing variables in a conditional "if" statement..PLEASE HELP!!
Message-Id: <39277259.EEB9666@stomp.stomp.tokyo>

Pasquale wrote:
 
> Can anyone tell me what is wrong with this?  

(snip)

> Any suggestions would be GREATLY appreciated!!

> open(ADR, "< adr.log");
> flock(ADR, 2);
> @adrs = <ADR>;
> flock(ADR, 8);
> close(ADR);
 
> foreach $addr(@adrs) {
> if ($address eq $addr) {
> &existerr;
> }
> }

Hi Pasquale,

With not being able to look at your data
base for addresses, an educated guess is
you need to 'chomp' off the newline symbol
\n for each element to get a match.

Quite often, if a data base is multi-line,
there is an invisible \n at the end of each
line we tend to forget. This is very common
and nothing to fret over.

I have a very simple bit of code for you
to play with, to modify, do whatever you
like if you so choose. Don't be offended,
I have changed a few minor things here
and there to help you stay within normal
standards for Perl, in 'looks' only.

Your lock and unlock have been left out
to keep this simple. Chances are good you
don't need those unless your site is very
busy, like thousands of hits per day. Try
your script without for testing, add your
lock and unlock back in later.

Array names and, this is strictly a personal
choice, array names usually start with a
capital letter to make the name stand out
better, for clarity. A sub-routine, same
'style' applies, most people capitalize
the first letter. No biggie. Do it your
way for sure. It's your program!

Use of < in your open a file, is not really
needed at all for just a 'read'. This can
be left out if you like.

This simple script will show you a few
methods which might help you. I am betting
you need to 'chomp' to get a match! Look
this over, upload it, give it a try. Play
with it for sure. Experimentation will
teach you a lot.

Hmm... let's see. I added 'last;' in your
loop to kick you out of the loop once a
match is found. Remove this if you need
to find multiple matches for some reason.
I made your sub-routine Existerr a simple
print so you can see all is working.

Pretty simple stuff. Script is first, contents
of 'adrs.log' are next and printed results are
last. Compare this code to your own, make a few
changes, see what happens! Don't forget to change
your first line Perl locale.

I wish you luck,

Godzilla!



Test script:
_____________


#!/usr/local/bin/perl

print "Content-Type: text/plain\n\n";

open (ADR, "adr.log");
@Adrs = <ADR>;
close(ADR);

# $address is pretend input

$address = "123 Main St.";

foreach $addr (@Adrs) 
 {
  chomp ($addr);
  if ($address =~ /$addr/) 
   {
    &Existerr ($addr);
    last;
   }
 }

sub Existerr
 { print "Input address matches $addr"; }


Contents of adr.log:
________________________

45 Smith Wesson St.
350 Mako Shark Ave.
123 Main St.
911 Porsche Blvd.
345 My St.
789 Six Afraid Of Seven Drive.


Printed results:
_________________________

Input address matches 123 Main St.


------------------------------

Date: Sun, 21 May 2000 02:01:07 -0400
From: "Brian Landers" <brian@bluecoat93.org>
Subject: Re: Comparing variables in a conditional "if" statement..PLEASE HELP!!
Message-Id: <0YKV4.3148$ct.40407@news4.atl>

"Godzilla!" <godzilla@stomp.stomp.tokyo> wrote in message
news:39277259.EEB9666@stomp.stomp.tokyo...

>Your lock and unlock have been left out
>to keep this simple. Chances are good you
>don't need those unless your site is very
>busy, like thousands of hits per day. Try
>your script without for testing, add your
>lock and unlock back in later.

"site"? what site? the OP said nothing about running a CGI script! You seem
to have difficulty with this concept. Taking the flock calls out for
simplicity in an example is one thing. Making a misleading statement like
the one above is irresponsible. The fact that the OP used flock to begin
with shows he/she is aware of concurrency issues, where you apparently are
not.

> #!/usr/local/bin/perl

no -w or use strict

> print "Content-Type: text/plain\n\n";

Nothing in what the OP write indicates he/she was looking for a CGI script.

> open (ADR, "adr.log");

didn't bother test whether the open succeeded or not.

[snip]

>   if ($address =~ /$addr/)

what happens if $addr contains meta-characters?

>    {
>     &Existerr ($addr);
>     last;
>    }
>  }
> sub Existerr
>  { print "Input address matches $addr"; }

you pass $addr as a parameter to the sub, then access it as a global var?
sloppy

All in all, a typical sloppy response.

Cheers,
Brian





------------------------------

Date: Sat, 20 May 2000 23:25:20 -0700
From: Larry Rosler <lr@hpl.hp.com>
Subject: Re: Comparing variables in a conditional "if" statement..PLEASE HELP!!
Message-Id: <MPG.13912123392ef31f98aaab@nntp.hpl.hp.com>

In article <39275AD5.DF472643@geocities.com>, destiny30@telusplanet.net 
says...
> Can anyone tell me what is wrong with this?  How can I get the script to
> check if the incoming address already exists in the log file?  I've also
> tried matching with regular expressions instead of comparing the
> variables, but that doesn't work.  The closest I've come to this working
> is when I use the "= =" instead of the "eq".  It will tell me the
> address already exists if I try to add it again, but it will also tell
> me it exists if I input an address with the same house number but
> different street.  Any suggestions would be GREATLY appreciated!!

You can use lots of help!  I hope you are reading a good introductory 
book or tutorial.

First of all, ALWAYS start your programs this way:

    #!/usr/local/bin/perl -w
    use strict;

Had you done so, you would have discovered the serious error of comparing 
strings by using '==' instead of 'eq'.

> open(ADR, "< adr.log");

ALWAYS check the results of a request to the operating system, such as 
opening a file.

  open ADR, 'adr.log' or die "Couldn't open 'adr.log'. $!\n";

> flock(ADR, 2);

ALWAYS check the results of a request to the operating system, such as 
locking a file.  But in most operating systems you cannot lock a ile 
opened only for reading, nor would you want to.

> @adrs = <ADR>;

Reading an entire file into an array when all you are doing is line-at-a-
time processing is wasteful of memory.

> flock(ADR, 8);

Don't unlock a locked file; let the close() do it.  But this file isn't 
locked anyway.

> close(ADR);
> 
> foreach $addr(@adrs) {

The problem is here.  Every element of the array has a terminating 
newline, which isn't in the comparison string.  Read this for more 
information:  perldoc -f chomp

> if ($address eq $addr) {
> &existerr;
> }

Indent code within a loop; indent code within a block.  Visual program 
structure is an important aid to understanding.

> }

Untested:

    #!/usr/local/bin/perl -w
    use strict;

    my $address = 'what we are looking for';

    open ADR, 'adr.log' or die "Couldn't open 'adr.log'. $!\n";
    while (<ADR>) {
        chomp;
        if ($address eq $_) {
            &existerr;
            last; # Why keep reading the file?
        }
    }

-- 
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: Sun, 21 May 2000 01:26:26 -0500
From: "Andrew N. McGuire " <anmcguire@ce.mediaone.net>
Subject: Re: Comparing variables in a conditional "if" statement..PLEASE HELP!!
Message-Id: <Pine.LNX.4.20.0005210120560.16683-100000@hawk.ce.mediaone.net>

On Sun, 21 May 2000, Brian Landers wrote:

+"Godzilla!" <godzilla@stomp.stomp.tokyo> wrote in message
+news:39277259.EEB9666@stomp.stomp.tokyo...

[ snip ]

+>   if ($address =~ /$addr/)
+
+what happens if $addr contains meta-characters?

Even worse, what happens if:

$address eq '4123 Main St.' && $addr eq '123 Main St.'

You get a pattern match, and erronious output from
your program, as one address is a substring of the other,
and her pattern match is not anchored on anything.

Best Wishes,

anm
-- 
/*-------------------------------------------------------.
| Andrew N. McGuire                                      |
| anmcguire@ce.mediaone.net                              |
`-------------------------------------------------------*/



------------------------------

Date: Sun, 21 May 2000 17:16:00 +0800
From: "Swee Heng" <sweeheng@usa.net>
Subject: Is this the right behaviour for Net::SMTP/Cmd?
Message-Id: <8g88vl$7v7$1@clematis.singnet.com.sg>

Consider the following code extract:
  use Net::SMTP;
  ...
  $smtp = Net::SMTP->new('mailhost');
  ...
  $smtp->mail('me');
  $smtp->to('you')
  $smtp->data;
  $smtp->datasend($CONTENT);
  $smtp->dataend;
  ...
where $CONTENT is a string. Note: $smtp->quit is not called yet.

A problem occurs if $CONTENT = ''. In Net::Cmd::datasend(), the "return 1
unless length($line)" makes datasend() exits without defining
${*$cmd}{'net_cmd_lastch'}.(Which seems right since there is no last
character to speak of.)

Consequently the "return 1 unless(exists ${*$cmd}{'net_cmd_lastch'})"
statement in Net::Cmd::dataend() returns without sending the SMTP server a
"<CR><LF>.<CR><LF>". The server is left waiting...
QUESTION: Is this the right behaviour for Net::SMTP/Cmd to implement?

Arguably, one should perhaps not send envelopes with empty content. However,
from what I recall of RFC 821/822, there seems to be no such requirement.
Indeed, my mail deamon (Postfix) accepts via "telnet localhost 25" the
following transaction:

< MAIL FROM: me
> 250 Ok
< RCPT TO: you
> 250 Ok
< DATA
> 354 End data with <CR><LF>.<CR><LF>
< .
> 250 Ok: queued as B1F1140AE

I believe the same is true of sendmail. Which is strange, since the dot to
end DATA has no preceding <CR><LF> and is thus not RFC compliant.
QUESTION: Is Postfix and sendmail not strictly enforcing RFC compliance
here?

Perhaps the telnet program magically pads <CR><LF> before and after the dot.
QUESTION: If that's the case, why doesn't Net::SMTP do so when $CONTENT=''?

Either Net::SMTP is wrong, the MTAs are wrong or I am beyond redemption. :)
Thanks in advance for answering my 3 questions.

Swee Heng




------------------------------

Date: Sun, 21 May 2000 05:18:52 GMT
From: "The Evil Beaver" <evilbeaver.picksoft@NOSPAMzext.net>
Subject: PPM Packages - Documentation Issues
Message-Id: <0lKV4.217696$Kv2.314610@quark.idirect.com>

I am currently creating a PPM package for a Perl module. I've copied the
instructions of the PPM FAQ on making packages, but for the life of me
cannot get the documentation for the module to show up in the menu. I can't
find anything on making packages outside the PPM FAQ, so if anyone has any
experience putting together these packages, HELP!

Thank you.

--
Christopher S. Charabaruk (EvilBeaver) <evilbeaver.picksoft@zext.net>
BlakLight Software <http://picksoft.zext.net/>





------------------------------

Date: Sun, 21 May 2000 04:58:33 GMT
From: an400@freenet.carleton.ca (Steve A. Taylor)
Subject: Re: PPMFIX Doesn't Fix Anything
Message-Id: <39276ba7.36131426@news.ncf.carleton.ca>

On Sat, 20 May 2000 10:52:32 GMT, bart.lateur@skynet.be (Bart Lateur)
wrote:

>mschore@mindspring.com wrote:
>
>>Does anyone understand the following behavior:
>>J:\Perl>ppmfix
>>
>>J:\Perl>ppm verify --upgrade --location=. PPM
>>Failed to load PPM_DAT file

Start reading (for buried treasure). The ppm.html file explains that
PPM_DAT is an env-var holding the location of ppm.xml, defaulting to
current directory. 

So, with Win95 == not Win2k == that error stopped when I copied/moved
the file into the ppm.bat directory. Things really got better when I
just 'verified' my ppm module to get the latestt version.

Try that.

>
>Eh? This file doesn't even exist.
>
>>Error verifying PPM: Package 'PPM' has not been installed by PPM
>>
>>I unzipped the fix file directly to the Perl directory.
>
>Oops. You shouldn't. You'd better unzip it into a temporary directory,
>while maintaining the directory structure in the ZIP file.
>
>>I just did a fresh install of Perl on a Win 2K machine and I am seeing
>>behavior not seen before. PPM is totally inoperative as well. I was
>>hoping the the "fix" would solve this problem.
>
>No, PPM ought to work without any fix. It's not *taht* broken.
>
>At least I hope you did a reboot after installing Perl? The only reason
>is that the bin directory of your Perl installation should be in the
>PATH environment variable. It is in AUTOEXEC.BAT, but not yet in PATH
>without a reboot.
>
>If you add that directory to PATH in a DOS window, chdir to the
>directory containing PPMFIX.BAT, and run it from there, then it ought to
>work, too.
>
>I haven't tried it on Win2k, though. I don't have a copy of that
>monster.
>
>-- 
>	Bart.



------------------------------

Date: Sun, 21 May 2000 04:38:19 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: printing all variables!
Message-Id: <39276744.58ECB8C6@rochester.rr.com>

Baris wrote:
> 
> Hello,
> it is a cgi program (not written by me). Is it possible to apply a similar
> strategy to it?

Sure.  Just fake out the environment variables and standard input coming
into the program from the server, and debug away.  If the program uses
the CGI module, it is even easier, since you can specify the parameters
on the command line.
--
Bob Walton

 ...


------------------------------

Date: Sun, 21 May 2000 04:33:59 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: regexes *sigh* damn I hate these things
Message-Id: <x7zopked3e.fsf@home.sysarch.com>

>>>>> "G" == Godzilla!  <godzilla@stomp.stomp.tokyo> writes:

  G> #!/usr/local/bin/perl

  G> print "Content-Type: text/plain\n\n";

one day you may actually need to run perl on something other than a web
server. you do realize that most perl programs are NOT cgi? and always
wrapping your code in a cgi is misleading and harder to debug.

  G> $input = "
  G> <p>Name:  <a href=\"http://www.planetunreal.com/dl/nc.asp?nalicity/
                       ^
  G> utdm/dm-distinctive.zip\">DM-Distinctive</a><br>
                            ^
  G> Author:<a href=\"mailto:bastiaan_frank\@hotmail.com\">Bastiaan
                    ^                                   ^
  G> Frank</a><br>
  G> Rating: (1-10) 9.5</p><!-- add correct image name below here -->
  G> <img align=\"right\" border=\"0\" hspace=\"10\" vspace=\"10\"
                ^      ^         ^  ^         ^   ^         ^   ^
  G> width=\"231\" height=\"173\" 
           ^    ^         ^    ^
  G> src=\"dm-distinctive.jpg\">";
         ^                   ^

learn to use a here doc already. long quoted strings with plenty of
escaped quotes are ugly and hard to read. maybe you should try coding
perl for comprehension?

  G> if ($input =~ / ([0-9\.]+)/)

0-9 is better said as \d. and . is not special in a char class.
so that should be:

	if ($input =~ / ([\d.]+)/)

and you didn't 

  G>  { $input = $1; }

and why are you assigning the number back to $input? even in a short
example that is a mighty dumb choice and in production it would probably
be even worse.

  G> exit;

try doing that already.

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  -----------  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  ----------  http://www.northernlight.com


------------------------------

Date: Sun, 21 May 2000 00:42:42 -0500
From: "Andrew N. McGuire " <anmcguire@ce.mediaone.net>
Subject: Re: regexes *sigh* damn I hate these things
Message-Id: <Pine.LNX.4.20.0005210033560.16471-100000@hawk.ce.mediaone.net>

On 21 May 2000, The WebDragon wrote:

+ok, I need help creating a regex to extract a number from a bit of html

[ snip ]

+I need to extract the number AFTER the Rating: (1-10) and before the </p>, which 
+can be any number from 0 to 10 in .5 increments 
+
+there may be a 0, or an 8.5 or a 10 or a 5 or a 5.5 etc. 
+
+I need ONLY that number. 
+
+Anyone up to the task?

#!/usr/bin/perl -w

use strict;

while (<DATA>)
{
    print "$1\n"
        if m|^Rating: \(1-10\) (\d*?\.*[05]*)</p><!--.*?-->|;
}

__DATA__
<p>Name:  <a href="http://www.planetunreal.com/dl/nc.asp?nalicity/              
utdm/dm-distinctive.zip">DM-Distinctive</a><br>
Author:<a href="mailto:bastiaan_frank@hotmail.com">Bastiaan
Frank</a><br>
Rating: (1-10) 19.6</p><!-- add correct image name below here -->

<img
align="right" border="0" hspace="10" vspace="10" width="231" height="173"
src="dm-distinctive.jpg">

The above works for me, it will cover cases 0, 0.0, 0.5, and .5.
It will not cover 0.6.  I am not a regex guru, so although it works
there is probably a better way.  Oh yeah, it will match 19.5, although
this easy to change, didn't know if you were limited at 10 or not.

HTH,

anm
-- 
/*-------------------------------------------------------.
| Andrew N. McGuire                                      |
| anmcguire@ce.mediaone.net                              |
`-------------------------------------------------------*/



------------------------------

Date: 21 May 2000 06:09:25 GMT
From: The WebDragon <nospam@devnull.com>
Subject: Re: regexes *sigh* damn I hate these things
Message-Id: <8g7uil$6t6$0@216.155.32.125>

In article <8g7gtn$1pl$1@wauna.tc.fluke.com>, wjones@tc.fluke.COM (Warren 
Jones) wrote:

 | The WebDragon <nospam@devnull.com> writes:
 | 
 | > ok, I need help creating a regex to extract a number from a bit of html
 | > 
 | > here's a snip of the html
 | > 
 | > <p>Name:  <a href="http://www.planetunreal.com/dl/nc.asp?nalicity/
 | > utdm/dm-distinctive.zip">DM-Distinctive</a><br>
 | > Author:<a href="mailto:bastiaan_frank@hotmail.com">Bastiaan 
 | > Frank</a><br>
 | > Rating: (1-10) 9.5</p><!-- add correct image name below here -->
 | > <img align="right" border="0" hspace="10" vspace="10" width="231" 
 | > height="173" 
 | > src="dm-distinctive.jpg">
 | > 
 | > I need to extract the number AFTER the Rating: (1-10) and before the 
 | > </p>, which 
 | > can be any number from 0 to 10 in .5 increments 
 | > 
 | > there may be a 0, or an 8.5 or a 10 or a 5 or a 5.5 etc. 
 | > 
 | > I need ONLY that number. 
 | 
 | Don't hate regular expressions.
 | Regular expressions are your friend.
 | 
 |      print "$1\n" if /Rating: \(1-10\)\s+(1?\d(\.[05])?)\s*<\/p>/;
 | 
 | Hmmm ... this will accept 10.5, but it won't accept .5 (no leading 0).
 | If that's a problem, hope this still gives you enough of a hint
 | at how to complete the solution.

I solved it like this :

FILE: foreach my $filez (sort @fileslist) {
    my @filedata = ();
    my $inputfile = File::Spec->catfile( $inputDir, $filez);
    open(IN, "<$inputfile");
        chomp(@filedata = <IN>);
    close(IN);
    foreach (@filedata) {
        #search the lines and look for the "Rating:" phrase
        if (/Rating:*/) {
            #grab the line that contains it
            my $rating = $_;
            #slurp out the rating 
            my $var = $1 if ($rating =~ /\)\s*(\d?\d\.??\d?)\s*\</);
            #if it's just coincidence and there isn't a rating, or it can't 
extract it, move on to the next file
            if (!$var) {next FILE};
            # otherwise dump it into the hash, and skip the rest of the file, 
since we have what we need, and move on to the next one.
            $listing{$filez} = $var; 
            next FILE
        }
    }
}

I had to account for the possibility of whitespace before and after the rating, 
making sure I wasn't just pulling ANY old number out of the html file, and also 
making sure I wasn't accidentally grabbing some whitespace after the digits..

-- 
send mail to mactech (at) webdragon (dot) net instead of the above address. 
this is to prevent spamming. e-mail reply-to's have been altered 
to prevent scan software from extracting my address for the purpose 
of spamming me, which I hate with a passion bordering on obsession.  


------------------------------

Date: Sun, 21 May 2000 06:56:50 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: regexes *sigh* damn I hate these things
Message-Id: <x7wvkoe6hf.fsf@home.sysarch.com>

>>>>> "TW" == The WebDragon <nospam@devnull.com> writes:


  TW> FILE: foreach my $filez (sort @fileslist) {
  TW>     my @filedata = ();
  TW>     my $inputfile = File::Spec->catfile( $inputDir, $filez);
  TW>     open(IN, "<$inputfile");
  TW>         chomp(@filedata = <IN>);
  TW>     close(IN);


  TW>     foreach (@filedata) {

use a while loop as it usually more efficient than slurping in the whole
file. in particular in your case you can stop reading when you find the
match.

  TW>         #search the lines and look for the "Rating:" phrase
  TW>         if (/Rating:*/) {
  TW>             #grab the line that contains it
  TW>             my $rating = $_;

no need to copy the line. just leave it in $_ and match.

  TW>             #slurp out the rating 
  TW>             my $var = $1 if ($rating =~ /\)\s*(\d?\d\.??\d?)\s*\</);
  TW>             #if it's just coincidence and there isn't a rating, or it can't 
  TW> extract it, move on to the next file
  TW>             if (!$var) {next FILE};

not good. what if the number was 0? better to test the success of the
match

		if ( /\)\s*(\d+\.\d*)</ ) {

there are many variations of the regex itself. in fact you don't need
the outer Rating match either. given your string that one will match
there just fine.

			$listing{$filez} = $var;
			last ;
		}

the last will exit the while loop and cause the next file to be
opened. no need for the label then.

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  -----------  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  ----------  http://www.northernlight.com


------------------------------

Date: 21 May 2000 07:28:17 GMT
From: The WebDragon <nospam@devnull.com>
Subject: Re: regexes *sigh* damn I hate these things
Message-Id: <8g836h$jkf$0@216.155.32.125>

In article <x7wvkoe6hf.fsf@home.sysarch.com>, Uri Guttman <uri@sysarch.com> 
wrote:

 | >>>>> "TW" == The WebDragon <nospam@devnull.com> writes:
 | 
 | 
 |   TW> FILE: foreach my $filez (sort @fileslist) {
 |   TW>     my @filedata = ();
 |   TW>     my $inputfile = File::Spec->catfile( $inputDir, $filez);
 |   TW>     open(IN, "<$inputfile");
 |   TW>         chomp(@filedata = <IN>);
 |   TW>     close(IN);
 | 
 | 
 |   TW>     foreach (@filedata) {
 | 
 | use a while loop as it usually more efficient than slurping in the whole
 | file. in particular in your case you can stop reading when you find the
 | match.
 | 
 |   TW>         #search the lines and look for the "Rating:" phrase
 |   TW>         if (/Rating:*/) {
 |   TW>             #grab the line that contains it
 |   TW>             my $rating = $_;
 | 
 | no need to copy the line. just leave it in $_ and match.

ok I'll try that. 

so 

    my $var = $1 if (/\)\s*(\d?\d\.??\d?)\s*\</);

will work? 

 |   TW>             #slurp out the rating 
 |   TW>             my $var = $1 if ($rating =~ /\)\s*(\d?\d\.??\d?)\s*\</);
 |   TW>             #if it's just coincidence and there isn't a rating, or 
 |   it can't 
 |   TW> extract it, move on to the next file
 |   TW>             if (!$var) {next FILE};
 | 
 | not good. what if the number was 0? better to test the success of the
 | match

ok you're losing me again.

plus the problem just got more complex.. see my next posting. 



 | 		if ( /\)\s*(\d+\.\d*)</ ) {
 | 
 | there are many variations of the regex itself. in fact you don't need
 | the outer Rating match either. given your string that one will match
 | there just fine.

problem. there may be other instances of numbers IN that html file, and I need 
to be absolutely certain that it matches the number from the ratings line.

 | 			$listing{$filez} = $var;
 | 			last ;
 | 		}
 | 
 | the last will exit the while loop and cause the next file to be
 | opened. no need for the label then.
 | 
 | uri

-- 
send mail to mactech (at) webdragon (dot) net instead of the above address. 
this is to prevent spamming. e-mail reply-to's have been altered 
to prevent scan software from extracting my address for the purpose 
of spamming me, which I hate with a passion bordering on obsession.  


------------------------------

Date: Sun, 21 May 2000 07:42:56 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: regexes *sigh* damn I hate these things
Message-Id: <x7u2fse4ch.fsf@home.sysarch.com>

>>>>> "TW" == The WebDragon <nospam@devnull.com> writes:

  TW>     my $var = $1 if (/\)\s*(\d?\d\.??\d?)\s*\</);

  TW> will work? 

  TW>  | not good. what if the number was 0? better to test the success of the
  TW>  | match

like i said above.

  TW> ok you're losing me again.

what if the number you matched is 0.0? then testing $var for truth will
fail and you lose. you test the actual match operation to see if you
matched. then you can assign $1 knowing it has something. in fact i have
never seen code like your line above. it is not clean and can break.

your regex looks ok but it is busy. mine would work but is much
simpler. in the regex world simpler is usually better.

  TW> plus the problem just got more complex.. see my next posting. 



  TW>  | 		if ( /\)\s*(\d+\.\d*)</ ) {
  TW>  | 
  TW>  | there are many variations of the regex itself. in fact you don't need
  TW>  | the outer Rating match either. given your string that one will match
  TW>  | there just fine.

  TW> problem. there may be other instances of numbers IN that html file, and I need 
  TW> to be absolutely certain that it matches the number from the ratings line.

then make it one longer match:

		if ( /Ratings:.*\)\s*(\d+\.\d*)</ ) {

that will grab a number following a close paren which has Ratings:
somewhere before it. no need to do 2 matches when one will do. simplify.

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  -----------  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  ----------  http://www.northernlight.com


------------------------------

Date: 21 May 2000 08:19:17 GMT
From: Intergalactic Denizen of Mystery <Tbone@pimpdaddy.com>
Subject: Re: regexes *sigh* damn I hate these things
Message-Id: <8g8665$288s$1@news.enteract.com>

godzilla@stomp.stomp.tokyo writes:
>Why bother with a possibly error prone fancy regex when

Not your best effort. Think of something new or shut the fuck up.



------------------------------

Date: 21 May 2000 09:13:58 GMT
From: The WebDragon <nospam@devnull.com>
Subject: Re: regexes *sigh* damn I hate these things
Message-Id: <8g89cm$5ub$0@216.155.32.125>

In article <8g8665$288s$1@news.enteract.com>, Tbone@pimpdaddy.com wrote:

 | godzilla@stomp.stomp.tokyo writes:
 | >Why bother with a possibly error prone fancy regex when
 | 
 | Not your best effort. Think of something new or shut the fuck up.
 | 

yo, jackass.. say hello to my little friend --> *plonk*

-- 
send mail to mactech (at) webdragon (dot) net instead of the above address. 
this is to prevent spamming. e-mail reply-to's have been altered 
to prevent scan software from extracting my address for the purpose 
of spamming me, which I hate with a passion bordering on obsession.  


------------------------------

Date: Sun, 21 May 2000 17:41:15 +0800
From: Maciej Mastalarczuk <maciek@treko.net.au>
Subject: SETUID problem (maybe lame)
Message-Id: <3927AF3B.2FCB8CE2@treko.net.au>

Hi All,

The problem is as follows:
Non-root user has to perform root-only action. The owner of the script
is root:root and the mode is:
-rwsr-x--x (4751). Whenever the user invokes the script he gets the
message:

Insecure $ENV{PATH} while running setuid at ./test line (something).

How to overcome this? Of course when root invokes the script it works
fine.

Thanks a lot in advance & regards,

PS. The system is RedHat Linux 6.0 (Intel), perl is 5.005_03

--
Maciej Mastalarczuk
email: maciek@treko.net.au




------------------------------

Date: Sun, 21 May 2000 08:11:00 +0100
From: "Lance Boyle" <lancelotboyle@hotmail.com>
Subject: SSI in Perl Script ?
Message-Id: <8g82ba$am0$1@uranium.btinternet.com>

Is it possible to have more than file type that reads SSI ?

At present, I use the default file extension ".shtml" and I know that I
could change this to another using the .htaccess file but I need that still
working.

I want to use a <!--#config timefmt="%A %e %b %Y"--><!--#echo
var="date_local"--> in the HTML header part of a perl script. This header is
on every page generated by the script and fits in nicely with the rest of
the website.

Any replies would be most welcome.










------------------------------

Date: Sun, 21 May 2000 01:20:03 -0700
From: Makarand Kulkarni <makarand_kulkarni@My-Deja.com>
Subject: Re: ugly mysql call
Message-Id: <39279C32.4820558D@My-Deja.com>

> sub insert_row {
>     my $param_str = (join ",", ("?") x @_);

wow! I am going to start using this technique and make
my code pretty.
--



------------------------------

Date: Sun, 21 May 2000 04:24:30 GMT
From: bbfrancis@networld.com
Subject: Untaint URL character class
Message-Id: <8g7odq$n1m$1@nnrp1.deja.com>

I'm trying to untaint URL data for 'a send a link' CGI script, but I'm
unsure of all the leagal characters that can be used in a URL.

Thanx in advanced


Sent via Deja.com http://www.deja.com/
Before you buy.


------------------------------

Date: 21 May 2000 08:05:03 GMT
From: The WebDragon <nospam@devnull.com>
Subject: updated : Re: regexes *sigh* damn I hate these things
Message-Id: <8g85bf$ohc$0@216.155.32.125>

the problem has become more complex than it was. 

I now need to make sure to extract TWO values from this file (would have been 
nice if he'd said so in the beginning)


Given that the files may contain lines like this: 

Name:  <a 
href="http://www.planetunreal.com/dl/nc.asp?nalicity/utdm/dm-cyberwar.zip">DM-Cyb
erwar</a><br>\n
\n
Author:  <a href="mailto:666deadman666@email.msn.com">deadman</a><br>\n
\n
Rating: (1-10) 7.5 <p></p>\n

or like this:

<p>Name:  <a href="http://www.planetunreal.com/dl/nc.asp?nalicity/\n
utdm/dm-nitro.zip">DM-Nitro</a><br>\n
Author:<a href="mailto:ebolt@planetunreal.com">Eric 'Ebolt' Boltjes</a><br>\n
Rating: (1-10) 9</p><!-- add correct image name below here -->\n

(they are being generated by two different sources and the html differs in some 
of them)

I need to ALSO extract the 'utdm' portion of the source file from the lines like 
"utdm/dm-cyberwar.zip" and "utdm/dm-nitro.zip"

there MAY or may not be a / or a \n before the 'word' containing the gametype. 
(which may be anything from unrealdm, utdm, utctf, utassault, utother, or 
utdomination)

I don't quite understand how to tell the regex that - is a valid 'word' 
character, so that it will find the utother/fire-logo-map.zip as well as 
utdm/DM-Halls_of_redemption.zip

I've been playing with the regex, but it's really painfully obvious to me that I 
don't 'grok' them yet. 

here's the script in present form with the main piece commented out 
Apologies if the b0rked linewrapping confuses anyone :/ 

#!perl -w
use strict;

#leave this on while testing, otherwise comment out as it slows things down
#use diagnostics -verbose;

# set up some compatibility and formatting ease.
use File::Spec;
use CGI::Carp; 
use CGI qw( :standard :html3 );
    if ($CGI::VERSION < 2.66) {

        confess ("You need to install CGI.pm version 2.66 or later to run this 
script.\n", 
                 'The most recent version information and revision history is at 
http://stein.cshl.org/WWW/software/CGI/index.html#new') 
    };
use CGI::Pretty qw( :html3 );

my @fileslist = ();
my %listing = ();

#simple date function
my $date = sprintf "%04d-%02d-%02d",
            sub { $_[5]+1900, $_[4]+1, $_[3] }->(localtime);

my $inputDir = File::Spec->catfile(File::Spec->curdir());
my $outfile = File::Spec->catfile( $inputDir, 'maprev.html');

# delete previous file so the parsing routine doesn't scan it, as it will create 
a new one from
# the other files anyway. 
unlink $outfile; 

opendir(DIR, $inputDir) || die "can't opendir $inputDir: $!";
my @filestart = readdir(DIR);
closedir DIR;

#if the file doesn't end in *.htm*, skip and go to the next file
# let's get this outta the way first off. save time later.
foreach my $filetest (@filestart) {
    if ($filetest =~ /htm(.)?$/) {
        push @fileslist, $filetest
    }
}

# Loop through all the .htm files, using a loop handle to exit the loop for that 
file, 
# once the rating is either extracted or invalid

FILE: foreach my $filez (sort @fileslist) {
    my @filedata = ();
    my $inputfile = File::Spec->catfile( $inputDir, $filez);
    open(IN, "<$inputfile");
        chomp(@filedata = <IN>);
    close(IN);

    my $section;
    #search the lines and look for the "Rating:" phrase
foreach (@filedata) {
#       if (m!(.*)/.*zip*!) {
#           print "$1\n";
#           $section = $1 if (/nalicity\/(\w+){1}\//i);
#           print "$section\n";
#       }
    if (/Rating:*/) {
        #slurp out the rating 
        if (/\)\s*(\d?\d\.??\d?)\s*\</) {
            #if it's just coincidence and there isn't a rating, or it can't 
extract it, move on to the next file
            # otherwise dump it into the hash, and skip the rest of the file, 
since we have what we need, and move on to the next one.
            $listing{$filez} = $1; 
            next FILE
        }
    }
}
}
#begin output section and create it as html, but as an external file rather than 
running this as a CGI script.
open(OUT, ">$outfile") or die("Cannot open $outfile, $!");

#this is as yet, very simple, and doesn't make pretty output, but it works and 
is functional
#can prettify it later.
 
print OUT start_html();
print OUT h2({-align=>'center'},"The Week In Review...");
print OUT p("Sweeps week of $date has produced a new batch of ratings .. (drum 
roll please)");
print OUT '<p>';

#declare and empty the data array in advance
my @data = ();

print "Diagnostic output: \n";

foreach my $review (sort (keys %listing)) {
 print "$review, $listing{$review}\n"; # diagnostic output, local to perl only, 
not sent to output file.
    push @data, "$review " . '(' . '<a href="' . $review . '" target="_new">' . 
$listing{$review} . '</a>), ' . "\n";
}

foreach (sort @data) {
    print OUT $_;
}
print OUT '</p>';
print OUT end_html();

close OUT;

__END__

what I want to do with this is have it extract both the gametype from earlier in 
the file, and also the filename of the HTML file and the contained rating, 

and then break up the map output into the sections

so I have a HOH like 

$listing{$section}{$filename} = $rating

then I can loop thru the (sorted by) sections, and print the maps for that 
section under separate headings. 

any assistance greatly appreciated.. my brain is SO fried. :/

-- 
send mail to mactech (at) webdragon (dot) net instead of the above address. 
this is to prevent spamming. e-mail reply-to's have been altered 
to prevent scan software from extracting my address for the purpose 
of spamming me, which I hate with a passion bordering on obsession.  


------------------------------

Date: Sun, 21 May 2000 03:11:34 -0400
From: Jennifer <webmaster@momsathome.on.ca>
Subject: valid email address
Message-Id: <39278C26.ECE3C6F3@momsathome.on.ca>

Please don't yell at me.  [cringing]

I've done searches on deja.com and I've read perlfaq9.  I know
that you can't test for a valid address with a regexp nor can you
really check for valid syntax, but what I want to know is it ok
to check for something that is definitely invalid syntax?

I'm thinking that if it isn't any_char@any_two_char.any_two_char
that it isn't valid syntax.  I know I have filled out forms and
forgot the .com.  I just want to catch the stupidest of mistakes
and hopefully narrow down the bad addresses that make it through.

If this is correct, can someone offer a regexp to check for it?

Jennifer


------------------------------

Date: Sun, 21 May 2000 07:28:38 GMT
From: "The Evil Beaver" <evilbeaver.picksoft@NOSPAMzext.net>
Subject: Re: valid email address
Message-Id: <GeMV4.217713$Kv2.315543@quark.idirect.com>

/(\w+)\@(\w)(\w+).(\w)(\w+)/
or something similar.

--
The Evil Beaver <evilbeaver.picksoft@NOSPAMzext.net>
-- Remove NOSPAM to e-mail me.
This message ROT-13 encrypted twice for extra security.

Jennifer <webmaster@momsathome.on.ca> wrote...
> Please don't yell at me.  [cringing]
>
> I've done searches on deja.com and I've read perlfaq9.  I know
> that you can't test for a valid address with a regexp nor can you
> really check for valid syntax, but what I want to know is it ok
> to check for something that is definitely invalid syntax?
>
> I'm thinking that if it isn't any_char@any_two_char.any_two_char
> that it isn't valid syntax.  I know I have filled out forms and
> forgot the .com.  I just want to catch the stupidest of mistakes
> and hopefully narrow down the bad addresses that make it through.
>
> If this is correct, can someone offer a regexp to check for it?
>
> Jennifer
>




------------------------------

Date: Sun, 21 May 2000 03:39:35 -0400
From: Jennifer <webmaster@momsathome.on.ca>
Subject: Re: valid email address
Message-Id: <392792B7.884AC4FC@momsathome.on.ca>



Jennifer wrote:
> 
> Please don't yell at me.  [cringing]
> 
> I've done searches on deja.com and I've read perlfaq9.  I know
> that you can't test for a valid address with a regexp nor can you
> really check for valid syntax, but what I want to know is it ok
> to check for something that is definitely invalid syntax?
> 
> I'm thinking that if it isn't any_char@any_two_char.any_two_char
> that it isn't valid syntax.  I know I have filled out forms and
> forgot the .com.  I just want to catch the stupidest of mistakes
> and hopefully narrow down the bad addresses that make it through.
> 
> If this is correct, can someone offer a regexp to check for it?


I'm still very weak on regexp, but this is what I came up with.
Anything wrong with this? Remember, I don't care if bad addresses
get through, I just want to stop some of them at least, but I
don't want to stop good ones. I don't care if people are
deliberately trying to trick it, since a human will be receiving
the addresses anyway.  I just hope to catch some stupid mistakes
before the visitor leaves the page.

/.+@[a-zA-Z0-9\.-]{2,}\.[a-zA-Z]{2,}$/

Jennifer


------------------------------

Date: Sun, 21 May 2000 07:44:31 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: valid email address
Message-Id: <x7r9awe49u.fsf@home.sysarch.com>

>>>>> "TEB" == The Evil Beaver <evilbeaver.picksoft@NOSPAMzext.net> writes:

  TEB> /(\w+)\@(\w)(\w+).(\w)(\w+)/
  TEB> or something similar.

or something wrong. that fails in so many ways such as 3 level domain names.

read the faq on this. email addresses cannot be fully verified. even
partially verified is difficult.

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  -----------  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  ----------  http://www.northernlight.com


------------------------------

Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V9 Issue 3110
**************************************


home help back first fref pref prev next nref lref last post