[25094] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 7345 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Oct 31 00:10:31 2004

Date: Sat, 30 Oct 2004 21:10:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 30 Oct 2004     Volume: 10 Number: 7345

Today's topics:
        Regular Expression confusion <c.taylor@no.spam>
    Re: Regular Expression confusion <postmaster@castleamber.com>
    Re: Regular Expression confusion <postmaster@castleamber.com>
    Re: Regular Expression confusion <matthew.garrish@sympatico.ca>
    Re: Regular Expression confusion <c.taylor@no.spam>
    Re: Regular Expression confusion <c.taylor@no.spam>
    Re: Regular Expression confusion <see@sig.invalid>
    Re: Regular Expression confusion <mritty@gmail.com>
    Re: Regular Expression confusion <c.taylor@no.spam>
    Re: Regular Expression confusion <see@sig.invalid>
    Re: Regular Expression confusion <see@sig.invalid>
    Re: Regular Expression confusion <c.taylor@no.spam>
        RFC: generator_generator 1.01 <newspost@coppit.org>
    Re: web hoster won't secure CGI <noreply@gunnar.cc>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 30 Oct 2004 21:05:51 -0400
From: Christie Taylor <c.taylor@no.spam>
Subject: Regular Expression confusion
Message-Id: <41843A6F.F188BB26@no.spam>

I'm new to regular expressions and am trying to create a fairly simple
one to validate input.  The goal is to accept an optional word followed
by zero or one spaces, then a required number of 1 to 8 digits.

I tried m/^word\s{0,1}\d{1,7}/i  to get started (ignoring my  optional
part) but this is not working, it's matching on way too much. What am I
missing?


Thanks!



------------------------------

Date: 31 Oct 2004 01:22:48 GMT
From: John Bokma <postmaster@castleamber.com>
Subject: Re: Regular Expression confusion
Message-Id: <Xns9592CF50F9AF7castleamber@130.133.1.4>

Christie Taylor wrote:

> I'm new to regular expressions and am trying to create a fairly simple
> one to validate input.  The goal is to accept an optional word followed
> by zero or one spaces, then a required number of 1 to 8 digits.
> 
> I tried m/^word\s{0,1}\d{1,7}/i

Your word is not optional
? is a shortcut for zero or one
1,7 v.s. 8 digits..

> to get started (ignoring my  optional
> part) but this is not working, it's matching on way too much. What am I
> missing?

Post *always* real code, and real examples. Perl is shorter and more clear 
than (your) English.

-- 
John                               MexIT: http://johnbokma.com/mexit/
                           personal page:       http://johnbokma.com/
        Experienced programmer available:     http://castleamber.com/
            Happy Customers: http://castleamber.com/testimonials.html


------------------------------

Date: 31 Oct 2004 01:24:33 GMT
From: John Bokma <postmaster@castleamber.com>
Subject: Re: Regular Expression confusion
Message-Id: <Xns9592CF9CB7BC2castleamber@130.133.1.4>

Christie Taylor wrote:

> I'm new to regular expressions and am trying to create a fairly simple
> one to validate input.  The goal is to accept an optional word followed
> by zero or one spaces, then a required number of 1 to 8 digits.
> 
> I tried m/^word\s{0,1}\d{1,7}/i  to get started (ignoring my  optional
> part) but this is not working, it's matching on way too much. What am I
> missing?

Also note that your digit match is not anchored, you can replace it with \d 
since the pattern also matches 8, 9, .... digits.

-- 
John                               MexIT: http://johnbokma.com/mexit/
                           personal page:       http://johnbokma.com/
        Experienced programmer available:     http://castleamber.com/
            Happy Customers: http://castleamber.com/testimonials.html


------------------------------

Date: Sat, 30 Oct 2004 21:20:03 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: Regular Expression confusion
Message-Id: <15Xgd.43817$rs5.1386983@news20.bellglobal.com>

"Christie Taylor" <c.taylor@no.spam> wrote in message 
news:41843A6F.F188BB26@no.spam...
> I'm new to regular expressions and am trying to create a fairly simple
> one to validate input.  The goal is to accept an optional word followed
> by zero or one spaces, then a required number of 1 to 8 digits.
>
> I tried m/^word\s{0,1}\d{1,7}/i  to get started (ignoring my  optional
> part) but this is not working, it's matching on way too much. What am I
> missing?
>

"matching on way too much" is not a very good description of your problem. 
The only thing I see that is obviously wrong compared to your description is 
the \s{0,1}. This will match on any whitespace, not just spaces (i.e., tabs, 
newlines, etc.). You might want to try writing it as:

/^word ?\d{1,7}/i

That or provide examples of what it is matching compared to what you were 
expecting.

Matt 




------------------------------

Date: Sat, 30 Oct 2004 21:46:29 -0400
From: Christie Taylor <c.taylor@no.spam>
Subject: Re: Regular Expression confusion
Message-Id: <418443F5.496D5170@no.spam>

John Bokma wrote:

> Christie Taylor wrote:
>
> > I'm new to regular expressions and am trying to create a fairly simple
> > one to validate input.  The goal is to accept an optional word followed
> > by zero or one spaces, then a required number of 1 to 8 digits.
> >
> > I tried m/^word\s{0,1}\d{1,7}/i  to get started (ignoring my  optional
> > part) but this is not working, it's matching on way too much. What am I
> > missing?
>
> Also note that your digit match is not anchored, you can replace it with \d
> since the pattern also matches 8, 9, .... digits.

Ok, here's what I have.

if ($mystring =~ m/(^word)?\s?\d{1,8}$/i) {
   print "it matches!\n";
}

word should be optional, but must be first if present.
a number between 1 & 8 digits must be present.
a space may or may not be between the word and the number.

Unfortunately it's matching on a lot more than I want it to :(
Thanks!




------------------------------

Date: Sat, 30 Oct 2004 22:02:27 -0400
From: Christie Taylor <c.taylor@no.spam>
Subject: Re: Regular Expression confusion
Message-Id: <418447B2.8156312A@no.spam>

Christie Taylor wrote:

> John Bokma wrote:
>
> > Christie Taylor wrote:
> >
> > > I'm new to regular expressions and am trying to create a fairly simple
> > > one to validate input.  The goal is to accept an optional word followed
> > > by zero or one spaces, then a required number of 1 to 8 digits.
> > >
> > > I tried m/^word\s{0,1}\d{1,7}/i  to get started (ignoring my  optional
> > > part) but this is not working, it's matching on way too much. What am I
> > > missing?
> >
> > Also note that your digit match is not anchored, you can replace it with \d
> > since the pattern also matches 8, 9, .... digits.
>
> Ok, here's what I have.
>
> if ($mystring =~ m/(^word)?\s?\d{1,8}$/i) {
>    print "it matches!\n";
> }
>
> word should be optional, but must be first if present.
> a number between 1 & 8 digits must be present.
> a space may or may not be between the word and the number.
>
> Unfortunately it's matching on a lot more than I want it to :(
> Thanks!

To be more specific it is matching on any number of digits and any letters
instead of just _word_. :(




------------------------------

Date: Sat, 30 Oct 2004 22:12:18 -0400
From: Bob Walton <see@sig.invalid>
Subject: Re: Regular Expression confusion
Message-Id: <418447cc_1@127.0.0.1>

Christie Taylor wrote:

> I'm new to regular expressions and am trying to create a fairly simple
> one to validate input.  The goal is to accept an optional word followed
> by zero or one spaces, then a required number of 1 to 8 digits.
> 
> I tried m/^word\s{0,1}\d{1,7}/i  to get started (ignoring my  optional
> part) but this is not working, it's matching on way too much. What am I
> missing?

Well, one thing you are missing is following the posting guidelines for 
this newsgroup where is says something along the lines of "include a 
short but complete (with data) program that anyone can copy/paste/run 
which illustrates your difficulty".  Also "this is not working, it's 
matching on way too much" is as vague as "it doesn't work" -- it tells 
us nothing -- be specific -- what *exactly* does it match that you don't 
think it should?

The regexp you included doesn't appear to match your stated criteria 
very well:  the "word" isn't optional; the digits matched are one 
through seven, not eight -- and, since the trailing end is not anchored, 
it will also match on a string with 8, 9, 10, or 100000 digits at that 
location.  Going from your stated criteria, I would think:

    m/^(?:word)?\s?\d{1,8}(?:\D|$)/i

might be close.  Example (of the regexp and of the sort of test code 
expected on this newsgroup):

use strict;
use warnings;
while(<DATA>){
    chomp;
    print '>',$_,'<',
       m/^(?:word)?\s?\d{1,8}(?:\D|$)/i ?
          ' matched':
          ' did not match',"\n";
}
__END__
word123
xxx234
WoRd 345
WORD 2398723498723492387
word 234xxx
word  567

word 12345678blahblahblah
1234xxx
  1234xxx
  9
9

 ...
-- 
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl


------------------------------

Date: Sat, 30 Oct 2004 22:28:44 -0400
From: Paul Lalli <mritty@gmail.com>
Subject: Re: Regular Expression confusion
Message-Id: <cm1im3$gpq$1@misc-cct.server.rpi.edu>

Christie Taylor wrote:
> Christie Taylor wrote:
>>
>>Ok, here's what I have.
>>
>>if ($mystring =~ m/(^word)?\s?\d{1,8}$/i) {
>>   print "it matches!\n";
>>}
>>
>>word should be optional, but must be first if present.
>>a number between 1 & 8 digits must be present.
>>a space may or may not be between the word and the number.
>>
>>Unfortunately it's matching on a lot more than I want it to :(
>>Thanks!
> 
> 
> To be more specific it is matching on any number of digits and any letters
> instead of just _word_. :(

Most likely this is because you've included the 'start of string' anchor 
within the optional match.  So there's no requirement for the match to 
be anchored to the beginning of the string.  Try putting the ^ at the 
beginning of the pattern, outside the parentheses.

Paul Lalli


------------------------------

Date: Sat, 30 Oct 2004 22:42:30 -0400
From: Christie Taylor <c.taylor@no.spam>
Subject: Re: Regular Expression confusion
Message-Id: <41845116.89180075@no.spam>

Bob Walton wrote:

> Christie Taylor wrote:
>
> > I'm new to regular expressions and am trying to create a fairly simple
> > one to validate input.  The goal is to accept an optional word followed
> > by zero or one spaces, then a required number of 1 to 8 digits.
> >
> > I tried m/^word\s{0,1}\d{1,7}/i  to get started (ignoring my  optional
> > part) but this is not working, it's matching on way too much. What am I
> > missing?
>
> Well, one thing you are missing is following the posting guidelines for
> this newsgroup where is says something along the lines of "include a
> short but complete (with data) program that anyone can copy/paste/run
> which illustrates your difficulty".  Also "this is not working, it's
> matching on way too much" is as vague as "it doesn't work" -- it tells
> us nothing -- be specific -- what *exactly* does it match that you don't
> think it should?
>
> The regexp you included doesn't appear to match your stated criteria
> very well:  the "word" isn't optional; the digits matched are one
> through seven, not eight -- and, since the trailing end is not anchored,
> it will also match on a string with 8, 9, 10, or 100000 digits at that
> location.  Going from your stated criteria, I would think:
>
>     m/^(?:word)?\s?\d{1,8}(?:\D|$)/i

That looks like what I'm trying to do!  I'm trying to understand how
everything works.  It looks like the ?: inside the first parenthesis makes
word optional.  I'm not sure how the final parenthesis works though.  Why is
\D (non-digit) used?

Thanks!



------------------------------

Date: Sat, 30 Oct 2004 22:51:21 -0400
From: Bob Walton <see@sig.invalid>
Subject: Re: Regular Expression confusion
Message-Id: <418450f2$1_1@127.0.0.1>

Christie Taylor wrote:
> Christie Taylor wrote:
> 
> 
>>John Bokma wrote:
>>
>>
>>>Christie Taylor wrote:
 ...

>>Ok, here's what I have.
>>
>>if ($mystring =~ m/(^word)?\s?\d{1,8}$/i) {
>>   print "it matches!\n";
>>}

That regexp (a *very* different one than your initial posting) has 
["word" starting at the beginning of the string] as the optional entity. 
  Thus (since the whitespace is also optional) anything with a digit on 
the end of the string will match, regardless of what is at the beginning 
of the string.  I have an idea that might be why "it's matching on a lot 
more" than you want it to.  It is equivalent to:

     m/\d$/

Also, you are using capturing parentheses.  Did you really intend that, 
since you're not making use of the captured values (or if you are, 
you're doing it wrong, since the capture isn't inside the if block)?  If 
you don't intend a capture, use (?:regexp) instead, which is identical 
to (regexp) except it doesn't capture.  That makes your intent clearer.

>>
>>word should be optional, but must be first if present.
>>a number between 1 & 8 digits must be present.
>>a space may or may not be between the word and the number.
>>
>>Unfortunately it's matching on a lot more than I want it to :(
>>Thanks!
> 
> 
> To be more specific it is matching on any number of digits and any letters
> instead of just _word_. :(

That's getting closer :-).  How about an *example* of an *actual string* 
that matches that you don't think should have matched.  Actually, how 
about a bunch of very different ones (like a dozen, not thousands) with 
brief statements of *why* you think they should or should not match.

And how about putting them after an __END__ of a program that reads 
them, attempts the match with your regexp, and prints out if it succeeds 
or fails, which anyone could copy/paste/run?  That would help tremedously.

It also seems that you don't have a clear idea of what you want to match 
and what you don't.  You keep saying "word" is optional, and then you 
complain about it matching strings that don't start with "word"???  A 
starting point for working with regexp is a *clear* statement of what is 
to be matched and not matched.  Sets of example strings are usually 
helpful, especially for testing your regexp.

-- 
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl


------------------------------

Date: Sat, 30 Oct 2004 23:18:07 -0400
From: Bob Walton <see@sig.invalid>
Subject: Re: Regular Expression confusion
Message-Id: <41845738$1_4@127.0.0.1>

Christie Taylor wrote:

> Bob Walton wrote:
> 
> 
>>Christie Taylor wrote:
>>
 ...
>>    m/^(?:word)?\s?\d{1,8}(?:\D|$)/i
> 
> 
> That looks like what I'm trying to do!  I'm trying to understand how
> everything works.  It looks like the ?: inside the first parenthesis makes
> word optional.  I'm not sure how the final parenthesis works though.  Why is

No, it is the "?" after the parens that makes "word" optional.  The ?: 
inside the parens makes the parens non-capturing.  Don't guess at 
syntax, refer to the truly wonderful reference material present in:

    perldoc perlre

> \D (non-digit) used?

Well, in brief explanation:

     ^  <--causes the rest of the regexp to start
           matching at the beginning of the string.
     (?:word)  <--is just like (word) except it doesn't
                   capture -- it looks like you're not
                   capturing.
     (?:word)?  <--the ? makes the presence of "word"
                   optional (zero or one occurrences).
     \s?  <--matches zero or one whitespace characters
     \d{1,8}  <--matches one to eight digits
     (?:\D|$)  <--matches without capturing either a
                  non-digit or the end of the string.
                  Same as (\D|$) except it doesn't capture.

The non-digit is used to permit trailing non-digit characters after the 
last of the digits, so something like:

word 123 blah blah blah

will match (with \D matching a space character).  I still don't know if 
that is part of what you desire, but if optional content starting with a 
non-digit is to be permitted after the digits, this is one of I'm sure 
many ways of making that happen.

For additional detail, please refer to:

    perldoc perlre
    perldoc perlretut
    perldoc perlreref

etc etc.  perlre is the "bible" of regular expressions -- you'll need to 
master it.  A good book on the subject, like "Mastering Regular 
Expressions", would probably help a lot too.
-- 
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl


------------------------------

Date: Sat, 30 Oct 2004 23:34:03 -0400
From: Christie Taylor <c.taylor@no.spam>
Subject: Re: Regular Expression confusion
Message-Id: <41845D2B.A664EBA0@no.spam>

Bob Walton wrote:

> Christie Taylor wrote:
>
> > Bob Walton wrote:
> >
> >
> >>Christie Taylor wrote:
> >>
> ...
> >>    m/^(?:word)?\s?\d{1,8}(?:\D|$)/i
> >
> >
> > That looks like what I'm trying to do!  I'm trying to understand how
> > everything works.  It looks like the ?: inside the first parenthesis makes
> > word optional.  I'm not sure how the final parenthesis works though.  Why is
>
> No, it is the "?" after the parens that makes "word" optional.  The ?:
> inside the parens makes the parens non-capturing.  Don't guess at
> syntax, refer to the truly wonderful reference material present in:
>
>     perldoc perlre
>
> > \D (non-digit) used?
>
> Well, in brief explanation:
>
>      ^  <--causes the rest of the regexp to start
>            matching at the beginning of the string.
>      (?:word)  <--is just like (word) except it doesn't
>                    capture -- it looks like you're not
>                    capturing.
>      (?:word)?  <--the ? makes the presence of "word"
>                    optional (zero or one occurrences).
>      \s?  <--matches zero or one whitespace characters
>      \d{1,8}  <--matches one to eight digits
>      (?:\D|$)  <--matches without capturing either a
>                   non-digit or the end of the string.
>                   Same as (\D|$) except it doesn't capture.
>
> The non-digit is used to permit trailing non-digit characters after the
> last of the digits, so something like:
>
> word 123 blah blah blah
>
> will match (with \D matching a space character).  I still don't know if
> that is part of what you desire, but if optional content starting with a
> non-digit is to be permitted after the digits, this is one of I'm sure
> many ways of making that happen.
>
> For additional detail, please refer to:
>
>     perldoc perlre
>     perldoc perlretut
>     perldoc perlreref
>
> etc etc.  perlre is the "bible" of regular expressions -- you'll need to
> master it.  A good book on the subject, like "Mastering Regular
> Expressions", would probably help a lot too.

Ok, thanks so much!  I have been reading "Mastering Regular Expressions" by
O'Reilly although I've been overwhelmed by it so far.  I'll try experimenting with
some more practice regexes.






------------------------------

Date: 30 Oct 2004 21:54:28 EDT
From: David Coppit <newspost@coppit.org>
Subject: RFC: generator_generator 1.01
Message-Id: <Pine.BSF.4.61.0410302153430.39113@www.provisio.net>

Hello everyone,

I'm seeking comments about a script I'll be releasing on CPAN soon. You
can think of it as YACC for generators instead of parsers.

   generator_generator - given a grammar, generates a C++ string generator

   generate_generator generates a C++ program that generates all strings of
   the format specified by the input grammar. The resulting generator
   generates strings of the user-specified length. The input file format is
   very similar to the normal YACC and LEX input files.  This program is
   designed to create a fast program for computing the "exhaustive initial
   segment" of inputs for testing of another program.

I've tried very hard to make it easy to use, but it's still fragile due to
the complexity and system-dependent nature of it. I hope to test on a
wider set of platforms to make it more robust.  You can get it here if you
want to take a look:

http://www.coppit.org/temp/generator_generator-1.01.tar.gz

The TODO file contains the beginnings of a tutorial, which I need to
write. I also need to update the documentation. I'll finish all that
before releasing the script publicly. If you want to try it out
out-of-the-box (after running "perl Makefile.PL"):

   ./generator_generator -fm -u in*tree grammars/f*.yg grammars/f*.lg
   ./output/progs/generate 12

to generate fault trees of length 13, and:

   ./generator_generator -fm -u in*ion grammars/l*.yg grammars/l*.lg
   ./output/progs/generate 4

to generate logical expressions of length 8.

I need comments on the following:

- I really don't like the name. Anyone have a better one?
- I'm not really sure how to explain it succinctly. The above blurb is
   quite bad. Can anyone suggest a better description?
- I need to install templates and other non-module code on the system.
   Should I use the generator_generator:: namespace? (Anyone know how to do
   installation of non-module files with Module::Install?)
- Is it best to write the tutorial up in POD, and embed it in an otherwise
   empty module? (I think Inline does this.)
- I need to distribute some example files. Is there a standard directory
   like "eg" or "ex"? If I choose "examples" will that break anything?
- Should I install examle files on the system?

If you happen to try it out, please let me know.

Regards,
David


------------------------------

Date: Sat, 30 Oct 2004 15:19:41 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: web hoster won't secure CGI
Message-Id: <2uhjlsF2a8a4hU1@uni-berlin.de>

wana wrote:
> Thanks, I read the CGI docs and the book chapter and it does explain
> how to protect scripts globally (change CGI.pm file) or on a
> script-by-script basis.  The book words it a little differently and
> emphasizes the changing of the CGI.pm file more.

If that's the case, Lincoln had reasonably not shared web hosting
environments in mind when he wrote it.

> It seems to me that a web hosting service would probably want to set
> the defaults in such a way to prevent attacks:
> 
>                $CGI::POST_MAX=1024 * 100;  # max 100K posts
>                $CGI::DISABLE_UPLOADS = 1;  # no uploads

I'd say that it would be most ill-advised to do so, for precisely the
same reason why Lincoln Stein does not change the default behaviour of
the module.

> They just have a policy of not modifying any modules, which makes
> sense.

Indeed. Actually, if they had accepted to change it after a request from
one of their customers, *that* would have been a reason to consider
putting them down. ;-)

> By the way, the book 'MySQL and Perl for the Web' by Paul Dubois
> clued me in on the importance of security in CGI scripts and the
> complete lack of security that html forms provide.

That's great. Making DoS attacks more difficult is one thing you can
do. Validating the data and enabling tainted mode are two other
important steps to reduce the inherent risks with CGI.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 7345
***************************************


home help back first fref pref prev next nref lref last post