[17205] in Perl-Users-Digest
Perl-Users Digest, Issue: 4617 Volume: 9
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Oct 14 14:10:21 2000
Date: Sat, 14 Oct 2000 11:10:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <971547010-v9-i4617@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Sat, 14 Oct 2000 Volume: 9 Number: 4617
Today's topics:
Re: Regex for matching e-mail addresses <james@NOSPAM.demon.co.uk>
Re: Regex for matching e-mail addresses (Randal L. Schwartz)
Re: Regex for matching e-mail addresses <calle@lysator.liu.se>
Re: Regex for matching e-mail addresses (Randal L. Schwartz)
Re: Regex for matching e-mail addresses <james@NOSPAM.demon.co.uk>
Re: Regex for matching e-mail addresses (Tony L. Svanstrom)
Re: Regex for matching e-mail addresses <nospam.newton@gmx.li>
Re: Regex for matching e-mail addresses <calle@lysator.liu.se>
split problem <stphw@ihug.com.au>
Re: split problem <godzilla@stomp.stomp.tokyo>
Re: split problem (Mark-Jason Dominus)
Re: Why these 2 simple errors? <uri@sysarch.com>
Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 14 Oct 2000 16:06:23 +0100
From: James Taylor <james@NOSPAM.demon.co.uk>
Subject: Re: Regex for matching e-mail addresses
Message-Id: <ant141523927fNdQ@oakseed.demon.co.uk>
In article <slrn8ugneg.ue5.tjla@thislove.dyndns.org>, Gwyn Judd
<URL:mailto:tjla@guvfybir.qlaqaf.bet> wrote:
>
> Is there a better way to check for a syntactically valid email address?
I'm fairly new to this problem area, but I've read the cookbook recipe
6.19, the section entitled "Matching an Email Address" in Mastering
Regular Expressions, and appendix D of RFC822. From all that, I've come
to the conclusion that a regex that matches a *valid* address is not
particularly useful because it is too lenient. There is a module
available called Email::Valid by Maurice Aubrey that will check for
RFC822 compliance, but surely what you really want is something that
checks whether an email address is of the form actually used on
today's Internet.
I would like to see some examples of *real* email addresses that
someone would be likely to enter into a web form, for example.
That probably means that the domain part will match something like
([\w-]+\.)+[\w-]{2,4} but the local part has greater flexibility.
For the local part I might add the possibility of a '+' and a '='
with something like ([\w+=-]+\.)*[\w+=-]+ but I'm not confident.
Putting that together we have:
^([\w+=-]+\.)*[\w+=-]+\@([\w-]+\.)+[\w-]{2,4}$
Please can people who know of any large class of *real* Internet
addresses that would not match the above regex let us know. Also,
can you think of any addresses which would match that but which
would be obviously invalid to the human eye. Thanks.
--
James Taylor <james (at) oakseed demon co uk>
PGP key available ID: 3FBE1BF9
Fingerprint: F19D803624ED6FE8 370045159F66FD02
------------------------------
Date: 14 Oct 2000 08:26:39 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Regex for matching e-mail addresses
Message-Id: <m1snpz78b4.fsf@halfdome.holdit.com>
>>>>> "James" == James Taylor <james@NOSPAM.demon.co.uk> writes:
James> I would like to see some examples of *real* email addresses that
James> someone would be likely to enter into a web form, for example.
James> That probably means that the domain part will match something like
James> ([\w-]+\.)+[\w-]{2,4} but the local part has greater flexibility.
James> For the local part I might add the possibility of a '+' and a '='
James> with something like ([\w+=-]+\.)*[\w+=-]+ but I'm not confident.
James> Putting that together we have:
James> ^([\w+=-]+\.)*[\w+=-]+\@([\w-]+\.)+[\w-]{2,4}$
James> Please can people who know of any large class of *real* Internet
James> addresses that would not match the above regex let us know. Also,
James> can you think of any addresses which would match that but which
James> would be obviously invalid to the human eye. Thanks.
Yes. <fred&barney@stonehenge.com> is a *real* Internet address. It
*really* exists. If you send mail to it, you'll get an autoresponder.
Now stop looking at it from "how narrow can we make this". If it's
narrower than RFC822, it's *needlessly* *too* *narrow*.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
------------------------------
Date: 14 Oct 2000 17:29:48 +0200
From: Calle Dybedahl <calle@lysator.liu.se>
Subject: Re: Regex for matching e-mail addresses
Message-Id: <86snpz785v.fsf@tezcatlipoca.algonet.se>
>>>>> "James" == James Taylor <james@NOSPAM.demon.co.uk> writes:
> Please can people who know of any large class of *real* Internet
> addresses that would not match the above regex let us know.
Just about anything that's meant to go into another mail system (such
as Notes, cc:Mail, X.400 or MEMO) is likely to not match your regexp.
Why do you want to forbid certain local parts anyway?
--
Calle Dybedahl, Vasav. 82, S-177 52 Jaerfaella,SWEDEN | calle@lysator.liu.se
"My Body Is A Temple...To Bacchus" -- Penny Dreadful
------------------------------
Date: 14 Oct 2000 08:35:53 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Regex for matching e-mail addresses
Message-Id: <m1lmvr77vq.fsf@halfdome.holdit.com>
>>>>> "Calle" == Calle Dybedahl <calle@lysator.liu.se> writes:
>>>>> "James" == James Taylor <james@NOSPAM.demon.co.uk> writes:
>> Please can people who know of any large class of *real* Internet
>> addresses that would not match the above regex let us know.
Calle> Just about anything that's meant to go into another mail system (such
Calle> as Notes, cc:Mail, X.400 or MEMO) is likely to not match your regexp.
Calle> Why do you want to forbid certain local parts anyway?
Perhaps because James is not aware of that large class of users. :)
Anyway, I also am reminded of my friend, Eli-the-bearded, who
frequently posts here under the email address of <*@qz.to>. So
please make sure your form works for him too. {grin}
print "Just another Perl hacker,";
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
------------------------------
Date: Sat, 14 Oct 2000 17:17:47 +0100
From: James Taylor <james@NOSPAM.demon.co.uk>
Subject: Re: Regex for matching e-mail addresses
Message-Id: <ant141647480fNdQ@oakseed.demon.co.uk>
In article <m1snpz78b4.fsf@halfdome.holdit.com>, Randal L. Schwartz
<URL:mailto:merlyn@stonehenge.com> wrote:
> >>>>> "James" == James Taylor writes:
>
> Yes. <fred&barney@stonehenge.com> is a *real* Internet address. It
> *really* exists. If you send mail to it, you'll get an autoresponder.
>
> Now stop looking at it from "how narrow can we make this". If it's
> narrower than RFC822, it's *needlessly* *too* *narrow*.
You've sussed me. I *am* trying to see "how narrow can we make this"
because for practical purposes I don't think RFC822 is useful. In my
particular web form I want to match a restrictive set of email addresses
and for the 0.01% of visitors who cannot supply a suitable address I
will return an error page that grovels with apology for being unable
to accept their address and asks that they contact us by email instead.
I feel it is better to inconvenience a very small number of people
than to risk a security breach due to me untainting and using
addresses containing potentially dangerous escape sequences, etc.
In article <m1lmvr77vq.fsf@halfdome.holdit.com>, Randal L. Schwartz
<URL:mailto:merlyn@stonehenge.com> wrote:
> >>>>> "Calle" == Calle Dybedahl <calle@lysator.liu.se> writes:
>
> >>>>> "James" == James Taylor writes:
> >> Please can people who know of any large class of *real* Internet
> >> addresses that would not match the above regex let us know.
>
> Calle> Just about anything that's meant to go into another mail system
> Calle> (such as Notes, cc:Mail, X.400 or MEMO) is likely to not match
> Calle> your regexp. Why do you want to forbid certain local parts anyway?
>
> Perhaps because James is not aware of that large class of users. :)
You're correct, I wasn't, but I *was* aware that such things might
exist, which is why I asked for examples in the hope that people would
tell me what format such addresses actually took. This is because I've
still got my thinking stuck in the "how narrow can we make this" rut.
Can someone give me an example of local parts in Notes, cc:Mail, X.400
and MEMO format? Please, Pretty-please.
What extra characters would I need to add the the character classes in
my regex to allow addresses in these formats?
Thanks for your help...
> Anyway, I also am reminded of my friend, Eli-the-bearded, who
> frequently posts here under the email address of <*@qz.to>. So
> please make sure your form works for him too. {grin}
Hmmm, interesting. Perhaps I should expand * to postmaster just for him. :-)
--
James Taylor <james (at) oakseed demon co uk>
PGP key available ID: 3FBE1BF9
Fingerprint: F19D803624ED6FE8 370045159F66FD02
------------------------------
Date: Sat, 14 Oct 2000 17:12:59 GMT
From: tony@svanstrom.com (Tony L. Svanstrom)
Subject: Re: Regex for matching e-mail addresses
Message-Id: <1eiiah0.1j9344x14djz62N%tony@svanstrom.com>
James Taylor <james@NOSPAM.demon.co.uk> wrote:
> In article <m1snpz78b4.fsf@halfdome.holdit.com>, Randal L. Schwartz
> <URL:mailto:merlyn@stonehenge.com> wrote:
> > >>>>> "James" == James Taylor writes:
> >
> > Yes. <fred&barney@stonehenge.com> is a *real* Internet address. It
> > *really* exists. If you send mail to it, you'll get an autoresponder.
> >
> > Now stop looking at it from "how narrow can we make this". If it's
> > narrower than RFC822, it's *needlessly* *too* *narrow*.
>
> You've sussed me. I *am* trying to see "how narrow can we make this"
> because for practical purposes I don't think RFC822 is useful. In my
> particular web form I want to match a restrictive set of email addresses
> and for the 0.01% of visitors who cannot supply a suitable address I
> will return an error page that grovels with apology for being unable
> to accept their address and asks that they contact us by email instead.
And you know it's only 0.01% because...?
> I feel it is better to inconvenience a very small number of people
> than to risk a security breach due to me untainting and using
> addresses containing potentially dangerous escape sequences, etc.
Don't remove everything because of paranoia; instead look at your code
and see how you handle the text and look at the external programs that
you send the text to... If you haven't been too lazy/non-aware of
securityrelated problems you'll find that a lot of what you think might
be dangerous stuff actually won't do any harm.
/Tony
--
/\___/\ Who would you like to read your messages today? /\___/\
\_@ @_/ Protect your privacy: <http://www.pgpi.com/> \_@ @_/
--oOO-(_)-OOo---------------------------------------------oOO-(_)-OOo--
on the verge of frenzy - i think my mask of sanity is about to slip
---ôôô---ôôô-----------------------------------------------ôôô---ôôô---
\O/ \O/ ©99-00 <http://www.svanstrom.com/?ref=news> \O/ \O/
------------------------------
Date: Sat, 14 Oct 2000 19:32:43 +0200
From: "Philip 'Yes, that's my address' Newton" <nospam.newton@gmx.li>
Subject: Re: Regex for matching e-mail addresses
Message-Id: <5g1husolm343bjivu8u9fbttive46cjt4b@4ax.com>
On Sat, 14 Oct 2000 16:06:23 +0100, James Taylor <james@NOSPAM.demon.co.uk>
wrote:
> That probably means that the domain part will match something like
> ([\w-]+\.)+[\w-]{2,4}
Not necesarily. For example, I believe someone once mentioned he had an email
address of hostmaster@dk, and someone else mentioned someusername@cx . These
would eliminate the dot as being necessary.
Cheers,
Philip
------------------------------
Date: 14 Oct 2000 19:27:12 +0200
From: Calle Dybedahl <calle@lysator.liu.se>
Subject: Re: Regex for matching e-mail addresses
Message-Id: <86hf6f72q7.fsf@tezcatlipoca.algonet.se>
>>>>> "James" == James Taylor <james@NOSPAM.demon.co.uk> writes:
> Can someone give me an example of local parts in Notes, cc:Mail, X.400
> and MEMO format? Please, Pretty-please.
No. Why bother? You've decided you're not going to accept
Internet-standard mail. That's enough reason for me not to help you.
If you're going to block 1%, 10% or 99% is irrelevant, it's still wrong.
Your security argument is bogus. The security does not come from what
data you accept as input, but from what you do with the data once you
have it.
--
Calle Dybedahl, Vasav. 82, S-177 52 Jaerfaella,SWEDEN | calle@lysator.liu.se
Try again. Try harder. -*- Fail again. Fail better.
------------------------------
Date: Sun, 15 Oct 2000 01:44:30 +1000
From: Neo James Crum <stphw@ihug.com.au>
Subject: split problem
Message-Id: <39E87F5E.36461E36@ihug.com.au>
Hello,
I have a problem using split that I can't figure out how to solve. In a
csv file supplied to me, I have a line like follows:
234,tree,"Smith, John",6834
The problem I face is how to get split to ignore the comma inside the
quotation marks.
Should I use a regex to delete and store contents and locations of
quotation marks like this case? Use split and then somehow place the
stored contents where they're suppose to go? It's the best solution I
can think of.
Thanks, Neo.
--
We wouldn't care so much what people thought of us if we knew how seldom
they did.
SJW: http://www.ozemail.com.au/~stphw
------------------------------
Date: Sat, 14 Oct 2000 10:04:25 -0700
From: "Godzilla!" <godzilla@stomp.stomp.tokyo>
Subject: Re: split problem
Message-Id: <39E89219.11893556@stomp.stomp.tokyo>
Neo James Crum wrote:
(snipped)
> ...I have a line like follows:
> 234,tree,"Smith, John",6834
> ...how to get split to ignore the comma inside the
> quotation marks.
Do not respond with changed parameters and,
"This does not work."
Godzilla!
--
TEST SCRIPT:
------------
#!/usr/local/bin/perl
print "Content-type: text/plain\n\n";
$in = '234,tree,"Smith, John",6834';
print "Input:\n $in\n\n";
$in =~ s/, /© /;
@Array = split (/,/, $in);
print "Output:\n";
foreach $element (@Array)
{
if ($element =~ /©/)
{ $element =~ s/©/,/; }
print " $element\n";
}
exit;
PRINTED RESULTS:
----------------
Input:
234,tree,"Smith, John",6834
Output:
234
tree
"Smith, John"
6834
------------------------------
Date: Sat, 14 Oct 2000 18:00:52 GMT
From: mjd@plover.com (Mark-Jason Dominus)
Subject: Re: split problem
Message-Id: <39e89f53.44d$16b@news.op.net>
Keywords: Hemingway, allotropic, auxiliary, sapiens
In article <39E87F5E.36461E36@ihug.com.au>,
Neo James Crum <stphw@ihug.com.au> wrote:
>The problem I face is how to get split to ignore the comma inside the
>quotation marks.
That's actually a Frequently Asked Question (lots of people deal with
CSV data) and the usual answer is that you should get a copy of the
Text::CSV module (which you can find at search.cpan.org) and use that;
it'll probably be more robust than whatever you put together yourself.
The FAQ list probably has more information; take a look.
------------------------------
Date: Sat, 14 Oct 2000 15:43:50 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: Why these 2 simple errors?
Message-Id: <x7zok7v361.fsf@home.sysarch.com>
>>>>> "BN" == BUCK NAKED1 <dennis100@webtv.net> writes:
>> uri@sysarch.com (Uri=A0Guttman) wrote:
>> so you are using the wrong class
>> name which is why new is not found.
BN> Hmmm... I thought you could just call out the module name to use a
BN> module in a local directory. When I change use Tar;
BN> $tar =3D Tar->new(); to use Archive::Tar;
BN> $tar =3D Archive::Tar->new(); ...then my BEGIN fails, and it doesn't
BN> find the module.
read up on use. there are two issues you car dealing with here and you
keep breaking one or the other. first when you use a file, the package
name has :: converted to / and then searched for. you are setting the
lib path to the dir where Tar.pm is, instead of where Archive/Tar.pm
is. you can still just use Tar.pm but that is dirty as when you properly
install the whole module (why didn't you?) it will need to be found by
the full name.
secondly, inside the module is the statement package Archive::Tar. so
the new sub is under that namespace and you MUST call it with that
name no matter how you load the module.
>> and you haven't installed
>> Compress::Zlib which is why it
>> says compression not available.
BN> That's what I thought, and you're right, Compress::Zlib is not
BN> installed... BUT the docs say it isn't necessary unless you're using
BN> compression. I'm extracting, not compressing.
gwyn answered that already.
uri
--
Uri Guttman --------- uri@sysarch.com ---------- http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page ----------- http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net ---------- http://www.northernlight.com
------------------------------
Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V9 Issue 4617
**************************************