[30209] in Perl-Users-Digest
Perl-Users Digest, Issue: 1452 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Apr 16 00:09:56 2008
Date: Tue, 15 Apr 2008 21:09:22 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 15 Apr 2008 Volume: 11 Number: 1452
Today's topics:
Re: Accessing a hash whose name is constructed <smallpond@juno.com>
Re: Accessing a hash whose name is constructed <syscjm@sumire.gwu.edu>
Re: Accessing a hash whose name is constructed <tadmc@seesig.invalid>
Re: Can "perldoc" take input from a pipe? jomarbueyes@hotmail.com
Re: Creating PDF documents from a Perl Program... <gerry@nowhere.ford>
Re: CSV to quasi-XML <travis.bowers@gmail.com>
Re: How to know if data is piped into my script <asolkar@gmail.com>
Matching URLs with REs (was "Some questions about q{} a <see.my.signature@for.my.email.address>
Re: Matching URLs with REs (was "Some questions about q <abigail@abigail.be>
Re: Matching URLs with REs (was "Some questions about q <1usa@llenroc.ude.invalid>
Re: Matching URLs with REs (was "Some questions about q <see.my.signature@for.my.email.address>
Re: Matching URLs with REs (was "Some questions about q <szrRE@szromanMO.comVE>
Re: Matching URLs with REs (was "Some questions about q <1usa@llenroc.ude.invalid>
Re: Matching URLs with REs (was "Some questions about q <benkasminbullock@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 15 Apr 2008 13:09:16 -0700 (PDT)
From: smallpond <smallpond@juno.com>
Subject: Re: Accessing a hash whose name is constructed
Message-Id: <5fc63f12-277e-4ca0-ad8c-1686fb8edb20@m71g2000hse.googlegroups.com>
On Apr 15, 3:46 pm, Gibbering <robl...@gmail.com> wrote:
> I want to access some data from a hash, but want to build that hash's
> name on the fly... here's the code:
>
> %hello = (a => 'doggy');
> print ${'hell' . lc 'O'}{a};
>
> does the trick as does:
> print %{'hell' . lc 'O'}->{a};
>
> however, under "use strict", this fails, since %hello isn't declared
> with "my".
>
> If I do put a my in front of the %hello declaration, the print
> statement gives me nothing.
> I have a sinking suspicion that the above code is wrong, dangerous,
> and error prone since I'm not even sure why it works.
>
> What is the proper way to do such a thing?
This should answer some of your questions
perldoc -q 'How can I use a variable as a variable name?'
But why name this hash at all? Why not use an anonymous hash?
To be useful, you need a reference to it anyway.
--S
------------------------------
Date: Tue, 15 Apr 2008 16:47:38 -0500
From: Chris Mattern <syscjm@sumire.gwu.edu>
Subject: Re: Accessing a hash whose name is constructed
Message-Id: <slrng0a8jo.bjp.syscjm@sumire.gwu.edu>
On 2008-04-15, Gibbering <roblund@gmail.com> wrote:
> I want to access some data from a hash, but want to build that hash's
> name on the fly... here's the code:
For God's sake, *why*?
>
> %hello = (a => 'doggy');
> print ${'hell' . lc 'O'}{a};
>
> does the trick as does:
> print %{'hell' . lc 'O'}->{a};
>
> however, under "use strict", this fails, since %hello isn't declared
> with "my".
>
> If I do put a my in front of the %hello declaration, the print
> statement gives me nothing.
> I have a sinking suspicion that the above code is wrong, dangerous,
> and error prone since I'm not even sure why it works.
>
> What is the proper way to do such a thing?
By not doing it and using a hash of hashes instead.
--
Christopher Mattern
NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities
------------------------------
Date: Wed, 16 Apr 2008 01:47:21 GMT
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: Accessing a hash whose name is constructed
Message-Id: <slrng0apjg.1t1.tadmc@tadmc30.sbcglobal.net>
Gibbering <roblund@gmail.com> wrote:
> I want to access some data from a hash, but want to build that hash's
> name on the fly...
I suggest that it would be of great benefit to you to
stop wanting that...
> here's the code:
>
> %hello = (a => 'doggy');
> print ${'hell' . lc 'O'}{a};
>
> does the trick as does:
> print %{'hell' . lc 'O'}->{a};
>
> however, under "use strict", this fails, since %hello isn't declared
> with "my".
>
> If I do put a my in front of the %hello declaration, the print
> statement gives me nothing.
> I have a sinking suspicion that the above code is wrong, dangerous,
> and error prone since I'm not even sure why it works.
Because Perl has two separate systems of variables.
Your code as shown uses "package variables" while my()
declares "lexical variables" instead.
See:
"Coping with Scoping":
http://perl.plover.com/FAQs/Namespaces.html
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Tue, 15 Apr 2008 14:35:48 -0700 (PDT)
From: jomarbueyes@hotmail.com
Subject: Re: Can "perldoc" take input from a pipe?
Message-Id: <047a0723-c410-4158-a5d0-4ed2c72cdfe2@y21g2000hsf.googlegroups.com>
On Apr 15, 1:29 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth jomarbue...@hotmail.com:
>
>
>
> > Is there a way to make perldoc take its input from stdin? I tried the
> > obvious:
> > command | perldoc
> > but it didn't work.
>
> > My idea is to embed "pod" documentation in source code written in
> > Fortran and C++, then write a perl script that will search for the pod
> > documentation, remove the comment marks, and pipe the resulting output
> > to something that can format perlpod.
>
> perldoc already has that functionality. If I put
>
> #include <stdio.h>
>
> /*
>
> =head1 NAME
>
> fubar - a program for making mistakes
>
> =cut
>
> */
>
> int
> main (int argc, char **argv)
> {
> return 1;
> }
>
> into fubar.c (note that all the blank lines inside the comment are
> required, to make it valid POD) then `perldoc fubar.c` quite happily
> formats and displays the POD.
>
> The real trick, of course, is to work out how to embed it into the
> resulting executable... :)
>
> Ben
Hi Ben,
Thank you for your suggestion. It does work with C programs and that
simplifies things a lot! However, in Fortran there are no block
comments. Each comment line must have a mark -a "C" in the first
column in Fortran 77, a bang (!) anywhere before the comment in
Fortran 90/95/2003. However, it is trivial to remove the bang or "C"
using a perl script. The output can be piped to pod2man as Frank
suggested.
Thanks again for your suggestion,
Jomar
------------------------------
Date: Tue, 15 Apr 2008 23:05:09 -0500
From: "Gerry Ford" <gerry@nowhere.ford>
Subject: Re: Creating PDF documents from a Perl Program...
Message-Id: <1208318273_22@news.newsgroups.com>
"Danish" <nigel@bouteyres.com> wrote in message
news:11fc2e51-1881-431b-97dc-8bf25c5c522f@m71g2000hse.googlegroups.com...
> Hi there,
>
> I'm working on a database driven website and I need to output data in
> PDF format. The database handling is all written in Perl so I'd prefer
> to stick to Perl if possible, hence my question in this newsgroup!
>
> I've done some research and come up with a couple of ideas: PDF on the
> Fly from Nottingham University and a module listed on CPAN called PDF-
> Create-0.08 by Markus Baertschi.
>
> What I'd like to know is if anyone out there has experience of doing
> this kind of thing. What method they used and what problems they hit,
> if any.
Nigel,
I thought I might try the same task. My ppm shows 2 different .08 versions
available, one by Markus and another by Fabian.
I can't tell 100% which one I did, but right off the bat I tried to fire up
their sample.pdf in \lib\PDF\ and acrobat can't read it. I think this bodes
poorly for the process that created it.
I wish I could tell you which one I had. With ppm, I feel like a toddler
with a fire hose.:-0
--
"Shopping for toilets isn't the most fascinating way to spend a Saturday
afternoon. But it beats watching cable news."
~~ Booman
------------------------------
Date: Tue, 15 Apr 2008 16:23:54 -0700 (PDT)
From: Travis <travis.bowers@gmail.com>
Subject: Re: CSV to quasi-XML
Message-Id: <14e4a41a-85de-4f08-bedc-a788fdd4de1d@c19g2000prf.googlegroups.com>
Thanks all for the wonderful help.
------------------------------
Date: Tue, 15 Apr 2008 14:08:03 -0700 (PDT)
From: Mahesh Asolkar <asolkar@gmail.com>
Subject: Re: How to know if data is piped into my script
Message-Id: <68acea68-d018-49e8-953f-295d11bc1b8b@h1g2000prh.googlegroups.com>
On Apr 15, 12:49=A0pm, Frank Seitz <devnull4...@web.de> wrote:
> Mahesh Asolkar wrote:
>
> > It should use =A0'`$ENV{'SHELL'} -c history`'. However if it is called
> > like:
>
> > =A0 % history | myscript.pl
>
> > It should use the piped in data instead.
>
> unless (-t) {
> =A0 =A0 # STDIN is not opend to a tty but a file or pipe
>
> }
>
Thanks Frank and Xho. -t and -p do exactly what I was looking for.
Xho, I had tried the first way you suggested too. It works in the case
of 'history | myscript.pl', but it blocks 'myscript.pl' waiting for <>
to return.
Here's the script with the changes (I picked -p):
---------
#!/usr/bin/perl
use strict;
use warnings;
my %hist =3D ();
my @hist;
$hist{(split /\s+/)[3]}++
foreach ((-p STDIN) ? <STDIN> : `$ENV{'SHELL'} -c history`);
print map {"$hist{$_} : $_\n"}
sort {$hist{$b} <=3D> $hist{$a}}
keys %hist;
---------
/Mahesh
------------------------------
Date: Tue, 15 Apr 2008 13:35:27 -0700
From: "Robbie Hatley" <see.my.signature@for.my.email.address>
Subject: Matching URLs with REs (was "Some questions about q{} and qr{}").
Message-Id: <8O-dnRQyuZYOjJjVnZ2dnUVZ_u-unZ2d@giganews.com>
"Ben Bullock" wrote:
> Well OK but if I was going to do this for real, I would use something like
> /\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i
> or similar (I haven't checked this regex with the machine yet but
> hopefully you get the picture).
The problem with "(com|net|org|us|uk|ca|jp)" or similar is that there are hundreds
or thousands of such valid domain suffixes. You're forgetting "es" (Spain),
"ru" (Russia), "uk" (Ukraine), "us" (USA), not to mention "mil", "gov", "edu", "biz",
"info", etc, etc, etc. That's part of why my URL-matching regex was so vague.
> I just wanted to make the point that the &$% stuff is not valid as part of the
> web address.
Those characters all appear in web addresses. For instance, "&" is used as
a field separator for server-side script (php, Perl, etc) commands embedded in
URLs. Similarly, "?" announces that the next cluster of alphanumeric characters
is a parameter for the previous command. If you reject such characters, you reject
many valid URLs. Just look at any YouTube URL. This one, for example:
http://uk.youtube.com/watch?v=I9ciR9qR1dU&feature=bz303
Maybe what you meant is that such characters are invalid in domain names;
but I was trying to capture and linkify document URLs, not domain names or
domain-level URLs such as "http://www.acme.com/". Trying to concoct a
foolproof RE that captures every valid URL and rejects every invalid one
is a real piece of work. And any such "perfect" URL-matching RE would
quickly become obsolete anyway as the Internet changes over time.
Hence I tend to go for a vauge RE that I believe captures every valid
document URL, at the cost of occasionally caputuring a few invalid ones.
Unless someone knows a better approach.
--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant
------------------------------
Date: 15 Apr 2008 20:54:30 GMT
From: Abigail <abigail@abigail.be>
Subject: Re: Matching URLs with REs (was "Some questions about q{} and qr{}").
Message-Id: <slrng0a5g5.uuv.abigail@alexandra.abigail.be>
_
Robbie Hatley (see.my.signature@for.my.email.address) wrote on VCCCXLI
September MCMXCIII in <URL:news:8O-dnRQyuZYOjJjVnZ2dnUVZ_u-unZ2d@giganews.com>:
`'
`' Maybe what you meant is that such characters are invalid in domain names;
`' but I was trying to capture and linkify document URLs, not domain names or
`' domain-level URLs such as "http://www.acme.com/". Trying to concoct a
`' foolproof RE that captures every valid URL and rejects every invalid one
`' is a real piece of work. And any such "perfect" URL-matching RE would
`' quickly become obsolete anyway as the Internet changes over time.
`' Hence I tend to go for a vauge RE that I believe captures every valid
`' document URL, at the cost of occasionally caputuring a few invalid ones.
`' Unless someone knows a better approach.
You mean, something like:
(?:(?:(?:http)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)|(?:(?:nntp)://(?:(?:(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?)/(?:(?:[a-zA-Z][-A-Za-z0-9.+_]*))(?:/(?:[0-9]+))?))|(?:(?:file)://(?:(?:(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+))|loc!
alhost)?)(?:/(?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:/(?:(?:[-a-zA-Z0-9$_.+!*'(),:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)))))|(?:(?:ftp)://(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'();:&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))(?:)@)?(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:/(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))(?:;type=(?:[AIai]))?))?)|(?:(?:tel):(?:(?:(?:[+](?:[0-9\-.()]+)(?:;isub=[0-9\-.()]+)?(?:;postd=[0-9\-.()*#ABCDwp]+)?(?:(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))|(?!
:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A!
-Za-z
0-9])?)(?:[.](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:[?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?:%22(?:(?:%5C(?:[a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])))|[a-zA-Z0-9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa-f]|[3-9A-Fa-f][a-fA-F0-9]))*%22)))?))*)|(?:[0-9\-.()*#ABCDwp]+(?:;isub=[0-9\-.()]+)?(?:;postd=[0-9\-.()*#ABCDwp]+)?(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))(?:(?:;(?:phone-context)=(?:(?:(?:[+][0!
-9\-.()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))|(?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?)(?:[.](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:[?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?:%22(?:(?:%5C(?:[a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])))|[a-zA-Z0-9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa-f]|[3-9A-Fa-f][a-fA-F0-9]))*%22)))?))*))))|(?:(?:fax):(?:(?:(?:[+](?:[0-9\-.()]+)(?:;isub=[0-9\-.()]+)?(?:;tsub=[0-9\!
-.()]+)?(?:;postd=[0-9\-.()*#ABCDwp]+)?(?:(?:;(?:phone-context)=!
(?:(?
:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))|(?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?)(?:[.](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:[?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?:%22(?:(?:%5C(?:[a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])))|[a-zA-Z0-9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa-f]|[3-9A-Fa-f][a-fA-F0-9]))*%22)))?))*)|(?:[0-9\-.()*#ABCDwp]+(?:;isub=[0-9\-.()]+)?(?:;tsub=[0-9\-.()]+)?(?:!
;postd=[0-9\-.()*#ABCDwp]+)?(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))(?:(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))|(?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?)(?:[.](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:[?]!
(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-!
f]|5[
AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?:%22(?:(?:%5C(?:[a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])))|[a-zA-Z0-9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa-f]|[3-9A-Fa-f][a-fA-F0-9]))*%22)))?))*))))|(?:(?:prospero)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?/(?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),?:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:/(?:(?:[-a-zA-Z0-9$_.+!*'(),?:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))(?:(?:;(?:(?:[-a-zA-Z0-9$_.+!*'(),?:@&]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)=(?:(?:[-a-zA-Z0-9$_.+!*'(),?:@&]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))|(?:(?:tv):(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?))?)|(?:(?:telnet)://(?:(?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),;?&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))(?::(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),;?&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?)@)?(?:(?:(?:(?:(?:!
(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?)(?:/)?)|(?:(?:news):(?:(?:[*]|(?:(?:[-a-zA-Z0-9$_.+!*'(),;/?:&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))+@(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))|(?:[a-zA-Z][-A-Za-z0-9.+_]*))))|(?:(?:wais)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?/(?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))(?:[?](?:(?:(?:[-a-zA-Z0-9$_.+!*'(),;:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))|/(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))/(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))|(?:(?:gopher)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9!
]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))!
(?::(
?:(?:[0-9]+)))?/(?:(?:(?:[0-9+IgT]))(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))))|(?:(?:pop)://(?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),&=~]+|(?:%[a-fA-F0-9][a-fA-F0-9]))+))(?:;AUTH=(?:[*]|(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),&=~]+|(?:%[a-fA-F0-9][a-fA-F0-9]))+)|(?:[+](?:APOP|(?:(?:[-a-zA-Z0-9$_.+!*'(),&=~]+|(?:%[a-fA-F0-9][a-fA-F0-9]))+))))))?@)?(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?))
I don't believe in capturing a few invalid ones - nor in rejected valid ones.
Abigail
--
$_ = "\x3C\x3C\x45\x4F\x54"; s/<<EOT/<<EOT/e; print;
Just another Perl Hacker
EOT
------------------------------
Date: Tue, 15 Apr 2008 23:44:53 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Matching URLs with REs (was "Some questions about q{} and qr{}").
Message-Id: <Xns9A81C8E2B3456asu1cornelledu@127.0.0.1>
Abigail <abigail@abigail.be> wrote in
news:slrng0a5g5.uuv.abigail@alexandra.abigail.be:
> _
> Robbie Hatley (see.my.signature@for.my.email.address) wrote on
> VCCCXLI September MCMXCIII in
> <URL:news:8O-dnRQyuZYOjJjVnZ2dnUVZ_u-unZ2d@giganews.com>: `'
...
> `' Unless someone knows a better approach.
>
>
> You mean, something like:
>
> (?:(?:(?:http)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a
<snipped for brevity>
OK, now you are just showing off ;-)
Joking aside, that giant block shows the utility of building regular
expressions from small building blocks.
In any case, I would like to take this opportunity to thank you
for Regexp::Common. It has saved me a lot of work over time.
The OP would benefit from using
http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common/URI.pm
as opposed to resigning himself to second or third or nth rate
'solutions'.
Thank you.
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
------------------------------
Date: Tue, 15 Apr 2008 17:28:58 -0700
From: "Robbie Hatley" <see.my.signature@for.my.email.address>
Subject: Re: Matching URLs with REs (was "Some questions about q{} and qr{}").
Message-Id: <ntednUYddefT1ZjVnZ2dnUVZ_uCinZ2d@giganews.com>
"Abigail" put forth into the annals of Usenet:
> (a rather large URL-capturing regex)
Hmmm... I'm curious, did you write that manually, or generate it
programmatically? If generated, using what software?
And how many decaseconds does it take a regex compiler to process
that?
> I don't believe in capturing a few invalid ones - nor in
> rejected valid ones.
I believe in simplicity over perfection. Given these choices:
A. Make a 100% perfect program taking 284 man-hours
B. Make a 97% perfect program taking 5 man-hours
I usually take B.
--
perl -le 'print "\122\157b\142\151e\40\110\141t\154\145y";'
perl -le 'print "\124\165s\164\151n\54\40\103A\54\40\125\123A";'
perl -le 'print "\154one\167olf\100\167ell\56\143om\n";'
perl -le 'print scalar reverse "/flowenol~/moc.llew.www//\72ptth";'
------------------------------
Date: Tue, 15 Apr 2008 17:47:03 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: Matching URLs with REs (was "Some questions about q{} and qr{}").
Message-Id: <fu3ia802q7b@news2.newsguy.com>
Ben Bullock wrote:
> On Tue, 15 Apr 2008 13:35:27 -0700, Robbie Hatley wrote:
>
>> "Ben Bullock" wrote:
>>
>>> Well OK but if I was going to do this for real, I would use
>>> something like
>>> /\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i or similar (I
>>> haven't checked this regex with the machine yet but hopefully you
>>> get the picture).
>>
>> The problem with "(com|net|org|us|uk|ca|jp)" or similar is that there
>> are hundreds or thousands of such valid domain suffixes.
>
> I think there are only about 200 or so, most of which are rare.
>
>> You're
>> forgetting "es" (Spain), "ru" (Russia), "uk" (Ukraine), "us" (USA),
>> not to mention "mil", "gov", "edu", "biz", "info", etc, etc, etc.
>
> Um, I have both "us" and "uk" there. I didn't know that uk was Ukraine
> though.
According to http://www.iana.org/domains/root/db/, ".uk" is United
Kingdom, and ".ua" is Ukraine (".gb" is also reserved and labeled for
the United Kingdom, though ".uk" was used instead.)
--
szr
------------------------------
Date: Wed, 16 Apr 2008 00:50:55 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Matching URLs with REs (was "Some questions about q{} and qr{}").
Message-Id: <Xns9A81D4142C69Basu1cornelledu@127.0.0.1>
"Robbie Hatley" <see.my.signature@for.my.email.address> wrote in
news:ntednUYddefT1ZjVnZ2dnUVZ_uCinZ2d@giganews.com:
>
> "Abigail" put forth into the annals of Usenet:
>
>> (a rather large URL-capturing regex)
>
> Hmmm... I'm curious, did you write that manually, or generate it
> programmatically? If generated, using what software?
You can read how it is done by looking at the sources of
Regexp::Common modules. It is very elegant.
> And how many decaseconds does it take a regex compiler to process
> that?
Not so much that it matters. You might want to measure performance if
you care so much.
>> I don't believe in capturing a few invalid ones - nor in
>> rejected valid ones.
>
> I believe in simplicity over perfection. Given these choices:
> A. Make a 100% perfect program taking 284 man-hours
> B. Make a 97% perfect program taking 5 man-hours
> I usually take B.
But, of course, using Regexp::Common would have cut that down to 2
minutes for a perfect program.
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
------------------------------
Date: Tue, 15 Apr 2008 22:46:16 +0000 (UTC)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: Matching URLs with REs (was "Some questions about q{} and qr{}").
Message-Id: <fu3b7o$pl5$2@ml.accsnet.ne.jp>
On Tue, 15 Apr 2008 13:35:27 -0700, Robbie Hatley wrote:
> "Ben Bullock" wrote:
>
>> Well OK but if I was going to do this for real, I would use something
>> like /\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i or similar
>> (I haven't checked this regex with the machine yet but hopefully you
>> get the picture).
>
> The problem with "(com|net|org|us|uk|ca|jp)" or similar is that there
> are hundreds or thousands of such valid domain suffixes.
I think there are only about 200 or so, most of which are rare.
> You're
> forgetting "es" (Spain), "ru" (Russia), "uk" (Ukraine), "us" (USA), not
> to mention "mil", "gov", "edu", "biz", "info", etc, etc, etc.
Um, I have both "us" and "uk" there. I didn't know that uk was Ukraine
though.
> That's
> part of why my URL-matching regex was so vague.
>> I just wanted to make the point that the &$% stuff is not valid as part
>> of the web address.
>
> Those characters all appear in web addresses.
Did you really not understand my point?
> Hence I tend to go for a vauge RE that I believe
> captures every valid document URL, at the cost of occasionally
> caputuring a few invalid ones. Unless someone knows a better approach.
Well, even if they do know a better approach, they might not have the
energy to discuss it with you.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 1452
***************************************