[30205] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1448 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Apr 15 16:20:47 2008

Date: Tue, 15 Apr 2008 13:20:39 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 15 Apr 2008     Volume: 11 Number: 1448

Today's topics:
        Sendmail <deepan.17@gmail.com>
    Re: Sendmail <devnull4711@web.de>
    Re: Sendmail <deepan.17@gmail.com>
    Re: Sendmail xhoster@gmail.com
    Re: Sendmail <devnull4711@web.de>
        Server Socket programming with Perl <foran.paul@gmail.com>
    Re: Server Socket programming with Perl <nobull67@gmail.com>
    Re: Server Socket programming with Perl <foran.paul@gmail.com>
        Some questions about q{} and qr{}. <lonewolf@well.com>
    Re: Some questions about q{} and qr{}. <kenslaterpa@hotmail.com>
    Re: Some questions about q{} and qr{}. <someone@example.com>
    Re: Some questions about q{} and qr{}. <benkasminbullock@gmail.com>
    Re: Some questions about q{} and qr{}. <lonewolf@well.com>
    Re: Some questions about q{} and qr{}. <lonewolf@well.com>
    Re: Some questions about q{} and qr{}. <rvtol+news@isolution.nl>
    Re: Some questions about q{} and qr{}. <benkasminbullock@gmail.com>
    Re: somewhat unusual way to define a sub <groditi@gmail.com>
    Re: somewhat unusual way to define a sub <ro.naldfi.scher@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 9 Apr 2008 22:20:12 -0700 (PDT)
From: Deepan Perl XML Parser <deepan.17@gmail.com>
Subject: Sendmail
Message-Id: <dbdeff55-d0a4-45d5-97a9-633327661e4e@1g2000prf.googlegroups.com>

my $email = populateEmail("test");
print $email;

open(SENDMAIL, "|/usr/lib/sendmail -oi -t -i") or print "cannot open
SENDMAIL: $!";
print SENDMAIL <<"EOF";
From: <deepan\@juniper.net>
To: ${email}
Subject: [SUSTAINING TICKET] Case
Content-type: text/html
	testing
EOF
	close(SENDMAIL);

The above code results in "No recipient address found in header"
 but when i replace ${email} with direct address say deepan.
17\@gmail.com it works fine.

populateEmail function's code is available below:

sub populateEmail {
              if(lc($_[0]) eq "test") {
				return ("deepan.17\@gmail.com");
	      }
}

Can anyone spot the bug?


------------------------------

Date: Thu, 10 Apr 2008 07:30:31 +0200
From: Frank Seitz <devnull4711@web.de>
Subject: Re: Sendmail
Message-Id: <665mvsF2ier2oU1@mid.individual.net>

Deepan Perl XML Parser wrote:
> 
> Can anyone spot the bug?

No. The code is ugly but it works.

Frank
-- 
Dipl.-Inform. Frank Seitz; http://www.fseitz.de/
Anwendungen für Ihr Internet und Intranet
Tel: 04103/180301; Fax: -02; Industriestr. 31, 22880 Wedel


------------------------------

Date: Wed, 9 Apr 2008 22:35:11 -0700 (PDT)
From: Deepan Perl XML Parser <deepan.17@gmail.com>
Subject: Re: Sendmail
Message-Id: <954a813d-721a-40a5-86db-7d194531ed67@z24g2000prf.googlegroups.com>

On Apr 10, 10:30 am, Frank Seitz <devnull4...@web.de> wrote:
> Deepan Perl XML Parser wrote:
>
>
>
> > Can anyone spot the bug?
>
> No. The code is ugly but it works.

which one works for you and which part of the code sounds ugly to you.
I haven't provided you the complete code but still you are able to
tell it ugly. good!


------------------------------

Date: 10 Apr 2008 05:51:44 GMT
From: xhoster@gmail.com
Subject: Re: Sendmail
Message-Id: <20080410015148.432$R8@newsreader.com>

Deepan Perl XML Parser <deepan.17@gmail.com> wrote:
> On Apr 10, 10:30 am, Frank Seitz <devnull4...@web.de> wrote:
> > Deepan Perl XML Parser wrote:
> >
> >
> >
> > > Can anyone spot the bug?
> >
> > No. The code is ugly but it works.
>
> which one works for you

Either way worked for me. (I took the liberty of changing the email
address to point to me rather than you, however.)

> and which part of the code sounds ugly to you.

The weird indentation of the close.  The bad wrapping.  The use of
"\@" when '@' would due instead.  The unnecessary use of
brackets in ${email}.  I also found the use of non-lexical file
handles somewhat ugly.  Some of it is petty stuff, but you did ask.


> I haven't provided you the complete code but still you are able to
> tell it ugly. good!

You seem to have it backwards.  We don't need to see all of the code
to know it is ugly.  But we maybe do need to see all of the code to debug
it, if the bug is not in the part of the code you choose to show us.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Thu, 10 Apr 2008 08:32:52 +0200
From: Frank Seitz <devnull4711@web.de>
Subject: Re: Sendmail
Message-Id: <665qkpF2ier2oU2@mid.individual.net>

Deepan Perl XML Parser wrote:
> On Apr 10, 10:30 am, Frank Seitz <devnull4...@web.de> wrote:
>>Deepan Perl XML Parser wrote:
>>>
>>>Can anyone spot the bug?
>>
>>No. The code is ugly but it works.
> 
> which one works for you and which part of the code sounds ugly to you.
> I haven't provided you the complete code but still you are able to
> tell it ugly. good!

I don't believe that you post real code and real questions.
It looks like a joke to me.

Frank
-- 
Dipl.-Inform. Frank Seitz; http://www.fseitz.de/
Anwendungen für Ihr Internet und Intranet
Tel: 04103/180301; Fax: -02; Industriestr. 31, 22880 Wedel


------------------------------

Date: Fri, 11 Apr 2008 02:21:23 -0700 (PDT)
From: paul f <foran.paul@gmail.com>
Subject: Server Socket programming with Perl
Message-Id: <1a12b351-f968-4952-900d-1d3f4d63adbe@8g2000hse.googlegroups.com>

Hi there ,
I have a GPRS device that sends it's IMEI data after a sucessfull TCP
socket is established with a Unix Server.
It waits for a sucess or failure ( 1 or 0) from the Server. The GPRS
decvice then sends it's packet of data back to server (approx 100
bytes).

Can somebody help me find out how I can rearrange code below to read
IMEI data send back a 1 and read 100 byte data and save incomming data
to a logfile (say log.txt)


Source code below:


#! /usr/bin/perl -w
# server0.pl
#--------------------
 use strict;
 use Socket;

 # use port 7879 as default
 my $port = shift || 7879;
 my $proto = getprotobyname('tcp');

 # create a socket, make it reusable
 socket(SERVER, PF_INET, SOCK_STREAM, $proto) or die "socket: $!";
 setsockopt(SERVER, SOL_SOCKET, SO_REUSEADDR, 1) or die "setsock:
$!";

 # grab a port on this machine
 my $paddr = sockaddr_in($port, INADDR_ANY);

 # bind to a port, then listen
 bind(SERVER, $paddr) or die "bind: $!";
 listen(SERVER, SOMAXCONN) or die "listen: $!";
 print "SERVER started on port $port ";
 . # accepting a connection
 my $client_addr;
 while ($client_addr = accept(CLIENT, SERVER))
 {
 # find out who connected
 my ($client_port, $client_ip) = sockaddr_in($client_addr);
 my $client_ipnum = inet_ntoa($client_ip);
 my $client_host = gethostbyaddr($client_ip, AF_INET);
 # print who has connected
 print "got a connection from: $client_host","[$client_ipnum] ";
 # send them a message, close connection
 print CLIENT "Smile from the server";
 close CLIENT;
 }




------------------------------

Date: Sat, 12 Apr 2008 02:04:29 -0700 (PDT)
From: Brian McCauley <nobull67@gmail.com>
Subject: Re: Server Socket programming with Perl
Message-Id: <d5be56c2-c46b-478e-bacc-7d82f6304624@b1g2000hsg.googlegroups.com>

On Apr 11, 10:21 am, paul f <foran.p...@gmail.com> wrote:
> Hi there ,
> I have a GPRS device that sends it's IMEI data after a sucessfull TCP
> socket is established with a Unix Server.
> It waits for a sucess or failure ( 1 or 0) from the Server. The GPRS
> decvice then sends it's packet of data back to server (approx 100
> bytes).
>
> Can somebody help me find out how I can rearrange code below to read
> IMEI data send back a 1 and read 100 byte data and save incomming data
> to a logfile (say log.txt)

No you can't just rearrange it - you need to add the statements that
read and write. Oh yes and you need to open the log file too.

You need first to be 100% clear by what you mean by glib phrases like
"sends it's IMEI" (fixed length record? Terminated record? (What
terminator?) String representation? Binary representation? (Which
one?) Terminated record? (What terminator?).

Same questions again for "send back a 1".

To send a byte 1.

print CLIENT "\x01";

To read fixed length records use read() (you can also use readline()
by setting the read terminator $/ to a reference to a scalar
containing the number of records in the if you prefer).

read ( CLIENT, my $record, 100) or whatever...

As for logging a 100 byte binary record to a textual log file unpack/
pack/sprintf are your friends..

print LOG unpack 'H*', $record;

What format would you like it in?

Can you be more precise about where you are getting stuck?


------------------------------

Date: Mon, 14 Apr 2008 14:04:35 -0700 (PDT)
From: paul f <foran.paul@gmail.com>
Subject: Re: Server Socket programming with Perl
Message-Id: <41b4b948-0e16-448c-a371-ae5c6e4666fd@k10g2000prm.googlegroups.com>

On Apr 12, 10:04=A0am, Brian McCauley <nobul...@gmail.com> wrote:
> On Apr 11, 10:21 am, paul f <foran.p...@gmail.com> wrote:
>
> > Hi there ,
> > I have aGPRSdevice that sends it's IMEI data after a sucessfull TCP
> > socket is established with a Unix Server.
> > It waits for a sucess or failure ( 1 or 0) from the Server. TheGPRS
> > decvice then sends it's packet of data back to server (approx 100
> > bytes).
>
> > Can somebody help me find out how I can rearrange code below to read
> > IMEI data send back a 1 and read 100 byte data and save incomming data
> > to a logfile (say log.txt)
>
> No you can't just rearrange it - you need to add the statements that
> read and write. Oh yes and you need to open the log file too.
>
> You need first to be 100% clear by what you mean by glib phrases like
> "sends it's IMEI" (fixed length record? Terminated record? (What
> terminator?) String representation? Binary representation? (Which
> one?) Terminated record? (What terminator?).
>
> Same questions again for "send back a 1".
>
> To send a byte 1.
>
> print CLIENT "\x01";
>
> To read fixed length records use read() (you can also use readline()
> by setting the read terminator $/ to a reference to a scalar
> containing the number of records in the if you prefer).
>
> read ( CLIENT, my $record, 100) or whatever...
>
> As for logging a 100 byte binary record to a textual log file unpack/
> pack/sprintf are your friends..
>
> print LOG unpack 'H*', $record;
>
> What format would you like it in?
>
> Can you be more precise about where you are getting stuck?

How do I send two bytes to Client zero then 1 (01 hex)?


------------------------------

Date: Sun, 13 Apr 2008 18:07:59 -0700
From: "Robbie Hatley" <lonewolf@well.com>
Subject: Some questions about q{} and qr{}.
Message-Id: <huCdndsLUcnTM5_VnZ2dnUVZ_t2inZ2d@giganews.com>


Today I was editing a URL-likifying program I wrote several
weeks ago, and I ran across some issues with q{} and qr{}
which are puzzling me.

Here's an edited-for-brevity version of the program:

my $Legal   =  q{[[:alnum:];/?:@=&#%$_.+!*'(),-]};
my $Regex1  = qr{($Legal+\.$Legal+/$Legal+)}
my $Regex2  = qr{(s?https?://$Legal+)};
while (<>)
{
   s{$Regex1}{http://$1}g;
   s{$Regex2}{\n<p><a href="$1">$1</a></p>\n}g;
   print ($_);
}

(As an afterthought, I also tacked the entire program on the
end of this post, for anyone who's interested.)

I have two questions:

1. I had a "\" before the "$" to prevent "$_" from being
   interpolated.  But when I took the "\" out, the regexes
   still worked fine!  Seems to me they should break, because
   $_ is now a variable rather than just "dollar sign followed
   by underscore".  But $_ seems not to be interpolated.
   So, is variable interpolation always strictly "one pass"?

2. I've read that qr{} "compiles" the regex; I'm hoping that
   means that the s/// operators in the while loop will not
   recompile $Regex1 and $Regex2 each iteration, even though
   I didn't use a /o flag?  (No sense wasting CPU time
   recompiling, because the patterns are fixed.)

Thanks in advance for your input!


===============================================================
IF YOU'RE PRESSED FOR TIME, FEEL FREE TO STOP READING HERE.
THE REMAINDER OF THIS POST IS THE WHOLE PROGRAM, FOR REFERENCE.
===============================================================


#!/usr/bin/perl

# linkify.perl

# Converts any text document into an HTML document with all of the contents of
# the original, but with any HTTP URLs converted to clickable hyperlinks.

# First print the standard opening lines of an HTML file.
# The title will be "Linkifyed HTML Document",
# the body text is in a "div" element,
# and the paragraphs will have 5-pixel margins on all 4 sides:

use strict;
use warnings;

# Print standard opening boilerplate crap for an HTML file:
print ("<html>\n");
print ("<head>\n");
print ("<title>Linkifyed HTML Document</title>\n");
print ("<style>p{margin:5px;}</style>\n");
print ("</head>\n");
print ("<body>\n");
print ("<div>\n");

# A valid URL must consist solely of the following 82 characters
#
#    alphanumeric:       [:alnum:]       62
#    reserved:           ;/?:@=&          7
#    anchor-id:          #                1
#    encoding:           %                1
#    special:            $_.+!*'(),-     11
#                                 Total: 82
#

# Make a non-interpolated string version of a character class
# consisting of the above 82 URL-legal characters:
my $Legal   =  q{[[:alnum:];/?:@=&#%$_.+!*'(),-]};

# This regex says "find a string which is probably a URL minus the 'http://'
# part; save any such found string as a backreference":
my $Regex1  = qr{($Legal+\.$Legal+/$Legal+)}

# This regex says "http or shttp or https or shttps, followed by '://',
# followed by a cluster of URL-legal characters; save any such found string
# as a backreference":
my $Regex2  = qr{(s?https?://$Legal+)};

# Now loop through all lines of text in the original file, wrapping all URLS
# found in "a" and "p" elements, with the URL used as both the text and the
# "href" attribute of the "a" element:

while (<>)
{
   # Linkify all http URLs, including the less-common "shttp" and "https" ones.

   # This substitution says "tack 'http://' onto be beginning of any strins
   # which are probably URLS sans 'http://':
   s{$Regex1}{http://$1}g;

   # This substitution says "replace each found URL with an html anchor element
   # with the found URL used both as the "href" atttribute and as the text,
   # insert the anchor element into a paragraph element,
   # and bracket the paragraph element with newlines":
   s{$Regex2}{\n<p><a href="$1">$1</a></p>\n}g;

   # Print the edited line.  If the line did not contain a URL, it will be
   # printed unexpurgated.  To redirect output to a file, use ">" on the
   # command line.
   print ($_);
}

# Print element-closure tags for div, body, html:
print ("</div>\n");
print ("</body>\n");
print ("</html>\n");

-- 
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant




------------------------------

Date: Sun, 13 Apr 2008 18:31:14 -0700 (PDT)
From: kens <kenslaterpa@hotmail.com>
Subject: Re: Some questions about q{} and qr{}.
Message-Id: <d459f438-e06b-4211-8164-d0ae10e49a4d@f36g2000hsa.googlegroups.com>

On Apr 13, 9:07 pm, "Robbie Hatley" <lonew...@well.com> wrote:
> Today I was editing a URL-likifying program I wrote several
> weeks ago, and I ran across some issues with q{} and qr{}
> which are puzzling me.
>
> Here's an edited-for-brevity version of the program:
>
> my $Legal   =  q{[[:alnum:];/?:@=&#%$_.+!*'(),-]};
> my $Regex1  = qr{($Legal+\.$Legal+/$Legal+)}
> my $Regex2  = qr{(s?https?://$Legal+)};
> while (<>)
> {
>    s{$Regex1}{http://$1}g;
>    s{$Regex2}{\n<p><a href="$1">$1</a></p>\n}g;
>    print ($_);
>
> }
>
> (As an afterthought, I also tacked the entire program on the
> end of this post, for anyone who's interested.)
>
> I have two questions:
>
> 1. I had a "\" before the "$" to prevent "$_" from being
>    interpolated.  But when I took the "\" out, the regexes
>    still worked fine!  Seems to me they should break, because
>    $_ is now a variable rather than just "dollar sign followed
>    by underscore".  But $_ seems not to be interpolated.
>    So, is variable interpolation always strictly "one pass"?

q{} is equivalent to the single-quote operator. Strings inside single
quotes do not get interpolated (as opposed to double quotes - "" or
qq{}.

>
> 2. I've read that qr{} "compiles" the regex; I'm hoping that
>    means that the s/// operators in the while loop will not
>    recompile $Regex1 and $Regex2 each iteration, even though
>    I didn't use a /o flag?  (No sense wasting CPU time
>    recompiling, because the patterns are fixed.)
>
Based on the documentation (perldoc perlop), qr may invoke a
precompilation of the pattern. To me that implies that it is
implementation specific, but there are others with more expertise in
this area than me.

HTH, Ken

>  lines deleted

>
> --
> Cheers,
> Robbie Hatley
> lonewolf aatt well dott com
> www dott well dott com slant user slant lonewolf slant



------------------------------

Date: Mon, 14 Apr 2008 03:01:17 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: Some questions about q{} and qr{}.
Message-Id: <1OzMj.22847$KP5.20132@edtnps89>

Robbie Hatley wrote:
> Today I was editing a URL-likifying program I wrote several
> weeks ago, and I ran across some issues with q{} and qr{}
> which are puzzling me.
> 
> Here's an edited-for-brevity version of the program:
> 
> my $Legal   =  q{[[:alnum:];/?:@=&#%$_.+!*'(),-]};
> my $Regex1  = qr{($Legal+\.$Legal+/$Legal+)}
> my $Regex2  = qr{(s?https?://$Legal+)};
> while (<>)
> {
>    s{$Regex1}{http://$1}g;
>    s{$Regex2}{\n<p><a href="$1">$1</a></p>\n}g;
>    print ($_);
> }
> 
> (As an afterthought, I also tacked the entire program on the
> end of this post, for anyone who's interested.)
> 
> I have two questions:
> 
> 1. I had a "\" before the "$" to prevent "$_" from being
>    interpolated.

That just adds a '\' character to your character class:

$ perl -le'$x = q{[$_]}; print qr{$x}'
(?-xism:[$_])
$ perl -le'$x = q{[\$_]}; print qr{$x}'
(?-xism:[\$_])

Which it doesn't look like you intended to include.

>    But when I took the "\" out, the regexes
>    still worked fine!  Seems to me they should break, because
>    $_ is now a variable rather than just "dollar sign followed
>    by underscore".  But $_ seems not to be interpolated.
>    So, is variable interpolation always strictly "one pass"?

Read the "Gory details of parsing quoted constructs" section of:

perldoc perlop

> 2. I've read that qr{} "compiles" the regex; I'm hoping that
>    means that the s/// operators in the while loop will not
>    recompile $Regex1 and $Regex2 each iteration,

That is correct.

>    even though
>    I didn't use a /o flag?  (No sense wasting CPU time
>    recompiling, because the patterns are fixed.)

perldoc -q /o



John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Sun, 13 Apr 2008 21:29:15 -0700 (PDT)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: Some questions about q{} and qr{}.
Message-Id: <4a10e138-4637-404d-879e-8551cc2a26f0@q1g2000prf.googlegroups.com>

On Apr 14, 10:07 am, "Robbie Hatley" <lonew...@well.com> wrote:

> my $Regex1  = qr{($Legal+\.$Legal+/$Legal+)}

;


> # This regex says "find a string which is probably a URL minus the 'http://'
> # part; save any such found string as a backreference":
> my $Regex1  = qr{($Legal+\.$Legal+/$Legal+)}

;

Also, here [a-z0-9-]{3,63} (ignoring case) is enough. Your regex will
get things which aren't valid URLs. The following catches anything
valid:

my $validdns = '[0-9a-z-]{2,63}';
m/\b(($validdns\.){1,62}$validdns)\b/i # Catches any valid thing.

>    s{$Regex1}{http://$1}g;

>    print ($_);

You can just say

print;

here if you like.


------------------------------

Date: Mon, 14 Apr 2008 01:35:11 -0700
From: "Robbie Hatley" <lonewolf@well.com>
Subject: Re: Some questions about q{} and qr{}.
Message-Id: <8fGdnT14nIqoip7VnZ2dnUVZ_r6rnZ2d@giganews.com>


"John W. Krahn" wrote:

> Robbie Hatley wrote:
>
> > is variable interpolation always strictly "one pass"?
>
> Read the "Gory details of parsing quoted constructs" section of:
> perldoc perlop

Thanks for the tip, but that section doesn't actually say
whether Perl variable interpolation is single-pass or
multi-pass (recursive).

However, when I scrolled up from that section, I noticed
that one of the sections above that:
http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators
*does* specify what I was looking for.  It says:

   "Perl does not expand multiple levels of interpolation."

Bingo.  That's what I was wondering.  That explains why "$_"
wasn't being interpolated in my program.

perl -le 'my $Cat=q/Fifi/; my $Dog=q/$Cat/; print qq/$Dog/;'

Prints "$Cat", not "Fifi" as I had expected.  Now that I
understand why, I can avoid being surprised by that.

-- 
Cheers,
Robbie Hatley
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
perl -le 'print "\150ttp\72//\167ww.\167ell.\143om/~\154onewolf/"'




------------------------------

Date: Mon, 14 Apr 2008 10:40:57 -0700
From: "Robbie Hatley" <lonewolf@well.com>
Subject: Re: Some questions about q{} and qr{}.
Message-Id: <Eq6dnQ4uR9iTCp7VnZ2dnUVZ_oOnnZ2d@giganews.com>


"Ben Bullock" wrote:

> On Apr 14, 10:07 am, "Robbie Hatley" <lonew...@well.com> wrote:
>
> > # This regex says "find a string which is probably a URL minus the 'http://'
> > # part; save any such found string as a backreference":
> > my $Regex1  = qr{($Legal+\.$Legal+/$Legal+)}
>
> ... [a-z0-9-]{3,63} (ignoring case) is enough. Your regex will
> get things which aren't valid URLs. The following catches anything
> valid:
>
> my $validdns = '[0-9a-z-]{2,63}';
> m/\b(($validdns\.){1,62}$validdns)\b/i # Catches any valid thing.

I can see that your pattern looks for just the dns part
of the url, which has fewer valid characters; but since it
doesn't look for "/", it will convert this string:

   references in Sec 35.74 paragraph B

to

   references in Sec http://35.74 paragraph B

I believe you're right in that it will find most valid dns
strings; but it also catches things that aren't part of URLs
at all (such as numbers with decimal points), and it rejects
certain well-formed domain strings (such as "j.qbc.net.ca",
which fails the "{2,63}" assertion).

My pattern at least insists on "stuff.stuff/stuff", so it
rejects "35.74".  It rejects domain-level URLs and only
linkifys document-level URLs.  That may be a blessing or
a curse, depending on your expectations.

Also, both your pattern and my are broken in that they match
http://www.asdf.com/qwer.html, and indeed convert it to
http://http://www.asdf.com/qwer.html .

Oops!  What was really intended was to find "bare" URLs
(without "http://") and tack "http://" on the beginning.


Ok, this should do the trick; it blends features from your
approach and mine, and solves the bugs I just mentioned,
as well as some other bugs I've noticed:


#!/usr/bin/perl

# linkify.perl

# Converts any text document into an HTML document with all of the contents of
# the original, but with any HTTP URLs converted to clickable hyperlinks.

# First print the standard opening lines of an HTML file.
# The title will be "Linkifyed HTML Document",
# the body text is in a "div" element,
# and the paragraphs will have 5-pixel margins on all 4 sides:

use strict;
use warnings;

# Print initial tags for HTML file:
print ("<html>\n");
print ("<head>\n");
print ("<title>Linkifyed HTML Document</title>\n");
print ("<style>p{margin:5px;}</style>\n");
print ("</head>\n");
print ("<body>\n");
print ("<div>\n");
print ("<pre>\n");

# A valid URL must consist solely of the following 82 characters
#
#    alphanumeric:       [:alnum:]       62
#    reserved:           ;/?:@=&          7
#    anchor-id:          #                1
#    encoding:           %                1
#    special:            $_.+!*'(),-     11
#                                 Total: 82
#

# Make a non-interpolated string version of a character class
# consisting of the above 82 URL-legal characters:
my $Legal = q<[[:alnum:];/?:@=&#%$_.+!*'(),-]>;

# Make a non-interpolated string version of a regex specifying
# a cluster of 1-63 DNS-valid characters:
my $Dns = q<[0-9A-Za-z-]{1,63}>;

# Make a non-interpolated string version of a regex specifying
# a URL header:
my $Header = q<s?https?://>;

# Make a non-interpolated string version of a regex specifying
# a URL suffix:
my $Suffix = qq<(?:$Dns\\.){1,62}$Dns/$Legal+>;

# This regex says "find a string which is probably a URL suffix,
# at start of line, and save any such found suffix as a backreference":
my $Regex1 = qr{^($Suffix)};

# This regex says "find a string which is probably a URL suffix,
# preceded by some space, and save any such found suffix as a backreference":
my $Regex2 = qr{(\s+)($Suffix)};

# This regex says "find a string which is probably a URL with header,
# and save any such found URL as a backreference":
my $Regex3 = qr{($Header$Suffix)};

# Now loop through all lines of text in the original file.  First add http:// to
# any URLs that need it; then wrap all URLS in "a" and "p" elements, with the
# URL used as both the text and the "href" attribute of the "a" element:

#print $Regex1,"\n";
#print $Regex2,"\n";
#print $Regex3,"\n";

while (<>)
{
   # Tack 'http://' onto be beginning of any strings which are
   # probably URLS but lack 'http://':

   $_ =~ s{$Regex1}{http://$1};    # No sense using g here (beginning of line only).
   #print ("Regex1 matched ", $&, "\n");

   $_ =~ s{$Regex2}{$1http://$2}g; # This one could be anywhere on the line.
   #print ("Regex2 matched ", $&, "\n");

   # Wrap each found URL in an html anchor element with the found URL used both
   # as the "href" atttribute and as the text:
   $_ =~ s{$Regex3}{<a href="$1">$1</a>}g;
   #print ("Regex3 matched ", $&, "\n");

   # Print the edited line.  If the line did not contain a URL, it will be
   # printed unexpurgated.  To redirect output to a file, use ">" on the
   # command line.
   print;
}

# Print element-closure tags for pre, div, body, html:
print ("</pre>\n");
print ("</div>\n");
print ("</body>\n");
print ("</html>\n");




------------------------------

Date: Tue, 15 Apr 2008 00:33:30 +0200
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: Some questions about q{} and qr{}.
Message-Id: <fu0t59.1lk.1@news.isolution.nl>

John W. Krahn schreef:
> Robbie Hatley wrote:

>> 1. I had a "\" before the "$" to prevent "$_" from being
>>    interpolated.
>
> That just adds a '\' character to your character class:
>
> $ perl -le'$x = q{[$_]}; print qr{$x}'
> (?-xism:[$_])
> $ perl -le'$x = q{[\$_]}; print qr{$x}'
> (?-xism:[\$_])

But it won't match a '\':

$ perl -wle'$x = q{[\$_]}; print $x; print length($x); print q{\\} =~
/$x/ ? 1 : 0'
[\$_]
5
0

$ perl -wle'$x = q{[\\$_]}; print $x; print length($x); print q{\\} =~
/$x/ ? 1 : 0'
[\$_]
5
0

$ perl -wle'$x = q{[\\\$_]}; print $x; print length($x); print q{\\} =~
/$x/ ? 1 : 0'
[\\$_]
6
1

$ perl -wle'$x = q{[\$_]}; print $x; print length($x); print chr(92) =~
/$x/ ? 1 : 0'
[\$_]
5
0

$ perl -wle'$x = q{[\\$_]}; print $x; print length($x); print chr(92) =~
/$x/ ? 1 : 0'
[\$_]
5
0

$ perl -wle'$x = q{[\\\$_]}; print $x; print length($x); print chr(92)
=~ /$x/ ? 1 : 0'
[\\$_]
6
1

(was run with a perl 5.8.5)

-- 
Affijn, Ruud

"Gewoon is een tijger."



------------------------------

Date: Mon, 14 Apr 2008 23:34:30 +0000 (UTC)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: Some questions about q{} and qr{}.
Message-Id: <fu0pm6$7qc$1@ml.accsnet.ne.jp>

On Mon, 14 Apr 2008 10:40:57 -0700, Robbie Hatley wrote:

> "Ben Bullock" wrote:

>> ... [a-z0-9-]{3,63} (ignoring case) is enough. Your regex will get
>> things which aren't valid URLs. The following catches anything valid:
>>
>> my $validdns = '[0-9a-z-]{2,63}';
>> m/\b(($validdns\.){1,62}$validdns)\b/i # Catches any valid thing.
> 
> I can see that your pattern looks for just the dns part of the url,
> which has fewer valid characters; but since it doesn't look for "/", it
> will convert this string:
> 
>    references in Sec 35.74 paragraph B
> 
> to
> 
>    references in Sec http://35.74 paragraph B
> 
> I believe you're right in that it will find most valid dns strings; but
> it also catches things that aren't part of URLs at all (such as numbers
> with decimal points), and it rejects certain well-formed domain strings
> (such as "j.qbc.net.ca", which fails the "{2,63}" assertion).

Well OK but if I was going to do this for real, I would use something like

/\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i

or similar (I haven't checked this regex with the machine yet but 
hopefully you get the picture).
 
> My pattern at least insists on "stuff.stuff/stuff", so it rejects
> "35.74".  It rejects domain-level URLs and only linkifys document-level
> URLs.  That may be a blessing or a curse, depending on your
> expectations.

I hadn't really thought this through carefully, I just wanted to make the 
point that the &$% stuff is not valid as part of the web address.
 
> Also, both your pattern and my are broken in that they match
> http://www.asdf.com/qwer.html, and indeed convert it to
> http://http://www.asdf.com/qwer.html .

Mine doesn't do anything at all, I'm not sure it even compiles!


------------------------------

Date: Wed, 9 Apr 2008 19:03:51 -0700 (PDT)
From: "groditi@gmail.com" <groditi@gmail.com>
Subject: Re: somewhat unusual way to define a sub
Message-Id: <baea9847-ad6e-4182-ab59-fee44ac7ff43@59g2000hsb.googlegroups.com>


> Well, your case *does* make sense, because $method is a different
> string on every execution. My original example, however, was that
> the method name was "constant" (i.e. set to the same string on every
> execution), so I agree with Uri that this could have done more
> naturally
> in the conventional way of defineing a sub.
>
 Could the original author have done this in an attempt to evade sub
naming? perldoc Sub::Name for more info


------------------------------

Date: Thu, 10 Apr 2008 01:45:21 -0700 (PDT)
From: Ronny <ro.naldfi.scher@gmail.com>
Subject: Re: somewhat unusual way to define a sub
Message-Id: <345b85a1-764f-4a88-bb73-bee17e01d761@d45g2000hsc.googlegroups.com>

On Apr 10, 4:03 am, "grod...@gmail.com" <grod...@gmail.com> wrote:
>  Could the original author have done this in an attempt to evade sub
> naming? perldoc Sub::Name for more info


No, I found now that it's mainly an alternate way to make a closure.
I think this was not necessary here, because the definitions where
on file scope (hope this is the correct term?), and this could also
have been done by just defining the sub in the conventional way.

Anyway, I checked for Sub::Name, but there is no perldoc for it (I'm
using ActiveState Perl on Windows, release 5.10.0; I guess it isn't
in the standard distribution, but I found it on CPAN. Interesting
module, though I wonder whether I will ever find some application for
it...

Ronald





------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1448
***************************************


home help back first fref pref prev next nref lref last post