[31448] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 2700 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Nov 30 03:09:48 2009

Date: Mon, 30 Nov 2009 00:09:16 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 30 Nov 2009     Volume: 11 Number: 2700

Today's topics:
    Re: DLL unload question for embedded Perl on Windows <nospam-abuse@ilyaz.org>
    Re: DLL unload question for embedded Perl on Windows <ben@morrow.me.uk>
    Re: DLL unload question for embedded Perl on Windows <nospam-abuse@ilyaz.org>
    Re: DLL unload question for embedded Perl on Windows <ben@morrow.me.uk>
    Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Per <OJZGSRPBZVCX@spammotel.com>
    Re: Faster way to get PHP script than LWP::Simple <smallpond@juno.com>
    Re: Faster way to get PHP script than LWP::Simple <hjp-usenet2@hjp.at>
    Re: Good Golly Miss Molly Perl. Been so long. (Seymour J.)
    Re: Good Golly Miss Molly Perl. Been so long. (Seymour J.)
    Re: Good Golly Miss Molly Perl. Been so long. (Randal L. Schwartz)
    Re: perlio vs. sysread speed (was: Quick CGI question ( <nospam-abuse@ilyaz.org>
    Re: perlio vs. sysread speed (was: Quick CGI question ( <ben@morrow.me.uk>
    Re: perlio vs. sysread speed (was: Quick CGI question ( <nospam-abuse@ilyaz.org>
    Re: perlio vs. sysread speed (was: Quick CGI question ( <ben@morrow.me.uk>
    Re: perlio vs. sysread speed (was: Quick CGI question ( <ben@morrow.me.uk>
        URI::Find to ignore images <jwcarlton@gmail.com>
    Re: URI::Find to ignore images <tadmc@seesig.invalid>
    Re: URI::Find to ignore images <ben@morrow.me.uk>
    Re: URI::Find to ignore images <jurgenex@hotmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 29 Nov 2009 08:38:25 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: DLL unload question for embedded Perl on Windows
Message-Id: <slrnhh4co1.vc1.nospam-abuse@powdermilk.math.berkeley.edu>

On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
>> AFAIK, the design of Perl DLLs was always that they are not unloadable.

> That's more-or-less what I thought, but perldoc DynaLoader says

>|    dl_unload_file()
>|        Syntax:
>|
>|            $status = dl_unload_file($libref)
>|
>|        Dynamically unload $libref, which must be an opaque ā€™library
>|        referenceā€™ as returned from dl_load_file.  Returns one on success
>|        and zero on failure.

This is just an interface to the OS's implementation; it has practically
nothing to do with Perl's DLLs.

One may keep in mind a simple analogy: in C, one can free(); but if
you have an array of (pointers to) structures, it is not enough to
free() the arena where the array content is situated.  One must know
the semantic of array elements, and do recursive free()ing (or
decrement of refcounts; or whatever is needed).

dl_unload_file() is just a dumb free().  It knows nothing about what
one should do with dangling pointers inside the arena, or pointers
which were stored in the area.

Hope this helps,
Ilya


------------------------------

Date: Sun, 29 Nov 2009 10:07:48 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: DLL unload question for embedded Perl on Windows
Message-Id: <kot9u6-blu1.ln1@osiris.mauzo.dyndns.org>


Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
> On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
> >> AFAIK, the design of Perl DLLs was always that they are not unloadable.
> 
> > That's more-or-less what I thought, but perldoc DynaLoader says
> 
> >|    dl_unload_file()
> >|        Syntax:
> >|
> >|            $status = dl_unload_file($libref)
> >|
> >|        Dynamically unload $libref, which must be an opaque ā€™library
> >|        referenceā€™ as returned from dl_load_file.  Returns one on success
> >|        and zero on failure.
> 
> This is just an interface to the OS's implementation; it has practically
> nothing to do with Perl's DLLs.

You snipped the important bit. DynaLoader will in fact call
dl_unload_file (if defined) on all previously-loaded extension dlls
during global destruction.

> One may keep in mind a simple analogy: in C, one can free(); but if
> you have an array of (pointers to) structures, it is not enough to
> free() the arena where the array content is situated.  One must know
> the semantic of array elements, and do recursive free()ing (or
> decrement of refcounts; or whatever is needed).
> 
> dl_unload_file() is just a dumb free().  It knows nothing about what
> one should do with dangling pointers inside the arena, or pointers
> which were stored in the area.

Quite so, which is why this can't (easily) be done *before* global
destruction.

Ben



------------------------------

Date: Sun, 29 Nov 2009 18:59:03 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: DLL unload question for embedded Perl on Windows
Message-Id: <slrnhh5h3n.80a.nospam-abuse@powdermilk.math.berkeley.edu>

On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
>> This is just an interface to the OS's implementation; it has practically
>> nothing to do with Perl's DLLs.
>
> You snipped the important bit. DynaLoader will in fact call
> dl_unload_file (if defined) on all previously-loaded extension dlls
> during global destruction.

This must be a somewhat new development; if so, I have no idea why
this is supposed to work (definitely, in my Perl DLLs, I have no
provision for unloading...).

> Quite so, which is why this can't (easily) be done *before* global
> destruction.

 ... and, AFAIU, *after* global destruction too.  Perl DLLs have BOOT:
sections; they do not have UNBOOT: ones to release resources they
allocated.

Ilya


------------------------------

Date: Sun, 29 Nov 2009 20:47:48 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: DLL unload question for embedded Perl on Windows
Message-Id: <k83bu6-f222.ln1@osiris.mauzo.dyndns.org>


Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
> On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
> >> This is just an interface to the OS's implementation; it has practically
> >> nothing to do with Perl's DLLs.
> >
> > You snipped the important bit. DynaLoader will in fact call
> > dl_unload_file (if defined) on all previously-loaded extension dlls
> > during global destruction.
> 
> This must be a somewhat new development; if so, I have no idea why
> this is supposed to work (definitely, in my Perl DLLs, I have no
> provision for unloading...).

OK... checking the source, it appears the functionality was added in
2000, but disabled (by default) in 2001. So, you are correct that perl
does not in fact currently ever unload extension DLLs. (I wonder what
happened to the original bug this was supposed to solve, that of
unloading and reloading libperl.so causing all loaded extensions to hold
pointers into an unloaded library?)

Ben



------------------------------

Date: Sun, 29 Nov 2009 09:38:33 +0100
From: "Jochen Lehmeier" <OJZGSRPBZVCX@spammotel.com>
Subject: Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Message-Id: <op.u35eajqemk9oye@frodo>

On Sun, 29 Nov 2009 00:12:36 +0100, David Canzi  
<dmcanzi@remulak.uwaterloo.ca> wrote:

> Around Y2K, many programs that had used a "%02d" format for the
> year were changed to use "%04d" or "%4d".  This will cause trouble
> in the future -- only "%d" is Y10K compliant.

What causes trouble is that in 2038, the 32 bit date representation that  
is used by most software, including the C libraries used by perl, ends.  
Dates larger than that can simply not be expressed.

There is a Perl module Time::y2038 which fixes that.


------------------------------

Date: Sun, 29 Nov 2009 10:28:43 -0800 (PST)
From: smallpond <smallpond@juno.com>
Subject: Re: Faster way to get PHP script than LWP::Simple
Message-Id: <272516c5-bd14-44c5-8100-099d3903cd18@d10g2000yqh.googlegroups.com>

On Nov 28, 9:50=A0pm, Jason Carlton <jwcarl...@gmail.com> wrote:
> I'm using a PHP script as a heading for the site, and it has several
> functions in it. If I want to include this header in a Perl script, is
> there a better / faster way than using LWP::Simple?
>
> I ask because I've noticed that my Perl-based pages load a little
> slower than other pages on the site (even a plain "Hello World" script
> that includes this header), so I'm pretty sure that this is the
> bottleneck.

time perl -e 'use LWP::Simple; print "Hello, World"'
Hello, World
real    0m0.033s
user    0m0.026s
sys     0m0.007s

You must be very perceptive.  Most people
wouldn't notice that delay.


------------------------------

Date: Sun, 29 Nov 2009 20:06:51 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Faster way to get PHP script than LWP::Simple
Message-Id: <slrnhh5hic.qjm.hjp-usenet2@hrunkner.hjp.at>

On 2009-11-29 18:28, smallpond <smallpond@juno.com> wrote:
> On Nov 28, 9:50 pm, Jason Carlton <jwcarl...@gmail.com> wrote:
>> I'm using a PHP script as a heading for the site, and it has several
>> functions in it. If I want to include this header in a Perl script, is
>> there a better / faster way than using LWP::Simple?
>>
>> I ask because I've noticed that my Perl-based pages load a little
>> slower

I assume "a little slower" means a few tenth's of a second?

>> than other pages on the site (even a plain "Hello World" script that
>> includes this header), so I'm pretty sure that this is the
>> bottleneck.
>
> time perl -e 'use LWP::Simple; print "Hello, World"'
> Hello, World
> real    0m0.033s
> user    0m0.026s
> sys     0m0.007s
>
> You must be very perceptive.  Most people
> wouldn't notice that delay.

I'm quite sure that he doesn't just load LWP::Simple without using it.

From the description he uses LWP::Simple to get a PHP-generated page,
then extracts the header from it and includes it in the output of his
Perl script.

So the total time is:

 1) startup of the perl script (if this is CGI, this includes loading the
   perl interpreter and all modules used by the script)
 2) plus the time for fetching the PHP page
 3) plus the time for extracting the header (almost certainly negligible)
 4) plus the time the script spends on doing "real work".

So loading a perl page always takes as long as loading a PHP page
(because loading a perl page *does* load a PHP page, too!) plus some
extra time. 

Obvious optimizations are:

 * If you have to load a PHP page every time you load a perl page, then
   at least load one which is short and loads fast! Don't load your
   start page which searches for your last n blog entries, does a google
   search for your name and aggregates 52 atom feeds just to throw all
   that information away immediately.
 * Cache the result of the query. If you use FastCGI or mod_perl, you
   can simply keep the header in a variable. If you don't you can put it
   in a file or stuff it into memcached.
 * Use FastCGI or mod_perl. The time to load the perl interpreter may be
   negligible these days, but some other actions aren't. For example
   opening a database connection is still rather slow, and if you can do
   that only once instead of for each request you win.

	hp


------------------------------

Date: Sun, 29 Nov 2009 00:02:30 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: Good Golly Miss Molly Perl. Been so long.
Message-Id: <4b120066$2$fuzhry+tra$mr2ice@news.patriot.net>

In <4b0f9f6c$0$2534$da0feed9@news.zen.co.uk>, on 11/27/2009
   at 09:44 AM, RedGrittyBrick <RedGrittyBrick@spamweary.invalid> said:

>I've never found a situation where I thought it would be useful to use 
>goto.

I have, although I've certainly written programs that contained no goto
and have never written one in Perl.

>When modifying other people's code that contains gotos it has  almost
>invariably made my job harder. Much harder.

Was that an aberration, or were those portions of the code not using goto
also hard to modify. I've never seen serious misuse of goto occur in
isolation; there have always be other instances of poor style in the same
programs.

>I'm sure gotos can be used well,

Read Knuth's article "Structured Programming Using GOTO."

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org



------------------------------

Date: Sat, 28 Nov 2009 23:56:40 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: Good Golly Miss Molly Perl. Been so long.
Message-Id: <4b11ff08$1$fuzhry+tra$mr2ice@news.patriot.net>

In <86ocmnudol.fsf@blue.stonehenge.com>, on 11/27/2009
   at 08:37 AM, merlyn@stonehenge.com (Randal L. Schwartz) said:

>The amount of flack we took for that far exceeded anything else we had
>done poorly on the book. :)

Suggesting that the issue was that it wasn't PC rather than some reasoned
objection to the specific use.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org



------------------------------

Date: Sun, 29 Nov 2009 09:05:53 -0800
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Good Golly Miss Molly Perl. Been so long.
Message-Id: <86my25qn0u.fsf@blue.stonehenge.com>

>>>>> "Shmuel" == Shmuel (Seymour J ) Metz <spamtrap@library.lspace.org.invalid> writes:

Shmuel> Suggesting that the issue was that it wasn't PC rather than some
Shmuel> reasoned objection to the specific use.

No, suggesting that we really should have known better than to include it, a
mistake I didn't repeat on the second Camel.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion


------------------------------

Date: Sun, 29 Nov 2009 08:54:02 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))
Message-Id: <slrnhh4dl9.vc1.nospam-abuse@powdermilk.math.berkeley.edu>

On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
>> On 2009-11-28, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> >  * Reading line by line is significantly slower than reading by blocks.
>> 
>> Remember that when reading line-by-line (with 80char line), you
>> actually read 80 times char-by-char.
>
> Not under normal circumstances. When perl is using buffered IO, it reads
> a bufferful and then goes grovelling through it for line endings.

But "grovelling" happens char-by-char [*]; then one must re-seek() to the
position in question.  Inspect how

  perl  -wle "$in = <STDIN>; print qq({$in}); system $^X, @ARGV" -- -wle "print qq([$_]) while <STDIN>"

behaves when reading from a file and from a pipe...

Yours,
Ilya

[*] Last time I checked, every PerlIO operation would go a dozen
    levels deep in subroutine calls - even when a simple macro
        count--, c = *buf++ if count > 0
    would suffice.  PerlIO was written without any regard to
    maintainability and efficiency...


------------------------------

Date: Sun, 29 Nov 2009 21:06:29 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))
Message-Id: <lb4bu6-ha22.ln1@osiris.mauzo.dyndns.org>


Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
> On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
> > Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
> >> On 2009-11-28, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> >> >  * Reading line by line is significantly slower than reading by blocks.
> >> 
> >> Remember that when reading line-by-line (with 80char line), you
> >> actually read 80 times char-by-char.
> >
> > Not under normal circumstances. When perl is using buffered IO, it reads
> > a bufferful and then goes grovelling through it for line endings.
> 
> But "grovelling" happens char-by-char [*]; then one must re-seek() to the
> position in question.

If I run 

    ~% perl -E'say for 1..1000' >foo
    ~% ktrace perl -pe1 foo >/dev/null

then the only syscalls I see for fd 3 are

 67709 perl     CALL  open(0x81020d4,O_RDONLY,<unused>0x1b6)
 67709 perl     RET   open 3
 67709 perl     CALL  ioctl(0x3,TIOCGETA,0xbfbfe0e8)
 67709 perl     RET   ioctl -1 errno 25 Inappropriate ioctl for device
 67709 perl     CALL  lseek(0x3,0,SEEK_SET,0x1)
 67709 perl     RET   lseek 0
 67709 perl     CALL  fstat(0x3,0x281a7a20)
 67709 perl     RET   fstat 0
 67709 perl     CALL  fcntl(0x3,F_SETFD,FD_CLOEXEC)
 67709 perl     RET   fcntl 0
 67709 perl     CALL  read(0x3,0x811c804,0x1000)
 67709 perl     RET   read 3893/0xf35
 67709 perl     CALL  read(0x3,0x811c804,0x1000)
 67709 perl     RET   read 0
 67709 perl     CALL  close(0x3)
 67709 perl     RET   close 0

so once the file has been opened and examined perl calls read(2) exactly
twice, and lseek(2) not at all.

> Inspect how
> 
>   perl  -wle "$in = <STDIN>; print qq({$in}); system $^X, @ARGV" -- -wle
> "print qq([$_]) while <STDIN>"
> 
> behaves when reading from a file and from a pipe...

ktrace says (AFAICT) that perl does a single lseek to where perl thinks
the file pointer should be just before calling fork(2).

Ben



------------------------------

Date: Mon, 30 Nov 2009 04:31:45 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))
Message-Id: <slrnhh6ilh.99p.nospam-abuse@powdermilk.math.berkeley.edu>

On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
>> >> Remember that when reading line-by-line (with 80char line), you
>> >> actually read 80 times char-by-char.

>> > Not under normal circumstances. When perl is using buffered IO, it reads
>> > a bufferful and then goes grovelling through it for line endings.

>> But "grovelling" happens char-by-char [*]; then one must re-seek() to the
>> position in question.

> If I run 
>
>     ~% perl -E'say for 1..1000' >foo
>     ~% ktrace perl -pe1 foo >/dev/null
>
> then the only syscalls I see for fd 3 are

First, I have no idea what `say' would do.  But, judging by the name,
it probably would not do anything with line-orented read?

>> Inspect how
>> 
>>   perl  -wle "$in = <STDIN>; print qq({$in}); system $^X, @ARGV" -- -wle
>> "print qq([$_]) while <STDIN>"
>> 
>> behaves when reading from a file and from a pipe...
>
> ktrace says (AFAICT) that perl does a single lseek to where perl thinks
> the file pointer should be just before calling fork(2).

This is even better than how it was before PerlIO was introduced!

Compare this with how it was quite recently: IIRC, about 5-7 years
after PerlIO was introduced, when I reported a spurious seek() per
character read (!), everybody behaved as if it was a surprise to them...

Thanks for clarifications,
Ilya


------------------------------

Date: Mon, 30 Nov 2009 05:15:19 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))
Message-Id: <701cu6-ivj2.ln1@osiris.mauzo.dyndns.org>


Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
> On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
> >> >> Remember that when reading line-by-line (with 80char line), you
> >> >> actually read 80 times char-by-char.
> 
> >> > Not under normal circumstances. When perl is using buffered IO, it reads
> >> > a bufferful and then goes grovelling through it for line endings.
> 
> >> But "grovelling" happens char-by-char [*]; then one must re-seek() to the
> >> position in question.
> 
> > If I run 
> >
> >     ~% perl -E'say for 1..1000' >foo
> >     ~% ktrace perl -pe1 foo >/dev/null
> >
> > then the only syscalls I see for fd 3 are
> 
> First, I have no idea what `say' would do.  But, judging by the name,
> it probably would not do anything with line-orented read?

The first command is just to create a data file. It is equivalent to

    perl -le "print for 1..1000" >foo

'say' was introduced with perl 5.10, and the -E option is equivalent to
-e but allows the new 5.10 features.

Ben



------------------------------

Date: Mon, 30 Nov 2009 06:03:16 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))
Message-Id: <4q3cu6-77k2.ln1@osiris.mauzo.dyndns.org>


Quoth Ben Morrow <ben@morrow.me.uk>:
> 
> Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
> > On 2009-11-29, Ben Morrow <ben@morrow.me.uk> wrote:
> > 
> > > If I run 
> > >
> > >     ~% perl -E'say for 1..1000' >foo
> > >     ~% ktrace perl -pe1 foo >/dev/null
> > >
> > > then the only syscalls I see for fd 3 are
> > 
> > First, I have no idea what `say' would do.  But, judging by the name,
> > it probably would not do anything with line-orented read?
> 
> The first command is just to create a data file. It is equivalent to

Sorry, I realise I may still have been unclear. The important command is
the second,

    ktrace perl -pe1 foo >/dev/null

which runs

    perl -pe1 foo >/dev/null

and records all the syscalls it makes. (Obviously, that is doing
line-oriented read; I also wanted the data file to be larger than one
bufferful.)

Ben



------------------------------

Date: Sun, 29 Nov 2009 18:33:01 -0800 (PST)
From: Jason Carlton <jwcarlton@gmail.com>
Subject: URI::Find to ignore images
Message-Id: <e5916493-7702-4b43-bd66-451e97ec4c68@p35g2000yqh.googlegroups.com>

I'm using URI::Find to convert addresses to links, like so:

$finder = URI::Find -> new(
  sub {
    ($uri, $orig_uri) = @_;
    return "<a href='$uri'>orig_uri</a>";
  }
);

$finder -> find(\$text);


Is there a way to make this ignore images, so that it doesn't create:

<img src="<a href="http://www.whatever.com/image.jpg">http://
www.whatever.com/image.jpg</a>">

The images on my server will have several different path
possibilities, so there's nothing constant inside of $uri to scan for.

I thought about letting it do this, then manually manipulating $text,
but there has to be a better way! Something like:

while ($text =~ /(<img[^>]+?>)/sgxi) {
  $mod_text = $1;
  $mod_text =~ s/<a href="//gi;
  $mod_text =~ s/<\/a>//gi;
}

I haven't tried that, but it seems like it would work, although it
leaves a LOT of room for error.

TIA,

Jason


------------------------------

Date: Sun, 29 Nov 2009 20:45:16 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: URI::Find to ignore images
Message-Id: <slrnhh6c82.lgj.tadmc@tadbox.sbcglobal.net>

Jason Carlton <jwcarlton@gmail.com> wrote:

>     ($uri, $orig_uri) = @_;
>     return "<a href='$uri'>orig_uri</a>";
                            ^^
                            ^^ ?

You should copy/paste code rather than attempt to retype it.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Mon, 30 Nov 2009 03:22:10 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: URI::Find to ignore images
Message-Id: <2cqbu6-2a52.ln1@osiris.mauzo.dyndns.org>


Quoth Jason Carlton <jwcarlton@gmail.com>:
> I'm using URI::Find to convert addresses to links, like so:
> 
> $finder = URI::Find -> new(
>   sub {
>     ($uri, $orig_uri) = @_;
>     return "<a href='$uri'>orig_uri</a>";
>   }
> );
> 
> $finder -> find(\$text);
> 
> 
> Is there a way to make this ignore images, so that it doesn't create:
> 
> <img src="<a href="http://www.whatever.com/image.jpg">http://
> www.whatever.com/image.jpg</a>">

I presume from this that your input text is HTML? The correct answer is
then to use an HTML parser to separate the tagged content from the text,
and only perform the substitutions on the text. I would probably start
with HTML::TokeParser.

Ben



------------------------------

Date: Sun, 29 Nov 2009 20:19:15 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: URI::Find to ignore images
Message-Id: <4nh6h559740hv6t942odkp8hco54ssv891@4ax.com>

Jason Carlton <jwcarlton@gmail.com> wrote:
>I'm using URI::Find to convert addresses to links, like so:

In your terminology what do you call an address and what do you call a
link?

>$finder = URI::Find -> new(
>  sub {
>    ($uri, $orig_uri) = @_;
>    return "<a href='$uri'>orig_uri</a>";

In Perl terms the return value is a string.
In HTML terms the return value is an anchor tag. 

Now, where are the address and the link?

>Is there a way to make this ignore images, so that it doesn't create:

I guess you could try to match against the set of jpg, gif, bmp, .....

jue


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2700
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31448] in Perl-Users-Digest

Perl-Users Digest, Issue: 2700 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon Nov 30 03:09:48 2009

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Nov 30 03:09:48 2009