[32456] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3723 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jun 22 16:09:22 2012

Date: Fri, 22 Jun 2012 13:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 22 Jun 2012     Volume: 11 Number: 3723

Today's topics:
    Re: an effective script for grabbing and putting images <cal@example.invalid>
    Re: an effective script for grabbing and putting images <cal@example.invalid>
        RE: Error Handling in Net::SSH::Perl <dago_makoa@hotmail.fr>
        File::glob pattern matching <bjlockie@lockie.ca>
    Re: File::glob pattern matching <rweikusat@mssgmbh.com>
    Re: Perl Protoypes <cwilbur@chromatico.net>
    Re: Perl Protoypes (Randal L. Schwartz)
        Question about a variable in list-context <markus.hutmacher@web.de>
    Re: question concerning pipes and large strings <rweikusat@mssgmbh.com>
    Re: question concerning pipes and large strings <m@rtij.nl.invlalid>
    Re: question concerning pipes and large strings <m@rtij.nl.invlalid>
    Re: question concerning pipes and large strings <rweikusat@mssgmbh.com>
    Re: question concerning pipes and large strings <rweikusat@mssgmbh.com>
        Regex losing <br> (different from the earlier topic abo <jwcarlton@gmail.com>
    Re: Regex losing <br> (different from the earlier topic <ulmai@gmx.de>
    Re: Regex losing <br> (different from the earlier topic <jwcarlton@gmail.com>
    Re: Regex losing <br> (different from the earlier topic <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 21 Jun 2012 22:48:31 -0600
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: an effective script for grabbing and putting images from or to a website
Message-Id: <JICdnbhT8pi9ZH7SnZ2dnUVZ_hmdnZ2d@supernews.com>

On 06/20/2012 11:10 AM, Jim Gibson wrote:
> In article<TKydncSD_q2Z_XzSnZ2dnUVZ_uidnZ2d@supernews.com>, Cal
> Dershowitz<cal@example.invalid>  wrote:

>> @matching = sort @matching;
>
> By default, sort will do an alphabetical sort, so '11' will come before
> '2'. This is likely the cause of your algorithm breaking down at 11.
> Try this:
>
> @matching = sort { $a<=>  $b } @matching;
>
> which will do a numerical sort.

Thanks so much, Jim, that seems to have solved the whole problem:

$ perl upload15.pl
Net::FTP>>> Net::FTP(2.77)
[lots of boring output]
Net::FTP=GLOB(0x9a28154)<<< 226 Transfer complete
 . .. logs index.html wsb6121022001 energy green2.m4v Video 151.wmv 
false.wmv vids images zen rev1.html luther1.html lh2.pl lh_1.html 
lh_2.html lh_3.html lh_4.html lh_5.html lh_6.html lh_7.html lh_8.html 
lh_9.html lh_10.html lh_11.html lh_12.html lh_13.html lh_14.html 
lh_15.html lh_16.html lh_17.html lh_18.html lh_19.html



$ cat upload15.pl
#!/usr/bin/perl -w
use strict;
use 5.010;
use Net::FTP;
my $domain   = '';
my $username = '';
my $password = '';
my $ftp      = Net::FTP->new( $domain, Debug => 1, Passive => 1 )
   or die "Can't connect: $@\n";
$ftp->login( $username, $password ) or die "Couldn't login\n";
$ftp->binary();

# get files from remote root that end in html:
my @remote_files = $ftp->ls();

# print "remote files are: @remote_files\n";
my @matching = map /lh_(\d+)\.html/, @remote_files;
print "matching is @matching\n";
push( @matching, 1 );
@matching = sort { $a <=> $b } @matching;
my $winner    = pop @matching;
my $newnum1   = $winner + 1;
my $html_file = "lh_$newnum1.html";
print "html file is  $html_file\n";

# create file for html stubouts
open( my $fh, '>', $html_file )
   or die("Can't open $html_file for writing: $!");
print $fh "<html>\n";
print $fh "<head>\n";
print $fh "<title>Lutherhaven Renovation</title>\n";
print $fh "</head>\n";
print $fh "<body bgcolor=white>\n";
print $fh "<h1>My First Heading</h1>\n";

# get files from Desktop/images/
my $path  = '/home/dan/Desktop/upload_luther/';
my @files = <$path*>;

# get ls from remote image directory
$ftp->cwd('/images/') or die "cwd failed $@\n";
my @list = $ftp->ls();

# main control
for my $name (@files) {
     print "name is $name\n";
     my ($ext) = $name =~ /([^.]*)$/;
     print "ext is $ext\n";

     @matching = map /image_(\d+)\.$ext$/, @list;
     print "matching is @matching\n";
     push( @matching, 1 );
     @matching = sort { $a <=> $b } @matching;
     $winner = pop @matching;
     my $newnum    = $winner + 1;
     my $new_file2 = "image_$newnum.$ext";
     print "newfile is $new_file2\n";
     $ftp->put( $name, $new_file2 ) or die "put failed $!\n";
     push( @list, $new_file2 );
     unlink($name);

     print $fh "<img src=\"/images/$new_file2\"/>\n\n";
     print $fh "<p>caption for $new_file2 <\/p>\n";

}

close $fh;
$ftp->cdup() or die "cdup failed $@\n";
$ftp->put($html_file) or die "put failed $@\n";
my @r = $ftp->ls();
print "@r\n";
$


> Obligatory advice:
>
> 1. You should use lexical variables instead of globals for file handles.
> 2. You should use the three-argument version of open.
> 3. You should always check the return value of system calls.
>
> open( my $fh, '>', $html_file ) or
> die("Can't open $html_file for writing: $!");

Ok, got that.

I realize that my fledgling attempts here are nothing to crow about, but 
I'm happy with what it does now.  It uploads the images, creates an html 
page that has the stub-outs for the images, and then deletes the files 
in the upload directory.

http://www.merrillpjensen.com/lh_17.html is an example.

Still, of course, accepting criticisms.
-- 
Cal


------------------------------

Date: Thu, 21 Jun 2012 22:57:02 -0600
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: an effective script for grabbing and putting images from or to a website
Message-Id: <5YudnVSPF66DZn7SnZ2dnUVZ_t-dnZ2d@supernews.com>

On 06/20/2012 12:40 PM, Ben Morrow wrote:
>
> Quoth Cal Dershowitz<cal@example.invalid>:

>> Also, I think the appropriate html for an image includes its height and
>> width.  I know that's a trick the Imagemagick does, but does someone
>> know a slick way to get such data using perl syntax?
>
> I would use Image::Size, which you will need to install from CPAN. If
> you haven't got set up with CPAN yet, I would recommend using cpanminus
> rather than CPAN.pm; download the file http://cpanmin.us, save it
> somewhere as 'cpanm', and then run
>
>      perl cpanm App::cpanminus
>
> You can then delete the downloaded copy. Assuming you are using your
> system perl, you will need to do this as root, so it can write to the
> system perl library. Then run (also as root)
>
>      cpanm Image::Size
>
> which will install Image::Size and its dependencies.
>
> (Obviously you need to think *VERY* *CAREFULLY* before running commands
> suggested by some random person on Usenet as root. You may want to read
> http://search.cpan.org/~miyagawa/App-cpanminus-1.5014/lib/App/cpanminus.pm
> before you start. I take no responsibility if it breaks your system,
> kills your cat,&c.&c.)
>
> Alternatively, if you are using an OS with a package management system,
> it may be better to install perl modules using that where possible. In
> principle cpanm knows how to use local::lib to install modules under
> your home directory, but I've never used that feature so I don't know
> how well it works.

Ok, thx, ben, I think I've got it downloaded, but I'm 110% exhausted 
with perl for the day now.
-- 
Cal



------------------------------

Date: Fri, 22 Jun 2012 14:25:36 -0500
From: dagomakoa <dago_makoa@hotmail.fr>
Subject: RE: Error Handling in Net::SSH::Perl
Message-Id: <YL-dnc6yBuItW3nSnZ2dnUVZ_vOdnZ2d@giganews.com>

my $ssh;
eval {
   $ssh = Net::SSH::Perl->connect( ... );
};
if ($@) {
  warn "Connect failed: $@\n";
}

It does not work with me, should we remove the option -w on the shebang or remove use warnings / use strict?




------------------------------

Date: Fri, 22 Jun 2012 12:36:14 -0700 (PDT)
From: bjlockie <bjlockie@lockie.ca>
Subject: File::glob pattern matching
Message-Id: <ad3ef586-54e5-4507-9d25-be49db4134da@googlegroups.com>

I tried [[:digit:]] which is posix but doesn't work.
I know ? matches any single character but I only want to match exactly 2 digits.


------------------------------

Date: Fri, 22 Jun 2012 20:54:54 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: File::glob pattern matching
Message-Id: <87k3yz86lt.fsf@sapphire.mobileactivedefense.com>

bjlockie <bjlockie@lockie.ca> writes:
> I tried [[:digit:]] which is posix but doesn't work.
> I know ? matches any single character but I only want to match
> exactly 2 digits.

[0-9][0-9]

should do the trick (if you're happy with standard 'western/ arabic'
numbers).



------------------------------

Date: Fri, 22 Jun 2012 12:30:13 -0400
From: Charlton Wilbur <cwilbur@chromatico.net>
Subject: Re: Perl Protoypes
Message-Id: <874nq3xqay.fsf@new.chromatico.net>

>>>>> "W" == Willem  <willem@toad.stack.nl> writes:

    W> That's the same black-and white reasoning that inflicted
    W> 'everything is an object' java upon the world.

Except that one of the failures of Java is that everything *isn't* an
object.  Look to Ruby or Python or Smalltalk for "everything is an
object" done correctly.

Charlton


-- 
Charlton Wilbur
cwilbur@chromatico.net


------------------------------

Date: Fri, 22 Jun 2012 10:42:47 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Perl Protoypes
Message-Id: <864nq38cq0.fsf@red.stonehenge.com>

>>>>> "Charlton" == Charlton Wilbur <cwilbur@chromatico.net> writes:

Charlton> Except that one of the failures of Java is that everything *isn't* an
Charlton> object.  Look to Ruby or Python or Smalltalk for "everything is an
Charlton> object" done correctly.

Smalltalk, yes.  Ruby or Python, no.

Unless you're asserting that the presence of sealed "system" classes
does not distract from the notion of object-ness.  I believe it does.

If you can't subclass it, or extend it in place, it's not a user class.
And if it's not a user class, its instances are not user instances, but
rather system instances with a second-class (heh) status.

In particular, as I recall, you can't subclass "String" in Ruby,
although you can add methods to it.  Python won't even let you add
methods to the String class.

But in Smalltalk, all such things are possible.  Not necessarily smart,
but it's there.

print "Just another Perl hacker,"; # the original

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


------------------------------

Date: 22 Jun 2012 19:53:18 GMT
From: Markus Hutmacher <markus.hutmacher@web.de>
Subject: Question about a variable in list-context
Message-Id: <a4k0pdFp7lU1@mid.individual.net>

Hello,

in a recent thread in this group I found the following line of code:

($id) = $input =~ /^([^\t]+)\t/;

I understand that ($id) means that $id is used in list-context which 
means that the part of $input which matches [^\t]+ is assigned to $id. 
I understand as well that the same line of code without the list-context 
will assign a 0 or 1 to $id depending on "matches" or "matches not".
But I don't understand to which part of the code the list-context refers.

In other words: what is the list in the expression 
$input =~ /^([^\t]+)\t/ 

Thanks in advance

-- 

Markus


------------------------------

Date: Thu, 21 Jun 2012 23:11:55 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: question concerning pipes and large strings
Message-Id: <87sjdofh78.fsf@sapphire.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>> Martijn Lievaart <m@rtij.nl.invlalid> writes:
>>> On Wed, 20 Jun 2012 16:29:56 +0100, Rainer Weikusat wrote:
>>
>> [...]
>>
>>>>>> man sort
>>>>>> 
>>>>> Indeed, I tried sort first, it works, it is more of a scalability
>>>>> question really.
>>>> 
>>>> This is a really bad idea because sort will reorder the complete input
>>>> lines, including the data part, possible/ probably multiple times for
>>>> each input line, and this means a lot of copying of data which doesn't
>>>> need to be copied since only the IDs are supposed to be sorted.
>>>
>>> As GNU sort is rather optimized, I would benchmark this before making 
>>> blanket statements like this.
>>
>> 'Rather optimmized' usually means the code is seriously convoluted
>> because it used to run faster on some piece of ancient hardware in
>> 1997 for a single test case because of that. And not matter how
>> 'optimized', a sort program needs to sort its input. Which involves
>> reordering it. Completely. In case of files which are too large for
>> the memory of a modern computer, this involves a real lot of copying
>> data around.
>>
>> I suggest that you make some benchmarks before making blanket
>> statements like the one above.
>
> On some random computer I just used for that, sorting a 1080M file
> (4000000 lines)

Since I'm a curious person, I also tried this with the 'complete'
algorithm, namely, sort the lines, remove the IDs and concatenate the
results and something like

sort -k1 -S 50% mob-4 | perl -pe 'chop; s/^[^\t]+\t//;' >out

is actually drastically faster than any 'pure Perl' solution. But this
requires keeping the whole file in memory. As soon as sort can't do
that anymore, its performance becomes relatively abysmal while the
code which keeps only the IDs works decently on a larger dataset.
But this is nevertheless sort-of a ghost discussion: Something
'complete' which has been written in C will doubtlessly outperform the
pipeline easily.



------------------------------

Date: Fri, 22 Jun 2012 17:00:01 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: question concerning pipes and large strings
Message-Id: <hsdeb9-r16.ln1@news.rtij.nl>

On Thu, 21 Jun 2012 23:11:55 +0100, Rainer Weikusat wrote:

> But this is nevertheless sort-of a ghost discussion: Something
> 'complete' which has been written in C will doubtlessly outperform the
> pipeline easily.

What is more valuable, your time or computer time?

M4


------------------------------

Date: Fri, 22 Jun 2012 17:15:05 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: question concerning pipes and large strings
Message-Id: <poeeb9-r16.ln1@news.rtij.nl>

On Thu, 21 Jun 2012 15:34:34 +0100, Rainer Weikusat wrote:

> Martijn Lievaart <m@rtij.nl.invlalid> writes:
>> On Wed, 20 Jun 2012 16:29:56 +0100, Rainer Weikusat wrote:
> 
> [...]
> 
>>>>> man sort
>>>>> 
>>>> Indeed, I tried sort first, it works, it is more of a scalability
>>>> question really.
>>> 
>>> This is a really bad idea because sort will reorder the complete input
>>> lines, including the data part, possible/ probably multiple times for
>>> each input line, and this means a lot of copying of data which doesn't
>>> need to be copied since only the IDs are supposed to be sorted.
>>
>> As GNU sort is rather optimized, I would benchmark this before making
>> blanket statements like this.
> 
> 'Rather optimmized' usually means the code is seriously convoluted
> because it used to run faster on some piece of ancient hardware in 1997
> for a single test case because of that. And not matter how 'optimized',

You have a rather dim view of GNU software, which I think is rather far 
from the truth.

> a sort program needs to sort its input. Which involves reordering it.
> Completely. In case of files which are too large for the memory of a
> modern computer, this involves a real lot of copying data around.

True.

> 
> I suggest that you make some benchmarks before making blanket statements
> like the one above.

I wasn't the one stating something is a "bad idea". On rereading I see I 
was not completely clear and said something different than what I was 
trying to say.

What I tried to say was: "Gnu sort is a very optimized program, maybe a 
solution with Gnu sort is fast enough".

> 
>> Also, we don't know if efficiency is relevant. If it runs only once a
>> month, at night, the OP probably does not care if it takes a few hours
>> as opposed to a few minutes.
> 
> Efficiency is always relevant except in a single case: The guy who has
> to write the code is so busy with getting it to work at all that the
> mere thought of having to try to make it work sensibly scares the shit
> out of him and he tries to pass this competence-deficit as 'secret
> advantage' when posing for others. Uusually, this will also always
> involve a dedicated computer for testing and often, the people who are
> going to use the code are not in the position to complain to the person
> who wrote it, IOW, run-time efficiency doesn't matter because it is
> someone elses problem.

You must live on a different planet than I am. There is always a "good 
enough" efficiency. Sometimes you cannot achieve even good enough 
efficiency, sometimes it truly does not matter and sometimes it's not 
worth spending 1000 hours to make something run a little faster.

It's a tradeof between use-time and programming time. And as I'm the 
primary user of my own software, I make sure it is efficient enough for 
my needs.

I query routers with snmp. It used to take ages, so I parallelized it. It 
now runs 10 times faster and I doubt it can be made faster (it's now I/O 
bound).

I run reports nightly. I don't care is they take 1 minute or 5 minutes to 
run.

> 
> Congratulate yourself to the happy situation you happen to be in. Stop
> assuming that it is 'the universal situation'. Things might look rather
> different if code is written for in-house use and supposed to run on a
> computer which also provides VPN services for customers coming from
> fifty different companies.

So maybe efficiency is relevant in that case. One example does not make 
your statements true.

Although a machine running VPNs for 50 customers should probably be 
dedicated. The impact is too big if something goes wrong, f.i. a file 
with multi megabyte lines must be sorted and brings the machine to it's 
knees :-). But maybe it has to run on that machine, f.i. gathering 
statistics. That's my whole point, it depends.
M4


------------------------------

Date: Fri, 22 Jun 2012 18:29:07 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: question concerning pipes and large strings
Message-Id: <87ipej9rx8.fsf@sapphire.mobileactivedefense.com>

Martijn Lievaart <m@rtij.nl.invlalid> writes:
> On Thu, 21 Jun 2012 23:11:55 +0100, Rainer Weikusat wrote:
>
>> But this is nevertheless sort-of a ghost discussion: Something
>> 'complete' which has been written in C will doubtlessly outperform the
>> pipeline easily.
>
> What is more valuable, your time or computer time?

Depending on how you look at it, my time is either infinitely more
valuable than computer time (every second of my finite life which has
passed is gone forever and cannot be replaced) or infinitely less
valuable (People will pay in order to use computer time to solve
problems. Nobody will pay me for my lifetime just because it is
theoretically valuable to me and if I just starved to death tomorrow,
the world-at-a-large wouldn't take note of this loss).

Seems you question doesn't really make sense ...


------------------------------

Date: Fri, 22 Jun 2012 20:52:17 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: question concerning pipes and large strings
Message-Id: <87obob86q6.fsf@sapphire.mobileactivedefense.com>

Martijn Lievaart <m@rtij.nl.invlalid> writes:
> On Thu, 21 Jun 2012 15:34:34 +0100, Rainer Weikusat wrote:
>> Martijn Lievaart <m@rtij.nl.invlalid> writes:
>>> On Wed, 20 Jun 2012 16:29:56 +0100, Rainer Weikusat wrote:
>> 
>> [...]
>> 
>>>>>> man sort
>>>>>> 
>>>>> Indeed, I tried sort first, it works, it is more of a scalability
>>>>> question really.
>>>> 
>>>> This is a really bad idea because sort will reorder the complete input
>>>> lines, including the data part, possible/ probably multiple times for
>>>> each input line, and this means a lot of copying of data which doesn't
>>>> need to be copied since only the IDs are supposed to be sorted.
>>>
>>> As GNU sort is rather optimized, I would benchmark this before making
>>> blanket statements like this.
>> 
>> 'Rather optimmized' usually means the code is seriously convoluted
>> because it used to run faster on some piece of ancient hardware in 1997
>> for a single test case because of that. And not matter how 'optimized',
>
> You have a rather dim view of GNU software, which I think is rather far 
> from the truth.

I have a rather realistic expectation what 'heavily optimized code'
means in practice. This is not all specifically related to 'GNU
software': It means that the code contains special-case treatment for
all kinds of special cases which, in turn, strongly suggest that it is
based on an ill-thought out 'hasty generalization' of lots of
different situation which should rather be handled differently
('malloc' would be the 'classical' example for that).

[...]

>> I suggest that you make some benchmarks before making blanket statements
>> like the one above.
>
> I wasn't the one stating something is a "bad idea". On rereading I see I 
> was not completely clear and said something different than what I was 
> trying to say.
>
> What I tried to say was: "Gnu sort is a very optimized program, maybe a 
> solution with Gnu sort is fast enough".

Actually, it isn't (which is supposed to be real compliment to the
people who wrote the code who understood that the maintenance cost of
'an untangible mess' by far outweighs any runtime efficiency
advantages this mess might have at the moment): The program reads
some amount of data into a fixed-size buffer, sorts these lines via
mergesort, write them to a temp file, reads the next amount of data
and so forth until all input data has been processed once. Afterwards,
it creates 'complete output' by merging the temporary files. For the
situation the OP describe (a 'line of text' composed of an ID, a tab
and then 'many millions' of data bytes) this is very wasteful, at
least for the sorting step, because these 'many lines with many
millions of data bytes' will needlessly be moved around during the
tempfile merge cycle.  

>>> Also, we don't know if efficiency is relevant. If it runs only once a
>>> month, at night, the OP probably does not care if it takes a few hours
>>> as opposed to a few minutes.
>> 
>> Efficiency is always relevant except in a single case: The guy who has
>> to write the code is so busy with getting it to work at all that the
>> mere thought of having to try to make it work sensibly scares the shit
>> out of him and he tries to pass this competence-deficit as 'secret
>> advantage' when posing for others. Uusually, this will also always
>> involve a dedicated computer for testing and often, the people who are
>> going to use the code are not in the position to complain to the person
>> who wrote it, IOW, run-time efficiency doesn't matter because it is
>> someone elses problem.
>
> You must live on a different planet than I am. There is always a "good 
> enough" efficiency.

The state of certain sciences such as 'physics' or 'astronomy' has
been 'good enough' for any even remotely practical purpose for
something like a century or so. Yet people build 'large hadron
colliders' for the joy of letting infinitesimally small things crash
into infinitesimally small other things which - ha ha ha - causes them
to fragment into yet smaller things (and no one will ever know if any
of these 'particles' actually exist outside of 'large hadron
colliders' and it wouldn't matter to know).

Compared to that, the case at hand is (also) a seriously practical
research problem and a really cheap one as well (among other things, I
have now learnt a sensible way for sorting linked lists and I will
certainly be able to make use of that).



------------------------------

Date: Thu, 21 Jun 2012 21:54:16 -0700 (PDT)
From: Jason C <jwcarlton@gmail.com>
Subject: Regex losing <br> (different from the earlier topic about losing $1)
Message-Id: <ece727d9-f695-48b6-a5fc-70d6fae08393@googlegroups.com>

I'm building a profanity filter, and I'm using the following subroutine to replace matched words with XXXX:

while (($original, $converted) = @profanityArr) {
  if (!$converted) {
    $len = length($original);
    $converted = "X" x $len;
  }

  $original = quotemeta($original);

  $text =~ s/(\r|\n|\r\n|<br>|\s)*$original(\r|\n|\r\n|<br>|\s)*/$1$converted$2/i;
}


# When I feed:
$original = "daym";

$text = "<br><br>daym<br><br>";
###

I'm getting "<br>XXXX<br>". Meaning, it loses the matched <br> in both $1 and $2.

# When I feed:
$original = "jason";
$converted = "brainfried";

$text = "<br><br>jason<br><br>";
###

I'm getting "<br>brainfried<br>". Again, it loses the matched <br> in both $1 and $2.


# When I feed:
$original = "dammit";
$converted = "XXXXit";

$text = "<br><br>dammit<br><br>";
###

I'm getting "<br>XXXXit<br><br>". Meaning, it loses the matched <br> in $1, but keeps it in $2.

It's the same if I change $1 and $2 to \1 and \2.

Any suggestions on how to correct the sub to keep the matched <br>?


------------------------------

Date: Fri, 22 Jun 2012 07:37:51 +0200
From: "Jan Pluntke" <ulmai@gmx.de>
Subject: Re: Regex losing <br> (different from the earlier topic about losing $1)
Message-Id: <js10bo$hls$1@news.sap-ag.de>

"Jason C" <jwcarlton@gmail.com> wrote:

>  $text =~ 
> s/(\r|\n|\r\n|<br>|\s)*$original(\r|\n|\r\n|<br>|\s)*/$1$converted$2/i;
[...]
> # When I feed:
> $original = "daym";
>
> $text = "<br><br>daym<br><br>";
> ###
>
> I'm getting "<br>XXXX<br>". Meaning, it loses the matched <br> in both $1 
> and $2.

You will want to capture the * also, otherwise $1 and $2 will
contain only one (the last) match for that part of the string:

  ((?:\r|\n|\r\n|<br>|\s)*)

The ?: will make the inner () non-capturing.

I think (but might be wrong - did not test) that \s contains
\r and \n, so you can remove them:

  ((?:<br>|\s)*)

Regards,
Jan 



------------------------------

Date: Fri, 22 Jun 2012 02:01:28 -0700 (PDT)
From: Jason C <jwcarlton@gmail.com>
Subject: Re: Regex losing <br> (different from the earlier topic about losing $1)
Message-Id: <c0737d44-3023-423f-84bc-91629695344c@googlegroups.com>

On Friday, June 22, 2012 1:37:51 AM UTC-4, Jan Pluntke wrote:
> You will want to capture the * also, otherwise $1 and $2 will
> contain only one (the last) match for that part of the string:
> 
>   ((?:\r|\n|\r\n|
> |\s)*)
> 
> The ?: will make the inner () non-capturing.

Excellent! I was not familiar with the ?:, so I'll have to make a note of that for future reference.


> I think (but might be wrong - did not test) that \s contains
> \r and \n, so you can remove them:
> 
>   ((?:
> |\s)*)
> 
> Regards,
> Jan

Correct again! I thought that \s just captured the space, and didn't realize that it includes line breaks (and apparently tabs, too). I can modify all of my scripts for that, now, and save a little bandwidth :-)

Thanks for the help!


------------------------------

Date: Fri, 22 Jun 2012 10:07:40 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Regex losing <br> (different from the earlier topic about losing $1)
Message-Id: <s7pdb9-23b2.ln1@anubis.morrow.me.uk>


Quoth "Jan Pluntke" <ulmai@gmx.de>:
> "Jason C" <jwcarlton@gmail.com> wrote:
> 
> >  $text =~ 
> > s/(\r|\n|\r\n|<br>|\s)*$original(\r|\n|\r\n|<br>|\s)*/$1$converted$2/i;

Unless $original is supposed to be a regex, you want \Q\E around it.

You don't really need the final capture, you can just use lookahead.
Similarly you don't need to capture more than one \s just to put it back
again:

    s/(\s|<br>) \Q$original\E (?= \s|<br>)/$1$converted/ix;

Turning the initial capture into lookbehind is harder, since Perl
doesn't support variable-length lookbehind and the two branches of the
alternation are different lengths. However, if you have at least 5.10
(which you do, I hope), you can use \K like this:

    s/ (?:\s|<br>) \K \Q$original\E (?=\s|<br>) /$converted/ix;

> I think (but might be wrong - did not test) that \s contains
> \r and \n, so you can remove them:
> 
>   ((?:<br>|\s)*)

That is correct.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3723
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32456] in Perl-Users-Digest

Perl-Users Digest, Issue: 3723 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Jun 22 16:09:22 2012

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jun 22 16:09:22 2012