[32450] in Perl-Users-Digest
Perl-Users Digest, Issue: 3717 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 18 00:09:23 2012
Date: Sun, 17 Jun 2012 21:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 17 Jun 2012 Volume: 11 Number: 3717
Today's topics:
Re: an effective script for grabbing and putting images <ben@morrow.me.uk>
Re: an effective script for grabbing and putting images <rweikusat@mssgmbh.com>
Re: an effective script for grabbing and putting images <cal@example.invalid>
Re: an effective script for grabbing and putting images <cal@example.invalid>
Re: an effective script for grabbing and putting images (Alan Curry)
Re: an effective script for grabbing and putting images <cal@example.invalid>
RE: modifying a PDF using PDF::API2? <samedkonak@gmail.com>
Re: new topic: I call length($<string>) and get number <uri@stemsystems.com>
Re: new topic: I call length($<string>) and get number (Tim McDaniel)
Re: new topic: I call length($<string>) and get number (Tim McDaniel)
Re: Perl Protoypes (Seymour J.)
Regular Expression (WAS: an effective script for grabbi <jurgenex@hotmail.com>
Re: Regular Expression (WAS: an effective script for gr <cal@example.invalid>
Re: Regular Expression <jurgenex@hotmail.com>
Re: Regular Expression <cal@example.invalid>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 16 Jun 2012 23:03:57 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: an effective script for grabbing and putting images from or to a website
Message-Id: <dfcva9-bf5.ln1@anubis.morrow.me.uk>
Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Ben Morrow <ben@morrow.me.uk> writes:
> > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> >> Ben Morrow <ben@morrow.me.uk> writes:
> >> >
> >> > ...and, you *also* need to check the match succeeded before you look at
> >> > $1:
> >> >
> >> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension";
> >> >
> >> > otherwise you run the risk of picking up $1 from some entirely other
> >> > pattern match. The $N variables are slightly strangely scoped, so this
> >> > is a little less likely than it might be, but it can and does happen and
> >> > causes *very* strange bugs when it does.
<snip>
>
> You didn't write anything regarding how the $n actually behave, just
> asserted they would be 'strangely scoped' and that this could case
> 'very strange bugs' in rarely occurring situations. That's about as
> sacry as it can get and nothing except scary.
You appear to be extremely easily scared. BOO!
I suggest you reread the paragraph quoted above, carefully. The strange
scoping of $N is a mitigating condition, not an aggrevating one (that
is, it makes it less likely, rather than more likely, that a programmer
will fall into a bug as a result of failing to check if a match
succeeded or failed).
> >> Practically, this means the simple way to use $1 etc correctly is to
> >> avoid using them except if the match supposed to set them was
> >> successful,
> >
> > ...as I said...
>
> You didn't. You wrote that it would be necessary to 'check the
> success of the match', suggested to use die for tha,t
No, I suggested to use 'or' for that. The 'die' was the means of
diverting control away from the use of $1 if the match failed; I was
careful to mention that in this situation it may not be the most
appropriate way of doing so.
> and that -
> subject to the nameless but surely grave dangers - this feature
> shouldn't be used at all.
The dangers were named ('otherwise you run the risk of picking up $1
from some entirely other pattern match'), and I did not say the feature
should not be used but that there exists another feature which is
usually more convenient.
> >> in this case (assuming that $ext is a hitherto untouched
> >> my variable)
> >>
> >> $name =~ m/.*\.(\w+)/ and $ext = $1;
> >
> > Ugly and crude. Assigning the return value of the match is much
> > cleaner.
>
> Chances are that our aesthetic preferences also differ in other
> respects. In this case, however, a nice side effect of this construct
> is that nothing is assigned if there wasn't anything to assign.
How is that useful, except under rather rare circumstances?
> And it
> isn't necessary to hack around the fact that the match only returns
> the intended value in list context.
Context is an integral feature of Perl. If you don't like it, don't use
Perl.
> Actually, I would write this as
>
> $name =~ /.*\.(\w+)/ and $ext = 1;
>
> but I purposely kept the m.
You will notice that I did the same.
> >> In more complicated cases,
> >>
> >> if (<some re match>) {
> >> # $1 etc valid here
> >> }
> >
> > That is a useful construction sometimes, but it's usually clearer for
> > exceptional flow control ('match failed') to be the branch which diverts
> > from the normal flow.
>
> A failed match may well be not 'exceptional' at all. This will usually
> be the case for parsing something were multiple kinds of 'input
> trings' need to be recognized and analyzed.
That was not the case in the piece of code being discussed. If it had
been, a different choice of construction would have been appropriate.
<snip vaguely offensive nonsense>
Ben
------------------------------
Date: Sun, 17 Jun 2012 16:11:38 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: an effective script for grabbing and putting images from or to a website
Message-Id: <87wr36as7p.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> Ben Morrow <ben@morrow.me.uk> writes:
>> > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> >> Ben Morrow <ben@morrow.me.uk> writes:
>> >> >
>> >> > ...and, you *also* need to check the match succeeded before you look at
>> >> > $1:
>> >> >
>> >> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension";
>> >> >
>> >> > otherwise you run the risk of picking up $1 from some entirely other
>> >> > pattern match. The $N variables are slightly strangely scoped, so this
>> >> > is a little less likely than it might be, but it can and does happen and
>> >> > causes *very* strange bugs when it does.
> <snip>
>>
>> You didn't write anything regarding how the $n actually behave, just
>> asserted they would be 'strangely scoped' and that this could case
>> 'very strange bugs' in rarely occurring situations. That's about as
>> sacry as it can get and nothing except scary.
>
> You appear to be extremely easily scared. BOO!
Writing about 'strange scoping' which may cause or prevent 'very
strange bugs' makes the matter appear by far more serious and arcane
than it actually is. That's why I referred to it as
'scaremongering'. I didn't write that my assessment of the situation
would be similar to your assesment and especially not that it was
based on your text.
[...]
>> >> Practically, this means the simple way to use $1 etc correctly is to
>> >> avoid using them except if the match supposed to set them was
>> >> successful,
>> >
>> > ...as I said...
>>
>> You didn't. You wrote that it would be necessary to 'check the
>> success of the match', suggested to use die for tha,t
>
> No, I suggested to use 'or' for that. The 'die' was the means of
> diverting control away from the use of $1 if the match failed; I was
> careful to mention that in this situation it may not be the most
> appropriate way of doing so.
This is a completely pointless attempt at confusing the issue by
playing with words and absuing semantic ambiguities inherent in the
way humans use language. It, however, enables me to ask a rethoric
question:
Assuming that
$name =~ /deepfried (whole) elephant roll/ or die('Salatschrecke!')
is 'sensible language use', according to your opinion, how come that
the almost identical
$name =~ /deepfried (whole) elephant roll/ and $quantity = $1;
is 'crude and ugly'?
>> and that - subject to the nameless but surely grave dangers - this
>> featureshouldn't be used at all.
>
> The dangers were named
Indeed. Their names were 'strange scopes' and 'very strange bugs'.
But I'm tired of this weasel-wording exercise.
[...]
> <snip vaguely offensive nonsense>
You could have snipped the 'vaguely offensive nonsense' from your
original text and in this case, you wouldn't have provoked a reply which
pointed out that you're condemning something without any reasons given
because it was different from what you're accustomed to.
In this case, I was actually thinking of something like
test -n "$parameter" && {
# do something with it
.
.
.
}
a construction I often use in shell scripts. Short-circuiting boolean
operators can be used for flow control and often, this results in more
concise code. The example above isn't anyhow enhanced by using
if test -n "$paramerer";
then
.
.
.
fi
instead. That just makes it more verbose. And the closest analog in
Perl for that is something like
/.\.(\w+)$/ && do {
.
.
.
};
You're free to dislike the style. I also dislike (quite a few) things
about your coding style, I just usually refrain from posting this even
despite I have more reasons for my opinion than "this is ugly":
Different people do different things in different ways and
BTNTWHIWHDI! (But that's not the way how I would have done it) is - at
best - a very weak metric for 'code'.
------------------------------
Date: Sat, 16 Jun 2012 17:02:28 -0600
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: an effective script for grabbing and putting images from or to a website
Message-Id: <Y8GdncU7M7oYjUDSnZ2dnUVZ_tudnZ2d@supernews.com>
On 06/16/2012 09:54 AM, Ben Morrow wrote:
>
> Quoth Cal Dershowitz<cal@example.invalid>:
>>
>> $ perl upload9.pl
>> syntax error at upload9.pl line 23, near "/$ext/;"
>> Execution of upload9.pl aborted due to compilation errors.
>> $ cat upload9.pl
>> ...
>>
>> my @list = $ftp->dir();
>> my $big_int = 1;
>> for my $name (@files) {
>> print "name is $name\n";
>
> You will (or, at least, we will) find your code much easier to read if
> you indent it properly.
I knocked the rust of of perltidy.
>
>> my ($ext) = $name =~ /([^.]*)$/;
>> for my $image (@list){
>> if ( $image =~ m/$ext/;)
>
> There are two syntax errors in that line, neither of which has anything
> to do with the pattern match.
>
>> print "image is $image\n";
>> }
This process is laden with the types of errors that beset the
less-experienced, and sometimes not all related to what I'm struggling
with centrally, which, I think, is best addressed by using grep. But
all of a sudden, I can't use syntax for printing an array that I've used
a hundred times before.
So, I'm stuck, because I can't see whether I'm actually grepping
something or not. While it is true that I'm posting a script with a
syntax error, it's far and away not the first one I hit today, and I've
tried to work through most of them by googling "perl ...."
Anyways:
$ perltidy -b upload10.pl
$ perl upload10.pl
Possible unintended interpolation of @array in string at upload10.pl
line 26.
Global symbol "@array" requires explicit package name at upload10.pl
line 26.
Execution of upload10.pl aborted due to compilation errors.
$ cat upload10.pl
#!/usr/bin/perl -w
use strict;
use feature ':5.10';
use Net::FTP;
my $domain = '';
my $username = '';
my $password = '';
my $ftp = Net::FTP->new( $domain, Debug => 1, Passive => 1 )
or die "Can't connect: $@\n";
$ftp->login( $username, $password ) or die "Couldn't login\n";
$ftp->binary();
$ftp->cwd('/images/') or die "cwd failed $@\n";
my $path = '/home/dan/Desktop/upload_luther/';
my @files = <$path*>;
my @list = $ftp->dir();
for my $name (@files) {
print "name is $name\n";
my ($ext) = $name =~ /([^.]*)$/;
for my $image (@list) {
print "image is $image\n";
my ($ext2) = $image =~ /([^.]*)$/;
my @array = grep ( /$ext2/, @list );
}
print "sub_list is @array\n";
}
$
It's a good time for me to take out the router and complete a task that
I won't fail at. Wood is much-more forgiving.
--
Cal
------------------------------
Date: Sat, 16 Jun 2012 21:59:57 -0600
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: an effective script for grabbing and putting images from or to a website
Message-Id: <G76dnZNFSM6hy0DSnZ2dnUVZ_qGdnZ2d@supernews.com>
On 06/13/2012 05:44 AM, Peter J. Holzer wrote:
> On 2012-06-13 10:46, Ben Morrow<ben@morrow.me.uk> wrote:
>> Quoth Martijn Lievaart<m@rtij.nl.invlalid>:
>>> On Tue, 12 Jun 2012 17:52:43 -0600, Cal Dershowitz wrote:
>>>> ok. So if I'm writing a perl script to grab an image, I do not make
>>>> decisions about it based first on the extension, but what the html says
>>>> about the image, right?
>>>
>>> Nitpick, what (the) HTTP (Content-Type header) says about the image.
>>
>> This is more than a nitpick, it's actually quite important. The header
>> which tells you what type of file the image is is not available until
>> *after* you've downloaded it. This means that if you want to save it in
>> a file with an appropriate extension, you need to put it in a temporary
>> file first and then rename it.
>
> No. The header is sent before the body, so the natural way to save a
> file downloaded via HTTP is:
>
> 1) send request
> 2) read response header (now we know the content-type)
> 3) create/open file
> 4) read body and write to file.
>
> Even if you can't for some reason do anything between reading the header
> and body, we are talking about images here which are small compared to
> RAM, so you can just:
>
> 1) send request
> 2) read entire response into memory
> 3) create/open file
> 4) save body to file
>
> The important (and sometimes annoying) part isn't about saving the
> resource. It's that you have to actually send a request! In general,
> HTML doesn't tell you whether the target of a link is an HTML or PDF
> document, a JPEG image or a ZIP archive. So even if you know that you
> are only interested in specific content-types, you still have to request
> all the resources and throw those away which you don't want (you can
> trade-off number of requests vs. band-width by using HEAD requests).
Na Peter,
Schoen mit Dir wiederum zu reden. I don't like to dwell on the things I
don't understand, for example, the reason why ben's one-liner for the
extension works. I've seen huge edifices to this same question, where I
don't understand concepts like look-back.
my ($ext) = $name =~ /([^.]*)$/;
Can (anyone) talk me through why this captures an extension? The carat
anchors the regex at the beginning. $ at the end. parens return the
match. The asterisk is to quantify what's in brackets, but what's going
on with the brackets?
While I understand the perl perspective to the above, this user's needs
differ in this regard, because I have a Moses Malone approach to
programming: "I might not get it this time, but I'll get my own
rebound, and it's going in." That's not to say that I don't beat people
because they don't think I can sink it from 30. (oh my, I've got hoops
on the brain: go thunder)
Anyways, I'm simply trying to communicate what I do to the people who
pay me to go somewhere and do it. As you can see, I don't know a lot of
html:
http://www.merrillpjensen.com/luther1.html
Let's say, for argument's sake that the encoding promised an image of
one extension yet linked to another. What would one do?
--
Cal
------------------------------
Date: Sun, 17 Jun 2012 22:39:19 +0000 (UTC)
From: pacman@kosh.dhis.org (Alan Curry)
Subject: Re: an effective script for grabbing and putting images from or to a website
Message-Id: <jrlman$u6h$1@speranza.aioe.org>
In article <Y8GdncU7M7oYjUDSnZ2dnUVZ_tudnZ2d@supernews.com>,
Cal Dershowitz <cal@example.invalid> wrote:
>for my $name (@files) {
> print "name is $name\n";
> my ($ext) = $name =~ /([^.]*)$/;
> for my $image (@list) {
> print "image is $image\n";
> my ($ext2) = $image =~ /([^.]*)$/;
> my @array = grep ( /$ext2/, @list );
> }
>
> print "sub_list is @array\n";
>}
@array only exists within the body of the for loop, because that's where it
was my'ed.
--
Alan Curry
------------------------------
Date: Sun, 17 Jun 2012 19:16:55 -0600
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: an effective script for grabbing and putting images from or to a website
Message-Id: <v9ydnVgkksgaHEPSnZ2dnUVZ_sGdnZ2d@supernews.com>
On 06/17/2012 04:39 PM, Alan Curry wrote:
> In article<Y8GdncU7M7oYjUDSnZ2dnUVZ_tudnZ2d@supernews.com>,
> Cal Dershowitz<cal@example.invalid> wrote:
>> for my $name (@files) {
>> print "name is $name\n";
>> my ($ext) = $name =~ /([^.]*)$/;
>> for my $image (@list) {
>> print "image is $image\n";
>> my ($ext2) = $image =~ /([^.]*)$/;
>> my @array = grep ( /$ext2/, @list );
>> }
>>
>> print "sub_list is @array\n";
>> }
>
> @array only exists within the body of the for loop, because that's where it
> was my'ed.
>
Uff, ok, thanks. The good news is that the grepping seems to work:
image is -rw-r--r-- 1 u61210220 ftpusers 143036 Jun 10 19:56 wis.jpg
sub_list is -rw-r--r-- 1 u61210220 ftpusers 11448 Jun 10 19:24
fimage_1.jpg -rw-r--r-- 1 u61210220 ftpusers 20744 Jun 13 01:20
image_2.jpg -rw-r--r-- 1 u61210220 ftpusers 32331 Jun 15 17:16
lh1.jpg -rw-r--r-- 1 u61210220 ftpusers 48891 Jun 15 17:16 lh2.jpg
-rw-r--r-- 1 u61210220 ftpusers 22649 Jun 15 17:16 lh3.jpg
-rw-r--r-- 1 u61210220 ftpusers 78191 Jun 10 19:56 romney1.jpg
-rw-r--r-- 1 u61210220 ftpusers 143036 Jun 10 19:56 wis.jpg
$
Looks like I want to have a different method call than ftp->dir(). I'll
look at that for tonight's reading. I really like that if you're doing
simple things like this, then in the Perl World, it's been done
thousands of times.
--
Cal
------------------------------
Date: Sun, 17 Jun 2012 17:28:09 -0500
From: samkon <samedkonak@gmail.com>
Subject: RE: modifying a PDF using PDF::API2?
Message-Id: <UvWdncfvgZ1kxEPSnZ2dnUVZ_qKdnZ2d@giganews.com>
For API2 you can just use:
$page = $pdf->importpage($source_pdf, $source_page, $path_page);
--http://compgroups.net/comp.lang.perl.misc/modifying-a-pdf-using-pdf-api2/372479
------------------------------
Date: Sun, 17 Jun 2012 15:38:00 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: new topic: I call length($<string>) and get number of lines - code frag below - on MAC OS X 10.7
Message-Id: <87y5nlivaf.fsf@stemsystems.com>
>>>>> "k" == kquirici <kquirici@yahoo.com> writes:
k> if( ! open JOURNALTXT, $inputfilename) {
k> die "cannot open ".$inputfilename;
k> }
why didn't you listen to what ben said about that code? also you can
interpolated $inputfilename into the die string which is a much better
style.
k> while (<JOURNALTXT>){
k> $journalin = $journalin . $_;
k> }
that is one of the slowest and clumsiest ways to read a whole file
in. first off learn about the .= assignment op (see perldoc perlop). but
even then why are you just reading each line to append it to the buffer?
ben showed you a much better way to read a whole file and File::Slurp is
the perl standard way these days.
k> close JOURNALTXT;
k> #print $journalin;
k> my $journalen = length($journalin);
k> print "\njournal length: " . $journalen;
k> prints the correct number of chars.
if all you wanted was the size of the file, -s will do that in one
call.
uri
------------------------------
Date: Sun, 17 Jun 2012 20:51:14 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: new topic: I call length($<string>) and get number of lines - code frag below - on MAC OS X 10.7
Message-Id: <jrlg02$6rv$1@reader1.panix.com>
Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>Code is not documentation.
A source file
- must express to the compiler the instructions to be followed
- also ideally ought to communicate to a human reader, expressing the
purpose and function of the overall file and of the various parts
I believe that most code, just by being code, communicates its purpose
to humans, at least at the lowest tactical levels and often even
higher. I believe that there is such a thing as "self-documenting
code" -- that code can also be documentation.
For example, a variable doesn't just have a use as a distinct thing,
but its name can be documentation: @cmd, $i, MAX_WIDTH, @global_flags,
$subdirectory. I've sometimes assigned intermediate values in an
expression to appropriately named variables -- sometimes because I
want to see the intermediates during debugging, but sometimes just to
clarify a large computation.
That is, I disagree with the premise of
>Code is not documentation.
I define "documentation" as "everything that communicates to a human
reader". So I say that all code is documentation. Further, since
comments so easily get out of date, code is really the primary
documentation. If you insist that "documentation" must be "everything
explanatory that isn't code", then I think your definition has no
practical use.
When I have more than one way to write code to perform an action,
I consider efficiency, but I also consider a reader of the code
(my future self as well) and how well they will grasp it at a glance.
Where efficiency matters little (a small array, a computation
dominated by a larger process), I might use something easy to
understand. Or I might use something trickier and obscure, but
comment it copiously.
The immediate case was someone writing a line that localized $/ and
also expressed that they wanted it to be undefined. I'm not the one
who expressed the notion that the undefined value is important to
emphasize. Maybe you think that
local $/;
expresses the concept well enough. But I've hit enough problems of
looking at a block of code and trying to figure out what the values
are supposed to be at the end, and enough cases of assignments being
forgotten, that I prefer to be explicit.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Sun, 17 Jun 2012 20:24:58 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: new topic: I call length($<string>) and get number of lines - code frag below - on MAC OS X 10.7
Message-Id: <jrleeq$lu9$1@reader1.panix.com>
In article <ee7sa9-ac6.ln1@anubis.morrow.me.uk>,
Ben Morrow <ben@morrow.me.uk> wrote:
>Quoth tmcd@panix.com:
>> undef(local $/);
>>
>> (That latter appears to actually work.)
>
>Is that surprising? local returns an lvalue, just like my; this is
>very similar in form to the standard
>
> (my $new = $old) =~ s/.../.../;
It's not the returning an lvalue part that I find hard to grasp --
though I had not realized that, so thank you.
It just still seems fundamentally weird to me that you can bury a
variable declaration within an expression, and further that it doesn't
end with the enclosing parentheses or expression. Yes, yes, the
perlfunc docco for "my" has "A 'my' declares the listed variables to
be local (lexically) to the enclosing block, file, or 'eval'." and
similarly for "local". And nobody else is responsible for it feeling
weird to me. Still. Weird.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Fri, 15 Jun 2012 09:18:32 -0400
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: Perl Protoypes
Message-Id: <4fdb3628$8$fuzhry+tra$mr2ice@news.patriot.net>
In <vilain-5E32E8.10403114062012@news.individual.net>, on 06/14/2012
at 10:40 AM, Michael Vilain <vilain@NOspamcop.net> said:
>I've always felt that Pascal was upside down in that the
>subroutines and functions had to be before the main
>code block.
Pascal is a classic example of crippling a language in order to allow
for a one-pass compiler.
>Maybe declaring a prototype was something that C did to keep from
>having to parse the code more than once.
Or to allow for a one-pass compilation. But even in a language where
an internal subroutine definition defines the arguments, you still
need to determine how to handle the parameters to an external
subroutine.
>I'm used to a language where the person writing the code is
>responsible for doing it right or wrong.
So am I, and I've seen a lot of cases where someone broke things by
getting it wrong.
>I document the hell out of blocks and subroutines describing each
>argument and type before any actual code.
Don't spend a lot of time explaining things that are clear from the
code; concentrate on why you are doing what you are doing and one
encodings that may not be obvious.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Sat, 16 Jun 2012 21:26:17 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Regular Expression (WAS: an effective script for grabbing and putting images from or to a website)
Message-Id: <6omqt717kkuo6u1po1a1v2b2j5h9dhbt2l@4ax.com>
Cal Dershowitz <cal@example.invalid> wrote:
> my ($ext) = $name =~ /([^.]*)$/;
>
>Can (anyone) talk me through why this captures an extension? The carat
>anchors the regex at the beginning.
No because see below!
> $ at the end. parens return the
>match. The asterisk is to quantify what's in brackets, but what's going
>on with the brackets?
The square brackets define a character class, and the leading carat
negates this class. In other words this class captures anything that is
not a literal dot. Together with the asterisk and the dollar anchor this
becomes: as many characters from the end of the string until the first
dot appears (from the end). Which pretty much describes what some people
call a file name extension.
jue
------------------------------
Date: Sun, 17 Jun 2012 18:59:19 -0600
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: Regular Expression (WAS: an effective script for grabbing and putting images from or to a website)
Message-Id: <69-dnWqdl5X64EPSnZ2dnUVZ_jqdnZ2d@supernews.com>
On 06/16/2012 10:26 PM, Jürgen Exner wrote:
> Cal Dershowitz<cal@example.invalid> wrote:
>> my ($ext) = $name =~ /([^.]*)$/;
> The square brackets define a character class, and the leading carat
> negates this class. In other words this class captures anything that is
> not a literal dot. Together with the asterisk and the dollar anchor this
> becomes: as many characters from the end of the string until the first
> dot appears (from the end). Which pretty much describes what some people
> call a file name extension.
That one was super hard, with the carat not meaning what it usually does
and the . as well. Good reading here:
http://work.lauralemay.com/samples/perl.html
Negated Character Classes
Brackets define a class of characters to match in a pattern. You can
also define a set of characters not to match using negated character
classes—just make sure the first character in your character class is a
caret (^). So, for example, to match anything that isn't an A or a B, use:
/[^AB]/
Note that the caret inside a character class is not the same as the
caret outside one. The former is used to create a negated character
class, and the latter is used to mean the beginning of a line.
If you want to actually search for the caret character inside a
character class, you're welcome to—just make sure it’s not the first
character or escape it (it might be best just to escape it either way to
cut down on the rules you have to keep track of):
/[\^?.%]/ # search for ^, ?, ., %
You most likely end up using a lot of negated character classes in your
regular expressions, so keep this syntax in mind. Note one subtlety:
negated characters classes don't negate the entire value of the pattern.
If /[12]/ means "return true if the data contains 1 or 2", /[^12]/ does
not mean "return true if the data doesn't contain 1 or 2." If that were
the case, you'd get a match even if the string in question was empty.
What negated character classes really mean is "match any character
that's not these characters." There must be at least one actual
character to match for a negated character class to work.
## end excerpt
Ok, but why isn't the negated part returned along with the rest of what
is inside the parenthesis?
--
Cal
------------------------------
Date: Sun, 17 Jun 2012 18:53:13 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Regular Expression
Message-Id: <vb2tt7lu2sd8g2nigu0paa5i460t8o7706@4ax.com>
Cal Dershowitz <cal@example.invalid> wrote:
>On 06/16/2012 10:26 PM, Jürgen Exner wrote:
>> Cal Dershowitz<cal@example.invalid> wrote:
>>> my ($ext) = $name =~ /([^.]*)$/;
>
>> The square brackets define a character class, and the leading carat
>> negates this class. In other words this class captures anything that is
>> not a literal dot. Together with the asterisk and the dollar anchor this
>> becomes: as many characters from the end of the string until the first
>> dot appears (from the end). Which pretty much describes what some people
>> call a file name extension.
>
>That one was super hard, with the carat not meaning what it usually does
>and the . as well.
Context matters!
> Good reading here:
Much better reading here:
perldoc perlre
>Ok, but why isn't the negated part returned along with the rest of what
>is inside the parenthesis?
But it is. Everything that is not a dot is captured.
jue
------------------------------
Date: Sun, 17 Jun 2012 21:19:39 -0600
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: Regular Expression
Message-Id: <sYqdnZxuP43WA0PSnZ2dnUVZ_qGdnZ2d@supernews.com>
On 06/17/2012 07:53 PM, Jürgen Exner wrote:
> Cal Dershowitz<cal@example.invalid> wrote:
>> On 06/16/2012 10:26 PM, Jürgen Exner wrote:
>>> Cal Dershowitz<cal@example.invalid> wrote:
>>>> my ($ext) = $name =~ /([^.]*)$/;
>>
>>> The square brackets define a character class, and the leading carat
>>> negates this class. In other words this class captures anything that is
>>> not a literal dot. Together with the asterisk and the dollar anchor this
>>> becomes: as many characters from the end of the string until the first
>>> dot appears (from the end). Which pretty much describes what some people
>>> call a file name extension.
>>
>> That one was super hard, with the carat not meaning what it usually does
>> and the . as well.
>
> Context matters!
>
>> Good reading here:
>
> Much better reading here:
>
> perldoc perlre
$ perldoc perlre
$ perldoc perlre
$ perldoc perlre | grep [^
grep: Invalid regular expression
$
It's too hard for me, jue. At the risk of sounding glib about valuable
information, if I really don't get the ? character in regex'es, seeing
almost every example of what I don't understand yet just makes this a
huge ball of "didn't find what I was looking for."
Do (you) see what I was trying to achieve with that ultimate command?
>
>> Ok, but why isn't the negated part returned along with the rest of what
>> is inside the parenthesis?
>
> But it is. Everything that is not a dot is captured.
>
> jue
That I see as output. Why it happens: little bit of a mystery to OP
still. Cheers,
--
Cal
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3717
***************************************