[32645] in Perl-Users-Digest
Perl-Users Digest, Issue: 3921 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Apr 10 14:09:24 2013
Date: Wed, 10 Apr 2013 11:09:03 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 10 Apr 2013 Volume: 11 Number: 3921
Today's topics:
Re: newbie: Problem with $ and \ in strings <jurgenex@hotmail.com>
Re: newbie: Problem with $ and \ in strings <ben@morrow.me.uk>
Re: newbie: Problem with $ and \ in strings <ben@morrow.me.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 09 Apr 2013 15:48:38 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: newbie: Problem with $ and \ in strings
Message-Id: <2h69m8pfh7na0cpmid1ldb1j0uafft58v4@4ax.com>
vivek_12315 <vivekchaurasiya@gmail.com> wrote:
>I m parsing a line like:
>
>line = [feature-tributary/access_db.wxs:35: <File Name="EmptyDB.mdb" Source="$(env.ARCHIVE_DIRECTORY)\access_db\DS Apps\Template\Database\FILE.mdb" KeyPath="yes" DiskId="2" Checksum="yes" Id="a621e7596dfcc45ffaec5fe2bb a84a6f1" />]
>
>Contents are in square brackets.
>
>I just want to extract the file name with is in the value of Source attribute.
>i.e. FILE.mdb
Try
$line =~ m/Source="(.*?)"/
print $1;
The additional '?' in the RE changes '.*' from trying to match the
longest possible substring to matching the shortest possible substring,
AKA non-greedy matching.
jue
------------------------------
Date: Wed, 10 Apr 2013 00:44:24 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: newbie: Problem with $ and \ in strings
Message-Id: <onle3a-o4h.ln1@anubis.morrow.me.uk>
Quoth Jürgen Exner <jurgenex@hotmail.com>:
> vivek_12315 <vivekchaurasiya@gmail.com> wrote:
> >I m parsing a line like:
> >
> >line = [feature-tributary/access_db.wxs:35: <File
> Name="EmptyDB.mdb" Source="$(env.ARCHIVE_DIRECTORY)\access_db\DS
> Apps\Template\Database\FILE.mdb" KeyPath="yes" DiskId="2" Checksum="yes"
> Id="a621e7596dfcc45ffaec5fe2bb a84a6f1" />]
> >
> >Contents are in square brackets.
> >
> >I just want to extract the file name with is in the value of Source attribute.
> >i.e. FILE.mdb
>
> Try
> $line =~ m/Source="(.*?)"/
> print $1;
>
> The additional '?' in the RE changes '.*' from trying to match the
> longest possible substring to matching the shortest possible substring,
> AKA non-greedy matching.
It's worth being careful with this: /".*?"/ will match over an
intervening quote if it has to, so generalising the pattern above
slightly to
/Source="(.*?)" DiskId="(.*?)"/
will give an unexpected result. IMHO it's safer to stick to negated
character classes, even though it's sometimes a bit of a pain to work
out what they should be.
Ben
------------------------------
Date: Wed, 10 Apr 2013 00:36:08 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: newbie: Problem with $ and \ in strings
Message-Id: <88le3a-o4h.ln1@anubis.morrow.me.uk>
Quoth vivek_12315 <vivekchaurasiya@gmail.com>:
> I m parsing a line like:
>
> line = [feature-tributary/access_db.wxs:35: <File
> Name="EmptyDB.mdb" Source="$(env.ARCHIVE_DIRECTORY)\access_db\DS
> Apps\Template\Database\FILE.mdb" KeyPath="yes" DiskId="2" Checksum="yes"
> Id="a621e7596dfcc45ffaec5fe2bb a84a6f1" />]
>
> Contents are in square brackets.
Is this actually all on one line, or are there intervening newlines?
This makes a difference to the /./ regex character: if you don't use /s,
it won't match a newline. (My patterns below don't use /./, so are
unaffected by this.)
> I just want to extract the file name with is in the value of Source attribute.
> i.e. FILE.mdb
>
> I tried doing
>
> 1. if ($line =~ m/(.*)Source="(.*)"\sKeyPath(.*)/) {
>
> 2. if ($line =~ m/(.*)Source="(.*)\.(.*)"(.*)/o) {
You don't ever want to use /o. Since perl 5.6 (a very long time ago)
perl precompiles all regexes, so /o will do no good and may do some
harm.
> 3. if ($line =~ m/(.*)Source="(.*)"(.*)/o) {
>
> but none of them is giving me what is required. Even the $ sign in $env
> is messing out the output when i print on console.
Can you give an example of this? In general, when posting a program that
doesn't do what you want it to, you need to post the output you actually
get as well as the output you want, so people can see the difference.
The basic pattern match you want is
if (my ($source) = $line =~ m! Source=" ([^"]*) " !x) {
followed by pulling the filename out of the captured part. Notice that
I'm using the /x flag: it's always worth doing this when working on a
complicated pattern.
The most important difference from your patterns is that instead of
using /.*/ I'm using /[^"]*/. This stops the captured part from running
past the end of the quotes: normally * will match as much text as it
can, so something like
qq/ foo="bar" baz="quux" / =~ /foo="(.*)"/
will capture everything from the first quote to the last, that is,
qq/bar" baz="quux/. /[^"]/ matches everything but a quote, so the * will
run as far as the closing quote and then stop. Note that this solution
only works if there's no way to escape a quote: if you had a C-type
string like
foo="bar\"baz"
you would need a more complicated pattern to allow for the escaping. In
that case you would be better off using a module like Text::Balanced or
Regexp::Common.
For parsing out the filename part, you will need to tell us what the
format is. The example you gave looks like an Win32-style \-separated
path with some sort of variable expansion, so assuming the expansions
won't ever affect the filename part, you want something like
my ($file) = $source =~ /([^\\]*)$/;
This just pulls off the last section without any backslashes in, and is
actually so simple I might include it in the main match:
if (my ($file) = $line =~
m! Source=" [^"]* \\ ([^"\\]*) " !x
) {
You have to be careful when doing this sort of match that you do
actually force the second *-ed part to match something: something like
/Source=" [^"]* ([^"\\]*) "/x
would not work properly, because there's nothing to stop the uncaptured
part taking the whole value. The pattern above has an explicit,
non-optional /\\/ to force the division between the parts to come at the
right place.
In fact, because of this property, we can simplify the pattern a little
to
m! Source=" [^"]* \\ ([^"]*) " !x
since perl will always give the earlier * as much text to match as it
can. I'm not sure I'd recommend writing it like this, though: it's
probably clearer to be explicit about the second half.
You should also seriously consider using an existing module rather than
spending time fighting with patterns. This looks to me like an XML tag
with a Win32 filename in it, so the obvious solution to me would be
something like
use File::Basename;
use XML::LibXML;
my @parts = split ":", $line, 3;
my $tag = $parts[2];
my $dom = XML::LibXML->load_xml(string => $tag);
my $source = $dom->documentElement->getAttribute("Source");
my $file = basename $source;
This might look like more code, but it's less work, and it gets all the
corner cases right like &entities; in the XML. (This code assumes you're
running on Win32. If not you'd need to use File::Spec instead of
File::Basename, or use the icky global interface F::B provides to set
the type of filename to parse.)
Ben
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3921
***************************************