[23692] in Perl-Users-Digest
Perl-Users Digest, Issue: 5899 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Dec 5 14:10:44 2003
Date: Fri, 5 Dec 2003 11:10:17 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 5 Dec 2003 Volume: 10 Number: 5899
Today's topics:
Regex Question <mikeflan@earthlink.net>
Re: Regex Question (Tad McClellan)
SGML/HTML syntax trivia (was Re: any idea how to optimi (Tad McClellan)
Re: SGML/HTML syntax trivia (was Re: any idea how to op <usenet@morrow.me.uk>
Re: SGML/HTML syntax trivia (was Re: any idea how to op <flavell@ph.gla.ac.uk>
Re: sorting file names <pinyaj@rpi.edu>
Re: sorting file names (Sara)
Re: sorting file names <jurgenex@hotmail.com>
Re: sorting file names (David)
Re: sorting file names (James E Keenan)
subroutine parameter with regex <chatasos@yahoo.com>
Re: subroutine parameter with regex <usenet@morrow.me.uk>
Using a hash for the post data in LWP::useragent? (Andrew)
Re: Using a hash for the post data in LWP::useragent? <noreply@gunnar.cc>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 05 Dec 2003 14:18:48 GMT
From: Mike Flannigan <mikeflan@earthlink.net>
Subject: Regex Question
Message-Id: <3FD09489.F05594E@earthlink.net>
I have a regex question here.
I have lines that look like:
condition=" OR (subject,contains,can we meet) OR (subject,contains,size
does matter) OR (subject,contains,aging pill) OR (subject,contains,gain
inch) OR (subject,contains,failure announc)"
condition=" OR (subject,contains,Email Storage) OR (from,contains,Net
Delivery) OR (from,contains,Network) OR (subject,contains,bug ) OR
(body,contains,Undeliverable mail to )"
etc.
I'd like to get all the subject('s) printed out together,
and all the from('s) printed out together, and all the
body('s) printed out together.
So it would print for the subject case:
can we meet, size does matter, aging pill, gain inch, failure announc,
Email Storage, Network, bug
I could split each line on the OR and do a simple match.
But I was wondering if there is a simple way of doing
it on the whole line at once with a regex.
The aquired field (size does matter) can be any length
of words, and each line can have 0 - 5 subject(s),
0 - 5 from(s), 0 - 5 body(s); but will have 5
or less of the total.
Here is what I tried, but it only returned the
last subject field in each line, and gave a warning
message if there were no subject(s).
use strict;
use warnings;
my $line;
open SPAMF, "<rules.dat" or die "$0: open rules.dat: $!";
foreach my $line ( <SPAMF> ) {
next if $line !~ m/^condition=/;
$line =~ s/OR \((\w+?),contains,(.*?)\)/OR \($1,contains,$2\)/g;
print "$1\n";
print "$2\n";
# print "$3\n";
# print "$4\n";
# print "$5\n";
}
close SPAMF;
__END__
my input file looks like:
name="General Spam29"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (subject,contains,stretch mark) OR
(subject,contains,printer driver) OR (body,contains,low mortgage) OR
(subject,contains,paypal) OR (subject,contains,orgasm)"
name="General Spam28"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (subject,contains,can we meet) OR (subject,contains,size
does matter) OR (subject,contains,aging pill) OR (subject,contains,gain
inch) OR (subject,contains,failure announc)"
name="General Spam27"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (subject,contains,failure advice) OR
(from,contains,failure) OR (subject,contains,failure notice) OR
(subject,contains,pharmacy) OR (subject,contains,lose inch)"
name="General Spam26"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (from,contains,GT Distributors) OR
(subject,contains,healthcare costs) OR (body,contains,wholesale
prescription) OR (body,contains,former military ruler) OR
(body,contains,medical consultations)"
name="General Spam25"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (from,contains,service) OR (from,contains,bulletin) OR
(subject,contains,security pa) OR (body,contains,inform you that the
message returned below) OR (body,contains,your road to financial
freedom)"
name="General Spam24"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (subject,contains,net service) OR
(subject,contains,network update) OR (from,contains,mail service) OR
(subject,contains,abort letter) OR (to or CC,contains,recipient@)"
name="General Spam23"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (subject,contains,you forgot to write) OR
(subject,contains,ploma) OR (subject,contains,security up) OR
(from,contains,net message) OR (from,contains,message)"
name="General Spam22"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (from,contains,Postmaster) OR (from,contains,virus) OR
(from,contains,soft update) OR (from,contains,technical bulletin) OR
(from,contains,support)"
name="General Spam21"
enabled="yes"
description=""
type="1"
action="Move to folder"
actionValue="ASpam"
condition=" OR (from,contains,net mail) OR (from,contains,email system)
OR (from,contains,storage) OR (from,contains,Customer) OR
(subject,contains,network upgrade)"
------------------------------
Date: Fri, 5 Dec 2003 09:25:25 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Regex Question
Message-Id: <slrnbt18r5.7vq.tadmc@magna.augustmail.com>
Mike Flannigan <mikeflan@earthlink.net> wrote:
>
> I have a regex question here.
>
> I have lines that look like:
Please:
Speak Perl rather than English, when possible
Perl is much more precise than natural language. Saying it in Perl
instead will avoid misunderstanding your question or problem.
[snip free-form data recast into the Perl code below]
> I'd like to get all the subject('s) printed out together,
> So it would print for the subject case:
> can we meet, size does matter, aging pill, gain inch, failure announc,
> Email Storage, Network, bug
Why should the 1st one end with a comma while the 2nd one doesn't?
> But I was wondering if there is a simple way of doing
> it on the whole line at once with a regex.
You get a list of all memories when you do a m//g in list context.
----------------------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
my @data = (
'condition=" OR (subject,contains,can we meet) OR '
. '(subject,contains,size does matter) OR '
. '(subject,contains,aging pill) OR '
. '(subject,contains,gain inch) OR '
. '(subject,contains,failure announc)"',
'condition=" OR (subject,contains,Email Storage) OR '
. '(from,contains,Net Delivery) OR '
. '(from,contains,Network) OR '
. '(subject,contains,bug ) OR '
. '(body,contains,Undeliverable mail to )"'
);
foreach ( @data ) {
my @parts = /\(subject,[^,]+,([^)]+)/g; # m//g in a list context
print join(', ', @parts), "\n";
}
----------------------------------------------------------
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Fri, 5 Dec 2003 08:52:43 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: SGML/HTML syntax trivia (was Re: any idea how to optimize this regex?)
Message-Id: <slrnbt16tr.7qc.tadmc@magna.augustmail.com>
Matt Garrish <matthew.garrish@sympatico.ca> wrote:
> "Ben Morrow" <usenet@morrow.me.uk> wrote in message
> news:bqoq0c$gms$1@wisteria.csv.warwick.ac.uk...
>> "Matt Garrish" <matthew.garrish@sympatico.ca> wrote:
>> > Html comments allow whitespace between the -- and > when you close a
>> > comment, so you'd have to write that as:
>> >
>> > <!--.*?--\s*>
>>
>> HTML (SGML) comments also allow whitespace after the '!', and anything
I believe that you are mistaken with that part.
>> matching /--\s*--/ to appear within the body of the comment. What
>> browsers will accept is another matter... ;)
But that part is true enough.
> I thought no whitespace at the start of a comment was one of the few things
> that html did enforce?
You thought correctly. The grammar[1], reformatted, is:
comment declaration =
MDO,
( comment,
( s |
comment
)*
)?
MDC
comment =
COM
SGML character*
COM
Where:
MDO (<!) Markup Declaration Open
MDC (>) Markup Declaration Close
COM (--) Comment Delimiter
s Separator ( roughly /\s/ )
So, if you have any "comment"s in the "comment declaration",
then there must be no spaces before that first one.
Note also that <!> is a "comment declaration" as well.
This is but one of the "strange corners" of SGML syntax. There
are several dozen of these. Your choices are:
1. Research the bazillion syntax oddities and code for
*all of them* in your program.
or
2. Use a module.
[1] "The SGML Handbook" Charles Goldfarb, p391
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Fri, 5 Dec 2003 15:40:21 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: SGML/HTML syntax trivia (was Re: any idea how to optimize this regex?)
Message-Id: <bqq8t5$bqh$2@wisteria.csv.warwick.ac.uk>
tadmc@augustmail.com wrote:
> > I thought no whitespace at the start of a comment was one of the few things
> > that html did enforce?
>
> You thought correctly. The grammar[1], reformatted, is:
>
<snip>
>
> So, if you have any "comment"s in the "comment declaration",
> then there must be no spaces before that first one.
>
> Note also that <!> is a "comment declaration" as well.
Bleech. SGML syntax is too obscure for words :).
> 2. Use a module.
I couldn't agree more...
Ben
--
Joy and Woe are woven fine,
A Clothing for the Soul divine William Blake
Under every grief and pine 'Auguries of Innocence'
Runs a joy with silken twine. ben@morrow.me.uk
------------------------------
Date: Fri, 5 Dec 2003 15:58:22 +0000
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: SGML/HTML syntax trivia (was Re: any idea how to optimize this regex?)
Message-Id: <Pine.LNX.4.53.0312051554390.25622@ppepc56.ph.gla.ac.uk>
On Fri, 5 Dec 2003, Tad McClellan wrote:
> Note also that <!> is a "comment declaration" as well.
And, currently, a sure-fire indicator of spam in HTML-formatted emails
- they evidently intend it to disrupt content scanners. It won't
last, of course - as soon as they realise that we're rating it for
rejection, rather than letting ourselves be fooled by the obfuscation.
> 2. Use a module.
And hope the module author has read the book too ;-)
------------------------------
Date: Fri, 5 Dec 2003 09:12:13 -0500
From: Jeff 'japhy' Pinyan <pinyaj@rpi.edu>
Subject: Re: sorting file names
Message-Id: <Pine.SGI.3.96.1031205091112.1095986A-100000@vcmr-64.server.rpi.edu>
On 5 Dec 2003, Anno Siegel wrote:
>Ingo Menger <quetzalcotl@consultant.com> wrote in comp.lang.perl.misc:
>>
>> @allfiles = map { "t-$_" }
>> sort { $a <=> $b }
>> map { s/^t-// } @allfiles;
>
>You forgot that s/// returns success or failure of the replacement, not
>the changed string. Change the last line to
>
> map { s/^t-//; $_ } @allfiles;
I'd use grep(). It saves the effort of having to return $_, and it makes
more sense because this code is designed to sort those SPECIFIC files:
@allfiles =
map "t-$_",
sort { $a <=> $b }
grep s/^t-//,
@allfiles;
--
Jeff Pinyan RPI Acacia Brother #734 2003 Rush Chairman
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
------------------------------
Date: 5 Dec 2003 06:52:51 -0800
From: genericax@hotmail.com (Sara)
Subject: Re: sorting file names
Message-Id: <776e0325.0312050652.700b3be4@posting.google.com>
mixo <mixo@beth.uniforum.org.za> wrote in message news:<bqpdjd$86k$1@ctb-nnrp2.saix.net>...
> I have file name that have a prefix "t-", which
> is followed by a number (integer), like
> t-1
> t-2
> t-3
> t-10
> t-21
> and so on.
>
> How can I sort this in numeric order? This avoid a situation
> where I get:
> t-1
> t-10
> t-2
> t-21
> t-3
> and so on.
>
> So I far have the following which suffers from the above symptom:
> ++++++++++++
> #!/usr/bin/perl -w
> opendir THISDIR, "." or die "serious dainbramage: $!";
> @allfiles = readdir THISDIR;
> @allfiles = sort @allfiles;
> #@allfiles = reverse @allfiles;
>
>
> closedir THISDIR;
> print "@allfiles\n";
> ++++++++++++
The statement
@allfiles = sort @allfiles;
is NOT a numeric sort. It's an alphanumeric sort. You'll need to use
the spaceship operator to do a numeric sort <=> . However you don't
have numbers..
Since you APPEAR to have a constant 'T-' in front of the filenames,
lets strip that off, sort numerically, then put it back. If its not a
true constant, you will have to store the prefix then put it back.
OK lets get busy. Remember, loops like for {} BAD! Let's use some Perl
instead:
#!/usr/bin/perl -wd
my @a = qw(T-1 T-12 T-5 T-3 T-6);
map s/^\D+//,@a; # stip off the prefix, don't really need the carat
but its OK
@a = sort { $a <=> $b } @a; # sort numerically using spaceship
map s/^/T-/,@a; # put the prefix back on
print "$_\n" for @a; # report success!
DB<1> c
T-1
T-3
T-5
T-6
T-12
Debugged program terminated. Use q to quit or R to restart,
Happy Holidays.
G
------------------------------
Date: Fri, 05 Dec 2003 15:25:52 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: sorting file names
Message-Id: <4s1Ab.3$ic4.2@nwrddc03.gnilink.net>
mixo wrote:
> I have file name that have a prefix "t-", which
> is followed by a number (integer), like
> t-1
> t-2
> t-3
> t-10
> t-21
> and so on.
>
> How can I sort this in numeric order?
Oh come on, it's not that difficult.
> ++++++++++++
> #!/usr/bin/perl -w
> opendir THISDIR, "." or die "serious dainbramage: $!";
> @allfiles = readdir THISDIR;
> @allfiles = sort @allfiles;
Did you read the documentation of sort()?
It clearly says:
[...]. If SUBNAME or
BLOCK is omitted, "sort"s in standard string comparison order.
And further down in the examples:
# sort numerically ascending
@articles = sort {$a <=> $b} @files;
Now, obviously your file names are not numbers, so the code block needs to
extract the number part of $a and $b before calling <=>. Because you have
such a well-defined format you could simply use substr to remove the
unwanted "t-":
sort {substr($a,2)<=>substr($b,2)} @a;
> closedir THISDIR;
> print "@allfiles\n";
> ++++++++++++
jue
------------------------------
Date: 5 Dec 2003 09:44:00 -0800
From: diberri@yahoo.com (David)
Subject: Re: sorting file names
Message-Id: <31b26f4.0312050944.35586f77@posting.google.com>
mixo <mixo@beth.uniforum.org.za> wrote:
> How can I sort this in numeric order? This avoid a situation
> where I get:
> t-1
> t-10
> t-2
> t-21
> t-3
> and so on.
Sounds like a job for Sort::Naturally, available at your local CPAN mirror.
- David
------------------------------
Date: 5 Dec 2003 10:27:18 -0800
From: jkeen@concentric.net (James E Keenan)
Subject: Re: sorting file names
Message-Id: <b955da04.0312051027.6852a7f3@posting.google.com>
mixo <mixo@beth.uniforum.org.za> wrote in message news:<bqpdjd$86k$1@ctb-nnrp2.saix.net>...
> I have file name that have a prefix "t-", which
> is followed by a number (integer), like
> t-1
> t-2
> t-3
> t-10
> t-21
> and so on.
>
> How can I sort this in numeric order? This avoid a situation
> where I get:
> t-1
> t-10
> t-2
> t-21
> t-3
> and so on.
>
> So I far have the following which suffers from the above symptom:
> ++++++++++++
> #!/usr/bin/perl -w
> opendir THISDIR, "." or die "serious dainbramage: $!";
> @allfiles = readdir THISDIR;
> @allfiles = sort @allfiles;
> #@allfiles = reverse @allfiles;
>
>
> closedir THISDIR;
> print "@allfiles\n";
> ++++++++++++
my @list = qw(t-1
t-2
t-3
t-10
t-21);
my (@files);
push (@files, (split/-/, $_)[1]) for (@list);
for (sort {$a <=> $b } @files) {
print "t-$_\n";
# do appropriate processing
}
HTH
jimk
------------------------------
Date: Fri, 05 Dec 2003 20:10:48 +0200
From: Tassos <chatasos@yahoo.com>
Subject: subroutine parameter with regex
Message-Id: <1070647958.87041@athprx02>
$var1 = "Se2:5";
if ( $var1 =~ /^Se\d/ ) {
$var1 =~ s/Se/Serial/;
&routine1($var1);
$var1 =~ s/Serial/Se/;
}
Is there a way i can call routine1 using the previous $var1 regex as its parameter, so i
don't have to double regex?
------------------------------
Date: Fri, 5 Dec 2003 18:57:22 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: subroutine parameter with regex
Message-Id: <bqqkei$lfm$1@wisteria.csv.warwick.ac.uk>
Tassos <chatasos@yahoo.com> wrote:
> $var1 = "Se2:5";
> if ( $var1 =~ /^Se\d/ ) {
> $var1 =~ s/Se/Serial/;
> &routine1($var1);
Don't call subs with & unless you know why you do it.
> $var1 =~ s/Serial/Se/;
> }
>
> Is there a way i can call routine1 using the previous $var1 regex as
> its parameter, so i don't have to double regex?
Err... I understand your Perl better than your English. I think what
you are asking for is
if( $var1 =~ /^Se\d/ ) {
(my $var2 = $var1) =~ s/Se/Serial/;
routine1($var2);
}
, but I'm not entirely sure.
Ben
--
Heracles: Vulture! Here's a titbit for you / A few dried molecules of the gall
From the liver of a friend of yours. / Excuse the arrow but I have no spoon.
(Ted Hughes, [ Heracles shoots Vulture with arrow. Vulture bursts into ]
/Alcestis/) [ flame, and falls out of sight. ] ben@morrow.me.uk
------------------------------
Date: 5 Dec 2003 10:29:48 -0800
From: awilhite@cableone.net (Andrew)
Subject: Using a hash for the post data in LWP::useragent?
Message-Id: <9b7dcecd.0312051029.1f66fa4c@posting.google.com>
Does anyone know of a way to send the post data as reference to a hash
instead of hardcoding the post data into the useragent request? I
have tried using HTTP::Request::Common and Useragent request.
for example:
$ua->request(POST 'www.someurl.com',
[ username => 'bob',
host => 'localhost',
other => '1',
other2 => '2',
other3 => '3',
]);
I would like to to be able to use the same subroutine for multiple
post data hashes by calling
my %posthash = [ username => 'bob',host => 'localhost', other => '1',
other2 => '2', other3 => '3']
$ua->request(POST $url, \%posthash); # something like this
anyone have any ideas on how to do this?
------------------------------
Date: Fri, 05 Dec 2003 19:55:41 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Using a hash for the post data in LWP::useragent?
Message-Id: <bqqkqg$265gab$1@ID-184292.news.uni-berlin.de>
Andrew wrote:
> Does anyone know of a way to send the post data as reference to a
> hash instead of hardcoding the post data into the useragent
> request? I have tried using HTTP::Request::Common and Useragent
> request.
>
> for example:
>
> $ua->request(POST 'www.someurl.com',
> [ username => 'bob',
> host => 'localhost',
> other => '1',
> other2 => '2',
> other3 => '3',
> ]);
Assuming that is correct syntax, you are passing an array reference,
not a hash reference.
> I would like to to be able to use the same subroutine for multiple
> post data hashes by calling
>
> my %posthash = [ username => 'bob',host => 'localhost',
> other => '1', other2 => '2', other3 => '3']
>
> $ua->request(POST $url, \%posthash); # something like this
Try this:
my $postarrayref = [ username => 'bob',host => 'localhost',
other => '1', other2 => '2', other3 => '3' ];
$ua->request(POST $url, $postarrayref);
(untested)
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 5899
***************************************