[32607] in Perl-Users-Digest
Perl-Users Digest, Issue: 3880 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 15 09:09:22 2013
Date: Fri, 15 Feb 2013 06:09:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 15 Feb 2013 Volume: 11 Number: 3880
Today's topics:
Re: A little help with Perl & Email Messages (Seymour J.)
Re: A little help with Perl & Email Messages <*@eli.users.panix.com>
Re: A little help with Perl & Email Messages <ben@morrow.me.uk>
Re: A little help with Perl & Email Messages <*@eli.users.panix.com>
Re: Date in CSV/TSV question <ben@morrow.me.uk>
Re: Date in CSV/TSV question <rweikusat@mssgmbh.com>
Re: Date in CSV/TSV question <ben@morrow.me.uk>
Re: Date in CSV/TSV question <uri@stemsystems.com>
Pattern matching [newbie] <vivekchaurasiya@gmail.com>
Re: Pattern matching [newbie] <jurgenex@hotmail.com>
Re: Pattern matching [newbie] <news@lawshouse.org>
Re: Pattern matching [newbie] <brian.d.foy@gmail.com>
Re: Pattern matching [newbie] <jurgenex@hotmail.com>
size per day of a directory <nospam.gravitalsun@hotmail.com.nospam>
Re: size per day of a directory <nospam.gravitalsun@hotmail.com.nospam>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Mon, 11 Feb 2013 09:28:20 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: A little help with Perl & Email Messages
Message-Id: <51190004$22$fuzhry+tra$mr2ice@news.patriot.net>
In <T8mdnTQC8Jfy1IjMnZ2dnUVZ8l2dnZ2d@giganews.com>, on 02/08/2013
at 06:52 PM, Henry Law <news@lawshouse.org> said:
>When you constrain Outlook to send in plain text there IS no MIME
>content (that's what plain text means),
Not quite; un fact, text/plain is a MIME type; well, text is a type
and plain is a subtype. I would hope that if outhose is set to plain
text then it will send a single MIME part with text/plain, with
appropriate charset in Content-Type and encoding in
Content-Transfer-Encoding. Certainly if the body is to contain
anything but ASCII then it must be encoded in accordance with RFC
2045.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Thu, 14 Feb 2013 20:32:08 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: A little help with Perl & Email Messages
Message-Id: <eli$1302141522@qz.little-neck.ny.us>
In comp.lang.perl.misc, Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth Henry Law <news@lawshouse.org>:
> > So you need to use the parts() method of your "first" message to
> > retrieve the individual sub-parts (they are returned as an array of
> > Email::MIME objects). Once you have the sub-parts you can find the one
> > that has the body you want (hint: look at the content disposition),
>
> Content-Disposition was not part of the original MIME spec, so body
> parts might not have that header. The spec for multipart/related
> (usually used for attachments) explicitly says that the rules for that
> type override Content-Disposition handling, where they conflict. The
> only way to find 'the body' of a message is by a careful parsing of the
> various multipart types involved, checking Content-Type and -Disposition
> where necessary.
I provide, as a counter example for that working, this skeleton of a
message:
Content-Type: multipart/mixed; boundary=Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (1.0)
X-Mailer: iPhone Mail (9B206)
--Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
(something the sender typed)
--Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C
Content-Disposition: inline;
filename=photo.JPG
Content-Type: image/jpeg;
name=photo.JPG
Content-Transfer-Encoding: base64
9j/4S/+RXhpZgAA ...
--Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
Sent from my iPhone
--Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C--
Notice that there are two text/plain segments, the significance of each
is completely unintuitive from the MIME part headers. It is very fair
to say that this message (and the hundreds of similar ones I've
received) does not have a part that can be called "the body" but has
multiple parts which must be concatenated to make a body. In all the
examples I have from this sender, the second part is only the "Sent"
line, but I believe that is merely a factor of the sender always
describing the photo above the image instead of below it.
Elijah
------
has written mime parsers for personal use
------------------------------
Date: Thu, 14 Feb 2013 22:04:15 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: A little help with Perl & Email Messages
Message-Id: <vj30v9-bjh2.ln1@anubis.morrow.me.uk>
Quoth Eli the Bearded <*@eli.users.panix.com>:
> In comp.lang.perl.misc, Ben Morrow <ben@morrow.me.uk> wrote:
> > Quoth Henry Law <news@lawshouse.org>:
> > > So you need to use the parts() method of your "first" message to
> > > retrieve the individual sub-parts (they are returned as an array of
> > > Email::MIME objects). Once you have the sub-parts you can find the one
> > > that has the body you want (hint: look at the content disposition),
> >
> > Content-Disposition was not part of the original MIME spec, so body
> > parts might not have that header. The spec for multipart/related
> > (usually used for attachments) explicitly says that the rules for that
> > type override Content-Disposition handling, where they conflict. The
> > only way to find 'the body' of a message is by a careful parsing of the
> > various multipart types involved, checking Content-Type and -Disposition
> > where necessary.
>
> I provide, as a counter example for that working,
Who are you countering, me or Henry?
> this skeleton of a message:
>
<headers trimmed>
> Content-Type: multipart/mixed;
>
> --Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C
> Content-Type: text/plain; charset=us-ascii
>
> (something the sender typed)
>
> --Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C
> Content-Disposition: inline; filename=photo.JPG
> Content-Type: image/jpeg; name=photo.JPG
>
> --Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C
> Content-Type: text/plain; charset=us-ascii
>
> Sent from my iPhone
> --Apple-Mail-4EBCAD0C-D24C-41E0-8E4A-6773E9087E6C--
>
> Notice that there are two text/plain segments, the significance of each
> is completely unintuitive from the MIME part headers. It is very fair
> to say that this message (and the hundreds of similar ones I've
> received) does not have a part that can be called "the body" but has
> multiple parts which must be concatenated to make a body. In all the
> examples I have from this sender, the second part is only the "Sent"
> line, but I believe that is merely a factor of the sender always
> describing the photo above the image instead of below it.
Yes, I have noticed this behaviour from Apple Mail before. Strictly it
conforms to the definition of multipart/mixed:
5.1.3. Mixed Subtype
The "mixed" subtype of "multipart" is intended for use when the
body parts are independent and need to be bundled in a particular
order. Any "multipart" subtypes that an implementation does not
recognize must be treated as being of subtype "mixed".
though IME some MIME readers don't handle it very well.
Ben
------------------------------
Date: Thu, 14 Feb 2013 22:43:56 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: A little help with Perl & Email Messages
Message-Id: <eli$1302141743@qz.little-neck.ny.us>
In comp.lang.perl.misc, Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth Eli the Bearded <*@eli.users.panix.com>:
> > I provide, as a counter example for that working,
> Who are you countering, me or Henry?
Henry talked of finding "the body" and you talked of using the
part headers to do so. But "the body" implies strongly that
there is one part that can be considered the entire message
body, which is not true.
> Yes, I have noticed this behaviour from Apple Mail before. Strictly it
> conforms to the definition of multipart/mixed:
I never meant to imply that it was non-confomant, but just that
it is a nasty real-world edge case.
Elijah
------
nasty theoretical edge cases exist, too
------------------------------
Date: Tue, 12 Feb 2013 06:09:00 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Date in CSV/TSV question
Message-Id: <ss2pu9-4ut.ln1@anubis.morrow.me.uk>
Quoth Ben Goldberg <ben-goldberg@hotmail.com>:
> On Wednesday, January 2, 2013 10:37:02 AM UTC-5, Rainer Weikusat wrote:
> >
> > %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May
> Jun Jul Aug Sep Oct Nov Dec);
> >
> >
> >
> > while (<>) {
> >
> > s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
> >
> > print;
> >
> > }
> >
> > -----------
>
> Don't forget that you can use perl's "command line" switches even when
> you put your program in a file.
> #!/usr/bin/perl -pi.bak
> BEGIN {
> %months = map {;$_, sprintf('%02d', ++$n)}
> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
> }
> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
> __END__
There's no need to muck about with the #! line and BEGIN blocks, both of
which would make it impossible to turn this into a subroutine later:
my %months = ...;
local $^I = ".bak";
while (<>) { ... }
The edit-in-place handling, including renaming the old file and opening
and selecting ARGVOUT, is done by the no-filehandle <> operator (or an
explicit <ARGV> or readline(ARGV)) whenever $^I is set. If you want to
in-place edit a custom list of files, you can also localise @ARGV.
Ben
------------------------------
Date: Tue, 12 Feb 2013 13:10:55 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Date in CSV/TSV question
Message-Id: <87y5etvfs0.fsf@sapphire.mobileactivedefense.com>
Ben Goldberg <ben-goldberg@hotmail.com> writes:
> On Wednesday, January 2, 2013 10:37:02 AM UTC-5, Rainer Weikusat wrote:
[...]
>> -----------
>>
>> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
>>
>> while (<>) {
>> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
>> print;
>> }
>>
>> -----------
>
> Don't forget that you can use perl's "command line" switches even when you put your program in a file.
> #!/usr/bin/perl -pi.bak
> BEGIN {
> %months = map {;$_, sprintf('%02d', ++$n)}
> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
> }
> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
> __END__
The 'BEGIN' serves no useful purpose here: %months needs to be
initialized before the while-loop uses it. Since statements in a file
are executed consecutively (anything else would probably be 'a little
confusing' :-), this will be the case with either variant.
As I wrote in another posting: If perl hadn't been told to destroy the
input file, also telling it to make a backup of that before doing so
wasn't necessary. While this probably doesn't matter much for a
trivial example like this, 'not using -i' also means that the code can
be debugged and fixed without constantly renaming files or losing the
original input file altogether in case the 'backup request' was
accidentally forgotten. This also enables use of the script(let) as
'another filter' in a more complicated pipeline.
------------------------------
Date: Tue, 12 Feb 2013 18:41:08 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Date in CSV/TSV question
Message-Id: <4vequ9-pc71.ln1@anubis.morrow.me.uk>
Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Ben Goldberg <ben-goldberg@hotmail.com> writes:
> >
> > Don't forget that you can use perl's "command line" switches even when
> you put your program in a file.
> > #!/usr/bin/perl -pi.bak
> > BEGIN {
> > %months = map {;$_, sprintf('%02d', ++$n)}
> > qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
> > }
> > s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
> > __END__
>
> The 'BEGIN' serves no useful purpose here: %months needs to be
> initialized before the while-loop uses it. Since statements in a file
> are executed consecutively (anything else would probably be 'a little
> confusing' :-), this will be the case with either variant.
You missed the -p. Ben's code is equivalent to
$^I = ".bak";
while (<>) {
BEGIN {
%months = ...;
}
s/.../.../;
}
continue { print }
which is certainly a rather confusing way to write that loop, but does
require the BEGIN to work properly since with -p there is no other way
to move code outside the while loop.
(This is where BEGIN came from originally: perl -p will treat BEGIN and
END the same way awk does. The later expansion into a general 'run code
at compile time' feature was something of an accident.)
Ben
(a different Ben)
------------------------------
Date: Thu, 14 Feb 2013 01:38:29 -0500
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Date in CSV/TSV question
Message-Id: <87txpfiemy.fsf@stemsystems.com>
>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
BM> There's no need to muck about with the #! line and BEGIN blocks, both of
BM> which would make it impossible to turn this into a subroutine later:
BM> my %months = ...;
BM> local $^I = ".bak";
BM> while (<>) { ... }
BM> The edit-in-place handling, including renaming the old file and opening
BM> and selecting ARGVOUT, is done by the no-filehandle <> operator (or an
BM> explicit <ARGV> or readline(ARGV)) whenever $^I is set. If you want to
BM> in-place edit a custom list of files, you can also localise @ARGV.
and File::Slurp has edit_file and edit_file_lines which are even easier
to use.
i do need to add a backup file option to those.
uri
------------------------------
Date: Tue, 12 Feb 2013 16:16:58 -0800 (PST)
From: vivek_12315 <vivekchaurasiya@gmail.com>
Subject: Pattern matching [newbie]
Message-Id: <678ed33b-3479-46bb-b6d3-42550a85c68c@googlegroups.com>
I m working on my perl regex code, where I have to parse a html line like :
<a href="/question?id=15422849"><p>MY text here 1</p><p>MY text here 2</p><p>MY text here 3</p></a>
I am doing something like:
$string =~ m/(.*)href(.*)/;
But this is not helping me in what I want. I want something closer to following text:
"MY text here 1 MY text here 2 MY text here 3"
Can some give some ideas ?
------------------------------
Date: Tue, 12 Feb 2013 16:28:06 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Pattern matching [newbie]
Message-Id: <6cnlh8dglstg9q2km94ac4jk90gjalnm3o@4ax.com>
vivek_12315 <vivekchaurasiya@gmail.com> wrote:
>I m working on my perl regex code, where I have to parse a html line like :
>
> <a href="/question?id=15422849"><p>MY text here 1</p><p>MY text here 2</p><p>MY text here 3</p></a>
>
>I am doing something like:
>$string =~ m/(.*)href(.*)/;
>
>But this is not helping me in what I want. I want something closer to following text:
>"MY text here 1 MY text here 2 MY text here 3"
>
>Can some give some ideas ?
Your Question used to be Asked Frequently. Please see
perldoc -q "remove html"
jue
------------------------------
Date: Wed, 13 Feb 2013 18:28:32 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: Pattern matching [newbie]
Message-Id: <77Wdnezt4s7NRobMnZ2dnUVZ8mCdnZ2d@giganews.com>
On 13/02/13 00:16, vivek_12315 wrote:
> I m working on my perl regex code, where I have to parse a html line like :
>
> <a href="/question?id=15422849"><p>MY text here 1</p><p>MY text here 2</p><p>MY text here 3</p></a>
I appreciate that you call yourself a newbie, and to you what I'm about
to suggest may seem complicated and difficult; but that's the way we all
learn ...
Have you thought of parsing the HTML properly, using a module like
HTML::Tree or HTML::TreeBuilder? The hardest part is choosing the
module; after that you should find it moderately easy to use it do what
you want, since it's pretty simple. And once you've done it it will
probably be a lot better than hand-cranked parsing code.
Note to all concerned: I'm not joining in the "you can't parse HTML with
regexes" thread. In this case, at least, I'm sure that's perfectly
possible. I'm suggesting a way by which a wise newbie can get the job
done and learn something forbye.
--
Henry Law Manchester, England
------------------------------
Date: Wed, 13 Feb 2013 15:27:36 -0500
From: brian d foy <brian.d.foy@gmail.com>
Subject: Re: Pattern matching [newbie]
Message-Id: <130220131527366393%brian.d.foy@gmail.com>
In article <678ed33b-3479-46bb-b6d3-42550a85c68c@googlegroups.com>,
vivek_12315 <vivekchaurasiya@gmail.com> wrote:
> I m working on my perl regex code, where I have to parse a html line like :
>
> <a href="/question?id=15422849"><p>MY text here 1</p><p>MY text here
> 2</p><p>MY text here 3</p></a>
>
> I am doing something like:
> $string =~ m/(.*)href(.*)/;
>
> But this is not helping me in what I want. I want something closer to
> following text:
>
> "MY text here 1 MY text here 2 MY text here 3"
http://search.cpan.org/dist/HTML-Strip/Strip.pm
------------------------------
Date: Wed, 13 Feb 2013 12:54:31 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Pattern matching [newbie]
Message-Id: <43vnh8drnpdj104pvpc3vq1jvt6d9b1sev@4ax.com>
Henry Law <news@lawshouse.org> wrote:
>On 13/02/13 00:16, vivek_12315 wrote:
>> I m working on my perl regex code, where I have to parse a html line like :
>>
>> <a href="/question?id=15422849"><p>MY text here 1</p><p>MY text here 2</p><p>MY text here 3</p></a>
>
>I appreciate that you call yourself a newbie, and to you what I'm about
>to suggest may seem complicated and difficult; but that's the way we all
>learn ...
>
>Have you thought of parsing the HTML properly, using a module like
>HTML::Tree or HTML::TreeBuilder? The hardest part is choosing the
>module; after that you should find it moderately easy to use it do what
>you want, since it's pretty simple. And once you've done it it will
>probably be a lot better than hand-cranked parsing code.
>
>Note to all concerned: I'm not joining in the "you can't parse HTML with
>regexes" thread. In this case, at least, I'm sure that's perfectly
>possible.
Actually for this particular example it is almost trivial(*):
s/<.*?>//g;
Of course this is going to fail as soon as the HTML code becomes a tiny
bit more complex.
*: almost because it doesn't add the space characters between the
individual paragraph elements.
jue
------------------------------
Date: Thu, 14 Feb 2013 12:57:01 +0200
From: "George Bouras" <nospam.gravitalsun@hotmail.com.nospam>
Subject: size per day of a directory
Message-Id: <op.wshspbi7r9cqtf@pc10759.unisystems.gr>
it is simple but someone may find it useful
http://www.easybytez.com/73uhlur5mvgv
--
Using Opera's mail client: http://www.opera.com/mail/
------------------------------
Date: Thu, 14 Feb 2013 12:59:54 +0200
From: "George Bouras" <nospam.gravitalsun@hotmail.com.nospam>
Subject: Re: size per day of a directory
Message-Id: <op.wshst4vqr9cqtf@pc10759.unisystems.gr>
here it is
#!/usr/bin/perl
# Show the data size per date
# data_per_day.pl /data/cache
use strict;
use warnings;
use File::Find;
my $dir = exists $ARGV[0] ? ( -d $ARGV[0] ? $ARGV[0] : die "Not existing
directory: $ARGV[0]\n" ) : ( die "Specify an existing directory as
argument\n" );
my %Info = ();
my @Month = qw/Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec/;
File::Find::find({wanted=>sub {
return unless -f $File::Find::name;
my $size = -s _;
my @Time = localtime( (stat _)[9] ); $Time[5]+=1900;
my $key = sprintf "%04d%02d%02d", @Time[5,4,3];
$Info{$key}->{'total files'}++ ;
$Info{$key}->{'total size'} += $size;
},no_chdir=>1, bydepth=>0, follow=>0}, $dir);
foreach my $date (sort {$a <=> $b} keys %Info) {
my ($y,$m,$d) = $date =~/^(\d{4})(\d\d)(\d\d)/;
print "$d $Month[$m] $y\n";
print "\ttotal files : $Info{$date}->{'total files'}\n";
print "\ttotal size : ", __Human_size( $Info{$date}->{'total size'} )
,"\n"
}
sub __Human_size {
if ( $_[0] < 1024 ) { return "$_[0] Bytes" }
elsif ( $_[0] < 1024**2 ) { return sprintf "%.1f Kb" , ($_[0]/(1024**1)) }
elsif ( $_[0] < 1024**3 ) { return sprintf "%.1f Mb" , ($_[0]/(1024**2)) }
elsif ( $_[0] < 1024**4 ) { return sprintf "%.2f Gb" , ($_[0]/(1024**3)) }
elsif ( $_[0] < 1024**5 ) { return sprintf "%.2f Tb" , ($_[0]/(1024**4)) }
elsif ( $_[0] < 1024**6 ) { return sprintf "%.2f Pb" , ($_[0]/(1024**5)) }
elsif ( $_[0] < 1024**7 ) { return sprintf "%.2f Eb" , ($_[0]/(1024**6)) }
else { return $_[0] }
}
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3880
***************************************