[31576] in Perl-Users-Digest
Perl-Users Digest, Issue: 2835 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Feb 25 18:09:25 2010
Date: Thu, 25 Feb 2010 15:09:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 25 Feb 2010 Volume: 11 Number: 2835
Today's topics:
Re: eval exit/exec (was: macros: return or exit) <marc.girod@gmail.com>
Re: eval exit/exec (was: macros: return or exit) <ben@morrow.me.uk>
MS Word metadata removal (David Griffith)
Re: OT, blowing off steam <kst-u@mib.org>
Re: OT, blowing off steam <ben@morrow.me.uk>
Parsing many emails (text files) into a CSV -type file <glenmillard@gmail.com>
Please excuse the NOOB question - go easy on me please~ <glenmillard@gmail.com>
Re: Please excuse the NOOB question - go easy on me ple <tadmc@seesig.invalid>
Re: Please excuse the NOOB question - go easy on me ple <cho.seung-hui@vt.edu>
Re: Please excuse the NOOB question - go easy on me ple <uri@StemSystems.com>
Re: Please excuse the NOOB question - go easy on me ple <smallpond@juno.com>
Re: Please excuse the NOOB question - go easy on me ple <someone@example.com>
Re: Please excuse the NOOB question - go easy on me ple <tadmc@seesig.invalid>
Would like some words of wisdom - convert text files to <glenmillard@gmail.com>
Re: Would like some words of wisdom - convert text file <uri@StemSystems.com>
Re: Would like some words of wisdom - convert text file <jurgenex@hotmail.com>
Re: Would like some words of wisdom - convert text file <ben@morrow.me.uk>
Re: Would like some words of wisdom - convert text file <glenmillard@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 25 Feb 2010 10:04:00 -0800 (PST)
From: Marc Girod <marc.girod@gmail.com>
Subject: Re: eval exit/exec (was: macros: return or exit)
Message-Id: <f11c04b8-e106-4763-af5c-83ded3250ede@c16g2000yqd.googlegroups.com>
On Feb 24, 1:44=A0pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Err... no. You can't take a ref to a builtin. The \die expression just
> calls die then and there. Also, you can only override a builtin from
> within a different package, and the override must happen at compile
> time.
>
> =A0 =A0 BEGIN { package Foo; *main::exit =3D sub { die } }
Thanks...
For both remarks. I guess I would soon have noticed the first, but I
didn't yet.
> Note that an override for &main::exit only applies to code compiled in
> package main. Other code will still see CORE::exit.
But it may be inherited?
Marc
------------------------------
Date: Thu, 25 Feb 2010 20:12:27 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: eval exit/exec (was: macros: return or exit)
Message-Id: <b62j57-n121.ln1@osiris.mauzo.dyndns.org>
Quoth Marc Girod <marc.girod@gmail.com>:
> On Feb 24, 1:44 pm, Ben Morrow <b...@morrow.me.uk> wrote:
>
> > Note that an override for &main::exit only applies to code compiled in
> > package main. Other code will still see CORE::exit.
>
> But it may be inherited?
No... not unless you call ->exit as a method on a subclass. The
builtin-override logic just looks in the current package, and
CORE::GLOBAL::.
Ben
------------------------------
Date: Thu, 25 Feb 2010 17:13:59 +0000 (UTC)
From: davidmylastname@acm.org (David Griffith)
Subject: MS Word metadata removal
Message-Id: <hm6b4n$4cp$1@frotz.eternal-september.org>
Would someone please point me towards something on removing metadata
from an MS Word .doc file? I'm trying to come up with a CGI-based
solution for the task of metadata scrubbing.
--
David Griffith
davidmylastname@acm.org <--- Put my last name where it belongs
------------------------------
Date: Thu, 25 Feb 2010 12:33:01 -0800
From: Keith Thompson <kst-u@mib.org>
Subject: Re: OT, blowing off steam
Message-Id: <lnr5o9vywy.fsf@nuthaus.mib.org>
ccc31807 <cartercc@gmail.com> writes:
[...]
> First, how to insert a non-printable character in a (vim) regular
> expression? Let's say you want to replace form feeds (^@) with
> nothing. You can see the codes for characters by giving the
> command :digraphs, and then insert the character in the regular
> expression by doing Cntl-k, two letter code.
[...]
^@ is the null character, not form feed.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
------------------------------
Date: Thu, 25 Feb 2010 22:44:35 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: OT, blowing off steam
Message-Id: <j3bj57-3u21.ln1@osiris.mauzo.dyndns.org>
Quoth Keith Thompson <kst-u@mib.org>:
> ccc31807 <cartercc@gmail.com> writes:
> [...]
> > First, how to insert a non-printable character in a (vim) regular
> > expression? Let's say you want to replace form feeds (^@) with
> > nothing. You can see the codes for characters by giving the
> > command :digraphs, and then insert the character in the regular
> > expression by doing Cntl-k, two letter code.
> [...]
>
> ^@ is the null character, not form feed.
However, vim internally does tr!\0\n!\n\0! on every line in the file
(I'm not sure why, but it's probably to do with C null-terminated
strings). This may be the source of the confusion.
Ben
------------------------------
Date: Thu, 25 Feb 2010 11:19:20 -0800 (PST)
From: GlenM <glenmillard@gmail.com>
Subject: Parsing many emails (text files) into a CSV -type file - would like some words of wisdom please.
Message-Id: <0f2eaadc-b3e2-4a62-bd99-28c1690ecd29@o30g2000yqb.googlegroups.com>
On Feb 25, 2:14=A0pm, Tad McClellan <ta...@seesig.invalid> wrote:
> GlenM <glenmill...@gmail.com> wrote:
> > Subject: Please excuse the NOOB question - go easy on me please~!
>
> Please put the subject of your article in the Subject of your article.
>
> --
> Tad McClellan
> email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
------------------------------
Date: Thu, 25 Feb 2010 11:05:34 -0800 (PST)
From: GlenM <glenmillard@gmail.com>
Subject: Please excuse the NOOB question - go easy on me please~!
Message-Id: <113cb00f-6eb2-469f-987a-5aa8261c0a5c@u9g2000yqb.googlegroups.com>
Okay;
I am sure that someone out there has done this before - I *think* I am
on the right track.
I have a directory full of emails. What I would like to do is read
each file in, then parse them into a CSV style file.
Example:
#!/usr/bin/perl
use warnings;
use strict;
open FILE , "/home/gmillard/SentMail/YourSatSetup.txt" or die $!;
my $linenum =1;
while (<FILE>) {
print "|", $linenum++;
print"$_" ;
}
Produces the following.
|1From - Sun Feb 21 11:40:01 2010
|2X-Mozilla-Status: 0001
|3X-Mozilla-Status2: 00000000
|4X-Gmail-Received: 58fa0ec68ca9975c1d187ceadc0ad3aeb1026134
|5Received: by 10.48.212.6 with HTTP; Fri, 17 Nov 2006 12:52:26 -0800
(PST)
|6Message-ID:
<234ff75a0611171252x3ea2facdw55cd81ec3a185926@mail.gmail.com>
|7Date: Fri, 17 Nov 2006 15:52:26 -0500
|8From: "xxxxxxxxxxxxxxxxxxxxxxxx>
|9To: xxxxxx@bell.blackberry.net
|10Subject: Your satellite set up. . From an article that i read.
|11MIME-Version: 1.0
|12Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|13Content-Transfer-Encoding: 7bit
|14Content-Disposition: inline
|15Delivered-To: xxxxxxxxxxxxxxxxxxxxxx
|16
|17Hi Andrew;
|18I read an article about you a while back about your MythTV and VOip
|19setup. Would you mind if i asked you some tech questions ? I am
very
|20intrigued.
|21Thanks
|22Glen xxxxxxxxxx
|23xxxxxxxxxxxxx
I have hundreds of emails in this directory. I would like to parse
them into a single file where each comma separated/tab separated field
is a line from the email.
So, the first line of the CSV file is
|1From - Sun Feb 21 11:40:01 2010|2X-Mozilla-Status: 0001|3X-Mozilla-
Status2: 00000000|4X-Gmail-Received:
58fa0ec68ca9975c1d187ceadc0ad3aeb1026134
<truncated>
and each subsequent line is the next email and so forth.
Any words of wisdom?
Thanks much.
Glen
------------------------------
Date: Thu, 25 Feb 2010 13:14:23 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Please excuse the NOOB question - go easy on me please~!
Message-Id: <slrnhodipe.lru.tadmc@tadbox.sbcglobal.net>
GlenM <glenmillard@gmail.com> wrote:
> Subject: Please excuse the NOOB question - go easy on me please~!
Please put the subject of your article in the Subject of your article.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
------------------------------
Date: Thu, 25 Feb 2010 14:29:03 -0500
From: Richard McBeef <cho.seung-hui@vt.edu>
Subject: Re: Please excuse the NOOB question - go easy on me please~!
Message-Id: <hm6j20$lrl$1@speranza.aioe.org>
Tad McClellan wrote:
> GlenM <glenmillard@gmail.com> wrote:
>
>> Subject: Please excuse the NOOB question - go easy on me please~!
>
>
> Please put the subject of your article in the Subject of your article.
Being mean to newbies is not a good way to
promote the use of perl.
Got it!?!?
------------------------------
Date: Thu, 25 Feb 2010 14:57:45 -0500
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Please excuse the NOOB question - go easy on me please~!
Message-Id: <87635lytom.fsf@quad.sysarch.com>
>>>>> "RM" == Richard McBeef <cho.seung-hui@vt.edu> writes:
RM> Tad McClellan wrote:
>> GlenM <glenmillard@gmail.com> wrote:
>>
>>> Subject: Please excuse the NOOB question - go easy on me please~!
>>
>>
>> Please put the subject of your article in the Subject of your article.
RM> Being mean to newbies is not a good way to
RM> promote the use of perl.
RM> Got it!?!?
no, teaching a newbie how to best ask a question is helping him. your
flaming a regular here and not addressing the newbie question is less
helpful. so please flame yourself for that. got it?!?!
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Thu, 25 Feb 2010 15:20:32 -0500
From: Steve C <smallpond@juno.com>
Subject: Re: Please excuse the NOOB question - go easy on me please~!
Message-Id: <hm6m30$hbr$1@news.eternal-september.org>
Richard McBeef wrote:
> Tad McClellan wrote:
>> GlenM <glenmillard@gmail.com> wrote:
>>
>>> Subject: Please excuse the NOOB question - go easy on me please~!
>>
>>
>> Please put the subject of your article in the Subject of your article.
> Being mean to newbies is not a good way to
> promote the use of perl.
> Got it!?!?
>
Since when is saying please "Being mean"? It's a valid correction.
The subject of this thread is NOOB question, not promoting the use of perl.
------------------------------
Date: Thu, 25 Feb 2010 13:21:03 -0800
From: "John W. Krahn" <someone@example.com>
Subject: Re: Please excuse the NOOB question - go easy on me please~!
Message-Id: <3RBhn.4$NH1.2@newsfe14.iad>
GlenM wrote:
> Okay;
>
> I am sure that someone out there has done this before - I *think* I am
> on the right track.
>
> I have a directory full of emails. What I would like to do is read
> each file in, then parse them into a CSV style file.
>
> Example:
>
>
> #!/usr/bin/perl
>
> use warnings;
> use strict;
>
> open FILE , "/home/gmillard/SentMail/YourSatSetup.txt" or die $!;
> my $linenum =1;
>
> while (<FILE>) {
> print "|", $linenum++;
> print"$_" ;
> }
>
> Produces the following.
>
> |1From - Sun Feb 21 11:40:01 2010
> |2X-Mozilla-Status: 0001
> |3X-Mozilla-Status2: 00000000
> |4X-Gmail-Received: 58fa0ec68ca9975c1d187ceadc0ad3aeb1026134
> |5Received: by 10.48.212.6 with HTTP; Fri, 17 Nov 2006 12:52:26 -0800
> (PST)
> |6Message-ID:
> <234ff75a0611171252x3ea2facdw55cd81ec3a185926@mail.gmail.com>
> |7Date: Fri, 17 Nov 2006 15:52:26 -0500
> |8From: "xxxxxxxxxxxxxxxxxxxxxxxx>
> |9To: xxxxxx@bell.blackberry.net
> |10Subject: Your satellite set up. . From an article that i read.
> |11MIME-Version: 1.0
> |12Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> |13Content-Transfer-Encoding: 7bit
> |14Content-Disposition: inline
> |15Delivered-To: xxxxxxxxxxxxxxxxxxxxxx
> |16
> |17Hi Andrew;
> |18I read an article about you a while back about your MythTV and VOip
> |19setup. Would you mind if i asked you some tech questions ? I am
> very
> |20intrigued.
> |21Thanks
> |22Glen xxxxxxxxxx
> |23xxxxxxxxxxxxx
>
>
> I have hundreds of emails in this directory. I would like to parse
> them into a single file where each comma separated/tab separated field
> is a line from the email.
>
> So, the first line of the CSV file is
> |1From - Sun Feb 21 11:40:01 2010|2X-Mozilla-Status: 0001|3X-Mozilla-
> Status2: 00000000|4X-Gmail-Received:
> 58fa0ec68ca9975c1d187ceadc0ad3aeb1026134
> <truncated>
>
> and each subsequent line is the next email and so forth.
>
> Any words of wisdom?
UNTESTED:
#!/usr/bin/perl
use warnings;
use strict;
local @ARGV = glob "/home/gmillard/SentMail/*.txt";
while ( <> ) {
chomp;
print "|$.$_";
if ( eof ) {
close ARGV;
print "\n";
}
}
John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway
------------------------------
Date: Thu, 25 Feb 2010 15:28:43 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Please excuse the NOOB question - go easy on me please~!
Message-Id: <slrnhodqlb.m5p.tadmc@tadbox.sbcglobal.net>
Richard McBeef <cho.seung-hui@vt.edu> wrote:
> Tad McClellan wrote:
>> GlenM <glenmillard@gmail.com> wrote:
>>
>>> Subject: Please excuse the NOOB question - go easy on me please~!
>>
>>
>> Please put the subject of your article in the Subject of your article.
> Being mean to newbies is not a good way to
> promote the use of perl.
Being mean to every participant, present and future, of this
newsgroup by hiding what its articles are about is not a good
way to promote the use of this newsgroup.
> Got it!?!?
Got that?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
------------------------------
Date: Thu, 25 Feb 2010 11:31:42 -0800 (PST)
From: GlenM <glenmillard@gmail.com>
Subject: Would like some words of wisdom - convert text files to CSV
Message-Id: <1363dcb9-4d74-46ac-82a1-589e94ee97c0@t23g2000yqt.googlegroups.com>
Okay;
I am sure that someone out there has done this before - I *think* I am
on the right track.
I have a directory full of emails. What I would like to do is read
each file in, then parse them into a CSV style file.
Example:
#!/usr/bin/perl
use warnings;
use strict;
open FILE , "/home/gmillard/SentMail/YourSatSetup.txt" or die $!;
my $linenum =1;
while (<FILE>) {
print "|", $linenum++;
print"$_" ;
}
Produces the following.
|1From - Sun Feb 21 11:40:01 2010
|2X-Mozilla-Status: 0001
|3X-Mozilla-Status2: 00000000
|4X-Gmail-Received: 58fa0ec68ca9975c1d187ceadc0ad3aeb1026134
|5Received: by 10.48.212.6 with HTTP; Fri, 17 Nov 2006 12:52:26 -0800
(PST)
|6Message-ID:
<234ff75a0611171252x3ea2facdw55cd81ec3a185...@mail.gmail.com>
|7Date: Fri, 17 Nov 2006 15:52:26 -0500
|8From: "xxxxxxxxxxxxxxxxxxxxxxxx>
|9To: xxx...@bell.blackberry.net
|10Subject: Your satellite set up. . From an article that i read.
|11MIME-Version: 1.0
|12Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|13Content-Transfer-Encoding: 7bit
|14Content-Disposition: inline
|15Delivered-To: xxxxxxxxxxxxxxxxxxxxxx
|16
|17Hi Andrew;
|18I read an article about you a while back about your MythTV and VOip
|19setup. Would you mind if i asked you some tech questions ? I am
very
|20intrigued.
|21Thanks
|22Glen xxxxxxxxxx
|23xxxxxxxxxxxxx
I have hundreds of emails in this directory. I would like to parse
them into a single file where each comma separated/tab separated field
is a line from the email.
So, the first line of the CSV file is
|1From - Sun Feb 21 11:40:01 2010|2X-Mozilla-Status: 0001|3X-Mozilla-
Status2: 00000000|4X-Gmail-Received:
58fa0ec68ca9975c1d187ceadc0ad3aeb1026134
<truncated>
and each subsequent line is the next email and so forth.
Any words of wisdom?
Thanks much.
Glen
------------------------------
Date: Thu, 25 Feb 2010 15:05:59 -0500
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Would like some words of wisdom - convert text files to CSV
Message-Id: <871vg9ytaw.fsf@quad.sysarch.com>
>>>>> "G" == GlenM <glenmillard@gmail.com> writes:
G> I have a directory full of emails. What I would like to do is read
G> each file in, then parse them into a CSV style file.
you need to be much clearer. is each mail file to be written out as a
single csv file? will the file names stay the same?
G> use warnings;
G> use strict;
good.
G> open FILE , "/home/gmillard/SentMail/YourSatSetup.txt" or die $!;
G> my $linenum =1;
be more consistant with spacing there.
my $linenum = 1;
G> while (<FILE>) {
G> print "|", $linenum++;
G> print"$_" ;
you don't need the quotes around $_ and it even can be an error in some
cases. don't unnecessarily quote scalar vars.
also that prints to stdout. if you want to do this per file and keep the
results you need to open an output file. and you will need an outer loop
to scan all the files. will they all be in a directory? passed in on the
command line into @ARGV? you need to ask and answer these questions.
G> Produces the following.
G> |1From - Sun Feb 21 11:40:01 2010
that isn't a csv format or anything but what you have printed.
G> I have hundreds of emails in this directory. I would like to parse
G> them into a single file where each comma separated/tab separated field
G> is a line from the email.
you aren't doing any parsing. reading line by line isn't
parsing. splitting on lines is what it would be called.
G> So, the first line of the CSV file is
G> |1From - Sun Feb 21 11:40:01 2010|2X-Mozilla-Status: 0001|3X-Mozilla-
G> Status2: 00000000|4X-Gmail-Received:
G> 58fa0ec68ca9975c1d187ceadc0ad3aeb1026134
G> <truncated>
well, think about your current output. why does it put each field (line)
on its own line? i will let you answer that first and then you can
easily fix it.
G> and each subsequent line is the next email and so forth.
G> Any words of wisdom?
that is a very strange format and will make for extremely long csv lines
(not a problem but just odd). also you are putting the line number in
front of each line. why? you can count the fields (lines). what happens
if a text line in an email starts with a number? then it will be next to
your line number making it hard to parse out the line number. also your
format starts with | so it means there is a leading empty field in the
csv. not a big problem but something to be aware of.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Thu, 25 Feb 2010 12:06:22 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Would like some words of wisdom - convert text files to CSV
Message-Id: <kdldo5dt3djp06j5v0f5djclhi9ajp9kcq@4ax.com>
GlenM <glenmillard@gmail.com> wrote:
>I am sure that someone out there has done this before - I *think* I am
>on the right track.
>
>I have a directory full of emails. What I would like to do is read
>each file in, then parse them into a CSV style file.
>
>Example:
>
>#!/usr/bin/perl
>
>use warnings;
>use strict;
Good, thank you.
>open FILE , "/home/gmillard/SentMail/YourSatSetup.txt" or die $!;
>my $linenum =1;
Perl already maintains a current input line counter for you, see $. in
'perldoc perlvar'
Obviously you need an additional outer loop to loop through all the
files. To get the file names please see 'perldoc opendir' and 'perldoc
readdir' and then just foreach(...){...} over those file names.
>while (<FILE>) {
> print "|", $linenum++;
> print"$_" ;
$_ already contains a string, therefore there is no need to stringify it
again. Actually there are situations where stringifying a variable
causes unintentional effects, therefore you should not do it unless you
want those effects. Please see "perldoc -q quoting".
>Produces the following.
Missing: how does this output fail to match your expectations, i.e. what
is wrong with it?
>|1From - Sun Feb 21 11:40:01 2010
>|2X-Mozilla-Status: 0001
>|3X-Mozilla-Status2: 00000000
I am guessing (but I may be totally wrong) one issue might be that you
want all those lines merged in to one line? If that is the case then
please see "perldoc -f chomp'.
If there any other issues please let us know. My crystal ball is out for
repairs.
jue
------------------------------
Date: Thu, 25 Feb 2010 22:41:38 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Would like some words of wisdom - convert text files to CSV
Message-Id: <2uaj57-3u21.ln1@osiris.mauzo.dyndns.org>
Quoth Jürgen Exner <jurgenex@hotmail.com>:
>
> Obviously you need an additional outer loop to loop through all the
> files. To get the file names please see 'perldoc opendir' and 'perldoc
> readdir' and then just foreach(...){...} over those file names.
It's nearly always easier to use File::Slurp::read_dir. Apart from
anything else, it will remove "." and ".." for you.
Ben
------------------------------
Date: Thu, 25 Feb 2010 14:54:48 -0800 (PST)
From: GlenM <glenmillard@gmail.com>
Subject: Re: Would like some words of wisdom - convert text files to CSV
Message-Id: <f6dbdaaf-05d7-4c73-a890-000096b179ab@k41g2000yqm.googlegroups.com>
Thanks for stopping the bullies - appreciate it!
Okay, First question:
> you need to be much clearer. is each mail file to be written out as a
> single csv file? will the file names stay the same?
well, it doesn't really matter, but if it was one big CSV file, it
would probably be too big. So, I can break them up into groups
manually. So, for now, one big CSV file for all emails.
Second:
also that prints to stdout. if you want to do this per file and keep
the
> results you need to open an output file. and you will need an outer loop
> to scan all the files. will they all be in a directory? passed in on the
> command line into @ARGV? you need to ask and answer these questions.
I can redirect the output to a file - yes, I see that it goes to
STDOUT. Not really looking for bells and whistles.
I will scan in every file in the directory - like I said previously,
it is a sheet-load of data so, I will split up the files.
Third:
well, think about your current output. why does it put each field
(line)
> on its own line? i will let you answer that first and then you can
> easily fix it.
I would like to have each field in a different 'column' or be
separated by a "|" or a "," (hence CSV). So, that is the next hurdle.
Fourth
that is a very strange format and will make for extremely long csv
lines
> (not a problem but just odd). also you are putting the line number in
> front of each line. why? you can count the fields (lines). what happens
> if a text line in an email starts with a number? then it will be next to
> your line number making it hard to parse out the line number. also your
> format starts with | so it means there is a leading empty field in the
> csv. not a big problem but something to be aware of.
Well, I just want to get all of the emails into a spreadsheet format,
then they are easier to work with. I can massage the data manually
afterward. Just want to get it into a CSV - maybe I can get fancy once
I get the fundamentals down.
Thank you for your response.
Glen
On Feb 25, 3:05=A0pm, "Uri Guttman" <u...@StemSystems.com> wrote:
> >>>>> "G" =3D=3D GlenM =A0<glenmill...@gmail.com> writes:
>
> =A0 G> I have a directory full of emails. What I would like to do is read
> =A0 G> each file in, then parse them into a CSV style file.
>
> you need to be much clearer. is each mail file to be written out as a
> single csv file? will the file names stay the same?
>
> =A0 G> use warnings;
> =A0 G> use strict;
>
> good.
>
> =A0 G> open FILE , "/home/gmillard/SentMail/YourSatSetup.txt" or die $!;
> =A0 G> my $linenum =3D1;
>
> be more consistant with spacing there.
>
> =A0 =A0 =A0 =A0 =A0my $linenum =3D 1;
>
> =A0 G> while (<FILE>) {
> =A0 G> =A0 =A0 =A0 =A0 print "|", $linenum++;
> =A0 G> =A0 =A0 =A0 =A0 print"$_" ;
>
> you don't need the quotes around $_ and it even can be an error in some
> cases. don't unnecessarily quote scalar vars.
>
> also that prints to stdout. if you want to do this per file and keep the
> results you need to open an output file. and you will need an outer loop
> to scan all the files. will they all be in a directory? passed in on the
> command line into @ARGV? you need to ask and answer these questions.
>
> =A0 G> Produces the following.
>
> =A0 G> |1From - Sun Feb 21 11:40:01 2010
>
> that isn't a csv format or anything but what you have printed.
>
> =A0 G> I have hundreds of emails in this directory. I would like to parse
> =A0 G> them into a single file where each comma separated/tab separated f=
ield
> =A0 G> is a line from the email.
>
> you aren't doing any parsing. reading line by line isn't
> parsing. splitting on lines is what it would be called.
>
> =A0 G> So, the first line of the CSV file is
> =A0 G> |1From - Sun Feb 21 11:40:01 2010|2X-Mozilla-Status: 0001|3X-Mozil=
la-
> =A0 G> Status2: 00000000|4X-Gmail-Received:
> =A0 G> 58fa0ec68ca9975c1d187ceadc0ad3aeb1026134
> =A0 G> <truncated>
>
> well, think about your current output. why does it put each field (line)
> on its own line? i will let you answer that first and then you can
> easily fix it.
>
> =A0 G> and each subsequent line is the next email and so forth.
> =A0 G> Any words of wisdom?
>
> that is a very strange format and will make for extremely long csv lines
> (not a problem but just odd). also you are putting the line number in
> front of each line. why? you can count the fields (lines). what happens
> if a text line in an email starts with a number? then it will be next to
> your line number making it hard to parse out the line number. also your
> format starts with | so it means there is a leading empty field in the
> csv. not a big problem but something to be aware of.
>
> uri
>
> --
> Uri Guttman =A0------ =A0u...@stemsystems.com =A0-------- =A0http://www.s=
ysarch.com--
> ----- =A0Perl Code Review , Architecture, Development, Training, Support =
------
> --------- =A0Gourmet Hot Cocoa Mix =A0---- =A0http://bestfriendscocoa.com=
---------
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2835
***************************************