[24525] in Perl-Users-Digest
Perl-Users Digest, Issue: 6705 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jun 18 14:05:48 2004
Date: Fri, 18 Jun 2004 11:05:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 18 Jun 2004 Volume: 10 Number: 6705
Today's topics:
Re: $NF for perl (Anno Siegel)
Re: $NF for perl (Walter Roberson)
Re: $NF for perl (Anno Siegel)
Re: Can't locate package AutoLoader for @File::List::IS <dwall@fastmail.fm>
Re: Encoding question <scobloke2@infotop.co.uk>
Re: Encoding question <flavell@ph.gla.ac.uk>
Re: Encoding question <scobloke2@infotop.co.uk>
Re: Encoding question <flavell@ph.gla.ac.uk>
Help with a "Post" procedure. <jimsimpson@cox.net>
Re: I need help with an 'if statement' in perl (Sam)
Re: Keypress automation? (Walter Roberson)
Re: pattern match problem <nospam@peng.nl>
Re: Perl and Sun Grid Engine (SGE) ctcgag@hotmail.com
regex split on even count <nospam@nospam.net>
Re: regex split on even count <ittyspam@yahoo.com>
Re: regex split on even count <pinyaj@rpi.edu>
text parsing (Shalini Joshi)
while grep filehandle (incognito)
Re: while grep filehandle <ittyspam@yahoo.com>
Re: while grep filehandle <Juha.Laiho@iki.fi>
Re: while grep filehandle <ittyspam@yahoo.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 18 Jun 2004 15:39:18 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: $NF for perl
Message-Id: <cav2b6$m3n$1@mamenchi.zrz.TU-Berlin.DE>
Richard Morse <remorse@partners.org> wrote in comp.lang.perl.misc:
> In article <Pine.A41.4.58.0406051847200.7498@ginger.libs.uga.edu>,
> Brad Baxter <bmb@ginger.libs.uga.edu> wrote:
>
> > And the assignment to $n... provides the scalar context, so scalar() isn't
> > needed. There are good reasons not to use scalar() when it isn't needed.
>
> What would such reasons be? I often use it for clarity's sake even when
> it isn't strictly necessary -- what side effects does it have that might
> be invidious? `perldoc -f scalar` and `perldoc -q scalar` don't seem to
> note any deleterious side effects...
I am strongly opposed to adding stuff to code "for clarity's sake", even
if it provably doesn't hurt.
For one, it doesn't accomplish its purpose. The process of understanding
code is largely understanding why every bit is where it is. If some bits
are there just because the author liked it that way, that makes this
process harder, not easier. That goes for scalar() when context is already
scalar as well as for unnecessary quoting of variables, unnecessary escaping
in strings and regexes, unnecessary variable initializations and a host
of less frequent unnecessarities.
Unnecessary code also tends to diminish the reader's trust in the author.
Your intention may have been clarity, but for all the reader knows you
have been hedging your bets because you weren't sure if a construct was
needed or not. Bad move. You want the reader to believe you know what
you're doing. Doing things you might as well not do is unconvincing in
that respect. Of course, it helps if you, in actual fact, know what
you're doing.
If a bit of code needs additional clarification, add a comment (or, better,
re-write it so that it doesn't). Using redundant code for the purpose
sends out mixed signals.
On a more general note, adding things for clarity means you are trying
to guess what constructs in your code will be difficult for the reader
to understand. That is incredibly hard to do. Programmers have wildly
different educational histories, and what you find hard (or once found
hard) may well be a matter of course for someone else, and vice versa.
Expect your readers to stumble over things you take for granted and apply
casually, not even thinking of clarification.
Anno
------------------------------
Date: 18 Jun 2004 15:55:08 GMT
From: roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)
Subject: Re: $NF for perl
Message-Id: <cav38s$44s$1@canopus.cc.umanitoba.ca>
In article <cav2b6$m3n$1@mamenchi.zrz.TU-Berlin.DE>,
Anno Siegel <anno4000@lublin.zrz.tu-berlin.de> wrote:
:I am strongly opposed to adding stuff to code "for clarity's sake", even
:if it provably doesn't hurt.
I disagree. But then I learned through the school of defensive
programming.
For example, initialize variables even if the algorithm will certainly
set the variable value -- because there might be a bug in the
algorithm, or you might end up being interrupted out of the code, or
some variable might assume an unexpected value due to real-time
interactions ('volatile' in C), or some variable might get clobbered
through an alias or stray or uninitialized pointer. Or because someone else
might come along later and add some code that allows the basic block
to be exitted without the variable having been set according to the
algorithm.
Don't assume your code will be right: assume your code will be wrong
(or will *become* wrong over time... possibly because of changes to the
language spec), and protect yourself against the possibilities.
It's a lot harder to show "proof of correctness" when some variables
might have undefined or indeterminate values, so it's better to
overcode than to rely upon the boundary conditions of the current version
of the language.
--
Warhol's Second Law of Usenet: "In the future, everyone will troll
for 15 minutes."
------------------------------
Date: 18 Jun 2004 16:51:28 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: $NF for perl
Message-Id: <cav6ig$oki$1@mamenchi.zrz.TU-Berlin.DE>
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote in comp.lang.perl.misc:
> In article <cav2b6$m3n$1@mamenchi.zrz.TU-Berlin.DE>,
> Anno Siegel <anno4000@lublin.zrz.tu-berlin.de> wrote:
> :I am strongly opposed to adding stuff to code "for clarity's sake", even
> :if it provably doesn't hurt.
>
> I disagree. But then I learned through the school of defensive
> programming.
>
> For example, initialize variables even if the algorithm will certainly
> set the variable value -- because there might be a bug in the
> algorithm, or you might end up being interrupted out of the code, or
> some variable might assume an unexpected value due to real-time
> interactions ('volatile' in C), or some variable might get clobbered
> through an alias or stray or uninitialized pointer. Or because someone else
> might come along later and add some code that allows the basic block
> to be exitted without the variable having been set according to the
> algorithm.
Let me first point out that I was talking about Perl and Perl variables,
not languages and variables in general. Your arguments above may pertain
to C, but not to Perl. Here, the "uninitialized value" is a clearly
recognizable state of a variable. The question is usually only if
the surrounding code can cope with an undef or not, often a design
decision.
So, in Perl, it makes sense to initialize variables only if undef
isn't good enough. That tends to be rather rare, so it tells the
reader something -- a good thing.
Note also, that I'm not advocating sloppy programming. When I say,
avoid redundant code, I mean code that can go without making a
difference. In Perl, variable initialization happens to be such
code more often than in other languages.
> Don't assume your code will be right: assume your code will be wrong
> (or will *become* wrong over time... possibly because of changes to the
> language spec), and protect yourself against the possibilities.
No disagreement here.
> It's a lot harder to show "proof of correctness" when some variables
> might have undefined or indeterminate values, so it's better to
> overcode than to rely upon the boundary conditions of the current version
> of the language.
Again, in Perl, undef is a perfectly respectable value with predictable
behavior, unlike a random value that is indistinguishable from a legitimate
one. If code and environment go out of sync wrt. undef, you usually see
warnings that weren't there before. Thus, defensive initialization can
mask bugs that would be obvious without it.
Anno
------------------------------
Date: Fri, 18 Jun 2004 15:24:20 -0000
From: "David K. Wall" <dwall@fastmail.fm>
Subject: Re: Can't locate package AutoLoader for @File::List::ISA at...
Message-Id: <Xns950C7405B9F1Fdkwwashere@216.168.3.30>
Shahriar <shahriar_mokhtarzad@pacbell.net> wrote:
> Hi Folks,
>
> I just installed *FILE-LIST* from ASP. I am running ASP (see below
> for version information)
> does any one know about this error:
What error? Ah, ok, I see it now -- in the Subject header.
As Rob/Sisyphus asked, what code are you running? Posting a short,
self-contained program that demonstrates the problem will allow
others to help you.
I'm not sure why you're using File::List, anyway. From looking at
http://search.cpan.org/~dopacki/File-List-0.3.1/ , it appears that it
doesn't do anything that can't easily be handled by File::Find -- and
File::Find is a standard module.
> This is perl, v5.8.3 built for MSWin32-x86-multi-thread
> (with 8 registered patches, see perl -V for more detail)
>
> Copyright 1987-2003, Larry Wall
>
> Binary build 809 provided by ActiveState Corp.
> http://www.ActiveState.com ActiveState is a division of Sophos.
> Built Feb 3 2004 00:28:51
As Sisyphus said, the perl version shouldn't matter.
------------------------------
Date: Fri, 18 Jun 2004 15:32:58 +0000 (UTC)
From: Ian Wilson <scobloke2@infotop.co.uk>
Subject: Re: Encoding question
Message-Id: <cav1va$9gs$1@titan.btinternet.com>
Michael Krueger wrote:
> Hi,
> I have a text based application and want to draw some kind of frame, on
> the screen. OS is Debian/Linux using Perl 5.6
>
> I'm using this code:
>
> -- snip ---
> my $top = chr(201);
> my $bottom = chr(200);
> for (my $i = 0; $i < ($termCols-2); $i++)
> {
> $top .= chr(205);
> $bottom .= chr(205);
> }
> $top .= chr(187);
> $bottom .= chr(188);
>
> $term->Tgoto('cm', 0, 0, *STDOUT);
> print $top;
> for (my $i = 1; $i < ($termRows-1); $i++)
> {
> $term->Tgoto('cm', 0, $i, *STDOUT);
> print chr(186);
> $term->Tgoto('cm', $termCols-1, $i, *STDOUT);
> print chr(186);
> }
> $term->Tgoto('cm', 0, $termRows-2, *STDOUT);
> print $bottom;
> -- snip --
>
> Where $termCols and $termRows are the current terminal lines and columns.
>
> Problem:
> Due to the encoding to latin-1 charset I didn't get the expected
> frame-symbols but some other accentuated(?) chars.
>
> How can I change the encoding that I can use the extended ASCII set, which
> is referred often as the most common e.g. on www.asciitable.com, which
> contains these frame-symbols?
The code set is probably "Code Page 437" variously referred to as
"cp437", "IBM437", "437" etc. There are national variants too which have
some or all of the same line-draw characters but include a few accented
characters or national currency symbols in place of some US characters.
All those line-draw characters are also in Unicode - this and UTF-8 may
be a better option.
See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT
Vim supports editing of unicode characters in UTF-8 files, e.g. ISTR
Control-K dr produces a top-left corner (mnemonic down, right) Control-K
vv produces a vertical-line and so on.
> I'm aware of 'use encoding "..";' but I just can't find the correct table. :(
>
Googling reveals snippets such as
binmode (STDOUT, ':encoding(cp437)');
You need to match encodings with your display device, on a Linux console
you probably need to check the "locale" settings (LANG etc) and some
other stuff.
If using a terminal emulator you need to choose an appropriate font. On
Windows that might be "Terminal" for IBM437 or "Courier New" for Unicode.
------------------------------
Date: Fri, 18 Jun 2004 16:58:56 +0100
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: Encoding question
Message-Id: <Pine.LNX.4.53.0406181656130.15406@ppepc56.ph.gla.ac.uk>
On Fri, 18 Jun 2004, Ian Wilson wrote:
> The code set is probably "Code Page 437" variously referred to as
> "cp437", "IBM437", "437" etc. There are national variants too
Er, excuse me, but cp437 -is- the national (USA) variant. The Latin
multilingual codepage is cp850.
> All those line-draw characters are also in Unicode - this and UTF-8 may
> be a better option.
By now I'm sure that's the best advice, unless there are some special
factors involved.
------------------------------
Date: Fri, 18 Jun 2004 16:41:01 +0000 (UTC)
From: Ian Wilson <scobloke2@infotop.co.uk>
Subject: Re: Encoding question
Message-Id: <cav5ut$h3f$1@hercules.btinternet.com>
Alan J. Flavell wrote:
> On Fri, 18 Jun 2004, Ian Wilson wrote:
>
>
>>The code set is probably "Code Page 437" variously referred to as
>>"cp437", "IBM437", "437" etc. There are national variants too
>
>
> Er, excuse me, but cp437 -is- the national (USA) variant.
Picky, but also wrong :-)
in my post s/there are national/there are other national/
in your post s/the national variant/a national variant/
(at least from where I'm standing, YMMV)
> The Latin multilingual codepage is cp850.
Alright but the OP referred to http://www.asciitable.com/ which shows
CP437.
I haven't checked every codepoint in the bit described as "Extended
ASCII" but point 184 looks to me like 437 rather than 850. I can't say I
like that page much anyhow.
------------------------------
Date: Fri, 18 Jun 2004 18:25:52 +0100
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: Encoding question
Message-Id: <Pine.LNX.4.53.0406181809430.15406@ppepc56.ph.gla.ac.uk>
On Fri, 18 Jun 2004, Ian Wilson wrote:
> >>The code set is probably "Code Page 437" variously referred to as
> >>"cp437", "IBM437", "437" etc. There are national variants too
> >
> > Er, excuse me, but cp437 -is- the national (USA) variant.
>
> Picky, but also wrong :-)
> in my post s/there are national/there are other national/
Fine, I'll go with that...
> in your post s/the national variant/a national variant/
Rather, s/the national (USA) variant/the USA national variant/
, to address your nitpick in the way that I had intended.
Way back (e.g this old MS-DOS 5 manual which I have on the shelf),
cp437 was advertised as the "English" code page; but already by the
time of the public release of Win95 (as opposed to the beta, where I
had chosen to change the codepage to 850 for myself, despite the dire
warnings in the covering notes), MS were setting the DOS codepage as
cp850 for Latin-based locales. As far as I know (though I could be
wrong) they were still setting cp437 in the USA, though.
> > The Latin multilingual codepage is cp850.
>
> Alright but the OP referred to http://www.asciitable.com/ which shows
> CP437.
Sure, I wasn't arguing about that part of the posting.
> I haven't checked every codepoint in the bit described as "Extended
> ASCII"
...a term which always sets off the bogosity alarms. There are
*numerous* 8-bit character codings which contain ASCII as their first
half.
> but point 184 looks to me like 437 rather than 850.
Indeed. The "Extended ASCII" bogon *does* usually refer to cp437 in
my experience.
> I can't say I like that page much anyhow.
Me too neither. For one thing, its claim that "it took a while to get
a single standard for these extra characters" is complete nonsense.
all the best
------------------------------
Date: Fri, 18 Jun 2004 13:04:06 -0400
From: "Jim Simpson" <jimsimpson@cox.net>
Subject: Help with a "Post" procedure.
Message-Id: <wgFAc.791$HN5.60@lakeread06>
I am trying to automate logging in to an HTTPS site which requires a "user
name" and "password". It appears to me that the following code should do the
job - but it does not do it. Can someone help me out on this.
I'm especially concerned about the "post" line. I do not understand what
should be in the places where I have used 'text', 'password' and 'submit'.
All help will be greatly appreciated.
Jim
#########################
#A program to login to a secure site which requires a "user name" and
password".
#Load the source code of the site into a Microsoft Word document.
#Using Windows 98 and ActivePerl v5.8
#########################
use strict;
use Data::Dumper;
use LWP::UserAgent;
use HTTP::Cookies;
use Win32::OLE::Const 'Microsoft Word';
my $https_login = 'url of login sheet sought';
my $https_user = 'my user name';
my $https_pass = 'my password';
#get already active Word application or open new
my $Word = Win32::OLE->GetActiveObject('Word.Application')
||Win32::OLE->new('Word.Application', 'Quit');
my $book = $Word->Documents("PrintOut.doc");
# secure login
my $ua = LWP::UserAgent->new();
$ua->protocols_allowed( [ 'https'] );
$ua->cookie_jar(HTTP::Cookies->new(file => ".cookies.txt", autosave =>
1));
my $response = $ua->post($https_login, [ 'text' => "$https_user",
'password' => "$https_pass", 'submit' => "Log On" ] );
$book->words(1)->{'text'} = Dumper($response);
------------------------------
Date: 18 Jun 2004 08:49:29 -0700
From: samuelvange@cox.net (Sam)
Subject: Re: I need help with an 'if statement' in perl
Message-Id: <dae5ebbf.0406180749.4c78a836@posting.google.com>
I'm having trouble implementing this, where do I put the if
statement?(to Ben: Thanks for all the help. I named that one
'collection' because I need it to display as simply 'collection'
everywhere else, this really is the exception.)
Ben Morrow <usenet@morrow.me.uk> wrote in message news:<care6b$683$1@wisteria.csv.warwick.ac.uk>...
> [please quote properly]
>
> Quoth samuelvange@cox.net (Sam):
> > Ben Morrow <usenet@morrow.me.uk> wrote in message news:<cao96b$ra2$2@wisteria.csv.warwick.ac.uk>...
> > > Quoth samuelvange@cox.net (Sam):
> > > >
> > > > I have a list of items categorized by collection. One of these is
> > > > simply called 'collection' (where as the others have names like
> > > > 'ceramic collection', 'glassware collection', etc.).
> > > >
> > > > *What I need to do is have the drop down menu list all of these
> > > > categories, and for the group that is simply called 'collection', I'd
> > > > like it to display 'Ken Edwards collection'.
> > > >
> > > > @collections=&get_collections;
> > >
> > > You don't need the '&' on that sub call; in fact, it may do things
> > > you're not expecting.
> > >
> > > > foreach $collection (@collections) {print
> > > > "<option>$collection</option>";}
> > >
> > > print '<option>',
> > > (/^collection$/i ? 'Ken Edwards collection' : $_),
> > > '</option>'
> > > for @collections;
> > >
> > > Note that this is a Very Bad Idea: the right answer is to fix whatever
> > > is causing this collection to have the wrong name in the first place.
> >
> > This is the only time I will need to display the name of the
> > collection. Is this solution buggy. Why is it a very bad idea.
>
> It is a bad idea because it is not maintainable: if you're not careful
> you'll end up with hundreds of these special-case replacements all over
> the code, and you'll have no idea which values come from where. You say
> this is the only place you need the name; you cannot be certain there
> will not be others in the future. As I said, the right answer is to
> change get_collections to return the correct data.
>
> Ben
------------------------------
Date: 18 Jun 2004 15:43:34 GMT
From: roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)
Subject: Re: Keypress automation?
Message-Id: <cav2j6$3c4$1@canopus.cc.umanitoba.ca>
In article <caup5r$942$2@wisteria.csv.warwick.ac.uk>,
Ben Morrow <usenet@morrow.me.uk> wrote:
:Quoth sap6210@rit.edu (Asmo):
:> Hi, I'm trying to write some scripts that open up other programs, then
:> send keypresses to them. I'm trying to use this report generation
:> software, and hopefully automate the report generation process, but it
:> prompts for start and end dates for the report, and if I could manage
:> to automate that, it would simplify my life immensely.
:What sort of program is it, and what OS?
:If it's a GUI program, you're out of luck. If it's a a text-mode
:program, and you're on a Unixish OS, then you want Expect.pm.
If it is an X Windows based program, you could probably synthesize
some key presses, and the application wouldn't notice the difference
unless it specifically disabled synthetic keys. If it is X Windows and
the form will always be in the same place, then there are programs out
there to automate this sort of thing.
If it is MS Windows... I don't know. My little- enough- to- be- dangerous
reading would suggest that if you had a service which had been given
permission to interact with the desktop, that you could perhaps
accomplish what you want. It's not too different from what some of the
installers have to do sometimes.
--
When your posts are all alone / and a user's on the phone/
there's one place to check -- / Upstream!
When you're in a hurry / and propagation is a worry/
there's a place you can post -- / Upstream!
------------------------------
Date: Fri, 18 Jun 2004 18:06:57 +0100
From: "Lex" <nospam@peng.nl>
Subject: Re: pattern match problem
Message-Id: <1nFAc.281539$Rc.8308949@news-reader.eresmas.com>
"Matt Garrish" <matthew.garrish@sympatico.ca> wrote in message
news:F2CAc.39052$nY.1219018@news20.bellglobal.com...
>
>
> Ugh, that was just bad on my part, especially since I knew he wanted
> multiple passes to clear them out. I ran it with <br /> tags before
> modifying the expression, which is why it looked like it wasn't working (I
> was just going to make mention of the html formatting, because there's
even
> less of a point in using a regex if you aren't going to make sure you
> capture as many oddities as you can).
>
Well, I know for sure they'll be just <br> and nothing else, they're put
there by my script earlier, replacing \n in plain text you see...
But I'e got it working thanks to you all.
Lex
------------------------------
Date: 18 Jun 2004 17:09:25 GMT
From: ctcgag@hotmail.com
Subject: Re: Perl and Sun Grid Engine (SGE)
Message-Id: <20040618130925.575$cn@newsreader.com>
lxh4info@yahoo.com (Leo) wrote:
> Dear all,
>
> I'm writing code to run on a cluster machine. Basically I need to run
> a Perl program 200 times and each time generate a line of the output.
> I use a C-shell script to call the perl progam and distribute it
> through Sun Grid Engine(SGE) to the nodes of the cluster (because SGE
> only takes shell script). My have two implementations: (1) write the
> output into different files (2) append the output to a single file.
Use implementation 1.
> Obviously the first approach is not efficient.
What about it is not efficient? If you had used that method, you would
be done by now, rather than rooting around in here for help. I'd call
that *more* efficient. :)
> But for the second one,
> I've only got less than 200 results(120-180 output lines). My
> questions are: (1) is this the problem with the deadlock when writing
> files?
You haven't given us enough information to tell.
> if so, how perl handle this issue?
Show us the few lines of your code that pertain to this issue, and we
may be able to tell you what the problem is.
> (like the one in C, means I
> have to write my own code to deal with the situation?)
How do you deal with it in C? If you tell us, we may be able to
help you translate that into Perl. If you don't, we can't.
> (2) Or SGE has
> the ability to not allow the file-writing at the same time?
Assuming you handle the files in Perl, then it is really none of
SGE's business to allow or disallow. But I do notice when it is
SGE's business to handle I/O (i.e. the shell's stdout and stderr),
it does so by directing them into different files.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
------------------------------
Date: Fri, 18 Jun 2004 16:17:02 GMT
From: "Jeff Thies" <nospam@nospam.net>
Subject: regex split on even count
Message-Id: <2AEAc.5184$bs4.2858@newsread3.news.atl.earthlink.net>
I'm just trying to pull rows from a csv file (I don't want to load a CSV
module). The trouble with this is that linebreaks may be embedded in the row
(in between pairs of double quotes).
So to find if a line break is the end of a row, or just embedded in a row
you need to count the double quotes so far in that row and see if they are
paired (odd or even).
How best to do this?
Jeff
------------------------------
Date: Fri, 18 Jun 2004 12:32:15 -0400
From: Paul Lalli <ittyspam@yahoo.com>
Subject: Re: regex split on even count
Message-Id: <20040618123059.R23512@dishwasher.cs.rpi.edu>
On Fri, 18 Jun 2004, Jeff Thies wrote:
> I'm just trying to pull rows from a csv file (I don't want to load a CSV
> module). The trouble with this is that linebreaks may be embedded in the row
> (in between pairs of double quotes).
>
> So to find if a line break is the end of a row, or just embedded in a row
> you need to count the double quotes so far in that row and see if they are
> paired (odd or even).
>
> How best to do this?
How best to count a single character in a string is answered in the FAQ:
perldoc -q count
How best to actually accomplish your goal of reading rows is to use a CSV
module. What is your reasoning for not wanting to do this?
Paul Lalli
------------------------------
Date: Fri, 18 Jun 2004 14:02:43 -0400
From: Jeff 'japhy' Pinyan <pinyaj@rpi.edu>
Subject: Re: regex split on even count
Message-Id: <Pine.SGI.3.96.1040618135955.336102A-100000@vcmr-64.server.rpi.edu>
On Fri, 18 Jun 2004, Jeff Thies wrote:
>I'm just trying to pull rows from a csv file (I don't want to load a CSV
>module). The trouble with this is that linebreaks may be embedded in the row
>(in between pairs of double quotes).
If you were using a CSV module, you probably wouldn't have this problem.
>So to find if a line break is the end of a row, or just embedded in a row
>you need to count the double quotes so far in that row and see if they are
>paired (odd or even).
I'll help you solve this general case, though. Assuming that your data
has no BACKSLASHED "'s in it, you can do:
while (<CSV_FILE>) {
my $q_count = tr/"//;
# if it's odd, that means a quoted string was interrupted by
# a quote, so append the next line to this one, and redo the loop
$_ .= <CSV_FILE>, redo if $q_count % 2;
my @fields = process($_);
}
That can be written more tightly:
$_ .= <CSV_FILE> while tr/"// % 2;
--
Jeff Pinyan RPI Acacia Brother #734 RPI Acacia Corp Secretary
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
------------------------------
Date: 18 Jun 2004 10:43:40 -0700
From: shalinij1@yahoo.com (Shalini Joshi)
Subject: text parsing
Message-Id: <283d6b7e.0406180943.650179d2@posting.google.com>
Hi!
I am relatively new to perl and am looking to parse a non-delimited
text file. What I would like to do is out of this file of records
which always begin with 'FPR' and could span multiple lines, extract
only some relevant records.
The criterion is the number denoted by characters 7 through 18..I have
a vague idea how to go about parsing this, but with so much
information(on the website and the other postings on the group) it's
kind of confusing what the best way to do it is..I am initially
interested in just getting the script to work...
I would just like to extract this info and create a newfile where i
would store it in the same format..Is there any way I can do it? Or
would I necessarily have to parse the info into an array or somethng
before i dump it into the new file?
Thanks and looking forward to any kind of help and tips.
--Shalini
------------------------------
Date: 18 Jun 2004 09:35:12 -0700
From: jgilber@yahoo.com (incognito)
Subject: while grep filehandle
Message-Id: <b5d78542.0406180835.bf830e3@posting.google.com>
Unix command line:
grep Results File.txt
Returns lots of lines:
Results = 1
Results = 11
Results = 2
etc, etc, etc.
Perl script:
#!/usr/bin/perl
open (IN, "< File.txt");
while ( grep /Results/, <IN> ) {
print "$_\n";
}
close (IN);
Returns nothing. Why?
------------------------------
Date: Fri, 18 Jun 2004 12:49:38 -0400
From: Paul Lalli <ittyspam@yahoo.com>
Subject: Re: while grep filehandle
Message-Id: <20040618124216.A23512@dishwasher.cs.rpi.edu>
On Fri, 18 Jun 2004, incognito wrote:
> Unix command line:
>
> grep Results File.txt
>
> Returns lots of lines:
>
> Results = 1
> Results = 11
> Results = 2
>
> etc, etc, etc.
>
> Perl script:
>
> #!/usr/bin/perl
>
> open (IN, "< File.txt");
>
> while ( grep /Results/, <IN> ) {
> print "$_\n";
> }
>
> close (IN);
>
> Returns nothing. Why?
>
Basically because you're not using grep correctly. You're using grep in
the condition to a while. That means you're using it in scalar context.
In a scalar context, grep returns the number of times its condition
statement (in this case: /Results/ ) returned true. $_ is never set
within the while loop.
What you want is to return the lines that grep matched correctly:
my @lines = grep /Results/, <IN>;
print @lines;
Or even just:
print grep /Results/, <IN>;
If you're going to use a while loop, you shouldn't be using grep:
while (<IN>){
print if /Results/;
}
Or you could just simplify this into a oneliner:
perl -ne 'print if /Results/' File.txt
By the way, I'm pretty sure if you had enabled warnings, you would have
gotten an uninitialize variable warning for the $_ inside the while loop.
That would have given you a clue as to what was going wrong. Please
enable warnings in the future before posting.
Paul Lalli
------------------------------
Date: Fri, 18 Jun 2004 17:27:03 GMT
From: Juha Laiho <Juha.Laiho@iki.fi>
Subject: Re: while grep filehandle
Message-Id: <cav8ev$h59$1@ichaos.ichaos-int>
jgilber@yahoo.com (incognito) said:
>#!/usr/bin/perl
>
>open (IN, "< File.txt");
>
> while ( grep /Results/, <IN> ) {
> print "$_\n";
> }
>
>close (IN);
>
>Returns nothing. Why?
My brain fails as to why the above doesn't work, but I tried the
following and it looks ok (though didn't test with a big input
file):
#!/usr/bin/perl
use warnings;
use strict;
open (IN, "< File.txt");
print grep /Results/, <IN>;
close (IN);
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)
------------------------------
Date: Fri, 18 Jun 2004 13:38:47 -0400
From: Paul Lalli <ittyspam@yahoo.com>
Subject: Re: while grep filehandle
Message-Id: <20040618133609.U23512@dishwasher.cs.rpi.edu>
On Fri, 18 Jun 2004, Juha Laiho wrote:
> jgilber@yahoo.com (incognito) said:
> >#!/usr/bin/perl
> >
> >open (IN, "< File.txt");
> >
> > while ( grep /Results/, <IN> ) {
> > print "$_\n";
> > }
> >
> >close (IN);
> >
> >Returns nothing. Why?
>
> My brain fails as to why the above doesn't work,
read perldoc -f grep to understand what grep does in a scalar context.
> but I tried the
> following and it looks ok (though didn't test with a big input
> file):
>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> open (IN, "< File.txt");
> print grep /Results/, <IN>;
> close (IN);
This is correct, because the print unary operator takes a list, so grep
here is being called in a list context.
Note that this will be a memory hog for large files, because all lines
will be read and stored, rather than one at a time. If this is a concern,
it is better written as:
while (<IN>){
print if /Results/;
}
See my other post in this thread for the method of doing this in a perl
one-liner.
Paul Lalli
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6705
***************************************