[25089] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 7339 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Oct 29 11:05:41 2004

Date: Fri, 29 Oct 2004 08:05:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 29 Oct 2004     Volume: 10 Number: 7339

Today's topics:
    Re: Common file operations <bik.mido@tiscalinet.it>
    Re: How to handle large variable <do-not-use@invalid.net>
    Re: How to handle large variable (Anno Siegel)
    Re: How to handle large variable <do-not-use@invalid.net>
    Re: How to handle large variable <tadmc@augustmail.com>
    Re: IDEs <jwillmore@adelphia.net>
    Re: IDEs <mjcarman@mchsi.com>
    Re: modifying hash key (dispatch table) <tadmc@augustmail.com>
    Re: Net::POP3 Install (Anno Siegel)
    Re: OT: perl errors <matthew.garrish@sympatico.ca>
    Re: OT: perl errors <jeff@spamalanadingong.com>
    Re: OT: perl errors <jurgenex@hotmail.com>
        Parsing 'dirty/corrupt data'. Advice wanted burlo_stumproot@yahoo.se
    Re: Parsing 'dirty/corrupt data'. Advice wanted (Anno Siegel)
    Re: Parsing 'dirty/corrupt data'. Advice wanted <shamus@hushmail.com>
    Re: Parsing 'dirty/corrupt data'. Advice wanted (Anno Siegel)
    Re: Parsing 'dirty/corrupt data'. Advice wanted (replace z with h, spam protection)
    Re: Parsing 'dirty/corrupt data'. Advice wanted <jwillmore@adelphia.net>
    Re: Parsing 'dirty/corrupt data'. Advice wanted <tadmc@augustmail.com>
    Re: Parsing 'dirty/corrupt data'. Advice wanted <flavell@ph.gla.ac.uk>
    Re: Should I use BEGIN, CHECK, or INIT? <bik.mido@tiscalinet.it>
    Re: Should I use BEGIN, CHECK, or INIT? <someone@example.com>
        speeding up perl script execution under apache <_nospam_stigerikson@yahoo.se>
    Re: speeding up perl script execution under apache <1usa@llenroc.ude.invalid>
    Re: speeding up perl script execution under apache <noreply@gunnar.cc>
    Re: speeding up perl script execution under apache <tim@vegeta.ath.cx>
    Re: using Win32::ODBC - what's fast? <noemail@#$&&!.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 29 Oct 2004 13:13:33 +0200
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Common file operations
Message-Id: <o764o0dur8t32643jmjjnqqoccncp37k0a@4ax.com>

On Thu, 28 Oct 2004 22:07:54 +0100, Ben Morrow <usenet@morrow.me.uk>
wrote:

>> >Then I suggest you do something like
>> >  s/^'//, s/'$// for $dir;
>> 
>> Thanks. In this case efficiency is irelevant, but if I need to do
>> something similar inside a loop, is the clear version as fast as the
>> other?
>
>Which are you calling the clear version? I (and most Perl programmers)
>would call Michele's clearer than yours.

I think it's clear enough he means "mine". Only he's concerned about
possible efficiency issues.


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: 29 Oct 2004 14:26:48 +0200
From: Arndt Jonasson <do-not-use@invalid.net>
Subject: Re: How to handle large variable
Message-Id: <yzdwtx91zhz.fsf@invalid.net>


Ben Morrow <usenet@morrow.me.uk> writes:
> > #$file = join '',<IN>;
> 
> The usual idiom for this is
> 
> $file = do {local $/; <IN>};

Newbie question: in what way is the above better than
        $file = <IN>;
?


------------------------------

Date: 29 Oct 2004 12:29:33 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: How to handle large variable
Message-Id: <cltd3d$c1i$4@mamenchi.zrz.TU-Berlin.DE>

Arndt Jonasson  <do-not-use@invalid.net> wrote in comp.lang.perl.misc:
> 
> Ben Morrow <usenet@morrow.me.uk> writes:
> > > #$file = join '',<IN>;
> > 
> > The usual idiom for this is
> > 
> > $file = do {local $/; <IN>};
> 
> Newbie question: in what way is the above better than
>         $file = <IN>;

It isn't better, it's entirely different.  "$file = <IN>" reads one
line from the file, "$file = do {local $/; <IN>}" reads them all.

Anno


------------------------------

Date: 29 Oct 2004 14:50:06 +0200
From: Arndt Jonasson <do-not-use@invalid.net>
Subject: Re: How to handle large variable
Message-Id: <yzdsm7x1yf5.fsf@invalid.net>


anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) writes:
> Arndt Jonasson  <do-not-use@invalid.net> wrote in comp.lang.perl.misc:
> > 
> > Ben Morrow <usenet@morrow.me.uk> writes:
> > > > #$file = join '',<IN>;
> > > 
> > > The usual idiom for this is
> > > 
> > > $file = do {local $/; <IN>};
> > 
> > Newbie question: in what way is the above better than
> >         $file = <IN>;
> 
> It isn't better, it's entirely different.  "$file = <IN>" reads one
> line from the file, "$file = do {local $/; <IN>}" reads them all.

Silly me. 1) I didn't try it out, because I thought I knew what it would
do; 2) for some reason I thought that localized variables keep their
old values, instead of getting "undef". Thanks.

For those who still wonder: $/ (the input line separator) gets set
(locally) to nothing, so the whole file will be considered as one line
when doing <IN>.


------------------------------

Date: Fri, 29 Oct 2004 09:33:42 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: How to handle large variable
Message-Id: <slrnco4l66.153.tadmc@magna.augustmail.com>

Arndt Jonasson <do-not-use@invalid.net> wrote:


> For those who still wonder: $/ (the input line separator) gets set
                                            ^^^^
> (locally) to nothing, so the whole file will be considered as one line
               ^^^^^^^                                              ^^^^
> when doing <IN>.


Let's not be so loose with terminology, it can lead to confusion...


Calling a thing that is not necessarily a line "line" is a Bad Idea.

A "line" has one \n in it, and it is at the end.

The name of $/ is "input *record* separator", probably for
that very reason.  :-)

$/ does not get set to nothing (whatever that means), it
gets set to undef.

 ... so the whole file will be considered as one *record* when doing <IN>.



-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Fri, 29 Oct 2004 08:05:25 -0400
From: James Willmore <jwillmore@adelphia.net>
Subject: Re: IDEs
Message-Id: <V_SdnarV_9B6rB_cRVn-gA@adelphia.com>

daniel kaplan wrote:
<snip>

> Oh, I don't know how or who keeps up the FAQ, but at least two of the links
> to IDEs found via perldoc -q "IDE" were links that were either dead or
> didn't take me to the IDEs they claimed they would...

There's a FAQ for that :-) 'perldoc perlfaq`

HTH

Jim


------------------------------

Date: Fri, 29 Oct 2004 08:43:12 -0500
From: Michael Carman <mjcarman@mchsi.com>
Subject: Re: IDEs
Message-Id: <clthdk$9422@onews.rockwellcollins.com>

daniel kaplan wrote:
> I gave up on Open-Perl-IDE and went back to the AS Komodo IDE.  O.P.
> just seemed to give me so many headaches related to the modules (in
> my USE statements) before exectuing my first line of code.

How does your *editor* cause problems with 'use' statements? (rhetorical)

There are almost as many opinions as there are programmers, but I'll
toss in a recommendation anyway: UltraEdit. I tend to program in a few
different languages at any given time, so I'm not a fan of IDEs that
specialize in a particular language. I prefer a good text editor that's
customizable and extensible.

UltraEdit isn't free, but it's very inexpensive. It takes a little
tweaking, but it can be made into a very good environment. (tweak the
syntax highlighting, add in ctags, add some custom commands for checking
syntax, running, debugging, checking in/out of source control, etc.)

-mjc


------------------------------

Date: Fri, 29 Oct 2004 09:24:02 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: modifying hash key (dispatch table)
Message-Id: <slrnco4kk2.153.tadmc@magna.augustmail.com>

Arndt Jonasson <do-not-use@invalid.net> wrote:
> 
> Tad McClellan <tadmc@augustmail.com> writes:
>> Jeff Thies <jeff@spamalanadingong.com> wrote:
>> >>>my $sub_ref=$DISPATCH{modify_hash_key};
>> >>>&$sub_ref($hashref);
>> 
>> BTW: I would much prefer:
>> 
>>    $sub_ref->($hashref);
>> 
>> for code dereferencing or, even "better":
>> 
>>    $DISPATCH{modify_hash_key}->($hashref); # look Ma! No temp variable!
> 
> Why are the versions with "->" preferable? Is it purely a matter of style?


Yes.

My personal style is to never use ampersands on sub calls, whether
called direct or via a coderef, I always use parens on sub calls 
instead (even when they take no args).


The 2nd one has the added benefit of eliminating a temporary variable.

What's preferable about eliminating temp vars IMO:

   1) I can't type the wrong variable name
      (ie. less chance for me to insert a bug)

   2) I don't have to use up one of my (human) memory slots with
      "I'm using $sub_ref for something, don't use it for anything else"


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 29 Oct 2004 12:26:16 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Net::POP3 Install
Message-Id: <cltct8$c1i$3@mamenchi.zrz.TU-Berlin.DE>

daniel kaplan <nospam@nospam.com> wrote in comp.lang.perl.misc:

No attribution (again).  This was in reply to Jürgen Exner.

> > Who terminates whom? What do you mean by 'terminate'. Be precise in what
> you
> > are saying. We have no way of knowing what you were thinking when you
> wrote
> > this unless you tell us.
> 
> ok, first off, enough....if you want to reply and inset a snap at the same
> time, save us both the headache.  if no one answer, then fine as well....

No Sir.  You don't get to say what kind of replies you prefer.  As
long as you keep wasting everybody's time with badly posed and inane
questions, you're going to be criticized for it.  By the few that still
read your stuff, that is...

Anno


------------------------------

Date: Fri, 29 Oct 2004 09:20:17 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: OT: perl errors
Message-Id: <isrgd.27224$rs5.1067139@news20.bellglobal.com>


"James Willmore" <jwillmore@adelphia.net> wrote in message 
news:r46dnZdeIIXUXhzcRVn-hw@adelphia.com...
> Jeff Thies wrote:
>>
>> use CGI::Carp 'fatalsToBrowser';
>>
>
> #you forgot the 'qw' in the original script
> use CGI::Carp qw/fatalsToBrowser/;
>

Which isn't going to make any difference...

Matt 




------------------------------

Date: Fri, 29 Oct 2004 13:42:17 GMT
From: Jeff Thies <jeff@spamalanadingong.com>
Subject: Re: OT: perl errors
Message-Id: <ZMrgd.13497$ta5.12572@newsread3.news.atl.earthlink.net>

Paul Lalli wrote:
> Laura wrote:
> 
>> Paul Lalli wrote:
>>
>>> Laura wrote:
>>>
>>>> Jeff Thies wrote:
>>>>
>>>>
>>>>>  I'm used to doing something like this when I'm debugging scripts:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>>
>>>>> use CGI::Carp 'fatalsToBrowser';
>>>>>
>>>>> my $a
>>>>>
>>>>> use NO_MODULE_FOUND;
>>>>>
>>>>> I would get explicit error messages about the missing semicolon and 
>>>>> that
>>>>> perl couldn't find the module in @INC.
>>>>>
>>>>> The web server (perl 5.8, Enterprise Redhat,Apache/2.0.46) I'm running
>>>>> now does not throw those nice explicit errors.  What I get now is 
>>>>> just a
>>>>> plain 500 server error.
>>>>
>>>>
>>>>
>>>> That means that the problem is probably in your HTML and not your Perl
>>>> script.
>>>
>>>
>>> Uhm.  Huh?  What HTML?  The above program didn't generate any HTML at
>>> all.  Obviously there is 'a problem' with the Perl script - that's the
>>> whole point.  However, 'the problem' - that is, that the browser is not
>>> showing the compilation errors - has nothing to do with the Perl script,
>>> and certainly nothing to do with any non-existent HTML. It has to do
>>> with either his server configuration and/or the permissions set on the
>>> file or directory.
>>>
>>>
>>>
>>>> You could look at the server log or try this:  Push your cgi
>>>> function statements into an array and then print them out all together.
>>>> This way you can also include a statement to save them to a file and 
>>>> look
>>>> at the bad html that led to the 500 error.
>>>
>>>
>>> "Bad HTML" does not lead to a 500 error.  This is nonsensical.  A 500
>>> is, by definition "An Internal Server Error".  There is no HTML
>>> involved.  Indeed, if the program manages to correctly output HTML,
>>> chances are the server did not encounter errors executing the program.
>>> "Bad HTML" will simply cause the browser to not render the webpage
>>> correctly.
>>>
>>> All of this is irrelevant, of course, because again, the program listed
>>> above did not create any HTML.
>>>
>>> The only correct solution to this problem is to view the errors stored
>>> in the OP's server log.
>>>
>>
>> Oh, I assumed that maybe he cut out some CGI.pm generated HTML.  I may be
>> wrong, but I think that if the header is not formed properly, you can get
>> the 500 internal server error.  For example, if you forget:
>>
>> print header();
>>
>> and then you print anything at all in a cgi program, the server may 
>> have a
>> problem with the header being absent or wrong.  I am only proposing 
>> this as
>> a possibility because I had the same exact problem when I tried to do my
>> header by hand and when I went with header(), the 500 error went away.
> 
> 
> Failing to print an HTTP header is indeed a possible cause of a HTTP 500 
> error.  However, 1) CGI::Carp prints its own header when fatalsToBrowser 
> is invoked, to avoid such a scenario;  2) the HTTP header is not HTML; 
> 3) The OP said  he was getting this error with just the smalle example 
> he posted above.

Well, I love fatalsToBrowser during developement and I was just trying 
to create some compilation errors for the example.

BTW, I had tried print a header right after the shebang line, but it 
made no difference, as it should have.

Note my reply to Gunnar. There is a bug in Carp 1.24 where it works with 
software errors but not compilation errors, at least not in my 
environment. I've read the reason for that, but it makes little sense to me!

   Cheers,
Jeff

> 
> Paul Lalli


------------------------------

Date: Fri, 29 Oct 2004 14:20:02 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: OT: perl errors
Message-Id: <mksgd.123$KL4.18@trnddc07>

Jeff Thies wrote:
> Note my reply to Gunnar. There is a bug in Carp 1.24 where it works
> with software errors but not compilation errors, at least not in my
> environment. I've read the reason for that, but it makes little sense

But who would deploy a CGI script to a web server if the script  doesn't 
even pass a simple "perl -c"?
I recognize that there are scenarios where it would be too difficult to 
duplicate a complex web service environment with databases etc. locally and 
where you have to test on the server. But there is no reason no to compile 
the script locally before uploading it.

jue 




------------------------------

Date: Fri, 29 Oct 2004 10:50:49 GMT
From: burlo_stumproot@yahoo.se
Subject: Parsing 'dirty/corrupt data'. Advice wanted
Message-Id: <uacu5u7at.fsf@notvalid.se>



I'm finding myself in a position where I have to extract data from a
file possibly filled with a lot of other junk of unknown length and
format.

The data has a strict format, a header line followed by lines of data
that goes on for a fixed number of lines in some cases and in other
cases until the next header line.

My problem is that the data can at any point contain one or more lines
with junk/data I dont want. It looks like the data is collected from
an output device that listens to more than one application. (And I
cant do anything about that). Some (or most) of the junk can be easily
identified as such and can be removed but how to deal with the rest?


Im not looking for code examples but rather advice on how to solve a
problem like this in a robust and secure way.


Currently I'm doing multiple passes over the data removing the obvious
junk first. I then try to piece together the data by looking ahead in
the file (if I dont find what I expect) trying to find a line that
matches the line I want. It works most of the time but I'm conserned
about the validity of the data and would of cource want it to work all
the time.

Another problem is that I dont know how much data I will recieve in
one file so it's hard to know if I missed anything.


Some short data examples:

<example> # Can't know how many lines this block will contain
0000 TFS001 

000 TERM 00000 0000001 00001 00000 0000043 00053 S
005 TERM 00000 0000000 00000 00000 0000000 00000 
006 TDMF 00000 0000000 00000 00000 0000048 01305 
007 CONF 00000 0000000 00000 00000 0000000 00000 
009 TERM 00000 0000000 00000 00000 0000005 00006 
PRI265 DCH: 9 DATA: Q+P NOXLAN 47000 99000 0 
010 TERM 00000 0000001 00002 00000 0000107 00120 
021 TDMF 00000 0000000 00000 00000 0000040 00797 
022 CONF 00000 0000000 00000 00000 0000000 00004 
TRK136 93 11 

023 TERM 00000 0000001 00002 00000 0000041 00041 S CARR
024 TERM 00000 0000000 00000 00000 0000007 00006 
</example>


<example> # Block is 9 lines, line nr of data added, the rest is junk
1: 030   RAN 
2: 
3:   00002    00002  
BUG440 
BUG440 : 00AC76B2 00001002 00008018 00004913 0000 19 0001 001
000 0 73168 000020A5 00006137 00000008 00000000 0000 0001 000 
BUG440 +   0471C390 044C8418 044C5340 044C5016 04366226 
    <<<<  Here there can be many more lines like these >>>>
BUG440 + 04365EB2 04365E10 0435E0A8 04B486AA 04B4837A 
BUG440 + 04B48306 

4: 
5: 0000000    00000  
6: 0000000    00003  
7:   00000    00000  
8:   00000  
9: 0000000    00000  
</example>


In one file I found what appears to be a login session complete with
commands and output. *sigh*



Any help, pointers, reading suggestions???

/PM
From adress valid but rarly read.


------------------------------

Date: 29 Oct 2004 11:19:00 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Parsing 'dirty/corrupt data'. Advice wanted
Message-Id: <clt8v4$c1i$1@mamenchi.zrz.TU-Berlin.DE>

 <burlo_stumproot@yahoo.se> wrote in comp.lang.perl.misc:
> 
> 
> I'm finding myself in a position where I have to extract data from a
> file possibly filled with a lot of other junk of unknown length and
> format.
> 
> The data has a strict format, a header line followed by lines of data
> that goes on for a fixed number of lines in some cases and in other
> cases until the next header line.
> 
> My problem is that the data can at any point contain one or more lines
> with junk/data I dont want. It looks like the data is collected from
> an output device that listens to more than one application. (And I
> cant do anything about that). Some (or most) of the junk can be easily
> identified as such and can be removed but how to deal with the rest?

Are you sure that only complete lines can intervene?  In general,
one process can overwrite parts of what another process writes to
the same file.

> Im not looking for code examples but rather advice on how to solve a
> problem like this in a robust and secure way.

That makes it a non-perl question.

There is no such way.  The intervening junk could happen to look exactly
like a valid line of data.  If you don't have means to check the validity
of a data block you (think you) received, you'll never know.

I snipped your example data below.  Since you haven't explained how
to tell valid lines from intervening ones, there is nothing we can
learn from it.

If you can control the output of "good" data, you could add line counts
or checksums and other means of insuring data integrity.  That way
you would at least *know* if data is corrupted.

If you can't control the output, reasonable data processing is impossible
in that environment.

Anno


------------------------------

Date: Fri, 29 Oct 2004 21:34:18 +1000
From: "Lord Ireland" <shamus@hushmail.com>
Subject: Re: Parsing 'dirty/corrupt data'. Advice wanted
Message-Id: <41822abc_1@news.iprimus.com.au>


<burlo_stumproot@yahoo.se> wrote in message
news:uacu5u7at.fsf@notvalid.se...
>
>
> I'm finding myself in a position where I have to extract data from a
> file possibly filled with a lot of other junk of unknown length and
> format.
>
> The data has a strict format, a header line followed by lines of data
> that goes on for a fixed number of lines in some cases and in other
> cases until the next header line.
>
> My problem is that the data can at any point contain one or more lines
> with junk/data I dont want. It looks like the data is collected from
> an output device that listens to more than one application. (And I
> cant do anything about that). Some (or most) of the junk can be easily
> identified as such and can be removed but how to deal with the rest?
>
>
> Im not looking for code examples but rather advice on how to solve a
> problem like this in a robust and secure way.
>
>
> Currently I'm doing multiple passes over the data removing the obvious
> junk first. I then try to piece together the data by looking ahead in
> the file (if I dont find what I expect) trying to find a line that
> matches the line I want. It works most of the time but I'm conserned
> about the validity of the data and would of cource want it to work all
> the time.
>
> Another problem is that I dont know how much data I will recieve in
> one file so it's hard to know if I missed anything.
>
>
> Some short data examples:
>
> <example> # Can't know how many lines this block will contain
> 0000 TFS001
>
> 000 TERM 00000 0000001 00001 00000 0000043 00053 S
> 005 TERM 00000 0000000 00000 00000 0000000 00000
> 006 TDMF 00000 0000000 00000 00000 0000048 01305
> 007 CONF 00000 0000000 00000 00000 0000000 00000
> 009 TERM 00000 0000000 00000 00000 0000005 00006
> PRI265 DCH: 9 DATA: Q+P NOXLAN 47000 99000 0
> 010 TERM 00000 0000001 00002 00000 0000107 00120
> 021 TDMF 00000 0000000 00000 00000 0000040 00797
> 022 CONF 00000 0000000 00000 00000 0000000 00004
> TRK136 93 11
>
> 023 TERM 00000 0000001 00002 00000 0000041 00041 S CARR
> 024 TERM 00000 0000000 00000 00000 0000007 00006
> </example>
>
>
> <example> # Block is 9 lines, line nr of data added, the rest is junk
> 1: 030   RAN
> 2:
> 3:   00002    00002
> BUG440
> BUG440 : 00AC76B2 00001002 00008018 00004913 0000 19 0001 001
> 000 0 73168 000020A5 00006137 00000008 00000000 0000 0001 000
> BUG440 +   0471C390 044C8418 044C5340 044C5016 04366226
>     <<<<  Here there can be many more lines like these >>>>
> BUG440 + 04365EB2 04365E10 0435E0A8 04B486AA 04B4837A
> BUG440 + 04B48306
>
> 4:
> 5: 0000000    00000
> 6: 0000000    00003
> 7:   00000    00000
> 8:   00000
> 9: 0000000    00000
> </example>
>
>
> In one file I found what appears to be a login session complete with
> commands and output. *sigh*
>
>
>
> Any help, pointers, reading suggestions???
>
> /PM
> From adress valid but rarly read.

Look mate, using perl for this is an act of insanity.  Visual Basic is much
better at this kind of stuff - here's a helpful article on how to purchase
the 2003 version.

http://msdn.microsoft.com/howtobuy/vbasic/default.aspx





------------------------------

Date: 29 Oct 2004 11:42:52 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Parsing 'dirty/corrupt data'. Advice wanted
Message-Id: <cltabs$c1i$2@mamenchi.zrz.TU-Berlin.DE>

Lord Ireland <shamus@hushmail.com> wrote in comp.lang.perl.misc:
> 
> <burlo_stumproot@yahoo.se> wrote in message
> news:uacu5u7at.fsf@notvalid.se...
> >
> >
> > I'm finding myself in a position where I have to extract data from a
> > file possibly filled with a lot of other junk of unknown length and
> > format.

[...]

> Look mate, using perl for this is an act of insanity.  Visual Basic is much
> better at this kind of stuff - here's a helpful article on how to purchase
> the 2003 version.

Care to explain how VB can solve the problem but Perl can't?  And why
you had to quote the entire article just to add this nonsense?

Anno


------------------------------

Date: Fri, 29 Oct 2004 13:48:35 +0200
From: "D. Marxsen" <detlef.marxsen@tdds-gmbz.de (replace z with h, spam protection)>
Subject: Re: Parsing 'dirty/corrupt data'. Advice wanted
Message-Id: <cltb3e$k41$1@news1.transmedia.de>

<burlo_stumproot@yahoo.se> schrieb im Newsbeitrag
news:uacu5u7at.fsf@notvalid.se...

> The data has a strict format, a header line followed by lines of data
> that goes on for a fixed number of lines in some cases and in other
> cases until the next header line.

Maybe you can give some more precise info how you recognise a valid header
or data line (x alphas here, x nums there, x blocks of y nums here, etc.).
This may help to find a regexp which weeds out non-matching lines.


Cheers,
Detlef.


--
D. Marxsen, TD&DS GmbH
detlef.marxsen@tdds-gmbz.de (replace z with h, spam protection)




------------------------------

Date: Fri, 29 Oct 2004 08:01:59 -0400
From: James Willmore <jwillmore@adelphia.net>
Subject: Re: Parsing 'dirty/corrupt data'. Advice wanted
Message-Id: <V_SdnavV_9CBrB_cRVn-gA@adelphia.com>

burlo_stumproot@yahoo.se wrote:
<snip>
> <example> # Block is 9 lines, line nr of data added, the rest is junk
> 1: 030   RAN 
> 2: 
> 3:   00002    00002  
> BUG440 
> BUG440 : 00AC76B2 00001002 00008018 00004913 0000 19 0001 001
> 000 0 73168 000020A5 00006137 00000008 00000000 0000 0001 000 
> BUG440 +   0471C390 044C8418 044C5340 044C5016 04366226 
>     <<<<  Here there can be many more lines like these >>>>
> BUG440 + 04365EB2 04365E10 0435E0A8 04B486AA 04B4837A 
> BUG440 + 04B48306 
> 
> 4: 
> 5: 0000000    00000  
> 6: 0000000    00003  
> 7:   00000    00000  
> 8:   00000  
> 9: 0000000    00000  
> </example>
> 
> 
> In one file I found what appears to be a login session complete with
> commands and output. *sigh*
> 
> 
> 
> Any help, pointers, reading suggestions???

Know your data.  Know why one line is valid and another isn't.  The 
data may appear to have no "logic" or "pattern" to it, but it's 
there somewhere.

First place I might start is either split the line on whitespace or 
use unpack to get at least the first column.  Then start testing for 
  what is requires for a valid line.  That's at first glance and 
without having any clue as to what the data is supposed to be/represent.

HTH

Jim


------------------------------

Date: Fri, 29 Oct 2004 09:38:42 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Parsing 'dirty/corrupt data'. Advice wanted
Message-Id: <slrnco4lfi.153.tadmc@magna.augustmail.com>

Lord Ireland <shamus@hushmail.com> wrote:
><burlo_stumproot@yahoo.se> wrote in message
> news:uacu5u7at.fsf@notvalid.se...
>>
>>
>> I'm finding myself in a position where I have to extract data from a
>> file possibly filled with a lot of other junk of unknown length and
>> format.


> Look mate, using perl for this is an act of insanity.  


Why is that?


> Visual Basic is much
> better at this kind of stuff - here's a helpful article on how to purchase
> the 2003 version.


You should put in a smiley when you make a joke.

Otherwise people might think you are being serious.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Fri, 29 Oct 2004 15:57:40 +0100
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: Parsing 'dirty/corrupt data'. Advice wanted
Message-Id: <Pine.LNX.4.61.0410291554590.20620@ppepc56.ph.gla.ac.uk>

On Fri, 29 Oct 2004, Lord Ireland scribbled furiously:

[comprehensive quote of problem, including signature, now removed]

> Look mate, using perl for this is an act of insanity.  Visual Basic 
> is much better at this kind of stuff - here's a helpful article on 
> how to purchase the 2003 version.

Damn!  Now my irony detector is in ruins.  Where am I supposed to
get a replacement, this late in the week?  Have a care for your fellow 
usenauts, please.



------------------------------

Date: Fri, 29 Oct 2004 13:13:38 +0200
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Should I use BEGIN, CHECK, or INIT?
Message-Id: <m964o0pqec6mb9b2shv84u3gbids18n6rf@4ax.com>

On 28 Oct 2004 20:26:23 GMT, Abigail <abigail@abigail.nl> wrote:

>It's not entirely clear to me when (or if) an INIT in a module is run.
>Is that just before runtime of the main program, or just before runtime
>of the module?

Wow! God is dead, Marx is dead and... there's something not entirely
clear to Abigail too!! (about Perl, that is!)


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Fri, 29 Oct 2004 13:13:15 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: Should I use BEGIN, CHECK, or INIT?
Message-Id: <Llrgd.28551$df2.23031@edtnps89>

Michele Dondi wrote:
> On 28 Oct 2004 20:26:23 GMT, Abigail <abigail@abigail.nl> wrote:
> 
>>It's not entirely clear to me when (or if) an INIT in a module is run.
>>Is that just before runtime of the main program, or just before runtime
>>of the module?
> 
> Wow! God is dead, Marx is dead and... there's something not entirely
> clear to Abigail too!! (about Perl, that is!)

Maybe he was stunned when he heard the news that the Red Sox won the World Series?


John
-- 
use Perl;
program
fulfillment


------------------------------

Date: Fri, 29 Oct 2004 16:29:35 +0200
From: stig erikson <_nospam_stigerikson@yahoo.se>
Subject: speeding up perl script execution under apache
Message-Id: <Ausgd.10125$1p.8365@nntpserver.swip.net>

Hello.
Even though this might be more of an apache question i will try here.

We have a few perl scripts in production, they run in a web service 
environment. We use apache 2 with the mod_cgi module that came with 
apache on redhat. We have performance problems.
It is run on a rather slow Pentium II 200Mhz server and it will be 
change in some not too far future but we would like to do something 
meanwhile.

The entire script (from first to last line) takes about 0.05-0.1 second 
to run. When we run it on apache it will take around 2.5 seconds to 
execute the entire script.
The overhead from executing perl and "compiling" the script is almost 
2.5 seconds. This is the problem. (serving static pages is fast, so the 
problem is execution of scripts).
The total execution time ir rather consistant no matter what script is 
run. For all scripts we run the overhead time is approximately 2.5 seconds.

We would like to minimize the overhead to gain time.
Preferrably we dont want to change the scripts.
Is there something we can do cut down the overhead time?
Make perl start faster och make the compilation faster?
Even a cut of the overhead from 2.5 to 2.0 seconds would help.

Thank you
Stig


------------------------------

Date: 29 Oct 2004 14:41:07 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: speeding up perl script execution under apache
Message-Id: <Xns95916CB2B69ACasu1cornelledu@132.236.56.8>

stig erikson <_nospam_stigerikson@yahoo.se> wrote in
news:Ausgd.10125$1p.8365@nntpserver.swip.net: 

> Hello.
> Even though this might be more of an apache question i will try here.

Rather you asked it in the right place.

> The overhead from executing perl and "compiling" the script is almost 
> 2.5 seconds. This is the problem. (serving static pages is fast, so
> the problem is execution of scripts).

Then you should look into using mod_perl:

http://perl.apache.org/

http://perl.apache.org/docs/1.0/guide/getwet.html#Porting_Existing_CGI_Scri
pts_to_run_under_mod_perl

> We would like to minimize the overhead to gain time.
> Preferrably we dont want to change the scripts.

There is no silver bullet. More memory ought to help.

There is really not that much else we can say here. This is a programming 
group. 

Sinan.


------------------------------

Date: Fri, 29 Oct 2004 16:43:54 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: speeding up perl script execution under apache
Message-Id: <2uf37lF2aaajsU1@uni-berlin.de>

stig erikson wrote:
> The overhead from executing perl and "compiling" the script is almost
> 2.5 seconds.

<snip>

> We would like to minimize the overhead to gain time. Preferrably we
> dont want to change the scripts. Is there something we can do cut
> down the overhead time?

You are screaming "mod_perl". Assuming that the scripts are well
written, there is at least not much you need to change to achieve a
significant performance improvement through mod_perl.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Fri, 29 Oct 2004 14:54:56 -0000
From: Tim Hammerquist <tim@vegeta.ath.cx>
Subject: Re: speeding up perl script execution under apache
Message-Id: <slrnco4m93.ugn.tim@vegeta.saiyix>

stig erikson <_nospam_stigerikson@yahoo.se> wrote:
>  We have a few perl scripts in production, they run in a web service
>  environment. We use apache 2 with the mod_cgi module that came with
>  apache on redhat. We have performance problems.
>  It is run on a rather slow Pentium II 200Mhz server and it will be
>  change in some not too far future but we would like to do something
>  meanwhile.

You don't state how much RAM your server has, but this will definitely
affect startup time of new processes (CGI scripts).  I configured
a Celeron 600MHz server with 32MB of RAM.  It was blazing fast on static
pages, but really suffered through CGI benchmarks.  The more CGI
processes or, indeed, dynamic pages *period*, the more important RAM
becomes.

>  The entire script (from first to last line) takes about 0.05-0.1 second 
>  to run. When we run it on apache it will take around 2.5 seconds to 
>  execute the entire script.
>  The overhead from executing perl and "compiling" the script is almost 
>  2.5 seconds. This is the problem. (serving static pages is fast, so the 
>  problem is execution of scripts).
>  The total execution time ir rather consistant no matter what script is 
>  run. For all scripts we run the overhead time is approximately 2.5 seconds.
>  
>  We would like to minimize the overhead to gain time.
>  Preferrably we dont want to change the scripts.
>  Is there something we can do cut down the overhead time?
>  Make perl start faster och make the compilation faster?
>  Even a cut of the overhead from 2.5 to 2.0 seconds would help.

As mentioned elsewhere, look at mod_perl.  If your scripts obey the
conventions for safe, stable CGI/Perl applications, the modifications
necessary to run them under mod_perl should be little none at all.

mod_perl may have a bit of a delay on the first execution of each
script, but each execution thereafter should be much faster, as each
script is left pre-compiled, in-memory.

mod_perl was, after all, designed specifically for your predicament.

HTH,
Tim Hammerquist


------------------------------

Date: Fri, 29 Oct 2004 07:02:43 -0400
From: Fred <noemail@#$&&!.net>
Subject: Re: using Win32::ODBC - what's fast?
Message-Id: <pan.2004.10.29.11.02.43.61753@#$&&!.net>

On Thu, 28 Oct 2004 22:30:14 -0400, Matt Garrish wrote:


> Are all your tables indexed on the select column to maximize for speed?

They are indexed on that column, but good point.

> Why are you reading all 70,000 unique ids into memory?

I thought it would be faster than doing it each time. Even with the index,
it just *seemed* exspensive, but I see what you mean about the connection.
( and the where clause!!)

>It would be faster
> just to execute a select statement each time you need to check if an sku
> already exists. I've never used the Win32::ODBC module, but if you were
> using the DBD driver you could just check whether the unit id exists
> like so: (untested)
> 
> sub check_exists {
> 
>    my ($tid, $unitid) = @_;
> 
>    my $sel_sth = $dbh->prepare("SELECT sku from $tid WHERE sku = ?") or
>    die
> $dbh->errstr();
> 
>    $sel_sth->execute($unitid) or die $dbh->errstr();
> 
>    if ($sel_sth->fetchrow_array) {
>       $sel_sth->finish();
>       return 1;
>    }
> 
>    return 0;
> 
> }
> 
> Remember that most of the overhead involved in database access is in
> setting up the connection. Once you have that connection, querying the
> database should be very fast (assuming your data is well structured and
> indexed).
> 
> Matt

I see now that including the where cluase leverages the power of the
index.. and individual selects start looking a lot better. Can't wait to
to test and thanks very much for your insight! Plus I'll try out the
driver you mention and test the diff. In the beginning when I was learning
perl, (circa 6 months ago) Win32::ODBC was the first thing I found tor
acessing MS DB's... so I learned it and never looked back. Bad practice in
'theory' but in real life when people want things done yesterday.... Thank
you again!

Fred


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 7339
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[25089] in Perl-Users-Digest

Perl-Users Digest, Issue: 7339 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Oct 29 11:05:41 2004

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Oct 29 11:05:41 2004