[23767] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5971 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Dec 23 00:05:44 2003

Date: Mon, 22 Dec 2003 21:05:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 22 Dec 2003     Volume: 10 Number: 5971

Today's topics:
        html-->text, keep line breaks, best strategy is? <henryn@zzzspacebbs.com>
    Re: Ordering arrays? <shondell@cis.ohio-state.edu>
    Re: Please critique this short script that scans a log  <jwngaa@att.net>
    Re: Please critique this short script that scans a log  <krahnj@acm.org>
    Re: Please critique this short script that scans a log  <dwall@fastmail.fm>
    Re: Q about a module containing more than one class <usenet@morrow.me.uk>
    Re: recursive closures? <usenet@morrow.me.uk>
    Re: What CPU/Memory does a Wait cost? <test@test.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 17 Dec 2003 19:35:41 GMT
From: Henry <henryn@zzzspacebbs.com>
Subject: html-->text, keep line breaks, best strategy is?
Message-Id: <BC05F00C.19A72%henryn@zzzspacebbs.com>

Folks:

Here's a problem I encountered and what I'm doing about it.  I'd welcome
suggestions about how best to solve issues like this.  Please do _not_ spend
time solving the specific problem -- just help me understand the  best
approaches to such issues.

The problem:  I'm sniffing the text in some HTML using Perl.   The only way
(believe me!) to do the job is scan from line breaks, so I need to preserve
these.  It looks like everything I need to sniff is preformatted (between
<pre> and </pre>> tags) with /r (carriage returns) at just the right places.

Thus far, I've been prototyping using a executable binary filter called
"html2text".  I can certainly continue using this utility, but this
approach..... hmmm, lacks a certain elegance.  Let's do this _all_ in Perl!

In short, I need extract text from HTML, preserving breaks.

1. First trip to CPAN.  Using HTML::TreeBuilder followed by HTML::FormatText
as documented at 

  http://search.cpan.org/~sburke/HTML-Format-2.03/lib/HTML/FormatText.pm

seems to do the trick, _except_ that all the output is one long byte stream
-- no breaks at all.   I fooled around with the HTML::FormatText parameters
"leftmargin" and "rightmargin".   No effect.

I briefly looked at the source for clues.  No other options, and it seems
capable of outputting newlines ("\n") --  I think -- but my Perl skills are
not sufficient to be sure, and under what conditions.

Hack a CPAN module?   Not at my skill level.

2. Second trip to CPAN.  HTML::Parser is different, but it looks
significantly more complex/advanced than HTML::Parser.  I'm not good at OO
technology and I can't even tell if it will do the trick.

3. Googling, I found a classic Tom Christiansen script called "striphtml" at
 
   http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz

But it seems to be designed in early 1996 for HTML 2.0.   Instructive, for
sure, but doesn't look like a good bet.

4.  My colleague, A.D., suggests a real hack:  preprocess the html files,
jamming in a unique tag (maybe "xyzzy"?) at line breaks.  Easy to
reconstitute at a later pass.   Sure, this will work, but .... lacks a
certain elegance, and doesn't teach me much about Perl.

5. Googled up a 2000/07/02 post to this group that fully quotes a Perl
Journal article on HTML::Parse.  If I understand this article correctly, not
only can I do what I need, but I'm going to understand more perl subtleties
in the bargain.  Thanks to the author, Ken MacFarlane!

As it stands, HTML::Parse seems my best bet.  Comments?

It it fairly typical to load a couple of different modules, trying them on
for size until the best fit is found?

Any particular penalty besides disk space used for leaving unused modules
lying around?

Thanks,

Henry

henryn@zzzspacebbs.com   remove 'zzz'



  



------------------------------

Date: 17 Dec 2003 17:01:44 -0500
From: Ryan Shondell <shondell@cis.ohio-state.edu>
Subject: Re: Ordering arrays?
Message-Id: <xcw7k0vdrxz.fsf@psi.cis.ohio-state.edu>

Gunnar Hjalmarsson <noreply@gunnar.cc> writes:

> A lexical sort of numbers? Not a good advice, is it? (It happens to
> give the same result as a numerical sort applied on the above data,
> since all the values are integers with the same number of digits.)

You're right, of course. I imagine my brain saw the quotes around the
numbers, and went on "lexical autopilot". :-)

@number2 = sort {$b <=> $a} @number1;

is a much better solution.

Ryan
-- 
perl -e '$;=q,BllpZllla_nNanfc]^h_rpF,;@;=split//,
$;;$^R.=--$=*ord for split//,$~;sub _{for(1..4){$=
=shift;$=--if$=!=4;while($=){print chr(ord($;[$%])
+shift);$%++;$=--;}print " ";}}_(split//,$^R);q;;'


------------------------------

Date: Wed, 17 Dec 2003 13:55:11 -0600
From: J.W. <jwngaa@att.net>
Subject: Re: Please critique this short script that scans a log file
Message-Id: <3uc1uv0uscqhfbde1b0lcsa6h0snc8t9rp@4ax.com>

On Wed, 17 Dec 2003 18:45:52 GMT, Uri Guttman <uri@stemsystems.com>
wrote:

>>>>>> "JW" == J W <jwngaa@att.net> writes:
>
>  > #!/usr/bin/perl
>
>you said you ran it with -w but where is it now? you should keep it on
>even in production code

I developed and debugged the script in Windows and did "perl -w
myscript.pl" in that environment.  The "#!/usr/bin/perl" was for
running in AIX and it didn't occur to me to do "#!/usr/bin/perl -w".
Good idea.

>	if ( /$regex_ctime/ ) {
>
>  > 			$ts1_started = timegm($6, $5, $4, $3,
>  > $MonthIndex{$2}, $7);
>  > 			$ts1 = "$1 $2 $3 $4:$5:$6 $7";
>
>
>having that many grabbed things without names is scary. what if you
>change something? it might be better to assign them to a list of vars
>and use those:
>
>	if ( my( $var1, $var2, $foo, $blah ) = /$regex_ctime/ ) {
>
>		$ts1_started = timegm($var1, $var2, $foo, $blah, ....

This is a good idea.  Code that is self-documenting like this is
great.




------------------------------

Date: Wed, 17 Dec 2003 20:21:50 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: Please critique this short script that scans a log file
Message-Id: <3FE0BAD6.83DCDCB1@acm.org>

"J.W." wrote:
> 
> I'm new to Perl and wrote a simple script to scan a database log file
> with embedded timestamps.  This is similar to scanning other common
> text files with timestamped entries such web server utilization logs,
> transaction logs, etc.
> 
> This script already works correctly and the -w doesn't report any
> warnings.  However, I'd like to get any feedback on making it better.
> My experience has been with C++ and ksh/awk, so I'm not sure my brain
> is solving problems the "Perl" way yet.
> 
> [snip code]
> 
> #example log file excerpt used as input
> ********************************************************************************
> Time Started:      Mon Dec 17 13:18:12 2001
> 
> Parameters Passed:
>   db2uext2 -OSAIX -RLSQL07010 -RQARCHIVE -DBKP1 -NNNODE0000
> -LP/db2/KP1/db2kp1/NODE0000/SQL00001/SQLOGDIR/ -LNS0007259.LOG
> System Action:     ARCHIVE
> Target:            DISK
> RC           :     0
> Time Completed:    Mon Dec 17 13:18:27 2001
> 
> ********************************************************************************
> Time Started:      Mon Dec 17 13:28:01 2001
> 
> Parameters Passed:
>   db2uext2 -OSAIX -RLSQL07010 -RQARCHIVE -DBKP1 -NNNODE0000
> -LP/db2/KP1/db2kp1/NODE0000/SQL00001/SQLOGDIR/ -LNS0007260.LOG
> System Action:     ARCHIVE
> Target:            DISK
> RC           :     654
> Time Completed:    Mon Dec 17 13:28:15 2001
> 
> ********************************************************************************
> Time Started:      Mon Dec 17 13:47:42 2001
> 
> Parameters Passed:
>   db2uext2 -OSAIX -RLSQL07010 -RQARCHIVE -DBKP1 -NNNODE0000
> -LP/db2/KP1/db2kp1/NODE0000/SQL00001/SQLOGDIR/ -LNS0007261.LOG
> System Action:     ARCHIVE
> Target:            DISK
> RC           :     0
> Time Completed:    Mon Dec 17 13:47:50 2001
> 
> ********************************************************************************
> 
> # example output from the Perl script:
> Mon Dec 17 13:18:12 2001  Mon Dec 17 13:18:27 2001    0    15
> Mon Dec 17 13:28:01 2001  Mon Dec 17 13:28:15 2001  654    14
> Mon Dec 17 13:47:42 2001  Mon Dec 17 13:47:50 2001    0     8

If each record is separated by a line of asterisks then you can use that
as the Input Record Separator.  Something like this (untested):

$/ = "**********\n";
while ( <> ) {
    my ( $start ) = /^Time Started:\s+(.+)/m;
    my ( $rc )    = /^RC\D+(\d+)/m;
    my ( $end )   = /^Time Completed:\s+(.+)/m;
    my $diff = convert_date( $end ) - convert_date( $start );
    printf "%s  %s %4d   %3d\n", $start, $end, $rc, $diff;
    }



John
-- 
use Perl;
program
fulfillment


------------------------------

Date: Wed, 17 Dec 2003 20:57:35 -0000
From: "David K. Wall" <dwall@fastmail.fm>
Subject: Re: Please critique this short script that scans a log file
Message-Id: <Xns9454A258057F2dkwwashere@216.168.3.30>

David K. Wall <dwall@fastmail.fm> wrote:

> J.W. <jwngaa@att.net> wrote:
> 
>> I'm new to Perl and wrote a simple script to scan a database log
>> file with embedded timestamps.  This is similar to scanning other
>> common text files with timestamped entries such web server
>> utilization logs, transaction logs, etc.
>> 
> 
> I'm not an especially good teacher*, but I can show you how I
> might have approached it.
> 
> (* Or, I suppose, an especially good programmer -- but I have
> fun.) 

I should probably add:  give Brian McCauley's comments much more weight 
than mine.  :-)

-- 
David Wall


------------------------------

Date: Wed, 17 Dec 2003 19:24:31 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: Q about a module containing more than one class
Message-Id: <brqahf$jao$2@wisteria.csv.warwick.ac.uk>


"Ala Qumsieh" <xxala_qumsiehxx@xxyahooxx.com> wrote:
> "Michele Dondi" <bik.mido@tiscalinet.it> wrote in message
> news:d583uvomvch5r79qlnpouoh5p3p8o7qfp6@4ax.com...
> 
> > I have a class that relies on other classes (as suggested in many
> > docs) that are to be used *exclusively* for this, i.e. not as
> > standalone classes.
> >
> > Now I want to put all these classes in the same (separate) module so
> > that only the "main" one is made (explicitly) available to the final
> > user, and my question is wether there's any convention/direction for
> > assigning a name to the "other" classes:

I would tend to name them under the namespace of the main class, so a
util class for My::Class would be called My::Class::Util. It is a
*very* bad idea to use a random top-level name: package names are
global, remember, so you open yourself up to clashes. This applies
whether you put all the classes into one module or split them up into
separate modules that you main module uses: which is often a good idea.

> I guess different people would do it differently. I would make is such that
> each module is in a separate file, and would name them such that each '::'
> in the module name corresponds to a real directory hierarchy. You don't have
> to do that of course, but I find it easier to look for files, and understand
> what class uses what. Something like this (untested):

You seem to have missed the whole point of 'require'... anyone who
keeps their Perl modules organised any differently from this is either
a fool or knows something I don't.

-- 
For the last month, a large number of PSNs in the Arpa[Inter-]net have been
reporting symptoms of congestion ... These reports have been accompanied by an
increasing number of user complaints ... As of June,... the Arpanet contained
47 nodes and 63 links. [ftp://rtfm.mit.edu/pub/arpaprob.txt] * ben@morrow.me.uk


------------------------------

Date: Wed, 17 Dec 2003 19:17:46 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: recursive closures?
Message-Id: <brqa4q$jao$1@wisteria.csv.warwick.ac.uk>


tassilo.parseval@post.rwth-aachen.de wrote:
> Also sprach Anno Siegel:
> 
> > Brian McCauley  <nobull@mail.com> wrote in comp.lang.perl.misc:
> >> Now, of course, it is an annoying miss-feature that local() can't be
> >> applied to lexical variables so if you want to use local() you are for
> >> forced to use package variables.  ...
> > 
<snip>
> > I agree that the inability to use local() with lexicals is sometimes
> > annoying.  It wouldn't be good style to use it all the time, but
> > occasionally the dynamic restoration behavior of local() is just what
> > the doctor ordered.
> 
> Actually there is a nice solution available for the given problem. our()
> has very similar scoping rules to my(). As long as the self-referential
> closure isn't meant to span over several packages, it should work as
> desired because now we can in fact localize.

No, because this variable is now accessible as $Package::var from
anywhere in the dynamic scope of the local statement, which opens you
up to strange-action-at-a-distance effects again.

I believe that in Perl6 you can temp() (local()'s new name) lexicals.

Ben

-- 
   If you put all the prophets,   |   You'd have so much more reason
   Mystics and saints             |   Than ever was born
   In one room together,          |   Out of all of the conflicts of time.
ben@morrow.me.uk |----------------+---------------| The Levellers, 'Believers'


------------------------------

Date: Wed, 17 Dec 2003 20:59:20 GMT
From: "Reportor" <test@test.com>
Subject: Re: What CPU/Memory does a Wait cost?
Message-Id: <Is3Eb.475462$0v4.21351096@bgtnsc04-news.ops.worldnet.att.net>

I must ever time get Admin's Review before they can place cron job for me.
I don't want to bother them everytime I change something.

Yesterday, I wrote a Super Proxy Pac Script in Perl and JS. I will profit
from it. Haha.

> Why don't you want to use cron? It avoids all of the above problems.




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5971
***************************************


home help back first fref pref prev next nref lref last post