[19604] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1799 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Sep 24 03:05:31 2001

Date: Mon, 24 Sep 2001 00:05:12 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <1001315111-v10-i1799@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Mon, 24 Sep 2001     Volume: 10 Number: 1799

Today's topics:
    Re: command line file operations <bcaligari@fireforged.com>
        CPAN:FirstTime <ashley@pcraft.com>
    Re: Cropping blank space in images. (Soren Andersen)
    Re: Cropping blank space in images. (Martien Verbruggen)
    Re: Getting "Subject" in (Net::POP3) <Tassilo.Parseval@post.rwth-aachen.de>
    Re: How to htmlize an email, for eg lynx? (David Combs)
    Re: Matching Strings Help Needed (Ralph Freshour)
    Re: Matching Strings Help Needed (Martien Verbruggen)
    Re: New To Perl Scripting <rob_13@excite.com>
    Re: Perl equiv of C argv[0] == program name? <christopher_j@keepurspamtoyerself.qwest.net>
    Re: Perl or not? <christopher_j@keepurspamtoyerself.qwest.net>
    Re: Perl or not? (Martien Verbruggen)
    Re: Perl or not? (Logan Shaw)
    Re: pretty printing a web page (Chris Fedde)
    Re: pretty printing a web page (Martien Verbruggen)
    Re: Regular Expression Problem <jeffplus@mediaone.net>
    Re: Regular Expression Problem <dtweed@acm.org>
    Re: Regular Expression Problem <bcaligari@fireforged.com>
    Re: search, replace, functions, text wrapping (hard que (Martien Verbruggen)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 24 Sep 2001 05:51:42 -0000
From: "B. Caligari" <bcaligari@fireforged.com>
Subject: Re: command line file operations
Message-Id: <9omh6702bf9@enews3.newsguy.com>


"B. Caligari" <bcaligari@fireforged.com> wrote in message
news:9oj75o01csu@enews4.newsguy.com...
>
> "Chucko" <cmerrifield@houston.rr.com> wrote in message
> news:z87r7.20$b55.36673@typhoon.austin.rr.com...
> > Does anyone know how to (from a command line) read in a file of text,
keep
> > each word read, discard the duplicates then write the kept words to a
> file?
> >
>
> perl -nle '$x{$_}++ for split; END { print for keys %x }' filename
>
> cat filename | tr -s [:space:] "\n" | sort | uniq
>
perl -nle '$x{lc($_)}++ for (grep {m/^[[:alpha:]]+$/} split);
 END { print for sort keys %x }' filename

for a nice little sorted dictionary of words found.

B.




------------------------------

Date: Sat, 22 Sep 2001 13:38:02 -0600
From: "Ashley M. Kirchner" <ashley@pcraft.com>
Subject: CPAN:FirstTime
Message-Id: <3BACE89A.BF434C2F@pcraft.com>


    [ please CC my email as well ]

    This is odd.  I updated perl from CPAN, and it installed perl 5.6.1
all find and dandy.  Now, when I run r(einstall recommendations) it
tells it CPAN::FirstTime is at version 1.50, and the latest is 1.5.3,
available in perl 5.6.1 - but, that's exactly what it installed.  So
what gives?

--
W | I haven't lost my mind; it's backed up on tape somewhere.
  +--------------------------------------------------------------------
  Ashley M. Kirchner <mailto:ashley@pcraft.com>   .   303.442.6410 x130
  IT Director / SysAdmin / WebSmith             .     800.441.3873 x130
  Photo Craft Laboratories, Inc.            .     3550 Arapahoe Ave. #6
  http://www.pcraft.com ..... .  .    .       Boulder, CO 80303, U.S.A.




------------------------------

Date: Mon, 24 Sep 2001 05:29:03 GMT
From: NoSpam@Nevermind.com (Soren Andersen)
Subject: Re: Cropping blank space in images.
Message-Id: <Xns9126F18CF21EpspnR@204.127.36.1>

James, we want you to know you can safely ignore another reply (Message-ID: 
<3B74A910.3207C519@stomp.stomp.tokyo>) given previously to this inquiry, 
from the irascible clpm troll "Godzilla!".

"James Roberge" <ten.stm@egreborj> wrote in
<Iy1d7.52$KoS1.2293815@tomcat.sk.sympatico.ca>: 

>but is there an easy way that i could get perl to check an image for a
>large black area (or a large area of all one or similar colour) and then
>crop that off?  What kinda cpu power are we talking about here cropping
>and resizing pictures?  Or, would i be better off just just limiting the
>file size to about 100k jpegs? 

No, there is certainly no easy way, if what you mean by "easy" is having 
somebody on a ng point you towards (or write you some free) code. You'd 
actually be able to create a Perl application that could do this, I am very 
sure, if you were good enough. I am that good BTW. It isn't a trivial task, 
however. You would want to use either GD.pm or Perl (Image)-Magick. You'll 
often be advised to try and see if GD won't give you what want, first.

This would definitely be a CPU-sucker, there is no way 'round that. If you 
needed performance *that* badly you'd need to pay someone to write you some 
custom C/C++ (or Java) programming.

  Best,
     Soren Andersen


------------------------------

Date: Mon, 24 Sep 2001 05:56:51 GMT
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: Cropping blank space in images.
Message-Id: <slrn9qtip3.eug.mgjv@verbruggen.comdyn.com.au>

On Mon, 24 Sep 2001 05:29:03 GMT,
	Soren Andersen <NoSpam@Nevermind.com> wrote:
> James, we want you to know you can safely ignore another reply (Message-ID: 
><3B74A910.3207C519@stomp.stomp.tokyo>) given previously to this inquiry, 
> from the irascible clpm troll "Godzilla!".
> 
> "James Roberge" <ten.stm@egreborj> wrote in
><Iy1d7.52$KoS1.2293815@tomcat.sk.sympatico.ca>: 
> 
>>but is there an easy way that i could get perl to check an image for a
>>large black area (or a large area of all one or similar colour) and then
>>crop that off?  What kinda cpu power are we talking about here cropping
>>and resizing pictures?  Or, would i be better off just just limiting the
>>file size to about 100k jpegs? 

My server lost the original post that started this thread, so I'll
reply to this one.

If that black area appears on the outside of the image, then
Image::Magick has a builtin way to get rid of those, as long as you
know what the colour is. The -crop option to the convert tool with a
geometry specification of "0x0" will trim edges that are in the
background colour. This translates to the Crop() method in
Image::Magick with a geometry specification of "0x0".

Something like (untested):

my $im = Image::Magick->new();
my $rc = $im->Read($image_file_name);
die $rc if $rc;

# assume the wanted colour is at the upper-left pixel
my $bgclr = $im->Get("pixel[0,0]");
$rc = $im->Set("background", $bgclr);
warn $rc if $rc;

# crop the edges where they have the background colour
$rc = $im->Crop("0x0");
warn $rc if $rc;

$rc = $im->Write($output_file_name);
die $rc if $rc;

You might need to do some things with that $bgclr variable, it
depends a bit on the Image::Magick version how well all that works.
You may need to use the QueryColor() method.

> This would definitely be a CPU-sucker, there is no way 'round that. If you 
> needed performance *that* badly you'd need to pay someone to write you some 
> custom C/C++ (or Java) programming.

If you have to write this in Perl, then yes, it would be a CPU
cyclone. However, Image::Magick will do it all in compiled machine
code, which should be fairly efficient.

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | Think of the average person. Half of
Commercial Dynamics Pty. Ltd.   | the people out there are dumber.
NSW, Australia                  | 


------------------------------

Date: Mon, 24 Sep 2001 08:54:15 +0200
From: Tassilo von Parseval <Tassilo.Parseval@post.rwth-aachen.de>
Subject: Re: Getting "Subject" in (Net::POP3)
Message-Id: <3BAED897.4040404@post.rwth-aachen.de>


J.B. Moreno wrote:

> Tassilo von Parseval <Tassilo.Parseval@post.rwth-aachen.de> wrote:


>>The problem is that this is base64 encoded data. Can you read it? I can't.
>>
> 
> So what's the problem with that?  It probably means it's spam, if you
> don't want to deal with that, then check for the presence of encoded
> information and decode it.


It does not at all mean it's spam. One third of emails I get have 
encoded subject lines and yet none of those is spam.
AS for decoding, I already said how to do that conveniently.


> See RFC 2047.


It's not extremely helpful to point the OP to an RFC if there is already 
a CPAN-module lying around for this very purpose.


Tassilo 

-- 
$a=[(74,116)];$b=[($a->[1]-1,$a->[1]++,0x20)];$c=[(97,110)];$d=[($c->
[1]+1,$b->[1],"her")];for(@{[$a,$b,$c,$d]}){for(@{$_}){$_=~/\d+/?print
(chr($_)):print;}}$c=sub{$l=shift;[(0x20+$l-1,0x50,0x65,0x73-0x01,108
),(0x20,0x68,0x61,)]};print(map{chr($_)}@{($c->(1))});$h={a=>33*3,b=>
10**2+7,c=>"1"."0"."1",d=>0162};@h=sort(keys(%$h));for(@h){print(chr(
ord(chr($h->{$_}))))};



------------------------------

Date: 24 Sep 2001 05:03:38 GMT
From: dkcombs@panix.com (David Combs)
Subject: Re: How to htmlize an email, for eg lynx?
Message-Id: <9omera$5tj$1@news.panix.com>

In article <UE_p7.534$Owe.263000576@news.frii.net>,
Chris Fedde <cfedde@fedde.littleton.co.us> wrote:
>!/usr/bin/perl
>#
># Please excuse the jeopardy style response
># This is not especially clever but it does the job
># BTW.  I prefer w3m to lynx.  You might too
>#
>
>print "<html><head></head><body><pre>\n";
>while (<DATA>) {
>    s|(http://\S+)|<a href="$1">$1</a>|g;
>    print;
>}
>print "</pre></body></html>\n";
>
>__END__
>


Thanks to everyone!!!

The one above turns out to be the easiest,
and lets me use lynx on the resulting file,
which is the "browser" I've been using ever
since I got onto the net (great for
"shell accounts", which is what I have and
prefer).

QUESTION: wrong group, maybe, but most people
who have responded to this post seem to be
pretty knowledgeable at mutt esoterics.

What do I have to add to the above perl
code to allow it to work *like* urlview --

 .  runnable from mutt (eg via ^B).

 .  reads (from somewhere?) the email that
   mutt (and I) are currently looking at.

 .  runs the above nifty perl filter to
   htmlize that email.

 .  runs lynx with the filtered result (file?)
   as its url arg (just like urlview does).

 .  lets me "q" out of lynx and get back into
   mutt (just like urlview does).

Any perl+mutt guru know how to do this?

In fact, maybe said guru or gurus can save
me (and all mutt users) an *immense* amount of
work (since I know not what I do :-), and
actually code it up?

That would be really great!

----

THEN, I could post the result on comp.mail.mutt,
AND get it added to the various faqs for mutt,
and to the manual as well.

This scheme is far superior to urlview in that
urlview (seems to) runs a new lynx for each
url you choose from the url-screen.

Also nice would be an option (The ^B could
ask you if you wanted it) to write the
htmlized output to a file (you choose the name),
and return to mutt -- at which point you
could ^Z and run lynx *yourself* -- meaning
with the usual set of options that *YOU* use
when running lynx.

(Am not sure that urlview gives you opportunity
to specify *your* options, to use when *it*
runs lynx.)

----

Again, the above perl solution beats urlview, in
that for an entire email's links, you run lynx
ONCE.  

Not that that saves any *time* -- but that it
allows you to benefit from using lynx's "history"
command (backspace), and especially its incredibly-
cool "V" command, that gives you an indented tree-
diagram (tree lying on its side, blown down in storm!)
of *everywhere* you've been in *this* lynx run.

That goes even more so for that option I mentioned,
of having the htmlized file written out -- so
that then you can ^Z and then either (1) run lynx
or (2) (even better!) "fg" into an already running
one!

THEN, that lynx history-list and "V" command
*really* win!

----

Note:  Trick:  That "V" display, of the sideways-lying
tree: you can do a "\", getting the html that lynx
generated so as to *show* that tree --

and SAVE that html to a file,

for use ANOTHER DAY!  

You just link to it as a localhost file, and
all those sites are added, as a subtree, to
the CURRENT tree kept within lynx.

SUPER COOL!

---

Hey, thanks so much for reading through this thing.

Any takers?

David



------------------------------

Date: Mon, 24 Sep 2001 05:05:06 GMT
From: ralph@primemail.com (Ralph Freshour)
Subject: Re: Matching Strings Help Needed
Message-Id: <3baebfe8.2643803@news-server>

No, no locale being used.

I get a syntax error when I tried to add your line:

print ord($_).',' for split //,$a;

syntax error at line 3030, near "',' for "

Ralph

On Sun, 23 Sep 2001 23:32:13 GMT, Bob Walton
<bwalton@rochester.rr.com> wrote:

>Ralph Freshour wrote:
>> 
>> I'm a bit confused here - I have two vars that should be comparing but
>> they are not:
>> 
>> if ($filtFilter eq $hAttachFileNames[$x])
>>    {
>>       # matches
>>    }
>> else
>>    {
>>      # no match
>>    }
>> 
>> I've stepped thru with the debugger and the string data are identical
>> and I even checked the string lengths - what am I missing here?  There
>> is no match when there should be as far as I can see.
>...
>> Ralph
>
>Well, I would advise you to check any whitespace in the strings
>carefully to make sure it is exactly the same kind of whitespace.  This
>includes any newlines which might be in or at the beginning/end of the
>strings.  The strings must be bit-for-bit identical to be true with the
>eq operator.  Try something like:
>
>    print ord($_).',' for split //,$a;
>
>on each variable and see if the outputs are identical.  Are you using
>"locale" in your script?  That could have an effect as well.
>-- 
>Bob Walton



------------------------------

Date: Mon, 24 Sep 2001 05:13:10 GMT
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: Matching Strings Help Needed
Message-Id: <slrn9qtg76.eug.mgjv@verbruggen.comdyn.com.au>

[Please, in the future, put your response _after_ the suitably trimmed
text you reply to. It's the commonly accepted quoting style on this
newsgroup, and Usenet in general]

On Mon, 24 Sep 2001 05:05:06 GMT,
	Ralph Freshour <ralph@primemail.com> wrote:

[post re-arranged and trimmed considerably]

> On Sun, 23 Sep 2001 23:32:13 GMT, Bob Walton
><bwalton@rochester.rr.com> wrote:
> 
>>Ralph Freshour wrote:
>>> 
>>> I'm a bit confused here - I have two vars that should be comparing but
>>> they are not:
>>> 
>>> if ($filtFilter eq $hAttachFileNames[$x])
>>
>>          The strings must be bit-for-bit identical to be true with the
>>eq operator.  Try something like:
>>
>>    print ord($_).',' for split //,$a;
> 
> I get a syntax error when I tried to add your line:
> 
> print ord($_).',' for split //,$a;
> 
> syntax error at line 3030, near "',' for "

I suspect you have a Perl version before 5.005, which introduced this
syntax. Try

for (split //, $a)
{
	print ord($_), ", ";
}
print "\n";

And consider upgrading. Your Perl is at least 2 1/2 years out of date.

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | Hi, John here, what's the root
Commercial Dynamics Pty. Ltd.   | password?
NSW, Australia                  | 


------------------------------

Date: Mon, 24 Sep 2001 05:18:43 GMT
From: "Rob - Rock13.com" <rob_13@excite.com>
Subject: Re: New To Perl Scripting
Message-Id: <Xns9126D4C13A08rock13com@64.8.1.226>

Jeff <news:Zjxr7.6730$xG6.2136959@typhoon.ne.mediaone.net>:

> MN RAIDER FAN wrote:
>> I am a bit new to Perl Scripting, I have some experience in
>> C++ [...] suggestions from programmers that have worked with
>> Perl on what books to pick up that will help me pick up the
>> language quickly.

> A couple of O'Reilly books (Programming Perl) and the Jeffrey
> Friedl Regular Expressions book) are definitely worth owning.

I'd second those choices. Learning Perl is likely to basic for you 
if you have some progamming experience. Don't forget about all the 
documetation that is included in Perl either.

Perl Cookbook might be worth a look, though its version of Perl is 
5.5 or thereabouts I think. If you are getting into Perl for the 
CGI usage then 'Perl and CGI for the WWW' by Liz Castro is nice.

-- 
Rob - http://rock13.com/
Web Stuff: http://rock13.com/webhelp/


------------------------------

Date: Sun, 23 Sep 2001 21:22:43 -0700
From: "Christopher M. Jones" <christopher_j@keepurspamtoyerself.qwest.net>
Subject: Re: Perl equiv of C argv[0] == program name?
Message-Id: <Twyr7.5$0R3.14377@news.uswest.net>

"Richard Muller" <rlmuller(at)msn.(dot)(deletethis).com> wrote:
> Hi all,
>
> Can someone tell me how to proint out the name of the currect script being
> executed by the perl engine?

$0


--
You are in a twisty little maze of newsgroups, all different.




------------------------------

Date: Sun, 23 Sep 2001 21:09:47 -0700
From: "Christopher M. Jones" <christopher_j@keepurspamtoyerself.qwest.net>
Subject: Re: Perl or not?
Message-Id: <Fkyr7.1$0R3.1169@news.uswest.net>

"Tom" <tom@zerofiveone.nosp@m.com> wrote:
> Perl is good, I love it, but it lacks the speed of compiled programs.
> Anyone got a good alternative?

And what do you base this on?  What exactly are you doing
for which Perl "lacks speed"?  Unless you are using
outdated hardware or are doing something really "crunchy"
then Perl is more than up to the job.


--
Turn your lantern on or the grue will eat you!




------------------------------

Date: Mon, 24 Sep 2001 05:27:17 GMT
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: Perl or not?
Message-Id: <slrn9qth1l.eug.mgjv@verbruggen.comdyn.com.au>

On Sun, 23 Sep 2001 21:05:21 -0700,
	Christopher M. Jones <christopher_j@keepurspamtoyerself.qwest.net> wrote:
> "Logan Shaw" <logan@cs.utexas.edu> wrote:
>> In article <m1u1xtvoyq.fsf@halfdome.holdit.com>,
>> Randal L. Schwartz <merlyn@stonehenge.com> wrote:
>> >>>>>> "Tom" == Tom  <tom@zerofiveone.nosp@m.com> writes:
>> >
>> >Tom> Perl is good, I love it, but it lacks the speed of compiled
> programs.
>> >
>> >Perl *is* compiled.  See the FAQ.  Are you just quoting rumors, or
>> >do you have a specific counterexample?
>>
>> Hmm.  I think he means "compiled into native machine code".  Which,
>> unless there is a JIT compiler for Perl that I don't know about, Perl
>> isn't.  Correct?
> 
> Incorrect.  Perl is not an interpreted language or a JIT-type
> language.  Perl programs are compiled at run time into
> executable code and then run just like a normal program.
> However, Perl does not (natively) support the creation of
> standalone executable program files.

Euhmmm.. Can you point me to the bit of the manual that explains that
Perl is compiled into machine code? The perlcompile man page, for
example states this:

[snip]
       Perl has always had a compiler: your source is compiled
       into an internal form (a parse tree) which is then opti­
       mized before being run.  Since version 5.005, Perl has
       shipped with a module capable of inspecting the optimized
       parse tree (`B'), and this has been used to write many
       useful utilities, including a module that lets you turn
       your Perl into C source code that can be compiled into an
       native executable.
[snip]

This mentiones the parse tree that I knew about, but not the machine
code bit that you seem to know about. With the help of the Bytecode
backend and the ByteLoader module you can create executable bytecode,
but again, this is not native machine code.

The Perl FAQ, section 1, question 'Is it a Perl program or a Perl
script?' says:

[snip]
       Perl programs are (usually) neither strictly compiled nor
       strictly interpreted.  They can be compiled to a byte-code
       form (something of a Perl virtual machine) or to com­
       pletely different languages, like C or assembly language.
       You can't tell just by looking at it whether the source is
       destined for a pure interpreter, a parse-tree interpreter,
       a byte-code interpreter, or a native-code compiler, so
       it's hard to give a definitive answer here.
[snip]

Again, no mention of Perl compiling sources internally to native
machine code.

Yes, Perl is compiled. No, Perl is not compiled into native machine
code, which is what Logan Shaw stated. If Perl sources were internally
compiled into native machine code, then it wouldn't have been as much
work as it is now to provide a decent perlcc.

>                                       This *is* a FAQ,
> perhaps you should check up on it before posting.

Hmm.

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | If at first you don't succeed, try
Commercial Dynamics Pty. Ltd.   | again. Then quit; there's no use
NSW, Australia                  | being a damn fool about it.


------------------------------

Date: 24 Sep 2001 00:51:18 -0500
From: logan@cs.utexas.edu (Logan Shaw)
Subject: Re: Perl or not?
Message-Id: <9omhkm$mov$1@charity.cs.utexas.edu>

In article <vgyr7.632$lI1.841349@news.uswest.net>,
Christopher M. Jones <christopher_j@keepurspamtoyerself.qwest.net> wrote:
>"Logan Shaw" <logan@cs.utexas.edu> wrote:
>> In article <m1u1xtvoyq.fsf@halfdome.holdit.com>,
>> Randal L. Schwartz <merlyn@stonehenge.com> wrote:
>> >>>>>> "Tom" == Tom  <tom@zerofiveone.nosp@m.com> writes:

>> >Tom> Perl is good, I love it, but it lacks the speed of compiled programs.

>> >Perl *is* compiled.

>> Hmm.  I think he means "compiled into native machine code".

>Incorrect.  Perl is not an interpreted language or a JIT-type
>language.  Perl programs are compiled at run time into
>executable code and then run just like a normal program.

Does "executable code" mean "native machine code" here?  I've always
been under the impression that while the execution of a Perl program
does include a compilation phase, the result of this compilation phase
is not machine code.  (If it were, I'd expect Perl to be less portable
than it is.)

Actually, what determines performance is not likely to be just whether
machine code is generated.  Even if it is generated, performance still
won't measure up to what most people mean by "speed of compiled
programs" if the generated machine code is very frequently making
expensive calls into a runtime system.  I haven't investigated
carefully, but my impression is that in Perl's case this would be
necessary unless the compiler were very clever, because Perl's
semantics are so complicated.  (For example, scalars can internally be
numbers or strings, and run-time conversion has to magically make
conversions happen sometimes, but whether this is necessary in a given
situation depends on the state of the program's execution, not just on
the program's text.)

I also want to point out that "the speed of compiled programs", the
phrase the original poster used, is not incredibly precise Lots of
language systems have some sort of compilation phase, and they're not
all equally fast.  So this discussion will reach a point where it's
hard to productively discuss whether Perl qualifies.

By the way, I'm not really sure I agree that Perl isn't a JIT-type
language.  I'd bet it would be possible to construct a JIT compiler for
Perl that would offer substantial performance benefits.  For one thing,
Perl has a lot of decisions to make about how to be magic, and one
could observe the running code to determine which decisions it is
usually making and then optimize the code to make those paths through
the code faster.

>However, Perl does not (natively) support the creation of
>standalone executable program files.

Well, I agree with that, but creating standalone executables isn't the
only way to get very good performance.

>This *is* a FAQ,
>perhaps you should check up on it before posting.

I did to some extent.  "perldoc -q compile" doesn't address the
question of whether native machine code can be produced.  It just says
that compiling into C doesn't help performance (or anything else) much,
and that it's experimental anyway.

Maybe there's some other info on compilation in the FAQ that I didn't
find, though.  I've found Perl's FAQ to be a tad difficult to search
unless you already know the magic keywords.  (This isn't intended to be
a slam against Perl.  It's hard to find things in most FAQs and it's
hard to make it easy find things in FAQs.)

  - Logan
-- 
"Everybody
 Loves to see              
 Justice done
 On somebody else"     ( Bruce Cockburn, "Justice", 1981 )


------------------------------

Date: Mon, 24 Sep 2001 04:48:28 GMT
From: cfedde@fedde.littleton.co.us (Chris Fedde)
Subject: Re: pretty printing a web page
Message-Id: <wWyr7.631$Owe.311475712@news.frii.net>

In article <3BAB3F76.CF513632@americasm01.nt.com>,
Desrosiers, Benoit [CAR:9F53:EXCH] <benoitd@americasm01.nt.com> wrote:
>-=-=-=-=-=-
>
>Thanks,
>
>but it's not enought. The main content on the page is a table. And some of
>its cells are multilines so I can't just print it sequentialy.
>

Take a look at w3m. It's a text browser that is savvy to several of the
common formatting gimicks like tables and frames.  It's not perl but it's
easy to drive from perl (at least from unix it is)

Good Luck
-- 
    This space intentionally left blank


------------------------------

Date: Mon, 24 Sep 2001 05:36:54 GMT
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: pretty printing a web page
Message-Id: <slrn9qthjm.eug.mgjv@verbruggen.comdyn.com.au>

On Sun, 23 Sep 2001 16:16:56 -0400,
	Benjamin Goldberg <goldbb2@earthlink.net> wrote:
> Desrosiers, Benoit [CAR:9F53:EXCH] wrote:
>> 
>> Thanks,
>> 
>> but it's not enought. The main content on the page is a table. And
>> some of its cells are multilines so I can't just print it sequentialy.
>> 
>> If I use the print button on Netscape or on IE, I only get the left
>> part of the page.
>> What I would like is something that would reduce the size of what is
>> being printed and fit it on one page.
> 
> Sounds like you need some kind of html2ps converter... that way you
> could set some kind of parameter and get it the right size.

Yes, there is such a program, with exactly that name. We use it here
to generate nicely formatted faxes from a Perl program that generates
HTML. It does a reasonable job, but it doesn't automatically make
stuff fit on a page.  It wouldn't be trivial at all to do that sort of
thing. However, it can print in landscape, so maybe that is enough for
the OP.

> I don't know if there is such a program, and I'm not going to look...
> but I will say that you might possibly get better results if you modify
> the program which creates the page output either postscript directly, or
> output latex, which you can convert to postscript with various tools.

You'd have to work hard to ensure that the table size is reasonable,
still.

> Hmm... Here's a funky idea.  What if someone made a module with the same
> kind of interface as CGI.pm, but whose content generating functions
> produce latex (or postscript, or [nt]roff) analogs of the html markup...
> eg, start_html() would print a header for the beginning of a latex
> document, table() would produce a latex table, hr() would be whatever
> latex uses as a horitontal line, etc.
> 
> Any comments?

We actually started writing something like that here, as part of a
generic reporting interface. The project got canned for certain err...
business reasons. The design was more or less finished (but not for
publication, unfortunately). The target language was (shudder) Java,
but there is nothing to prevent it from being implemented in Perl.

I would not use the CGI interface, however. It's too web-oriented.
It would be easily possible though to use the CGI module as an output
driver for the interface.

Note that if you want to implement all of these things yourself, that
you have a serious, serious task at hand. We were thinking of using
XML and SGML/Docbook, with various second-tier convertors to
PostScript and/or HTML.

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | Make it idiot proof and someone will
Commercial Dynamics Pty. Ltd.   | make a better idiot.
NSW, Australia                  | 


------------------------------

Date: Mon, 24 Sep 2001 04:19:17 GMT
From: Jeff <jeffplus@mediaone.net>
Subject: Re: Regular Expression Problem
Message-Id: <9vyr7.6900$xG6.2171127@typhoon.ne.mediaone.net>

Good point.

To avoid copying the string, one might also care to do the whole thing at 
once:

my $data = ($my_string =~ /^#DATA(.*?)^#0/ms)[0];

Logan Shaw wrote:

>>Scott wrote:
>>> myString = " ....multi-lines of text i don't need...\n"
>>> myString += "#DATA"     // this flags my data section
>>> myString += "..multi-lines of data i DO need....\n"
>>> myString += "#0"        // this flags the end of my data.
>>> 
>>> 
>>> I am trying something like /#DATA([^#0]+)/
> 
> In article <Axxr7.6758$xG6.2143028@typhoon.ne.mediaone.net>,
> Jeff  <jeffplus@mediaone.net> wrote:
>>Perl is interpreting that to mean:
>>
>>"The text '#DATA' followed by one or more characters that are neither '#'
>>nor '0'"
>>
>>You want this:
>>
>>$_ = $the_string_with_the_data;
>>/^#DATA(.*)^#0/ms;
>>my $data = $1;
> 
> It might be a good idea to make the Kleene star non-greedy by adding a
> question mark, like this:
> 
> $string =~ /^#DATA(.*?)^#0/ms;
> 
> Whether that's necessary or not depends on whether "^#0" occurs only
> once after "^#DATA" occurs.  If it does occur more than once, the
> non-greedy version will match all the text up to the first one, whereas
> the (default) greedy version will match all of the text up to the last
> one.
> 
>   - Logan



------------------------------

Date: Mon, 24 Sep 2001 05:12:30 GMT
From: Dave Tweed <dtweed@acm.org>
Subject: Re: Regular Expression Problem
Message-Id: <3BAEBF73.E166F93E@acm.org>

Andrew Cady wrote:
> Non-greediness is slow, though, because it has to check the rest of
> the regex for every character (although if the rest of the regex is
> just "#0" that won't be too bad).

A common misconception, but actually not true in most cases. If the
next thing after a non-greedy qualifier is a literal substring, a very
fast string search can be used to locate the next occurance of it.
Of course, if there are additional fragments after that, they need
to be checked, and that can be slow.

On the other hand, a greedy qualifier is usually inefficient, because
it can gobble up all the rest of the string, and then the regex engine
has to backtrack one character at a time until the pattern succeeds.
There are optimizations that work pretty well for the most common
cases, but it isn't hard to write a regex with greedy qualifiers that
requires exponential time to execute. See "perldoc perlre".

If you really want a headache, take a look at regcomp.c and regexec.c.

-- Dave Tweed


------------------------------

Date: Mon, 24 Sep 2001 05:42:13 -0000
From: "B. Caligari" <bcaligari@fireforged.com>
Subject: Re: Regular Expression Problem
Message-Id: <9omgml02aom@enews3.newsguy.com>


"Scott" <scott_hill2@hotmail.com> wrote in message
news:7e0b3308.0109231823.5b7ea19e@posting.google.com...
> Hello, I'm just learning Regular Expressions, and could use some help.
>
> I have some bits of data in a long string of newline terminated text,
> and I need to pull it out. Ths string looks like this.
>
> (please forgive the Java style code...i am using a library)
>
> myString = " ....multi-lines of text i don't need...\n"
> myString += "#DATA"     // this flags my data section
> myString += "..multi-lines of data i DO need....\n"
> myString += "#0"        // this flags the end of my data.
>
>
> I am trying something like /#DATA([^#0]+)/
>
> but this isnt working. I am trying to use the ^ to mean 'not #0', but I
> think Perl is reading it as 'not #'.

your expression matches #DATA followed by any number of characters that are
neither a '#' nor a '0'

In your case it might be worth considering using a non greedy match such as
    m/#DATA.*?#0/

my $string = "beginning#DATA#0garbage#DATAblablabla#0end";
print "Matched: >$_<\n" for $string =~ m/#DATA(.*?)#0/g;

This would print out
    Matched: ><
    Matched: >blablabla<

B.





------------------------------

Date: Mon, 24 Sep 2001 05:07:38 GMT
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: search, replace, functions, text wrapping (hard question! I think!)
Message-Id: <slrn9qtfsq.eug.mgjv@verbruggen.comdyn.com.au>

On Sun, 23 Sep 2001 23:00:27 -0400,
	Benjamin Goldberg <goldbb2@earthlink.net> wrote:
> Martien Verbruggen wrote:
>> 
>> On Sun, 23 Sep 2001 19:29:07 -0400,
>>         Benjamin Goldberg <goldbb2@earthlink.net> wrote:
>> > Richard Lawrence wrote:
>> > [snip]
>> >> Now my regexp utterly breaks because there are two spaces in there
>> >> that have to be taken into consideration. What I'm looking for is
>> >> the input of
>> >
>> > (my $cheat = $URI::uric) =~ tr/://d;
>> 
>> Hmm. Aren't you getting a bit too chummy with the URI implementation
>> here? I mean, the $uric variable isn't exactly documented, so one
>> would need to assume that it may disappear at some time in the future.
> 
> Ehh, I took this from URI::Find, more or less.  To be honest, I haven't
> a clue as to what's actually in it.  Well, I assume that it's a string
> with all of the characters which are valid in the somethingorother part
> of the in scheme:somethingorother, but I didn't get that from looking
> into URI.pm.

Ah, I didn't know that. I just hadn't seen that variable before, so I
looked at the URI man page, and didn't see an entry there. Then I
looked at the URI source, and noticed that indeed it did exist. I
don't have URI::Find installed. Let me have a look at it..

Hmmm.. It also doesn't document $URI::uric, but it does use it. I'd
say _it_ is a bit too chummy with the URI module implementation as
well :). The author, Michael Schwern, does agree with me though:

$ cat lib/Find/URI.pm
[snip]

# XXX This is probably more than a little cozy with URI.pm.
require URI;
my($schemeRe) = $URI::scheme_re;
my($uricSet)  = $URI::uric;

# We need to avoid picking up 'HTTP::Request::Common' so we have a
# subset of uric without a colon ("I have no colon and yet I must poop")
[snip]

(Yes, I know that last comment has nothing to do with the $uric
variable, but I thought I'd include it anyway, for entertainment
value.)

Maybe Michael Schwern has talked to Gisle Aas to make sure
that $uric will always be there, and maybe I'm being overly paranoid,
but experience has taught me to never rely on undocumented
implementation details, not even the ones in my own code. 

>> Maybe a warning should be added..
> 
> You mean add a warning in my code?  Or in URI.pm?

Mainly your code, since it shows the use of an undocumented
implementation details of URI. It is unlikely that $uric will change
(since I believe it is reflects uric in RFC 2396), but the way it
lives in URI.pm might change. Not a biggie, just a warning from
someone who's had to rewrite much code that relied on undocumented
"features".

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | 
Commercial Dynamics Pty. Ltd.   | values of Beta will give rise to dom!
NSW, Australia                  | 


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 1799
***************************************


home help back first fref pref prev next nref lref last post