[18674] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 842 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun May 6 09:07:32 2001

Date: Sun, 6 May 2001 06:05:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <989154307-v10-i842@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Sun, 6 May 2001     Volume: 10 Number: 842

Today's topics:
    Re: Compression (to .zip/.gz) using system/backticks <goldbb2@earthlink.net>
    Re: DOS Perl convering LF to CR/LF <pne-news-20010506@newton.digitalspace.net>
    Re: DOS Perl convering LF to CR/LF <bart.lateur@skynet.be>
    Re: Good editor for perl <dt@area.com>
    Re: Help on optimization wanted (Abigail)
    Re: Help with double hash of arrays <goldbb2@earthlink.net>
    Re: How secure is this.... <goldbb2@earthlink.net>
    Re: How to determine file type of a filehandle? <bart.lateur@skynet.be>
    Re: How to execute a perl script from a perl script? -  <admin@nospam.m2n.co.uk>
        Match/replace of HTML text without loss of tag informat <jim1234@monetsgarden.net>
    Re: Match/replace of HTML text without loss of tag info <jfreeman@tassie.net.au>
    Re: Parsing <dave@dave.org.uk>
    Re: PERL Code Generator (Martien Verbruggen)
    Re: Poetry::Aum version 0 released <jfreeman@tassie.net.au>
    Re: re-sizing GIF images on the fly (Martien Verbruggen)
    Re: Recursing a directory tree <jfreeman@tassie.net.au>
    Re: regular expression (Eric Bohlman)
    Re: Test for integer? <jfreeman@tassie.net.au>
    Re: Where is my script <goldbb2@earthlink.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 06 May 2001 07:23:08 GMT
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Compression (to .zip/.gz) using system/backticks
Message-Id: <3AF4FD11.2D53C8FD@earthlink.net>

Scott R. Godin wrote:
> 
> I have two text files (one .html, and one plain .txt) generated by a
> Perl CGI script I'm working on. I was considering having the script
> e-mail the files to the end-user, but two things occured to me.

Why email?  Why not archive them (using either zip or tar) into a single
file, and have your CGI return that archive.  If the content-type http
header is set appropriately, then the browser will run the appropriate
program.

> 
>     1> there is a (slight to moderate) possibility of it being used
> for abuse, in that the user could enter an incorrect (i.e. someone
> else's) mail address

Then don't use email.

> 
>     2> I'd most likely want to use Mail::Mailer or MIME::Lite to send
> the files as attachments, and in either case, it would require the
> host to install those perl modules. Since in 5 months, I haven't even
> been able to get them to update some stock modules that I *know* have
> bugs in them, and really *require* newer versions... this doesn't seem
> likely.

What are you using to update?  Are you just downloading with ftp or
whatever, or are you using the CPAN module for updating?
	http://www.perldoc.com/cpan/CPAN.html

In any case, only the machine on which the cgi script resides should
have to install them, so you should only have to do it once.

> SO, I'm considering calling out to the shell via system or backticks
> to have it compress the two files into a .zip archive.

Just cause you are having trouble getting those two modules?  Anyway,
even if you do get them, are you sure that they're what you need?

> Two things have me pausing.. one, I'm not 100% certain of the syntax
> needed to .zip (which I use more of between Mac and Windoze) a file
> instead of gzip (which I use more of between my Mac and my shell).

This is a non perl question, but the answer is pretty simple.
	zip newzipfile.zip file1 file2 file3

> I'm quite familiar with gzip from the shell but I've never used
> anything that created plain .zip files from unix, so I'm unsure which
> way to go.
> 
> Since I'm of certainty approaching 100% that the people using this
> script to generate the templates will be on Windows systems, (and some
> may be complete newbies) I'd rather give them a file that I'm
> confident can be "unzipped" with something like WinZip[1].
> 
> so it boils down to two questions
> A. (non-perl) does anyone offhand know whether WinZip can uncopress a
> unix .gz file?

Yup.  Further, if the .gz file is actually a .tar.gz file, it will even
ask if you want to open up the .tar after it's opened the .gz.

> B. (perl ! and thus saving my bacon :) would the proper command look
> something like
> 
>     `cat $file1 $file2 | gzip > ${fileprefix}.gz`
>         or die "$!";
> 
> .. or would it be $@ and not $! ?

As others have said, there are problems with this.  First, backticks
returns the text output of the command (what it printed), not a return
value.  Use system to get the command's return value.  Second, the
commandline you gave concatenates the two files, and compresses them,
with no way to seperate the two after decompression.  Third, you should
be examining $? (the program's exit code) as well as $! (which would be
for if system itself gave an error).

For example:
	if( system("zip", "$fileprefix.zip", $file1, $file2) ) {
		die "Couldn't run zip: $!" if $!;
		my ($sig,$val) = ($?>>8, $?&255);
		die "zip exited with value $val" if $val;
		die "zip exited with signal $sig" if $sig;
	}

> Obviously I know the unix part -- cat filea fileb |gzip >foo.gz --
> that's not what I'm asking in B, above. *cough* :-)
> 
> Or would you do it a different way?
> 
> Thanks for any tips.[2]

Sure.  Try to do it without having to write any files to disk, and you
will much improve response time.  Try to do as much as you can entirely
within perl, with a minimum of external programs called.  Using tar.gz
will get better compression than zip, most of the time.  There does
exist a module for creating tar files, purely within perl.  There does
exist a module for making .gz files without calling the gzip program (it
uses the zlib library instead).  When using system, you might prefer to
pass the arguments as a list ($program, $arg, $arg), rather than a
string "$program $arg $arg", especially if the $program path may contain
spaces.  Have the cgi return the .zip or .tar.gz (or .tgz, if you want)
directly to the user (with an appropriate content-type), rather than
sending it via email.  Use strict.  Use warnings.  Brush your teeth.

Hmm, I'm sure I can think of more tips, but I'm all out for now :)

> [1] I mean, ok, it's easy enough to just let them 'save as' from the
> browser preview of the files, but not all browsers preserve the
> original file name, plus as a .zip file it takes up less of our
> precious bandwidth while they download. I'd just like to idiot-proof
> the whole thing as much as possible, and prevent the users from even
> THINKING of re-naming the files "README.html" or "ReadMe.txt", as with
> 3700 files in the database, each 'readme.foo' overwrites the previous
> one as the end users unzip 'em. bleh. I'd love to prevent more of this
> insanity. :)

To really idiot-proof it, stick the things in a remote-mountable
directory, and leave them there, never to be explictly downloaded.

After all, if the content is static, why do you need CGI?

-- 
Shift to the left, shift to the right, mask in, mask out, BYTE, BYTE,
BYTE !!!


------------------------------

Date: Sun, 06 May 2001 09:50:55 +0200
From: Philip Newton <pne-news-20010506@newton.digitalspace.net>
Subject: Re: DOS Perl convering LF to CR/LF
Message-Id: <nc0afto9t3l3anfplol65hcbn5p9omd0kb@4ax.com>

On Sat, 05 May 2001 23:03:18 GMT, David <dbmartin5@home.com> wrote:

> I am reading a Unix file on a DOS machines which contains lines ending
> in a LF.  The program reads in the line, but when I reprint them to
> another file, the LF is now a CR/LF!

That's because you're writing the file in text mode, which automatically
translates LF to CRLF.

>  Is there a way to prevent perl from this conversion

Yes. Use binary mode (binmode HANDLE where HANDLE is the name of your output
filehandle).

> Also, what is the control char(s) for LF only, eg. like the "\n" for
> CRLF?

Wrong question. "\n", on most machines (not the Mac, for example) stands for
plain LF. However, on DOSish machines, LF gets autotranslated to CRLF on output,
and CRLF gets autotranslated to LF on input.

So if you read a file containing CRLFs, they'll appears as simple LFs ("\n") to
the program; if you write out that data, they'll become CRLFs again.

Use binmode() to prevent LF turning into CRLF on output, or to keep CRLF as CRLF
on input. (On such platforms, reading a file in text mode will probably also
cause you to get EOF at the first occurrence of "\cZ" or "\x1a", so if your
input contains such characters as part of normal data, you will probably also
need binmode() on your input filehandle.)

Cheers,
Philip
-- 
Philip Newton <nospam.newton@gmx.li>
That really is my address; no need to remove anything to reply.
If you're not part of the solution, you're part of the precipitate.


------------------------------

Date: Sun, 06 May 2001 11:34:54 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: DOS Perl convering LF to CR/LF
Message-Id: <kldaft8qmicvt81nlo3v852egnt470bgbj@4ax.com>

David wrote:

>I am reading a Unix file on a DOS machines which contains lines ending
>in a LF.  The program reads in the line, but when I reprint them to
>another file, the LF is now a CR/LF!  Is there a way to prevent perl
>from this conversion or do I need to cop off the end and print ending
>with a LF?

binmode() on the output handle.

>Also, what is the control char(s) for LF only, eg. like the "\n" for
>CRLF?

As you noticed, "\n" is chr(10), just as in Unix. So it's just one
character, not two. Conversion from CRLF to "\n" happens on input, and
from "\n" to CRLF happens on output. binmode() prevents this conversion.

-- 
	Bart.


------------------------------

Date: Sun, 6 May 2001 00:10:39 -0700
From: dave turner <dt@area.com>
Subject: Re: Good editor for perl
Message-Id: <MPG.155e98c6f91645dd989696@news.area.com>

In article <9cpnd202p78@enews3.newsguy.com>, rcranberry@hotmail.com 
says...

You might also look at DZ Soft's Perl editor.

www.dzsoft.com


> www.editplus.com
> 
> Excellent editor!
> 
> "Super-Simon" <simon@super-simon.com> wrote in message
> news:9c1csm$loh$1@news1.xs4all.nl...
> > Hi all,
> >
> > I'm searching for a good, fast editor with syntax highlighting for perl
> > (CGI) for use under Windows 2000 / Windows 98 (I use windowz only for
> > editing scripts, scripts runs on Linux-server). It has to be free (I'm a
> > poor student ;-)
> >
> > Grtz,
> >
> > Super-Simon
> >
> >
> 
> 
> 


------------------------------

Date: Sun, 6 May 2001 12:11:39 +0000 (UTC)
From: abigail@foad.org (Abigail)
Subject: Re: Help on optimization wanted
Message-Id: <slrn9fafrr.der.abigail@tsathoggua.rlyeh.net>

Michael Ströck (michael@stroeck.com) wrote on MMDCCCIV September MCMXCIII
in <URL:news:3af482be$1@e-post.inode.at>:
~~  Hi to all !
~~  
~~  One of our CS teachers at school told me to write a
~~  script that finds all primes from 1 - n.
~~  
~~  Writing that script was easy, but as I'm very new to
~~  Perl, I'd really appreciate any comments on how to
~~  make the following script run faster.
~~  
~~  What I've done so far:
~~  - I only divide by known primes.
~~  - I increase $number_to_test by 2 at each pass, so
~~  I don't iterate over even numbers (with starting at 1
~~  and pushing 2 onto the array by hand).
~~  - I tried to minimize the loop to the smallest number
~~  of instructions.
~~  
~~  Any comments on style and speed are highly appreciated.


There's no much pointing out micro optimizations if you are better
off scratching the entire program and implement a much better algorithm.
One based on sieves, for instance. If you are doing a CS program, you
should have math as well, and I would be surprised if sieves aren't
discussed there.



Abigail


------------------------------

Date: Sun, 06 May 2001 08:01:45 GMT
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Help with double hash of arrays
Message-Id: <3AF50632.5F366134@earthlink.net>

Tad McClellan wrote:
> 
> Jake Peters <jacobp@cubit.seas.upenn.edu> wrote:
> >
> >I am trying to build a double hash of arrays, such as the following:
> >
> >       $somehash{key1}{key2} = @somearray;
> 
>    perldoc perlreftut
> 
> >How can I actually get something where there is an array in the
> >double hash, so I can do the following:
> >
> >       $somehash{key1}{key2}[3] = 4;
> 
>    $somehash{key1}{key2} = \@somearray;      # take reference to array
> 
> or if you want a COPY instead:
> 
>    $somehash{key1}{key2} = [ @somearray ];    # make ref to anon array

or another way to make a copy:
	@$somehash{key1}{key2} = @somearray;

Or maybe:
	push @$somehash{key1}{key2} $_ foreach(@somearray);

Ok, now I'm being silly :) but just remember, TINOTW.

-- 
Shift to the left, shift to the right, mask in, mask out, BYTE, BYTE,
BYTE !!!


------------------------------

Date: Sun, 06 May 2001 09:07:12 GMT
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: How secure is this....
Message-Id: <3AF5157F.D3893E95@earthlink.net>

This is off-topic for comp.lang.perl.misc, so please send all followups
to sci.crypt.

Dodger wrote:
> "Super-Simon" <simon@super-simon.com> wrote in message
> news:9c4oiv$lmo$1@news1.xs4all.nl...
> > "Brett Foster" <onasc@remove.me@home.com> wrote in message
> > news:eLkF6.96228$61.20567500@news4.rdc1.on.home.com...
> > > "Super-Simon" <simon@super-simon.com> wrote in message
> > > news:9c2b12$sbn$1@news1.xs4all.nl...
> > > > Hi,
> > > >
> > > > In most security-scripts the following code is used for
> > > > encryption:
> > > >
> > > > print crypt($passwd,$salt);
> > > >
> > > > Is this safe, or at least difficult to crack, is there something
> > > > better???
> > >
> > > It is my information that once you crypt you can't go back. In
> > > other words, you cannot determin what the value of $passwd was
> > > before calling crypt.
> >
> > You mean it's a hashing-routine....
> 
> No, he means it's an encryption routine.

An encryption routine transforms a plaintext into a ciphertext with the
help of a key, such that only with the key can the ciphertext be made
back into the plaintext.

A secure hashing routine transforms an input into an output, such that
it is not possible to determine the input from just the output.

The function crypt is a kind of hashing routine.  The fact that its guts
were mostly stolen from an algorithm which is/was commonly used for
encryption is largely irrelevant.

Since crypt's input is limited to 8 characters (which can be easily be
brute-forced, due to it's smallness), I would recommend against using it
in writing a new application.  Instead, use a modern secure hashing
algorithm, such as SHA1.  You still need salt, but it gets inputed to
the hash in the same way as the key.

I would also recommend that you try to give some advice to your users
regarding the strength of their passwords.  Diceware is one good way to
generate a strong password.  Measuring the randomness of the string and
rejecting 'not random enough' ones is another way.

Here's a function to measure order-0 entropy (a simple measurement of
randomness):

# This [untested] function calculates
# H = Sum(i) { Pi * ( - log2(Pi) ) }}
# and returns H * length(string)
sub entropy {
	my $string = shift;
	my ($length, %letter) = length $string;
	for( -$length .. -1 ) {
		++$letter{substr($string, $_, 1)};
	}
	my ($x,$h) = ( ln($length), 0 );
	$h += $_ * ( $x - ln($_) ) for(values $letter);
	return $h / ln(2);
	# some divisions ommited here and there, but they all cancel
	# out, so it's ok.
}

If I've coded this right, the function returns the number of bits needed
for a non-adaptive order-0 arithmetic coder to encode the string, not
including the statistics information itself.  It's a half-decent
estimate on the strength of a password/passphrase.  If every letter
occurs exactly once within a string, then the entropy should measure to
be N log2 N, where N is the length of the string.  Some examples are:
	ab = 2 * 1 = 2
	aabb = 4 * 1 = 4
	abcd = 4 * 2 = 8
	aabbccdd = 8 * 2 = 16
	abcdefgh = 8 * 3 = 24
	aaaabbbbccccdddd = 16 * 2 = 32
	aabbccddeeffgghh = 16 * 3 = 48
	abcdefghijklmnop = 16 * 4 = 64

As you can see, the estimator gives smaller numbers than the maximum
amount of real entropy you can have in a string that particular length,
but it's better to be safe than sorry.  If you require the user to give
a password whose strength is at least 32 by this estimate, then it will
probably be strong enough for most purposes.  A strength 32 password is
going to be at least 11 letters.

-- 
Shift to the left, shift to the right, mask in, mask out, BYTE, BYTE,
BYTE !!!


------------------------------

Date: Sun, 06 May 2001 10:27:20 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: How to determine file type of a filehandle?
Message-Id: <9l9aft8csgn86dc7gr4mnl88r58km5h5tv@4ax.com>

David J. Marcus wrote:

>Given a file handle, $fh, how can I determine what kind of file the handle
>is associated with. In particular, the $fh can come from a socket, a disk
>file, console, etc.
>
>The environment is AS Perl 5.6.1, running on W'98 and W2K.

open the docs for perlfunc, and search for "-X". In particular, check
out -t, -S, -p and -f.

-- 
	Bart.


------------------------------

Date: Sun, 06 May 2001 12:06:54 +0100
From: HCCO admin <admin@nospam.m2n.co.uk>
Subject: Re: How to execute a perl script from a perl script? - nearly got it   right !
Message-Id: <2f7aftkot8of7m43rcf1d5d2nh6c1l6t63@4ax.com>

On Sun, 06 May 2001 03:22:06 GMT, "flash" <bop@mypad.com> wrote:

>I finally got home and re-read your message.
>
>you could do.
>
>system(perl /path/to/script ubb=whos_online_ssi');
>
>in the middle man cgi.
>

Ok, removed the trailing single quote in your text above and tried
that - internal server error.

Also tried 
system(/usr/bin/perl/perl /home/sites/site2/web/cgi-bin/ultimatebb.cgi
ubb=whos_online_ssi);
Internal server error.

Tried 
system ('/home/sites/site2/web/cgi-bin/ultimatebb.cgi'
'ubb=whos_online_ssi');
Internal server error.

Tried
system ('/home/sites/site2/web/cgi-bin/ultimatebb.cgi',
'ubb=whos_online_ssi'); 

'Perl' command not needed.  It runs the ultimatebb.cgi but ignores the
argument ubb=whos_online_ssi.

Without having to alter the apache server as suggested elsewhere, can
anyone suggest an alternative syntax for 'system' or 'exec' which WILL
accept the argument?

Thanks


------------------------------

Date: Sun, 06 May 2001 08:54:51 GMT
From: Jim Schaerer <jim1234@monetsgarden.net>
Subject: Match/replace of HTML text without loss of tag information
Message-Id: <1ip9ftc8na6leet1ip41113uan5ee33u2a@4ax.com>

Hello, all.

I'm attempting to write a Perl regular expression match/replace
statement to change text in a HTML document without compromising the
information in the HTML tags themselves.  I'm working with a small bit
of test code before I make a complete script (I figure this will save
debug/fix time). Thus far, I have not been successful.  My apologies
if I've missed something really simple.  Here is what I've tried so
far:

$HTML = "<b> b b b b </b>";
$HTML =~ s/>(.*)b(.*)</>$1test$2</g;
print $HTML;

The output of this was: <b> b b b test </b>

Next I tried inserting the previous statement into a while loop, to
see if that would be more helpful.

$HTML = "<b> b b b b </b>";
while($HTML =~ s/>(.*)b(.*)</>$1test$2</g) {
	$HTML =~ s/>(.*)b(.*)</>$1test$2</g;
}
print $HTML;

(I used the same statement in the while() to remove the chance of
typos, etc., which could serve to confuse myself.)

This time the results were correct: <b> test test test test </b>

 ... so, I made the match more complicated:

$HTML = "<b> b b b b </b><b> b b </b>";
(same code as above)

Result: <b> test test test test </test><test> test test </b>

It seems I have a case of greedy matching on my hands.  This is where
my confusion starts (unless it started earlier and I haven't realized
it yet. <g>)  I can't figure out a combination of ?'s that will curb
this matching.

$HTML = "<b> b b b b </b><b> b b </b>";
while($HTML =~ s/>(.*?)b(.*?)</>$1test$2</g) {
	$HTML =~ s/>(.*?)b(.*?)</>$1test$2</g;
}
print $HTML;

 ... results in the same as my last test: <b> test test test test
</test><test> test test </b>

So, I changed the position of the ?'s a bit:

$HTML = "<b> b b b b </b><b> b b </b>";
while($HTML =~ s/>?(.*)b(.*)<?/>$1test$2</g) {
	$HTML =~ s/>?(.*?)b(.*)<?/>$1test$2</g;
}
print $HTML;

This time I get: ><test> test test test test </test><test> test test
</test><<<<<<<<<<

So ... I am at a loss.  I've searched cpan and the perldocs, as well
as a general internet search, and haven't been able to turn anything
up that seems close to what I'm wanting.  If someone has any
suggestions on how I could accomplish what I'm trying to do (the way
I'm trying to do it, or a way better way I don't know about), or
pointers in the right direction, or names of books that might cover
this, or anything of the sort, I'd be very grateful.  Thank you all
very much for your time. :)

- Jim Schaerer


------------------------------

Date: Sun, 06 May 2001 21:26:39 +1000
From: James Freeman <jfreeman@tassie.net.au>
Subject: Re: Match/replace of HTML text without loss of tag information
Message-Id: <3AF534EF.56A8DE0A@tassie.net.au>

Jim Schaerer wrote:

> Hello, all.
>
> I'm attempting to write a Perl regular expression match/replace
> statement to change text in a HTML document without compromising the
> information in the HTML tags themselves.

Short answer look at some of the modules that do this knd of thing.
HTML::TokeParser is a good start. Also this type of request is common so
check out the usenet archive for answers. If you want to know why your
regexes do what they do read on.

> I'm working with a small bit
> of test code before I make a complete script (I figure this will save
> debug/fix time). Thus far, I have not been successful.  My apologies
> if I've missed something really simple.  Here is what I've tried so
> far:
>
> $HTML = "<b> b b b b </b>";
> $HTML =~ s/>(.*)b(.*)</>$1test$2</g;
> print $HTML;
>
> The output of this was: <b> b b b test </b>

Naturally, it was. The first (.*) eats up as much of the string as it can.
It gives back the last five characters 'b</b>' so that it can match the
'b(.*)<' part of the regex. At this stage $1 = 'b b b' and $2 = '' as null
is a perfectly good match for .*, in fact often the first it will search
for! It then continues on from this last b and finding no more matches
exits.

If your intent was to substitute all the b chars between the bold tokens
you need

$HTML = "<b> b b b b </b>";
1 while $HTML =~ s/^(<b>.*?)b(.*<\/b>)$/$1test$2/;
print $HTML;

This prints:

<b> test test test test </b>

Note we lock the pattern to the ends of the string with ^ and $. This is
not actually necessary in this case but when using .* you need to be
careful because sometimes it is quite happy to match nothing rather than
the everyting you might be expecting.

>
>
> Next I tried inserting the previous statement into a while loop, to
> see if that would be more helpful.
>
> $HTML = "<b> b b b b </b>";
> while($HTML =~ s/>(.*)b(.*)</>$1test$2</g) {
>         $HTML =~ s/>(.*)b(.*)</>$1test$2</g;
> }
> print $HTML;
>
> (I used the same statement in the while() to remove the chance of
> typos, etc., which could serve to confuse myself.)
>
> This time the results were correct: <b> test test test test </b>
>
> ... so, I made the match more complicated:
>
> $HTML = "<b> b b b b </b><b> b b </b>";
> (same code as above)
>
> Result: <b> test test test test </test><test> test test </b>
>
> It seems I have a case of greedy matching on my hands.

No, your regex allows matching of any b at all so long as that b exists
between '>'  and  '<'

For example using :

$HTML = '> B-b-b-erroca gives you back your b-b-bounce <';

gives

> B-test-test-erroca gives you testack your test-test-testounce <

A slight variation on my first regex fixes the problem. What we do is use
a negative lookahead assertion, to see it the character following the b
which we propose to change is a >. It will be for <b> and </b>. The
lookaround assertion is the (?!>) bit following the b. This reads match a
'b' provided the next char is not a '>'. We have lookahead, and look
behind assertions to insist a certain pattern is or is not present.

$HTML = "<b> b b b b </b><b> b b </b>";
1 while $HTML =~ s/^(<b>.*?)b(?!>)(.*<\/b>)$/$1test$2/;
  print $HTML;

this gives

<b> test test test test </b><b> test test </b>

> This is where
> my confusion starts (unless it started earlier and I haven't realized
> it yet. <g>)  I can't figure out a combination of ?'s that will curb
> this matching.

I'm afraid problem is with your logic and understanding of how regexes
work I strongly suggest getting a handle on look around assertions, they
are amazingly useful.

>
>
> $HTML = "<b> b b b b </b><b> b b </b>";
> while($HTML =~ s/>(.*?)b(.*?)</>$1test$2</g) {
>         $HTML =~ s/>(.*?)b(.*?)</>$1test$2</g;
> }
> print $HTML;
>
> ... results in the same as my last test: <b> test test test test
> </test><test> test test </b>
>
> So, I changed the position of the ?'s a bit:
>
> $HTML = "<b> b b b b </b><b> b b </b>";
> while($HTML =~ s/>?(.*)b(.*)<?/>$1test$2</g) {
>         $HTML =~ s/>?(.*?)b(.*)<?/>$1test$2</g;
> }
> print $HTML;
>
> This time I get: ><test> test test test test </test><test> test test
> </test><<<<<<<<<<
>
> So ... I am at a loss.  I've searched cpan and the perldocs, as well
> as a general internet search, and haven't been able to turn anything
> up that seems close to what I'm wanting.  If someone has any
> suggestions on how I could accomplish what I'm trying to do (the way
> I'm trying to do it, or a way better way I don't know about), or
> pointers in the right direction, or names of books that might cover
> this, or anything of the sort, I'd be very grateful.

#!/usr/bin/perl -w

use strict;

require HTML::TokeParser;
my $p = HTML::TokeParser->new("index.html") || die "Can't open: $!";
 while (my $token = $p->get_token) {
     #...
 }

Cheers

James

> Thank you all
> very much for your time. :)
>
> - Jim Schaerer



------------------------------

Date: Sun, 06 May 2001 12:57:53 +0100
From: Dave Cross <dave@dave.org.uk>
Subject: Re: Parsing
Message-Id: <vreaftcfncemeoa2kpko8r4e16csi57lpa@4ax.com>

On 03 May 2001 21:16:58 GMT, barryallwood@aol.com (Barry Allwood)
wrote:

>Hey,
>
>Ive Created A Program It works fine but the parsing has stopped working, The
>form has a hidden value called "Action" which when the form is submitted it
>will process the request for the subroutine (look at the top) but I always
>get a blank page for some reason, Im completely stumped and so is my Friend
>who Op's Undernet's Perl
>
>Here are the errors
>
>Name "main::action" used only once: possible typo at scenemail.cgi line 213.
>Name "main::USERS" used only once: possible typo at scenemail.cgi line 55.
>Name "main::version" used only once: possible typo at scenemail.cgi line 90.
>Content-type: text/html
>
>Use of uninitialized value at scenemail.cgi line 21.
>
>Here's the code:-
>
>#!/usr/bin/perl -w
>
>require "var_settings.cgi";
>require "var_users.cgi";
>
>$thiscgi = "SceneMail.cgi";
>$Action = $FORM{'ACTION'};
>$username = $FORM{'USERNAME'};
>$pass = $FORM{'PASSWORD'};
>
>print "Content-type: text/html\n\n";
>
>if ( &readparse ) {
>if ($Action eq "admin") {&AdminMenu
>} else {&DoStart}
>}
>
>#----------Form Parsing Start----------#
>sub readparse {
>read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>@pairs = split(/&/, $buffer);
>foreach $pair (@pairs) {
>($name, $value) = split(/=/, $pair);
>$value =~ tr/+/ /;
>$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $FORM{$name} =
>$value;
>}
>}
>
>
>#----------Form Parsing End----------#

[snipped non-essential parts of the program]

You're trying to access parts of the %FORM hash before you've
populated it. It is populated in the &readparse subroutine. You should
move the call to this subroutine earlier in the program.

Your &readparse routine is very buggy. You'd be better off using the
&param function from CGI.pm.

hth,

Dave...

-- 
<http://www.dave.org.uk>  SMS: sms@dave.org.uk
<http://www.manning.com/cross/>


------------------------------

Date: Sun, 6 May 2001 20:52:54 +1000
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: PERL Code Generator
Message-Id: <slrn9fab86.v5e.mgjv@martien.heliotrope.home>

[Post left in silly order]

On Fri, 4 May 2001 14:59:27 +1000,
	Tom Beer <tom.beer@btfinancialgroup.spamfilter.com> wrote:
> What about me?  Going to kill-file me for replying to your post?

No. For repeating offense number 1.

[snip]

> "Randal L. Schwartz" <merlyn@stonehenge.com> wrote in message
> news:m1vgnmdh7c.fsf@halfdome.holdit.com...
>> >>>>> "Todd" == Todd Smith <todd@designsouth.net> writes:
>>
>> Todd> why would you blame a company for something they didn't do, and
> probably
>> Todd> don't know about?
>>
>> Strike 1 - top posting

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | 
Commercial Dynamics Pty. Ltd.   | What's another word for Thesaurus?
NSW, Australia                  | 


------------------------------

Date: Sun, 06 May 2001 22:32:00 +1000
From: James Freeman <jfreeman@tassie.net.au>
Subject: Re: Poetry::Aum version 0 released
Message-Id: <3AF54440.DCE7ECB7@tassie.net.au>

Lee wrote:

> package Poetry::Aum;
> use strict;
> use warnings;
> our $VERSION = 0;
>
> sub new {
>  my $self = new Abstract::Entity(rand);
>  my $ego  = new Abstract::Concept;
>  my $goal = new Abstract::Interest;
>  our $philosophy;
>
>  DARK_NIGHT:
>  while (defined $goal){
>   $philosophy = <STDIN>;
>   study $philosophy;
>   if ($ego->understands($philosophy)){
>    undef $goal; # implies last DARK_NIGHT;
>   }
>  }
>
>  $self->{$philosophy} = localtime;
>  bless $self; # Intentionally classless
>  return;   # Does this hand over my $self?
> }
>
> 1;

Nice

James




------------------------------

Date: Sun, 6 May 2001 20:58:33 +1000
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: re-sizing GIF images on the fly
Message-Id: <slrn9fabip.v5e.mgjv@martien.heliotrope.home>

On Sat, 05 May 2001 09:11:09 +1000,
	George Bailey <georgebailey@my-deja.com> wrote:
> I keep wanting to say "you know what's a good program for resizing 
> images? Photoshop, that's what."
[snip]
> Apologies, but it will be a much much better tool for the job than 
> anything a Perl script can do.

Nonsense. Photoshop is a good tool. So's the GIMP. neither are very good
at automated tasks. A program with a well-written library behind it is
much better at this. ImageMagick is much better suited for automated
tasks.

Why would I even want to consider firing up a piece of bloatware like
photoshop to resize an image? Why would I even consider installing an OS
that it runs on? Why would I pay a few thousand dollars worth of
investment in hardware and software costs, just so I can resize an
image?

Get real.

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | The gene pool could use a little
Commercial Dynamics Pty. Ltd.   | chlorine.
NSW, Australia                  | 


------------------------------

Date: Sun, 06 May 2001 22:44:13 +1000
From: James Freeman <jfreeman@tassie.net.au>
Subject: Re: Recursing a directory tree
Message-Id: <3AF5471C.7BFDE4BB@tassie.net.au>

Bart Lateur wrote:

> Jfreeman wrote:
>
> >"Randal L. Schwartz" wrote:
> >
> >> >>>>> "Jfreeman" == Jfreeman  <jfreeman@tassie.net.au> writes:
> >>
> >> >> next if m/^\.{1,2}$/; # skip the dot files
> >>
> >> Jfreeman> So that you only skip the . and .. files fine.
> >>
> >> Actually, that's wrong because it matches "..\n" and ".\n", perfectly
> >> legal names, and a great way to skip by this code if it would be an
> >> advantage to do so.
> >
> >Correct as always!
>
> > next if $_ eq '.' or $_ eq '..'; # skip the dot files (only . and .. !!!!)
>
> Have you ever heard of the \z zero-width assertion?
>
>         next if /^\.\.?\z/;

Not in action. A new fact every day. Thanks

James

>
>
> --
>         Bart.



------------------------------

Date: 6 May 2001 11:55:04 GMT
From: ebohlman@omsdev.com (Eric Bohlman)
Subject: Re: regular expression
Message-Id: <9d3e2o$ki1$1@bob.news.rcn.net>

Benjamin Goldberg <goldbb2@earthlink.net> wrote:
> With only one capture, this could also have been done
> 	m[(pattern here)$]; $Rate = $1;
> though it is probably silly to do so.

Not just silly, but dangerous.  If for some reason the pattern fails to 
match, $Rate will be set to whatever the last successful match had been.  
Not correct, and not easy to debug either.



------------------------------

Date: Sun, 06 May 2001 20:40:11 +1000
From: James Freeman <jfreeman@tassie.net.au>
Subject: Re: Test for integer?
Message-Id: <3AF52A0B.9265D8F4@tassie.net.au>



Rob wrote:

> Howdy!
>
> I came up with the same idea you are using here, but I was wondering if
> there was an inbuilt function to do it... anyway, I have a couple of
> questions about your code if you don't mind..
>
> > my $do_this_sub;
>
> > $do_this_sub = \&is_integer1;
> > &tests;
>
> What exactly are you doing here?
> I can see:
>  - creating a scalar called $do_this_sub and assigning it with a reference
> to a subroutine. (Assigning it the physical address of the subroutine?)
>  - calling the tests subroutine
>
> > sub tests {
> >     &$do_this_sub('2');
> >     &$do_this_sub('+2');
> >     &$do_this_sub('-2');
> >     &$do_this_sub('42');
> >     &$do_this_sub(' + 2 ');
> >     &$do_this_sub('  2000');
> >     &$do_this_sub('2,000');
> >     &$do_this_sub('+2.0');
> >     &$do_this_sub('2000.00');
> >     &$do_this_sub('2,000,000.00');
> >     &$do_this_sub(',2000.');
> >     &$do_this_sub('20  00');
> >     &$do_this_sub('foo');
> >     &$do_this_sub('foo+2');
> >     &$do_this_sub('255.255.255.0');
> > }
>
> The tests subroutine calls whichever subroutine is stored in $do_this_sub,
> sending it the listed parameter.
>
> Is that right?

You are correct. This was just to show you one use of references, just for the
hell of it really! Of course I could have defined a data set and then called
each sub sequentially with each data element but that would be boring.

What happens is that a reference to each sub is stored sequentially in the
effectively global $do_this_sub. Strictly we could.should pass this reference
to our sub each time but in a short script using a global is fine.

When the tests sub is called, we make perl dereference the reference to the
sub using the '&$do_this_sub' syntax. As a result we execute whatever sub is
pointed to by the reference in '$do_this_sub' passing it the argument in
parentheses as usual. Thus:

 $do_this_sub = \&is_integer1;
&$do_this_sub('2');

is exactly the same as calling:

&is_integer1('2');

or

is_integer1('2') # the & is optional

Another way we can dereference or subroutine reference  is using the -> arrow
operator.

$do_this_sub = \&is_integer1;
$do_this_sub->('2');

This has the same result. Note the identifying & which lets you explicitly
know that we are defererencing a sub is no longer required, in fact it will
cause a syntax error.

Cheers

James



>
>
> Thanks
> Rob



------------------------------

Date: Sun, 06 May 2001 07:54:08 GMT
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Where is my script
Message-Id: <3AF50469.9EEF26BD@earthlink.net>

Rudolf Polzer wrote:
> BTW: is it possible to get the filename from a handle? Then one could
> perhaps examine __DATA__.

If the handle is to an AF_UNIX socket, then getsockname will get you the
filename, I think.  However, it won't work in the general case.

And for that matter, if it works, it will only get the filename the
socket was created with.

If you create a file (or pipe, or whatever), and then make a link (not a
symbolic link, a hard link, another name in the file system pointing to
the same inode), and then try to get the file name from a handle, which
one should it return?

-- 
Shift to the left, shift to the right, mask in, mask out, BYTE, BYTE,
BYTE !!!


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 842
**************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[18674] in Perl-Users-Digest

Perl-Users Digest, Issue: 842 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sun May 6 09:07:32 2001

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun May 6 09:07:32 2001