[22678] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4899 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Apr 27 03:05:44 2003

Date: Sun, 27 Apr 2003 00:05:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 27 Apr 2003     Volume: 10 Number: 4899

Today's topics:
        anyone get nonblock io with 5.8 and win32? <smackdab1@hotmail.com>
    Re: Authentication with Unix username and password <bwalton@rochester.rr.com>
        compiling c pgms as part of perl mods for windows -- ne (R Solberg)
    Re: compiling c pgms as part of perl mods for windows - <kalinabears@hdc.com.au>
    Re: English --> Orkish text filter <goldbb2@earthlink.net>
        hashes in scalar context? <nobody@dev.null>
    Re: hashes in scalar context? <REMOVEsdnCAPS@comcast.net>
    Re: How to send and receive on IP PORT? <spam@thecouch.homeip.net>
    Re: How to send and receive on IP PORT? <Juha.Laiho@iki.fi>
    Re: Insecure Filehandle Dependencies <poncewattle@comcast.net>
    Re: Just curious about this- are REGEXes rigorously det <tassilo.parseval@rwth-aachen.de>
    Re: Just curous about this- are REGEXes rigorously dete <REMOVEsdnCAPS@comcast.net>
    Re: regex for word whitespace word <bwalton@rochester.rr.com>
    Re: regex for word whitespace word <REMOVEsdnCAPS@comcast.net>
    Re: regex for word whitespace word <johngros@bigpond.net.au>
    Re: regex for word whitespace word <bwalton@rochester.rr.com>
    Re: Regex greediness question (Kevin Shay)
        uploading photos in e-classified <chris_12003@yahoo.com>
    Re: uploading photos in e-classified <REMOVEsdnCAPS@comcast.net>
    Re: Won't let me use $[ !!! <goldbb2@earthlink.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 26 Apr 2003 23:24:17 -0700
From: "smackdab" <smackdab1@hotmail.com>
Subject: anyone get nonblock io with 5.8 and win32?
Message-Id: <IEKqa.5298$oI2.3368@fed1read01>

There was the helpful code to get this to work on 5.6.x,
which worked great, but I couldn't get it to work on 5.8.
(main part was this: "ioctl($self, 0x8004667e, $nonblocking);")

On 5.8 not working, there was one post directly on this, indicating an undef
return instead of 1/0 for the $sock->blocking(0) call
(Title was "Windows and nonblocking IO"), this was back in Nov 2002,
anyone else gotten around this or know of a workaround???

thanks!!!




------------------------------

Date: Sun, 27 Apr 2003 02:50:04 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: Authentication with Unix username and password
Message-Id: <3EAB4541.1080204@rochester.rr.com>

SlimClity wrote:

> Is it possible to use the BSD username and password as authentication
> method?
> 
> I've tried to use the encrypt command and verify this with
> /etc/master.passwd but the encrypted string changes while using the
> same password.
> 

Check out:

    perldoc -f crypt

or, for an all-Perl implementation:

    use Crypt::UnixCrypt;

Some Unix systems have moved on to MD5 passwords or other schemes.  Look 
in CPAN for appropriate modules if that is the case with your system.

You mention an "encrypt" function.  Is that part of some module?
-- 
Bob Walton



------------------------------

Date: 26 Apr 2003 19:34:16 -0700
From: flateyjarbok@yahoo.com (R Solberg)
Subject: compiling c pgms as part of perl mods for windows -- newbie
Message-Id: <386cc483.0304261834.5a75eeab@posting.google.com>

I am trying to install Math::CDF and there are 2 .c pgms I need to
compile.
I am getting an error message from my c compiler (borland command line
compiler).  I know that these pgms are being called by the perl so
there is no "main", but can anyone tell me how to compile these as
required for the perl calls?

The compiler call 
bcc32 ipmpar.c

gives the message

Error:  Unresolved external '_main' referenced from C:\Borland\.....


------------------------------

Date: Sun, 27 Apr 2003 16:31:02 +1000
From: "Sisyphus" <kalinabears@hdc.com.au>
Subject: Re: compiling c pgms as part of perl mods for windows -- newbie
Message-Id: <3eab7a20$0$19914@echo-01.iinet.net.au>


"R Solberg" <flateyjarbok@yahoo.com> wrote in message
news:386cc483.0304261834.5a75eeab@posting.google.com...
> I am trying to install Math::CDF and there are 2 .c pgms I need to
> compile.
> I am getting an error message from my c compiler (borland command line
> compiler).  I know that these pgms are being called by the perl so
> there is no "main", but can anyone tell me how to compile these as
> required for the perl calls?
>
> The compiler call
> bcc32 ipmpar.c
>
> gives the message
>
> Error:  Unresolved external '_main' referenced from C:\Borland\.....

Hi,

You would normally build this module by running 'perl Makefile.PL', 'make
test' and 'make install'.
Sounds to me that you're not following that procedure - apologies if I'm
wrong :-)

Check the output of 'perl -V:make'. You'll want to use whatever it reports
instead of "make".

Hth.

Cheers,
Rob




------------------------------

Date: Sun, 27 Apr 2003 02:56:39 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: English --> Orkish text filter
Message-Id: <3EAB7F27.D770D030@earthlink.net>

Arduin wrote:
> 
> Free text filter, written using Perl, that converts english to
> "orkish":
> 
>   http://hiddenway.tripod.com/waaagh/orklib.pm.txt
> 
> Nothing particularly clever about the source -- it's merely for
> amusement.
> (Similar to the old Swedish Chef and Elmer Fudd filters.)
> 
> Note: Too many access attempts to this site might exceed the download
> limit, so please be gentle. ;-)

Lots of the replacements are like

  s/([^aeiou])es$/$1z/;
  s/([^aeiou])ess$/$1ez/;

Where they should be like:

  s/([^aeiou])es$/${1}z/;
  s/([^aeiou])ess$/${1}ez/;

If you'd had "use strict", you'd have caught this.

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$@[$a%6
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}


------------------------------

Date: Sun, 27 Apr 2003 02:05:52 GMT
From: Andras Malatinszky <nobody@dev.null>
Subject: hashes in scalar context?
Message-Id: <3EAB3ACC.4000308@dev.null>

The perldata man page tells me this:

"If you evaluate a hash in scalar context, it returns false if the hash 
is empty. If there are any key/value pairs, it returns true; more 
precisely, the value returned is a string consisting of the number of 
used buckets and the number of allocated buckets, separated by a slash. 
This is pretty much useful only to find out whether Perl's internal 
hashing algorithm is performing poorly on your data set. For example, 
you stick 10,000 things in a hash, but evaluating %HASH in scalar 
context reveals "1/16", which means only one out of sixteen buckets has 
been touched, and presumably contains all 10,000 of your items. This 
isn't supposed to happen."

Is there a realistic situation when evaluating a hash in scalar context 
may be useful? Anyone ever used this feature?

On a slightly different track: what are these "buckets"? Are they 
documented somewhere? Is it useful to know about them?



------------------------------

Date: Sat, 26 Apr 2003 21:42:07 -0500
From: "Eric J. Roode" <REMOVEsdnCAPS@comcast.net>
Subject: Re: hashes in scalar context?
Message-Id: <Xns9369E6ED5600Bsdn.comcast@216.166.71.239>

-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1

Andras Malatinszky <nobody@dev.null> wrote in
news:3EAB3ACC.4000308@dev.null:

> Is there a realistic situation when evaluating a hash in scalar
context 
> may be useful? Anyone ever used this feature?

I've never seen or heard of a real-world use for it.  I've never used
it.


> On a slightly different track: what are these "buckets"? Are they 
> documented somewhere? Is it useful to know about them?

In order for a hash to operate efficiently, a "hashing function" is
applied to the key to produce an internally-used number which is used
as a sort of index into the hash's data area in order to find your
key's stored value.  That's not specific to Perl; that's longstanding
computer science theory.

Ideally, a hash function will distribute input keys fairly evenly and
randomly across however many buckets there are allocated in the hash.
 You see, the hash function makes lookups faster, but there is the
possibility (and in most real-world hashes, the near-certainty) that
the hash function will return the same number for two or more input
keys.  This is called a hash collision.  I don't know what perl does
internally when that happens; generally what programs do is to branch
off sideways with a linked-list or other data structure -- less
efficient to search, which is why you want to minimize the number of
collisions.

Perl of course has the added complication that hashes will
automagically grow as more key/value pairs are added.  I don't know
how this happens internally, but at any given time, there are a
certain number of hash buckets into which the hash function maps
keys.

If you add a key/value pair to an empty hash, then one bucket is
filled (out of whatever the initial number is, 8 I think).  If you
add a second key/value pair, then either the hash function (performed
on the new key) puts the new data into the same bucket (inefficient,
but hey it happens), or it puts the new data into a different bucket.
 So scalar(%this_hash) will either return "1/8" or "2/8".

- -- 
Eric
print scalar reverse sort qw p ekca lre reh 
ts uJ p, $/.r, map $_.$", qw e p h tona e;
-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32) - WinPT 0.5.13

iD8DBQE+q0N0Y96i4h5M0egRAgiBAKCDu7XxdvptjFMWBb/rnl0ovIGFjACgxvUx
KAJiMivopr2YZgHBsfg1/2U=
=UofA
-----END PGP SIGNATURE-----


------------------------------

Date: Sun, 27 Apr 2003 01:17:41 -0400
From: Mina Naguib <spam@thecouch.homeip.net>
Subject: Re: How to send and receive on IP PORT?
Message-Id: <RJJqa.29162$U01.435093@weber.videotron.net>

-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1

Brad Walton wrote:
> Thank you Mina, I have been searching all day, looking at Socket() and
> IO::Socket::INET->new(), trying to find a fairly simple solution. Let me go
> into a little more detail, and see if you have any ideas for an easier
> solution, or may an already existing script.

For a beginner in a simple project such as the one you've outlined, 
IO::Socket::INET is probably the easiest way to go.

> 
> I have a program sitting on a server that already has the data formatted and
> ready to send.
> On the other machine, I simply need to sit there and listen for information
> coming from the server. Once that info is received, it grabs it, and puts it
> in an array or some sort of results param.
> 
> It should be real basic, and dumb... meaning neither side should care if the
> other side is there. I guess UDP is the way to go for this. Does that bring
> any other ideas to mind?

UDP does not guarantee delivery, or even sequence for that matter. 
Think of it like dropping a bunch of letters in the mailbox. That is it, 
end of story.

TCP on the other hand is like an important package with sign-on-receive 
verification.

UDP is usually used in scenarios where some data loss or mangling is 
acceptable, such as online gaming, multimedia streaming, etc.  TCP is 
used for almost everything else that requires reliability.

(That is slightly a white lie. You *could* make UDP reliable by layering 
over it most of the features that TCP already does for you).

And so, look over the perlipc page and check the IO::Socket::INET 
examples, it should be a simple copy-and-paste for your project.

Best of luck.


-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+q2f4eS99pGMif6wRAmrlAKC+3A3StIF71QsdRcU2DGyXiSm6ZACgtDNk
0sFdmSOJxISdWjaajQp/Bwg=
=X/yK
-----END PGP SIGNATURE-----



------------------------------

Date: Sun, 27 Apr 2003 06:02:00 GMT
From: Juha Laiho <Juha.Laiho@iki.fi>
Subject: Re: How to send and receive on IP PORT?
Message-Id: <b8frhq$h36$2@ichaos.ichaos-int>

"Brad Walton" <sammie@greatergreen.com> said:
[Please do not top post]
>"Mina Naguib" <spam@thecouch.homeip.net> wrote in message
>news:IUlqa.14305$_w.273717@wagner.videotron.net...
>>
>> Brad Walton wrote:
>> > I am looking for information on how to send information and
>> > receive (listen) for information on a port. For example, I want
>> > to have a perl program running on one PC, while another sits on a
>> > remote machine and listens for incoming data on a specified port.
>> > What would this process be called? And are their any examples or
>> > tutorials of how this is accomplished?
>>
>> This is simple IP traffic (Internet Protocol).  As long as both machines
>> are on the same IP network (for example, the internet) then you can
>> easily make them talk to each other.
>>
>> Start your quest by lots of reading. Here are some resources to get you
>> started:
>> http://www.perldoc.com/perl5.8.0/pod/perlipc.html
>> http://search.cpan.org/author/JHI/perl-5.8.0/ext/IO/lib/IO/Socket/INET.pm
>> http://search.cpan.org/author/JHI/perl-5.8.0/ext/IO/lib/IO/Socket.pm
>> http://search.cpan.org/author/JHI/perl-5.8.0/ext/Socket/Socket.pm
>> http://search.cpan.org/author/JWIED/Net-Daemon-0.37/lib/Net/Daemon.pm
>>
>> And my own:
>> http://search.cpan.org/author/MNAGUIB/EasyTCP-0.19/EasyTCP.pm
>
>Thank you Mina, I have been searching all day, looking at Socket() and
>IO::Socket::INET->new(), trying to find a fairly simple solution. Let
>me go into a little more detail, and see if you have any ideas for an
>easier solution, or may an already existing script.

The first of the references has sample programs for both client and
server end using TCP. By the way, the document is part of the Perl
distribution, so it probably is already installed on the machine
where you're running Perl.

>It should be real basic, and dumb... meaning neither side should care
>if the other side is there. I guess UDP is the way to go for this. Does
>that bring any other ideas to mind?

UDP doesn't guarantee delivery, data integrity, or order of received
data (in case you send multiple packets wihtin a short interval). The
sender will not get any notification whether the data actually was
received. The recipient will have no guarantee that the data is complete
and not corrupt on reception - and if it has any meaning, the recipient
cannot be sure that the packets are received in the order they were
sent. So, if any of these considerations are of concern for you (and
you decide to use UDP), then you'll need to code the corresponding
logic into your own program.

So, in short, go with TCP. Esp. as you have the code already made for
you and presented in the Perl base documentation. Of course, you might
use some other way to handle errors than just having your program
die each time something unexpected happens - but at least even those
examples show where the possible points for problems are.
-- 
Wolf  a.k.a.  Juha Laiho     Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
         PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)


------------------------------

Date: Sat, 26 Apr 2003 23:14:08 -0500
From: Nigel Poncewattle <poncewattle@comcast.net>
Subject: Re: Insecure Filehandle Dependencies
Message-Id: <Xns936A265CB93poncewattle@206.127.4.11>

"Alan J. Flavell" <flavell@mail.cern.ch> wrote in
news:Pine.LNX.4.53.0302121710540.7803@lxplus073.cern.ch: 

> We had a collegial discussion of some relevant issues, and I thought
> we reached a reasonable agreement about the issues discussed, and
> might have helped some lurkers as a result (one can always hope).

For the record, you just helped this lurker. Thanks! (ducks back to
lurking mode, and also for being thankful for giganews' multi-month
retention on text newsgroups! :) 



------------------------------

Date: 27 Apr 2003 06:57:35 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: Just curious about this- are REGEXes rigorously deterministic?
Message-Id: <b8fv0v$aj9$1@nets3.rz.RWTH-Aachen.DE>

Also sprach Sara:

> "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de> wrote in message news:<b8euch$r0v$1@nets3.rz.RWTH-Aachen.DE>...
>> Also sprach Andras Malatinszky:

>> > my $number=666;
>> > $number=~s/6/int(rand(10))/ge;
>> > print $number;
>> 
>> The regex part is still deterministic here. Remember that in s/// only
>> the left part is a regex. And the pattern is just '6' here. The
>> replacement part is a string (optionally taken to be evaluated).
>> 
>> Perl regexes get indeterministic once you use some of the more advanced
>> features. "(??{ code })" or "(?(condition)yes-pattern|no-pattern)" come
>> to mind. Naturally, in such a case no one would expect them to be
>> deterministic either.

> Tassilo, I guess you mean that given an input string, and a regex,
> there are other EXTERNAL states that affect the outcome, such as in
> your example, (condition)? Yes I can see that. OK. Good point.
> 
> Let's say only the input string and the engine itself can affect the
> outcome. I realize that's a subset of the domain of regexes, but let's
> look at that since you make a good point that the domain of
> *everything outside the regex* is like saying "is Perl
> deterministic?". Perhaps an intersting question as well but not what
> I'm after.

I wasn't so much thinking of that. I had a case such as this in mind:

    s/(??{ int rand 10 })//;

As I said, this is an obvious case designed to be indeterminstic.

> Are we all using the "THEORY of regexes* or *the LAW of regexes*?
> 
> Now in my little mind, I think of it THIS WAY. The regex ENGINE is a
> state-machine, like a Turing machine, but ever-so-much more complex.
> Each time the crank turns the state changes. Can it be proven that (1)
> each turn of the crank in state A inalterably produces state B, and
> (2) each implementation begins in the same state? I guess if those
> proposals could be proven then they are a LAW. Otherwise not.

Some more details are required what your concept of 'deterministic' is.
Should the outcome always be the same across different computer
platforms or should it only be identical in something like:

    for (0 .. 1_000_000_000) {
        $string =~ /$pattern/;
    }
     
In either cases, a formal proof will be hard. Anyone wants to applie the
Hoare Calculus to Perl's regex-engine? ;-) I guess not.

> You gentlemen already came up with some very interesting ways that (1)
> and (2) are not satisfied, and I'm inclined to think that we're using
> the *theory of regexes* in our daily work. Surely a model that will
> satisfy 99.999% and more of our challenges we throw at it, but much
> like the monkey typing, how long does he have to type before he writes
> War & Peace? How many inputs do we have to throw at a simple regex to
> break it?

If you are lucky, you can proove that the whole matter is
indeterministic. Just write a program that does a match over and over
again. If the outcome is suddenly different from the previous run you
have proven that it's indeterministic. Of course, if your program runs
and runs you are out of luck since then you have hit the "Halting
Problem": you can't tell whether it's ever going to stop or rather 
running forever.

> I suppose in a sense this is all moot since as any seasoned programmer
> knows, even arithmetic operations are at best approximations. How many
> of us have been bit by the machine thinking 1 + 2 = 2.99999999999? I
> know I have.

However, getting unexpected output does not mean it's not deterministic.
This is only the case if you get two different approximations for two
different runs. Shouldn't happen unless your CPU is indeterministic (for
the above floating point issue at least).

> Anyhow- guess I just felt like digging a little but past the practical
> aspects of regexes and into the theory side. As always, some
> interesting views pop up immediately here.

Indeed. So provided there is a proof that Perl is deterministic. I
wonder whether this would mean anything. Such a proof would have to
occur under sanitized circumstances. For instance, the same program may
behave differently (and very probably will), if you have 120meg of free
RAM left once and only 0.5meg the other time. This is out of perl's
hands: if a call to malloc(3) (or even brk(2)) fails it might not be
the fault of the Perl interpreter. Perl tries to link tightly into the
operating system it runs on therefore more considerations could need to
be taken into account than for instance with Java. But conceptionally
the same problems occur there as well.

> Did anyone else ever notice how much better the discussions are on
> CLPM than other groups :)

I don't know; I don't follow other language groups very closely. But
glad to hear they are. ;-)

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


------------------------------

Date: Sat, 26 Apr 2003 21:23:00 -0500
From: "Eric J. Roode" <REMOVEsdnCAPS@comcast.net>
Subject: Re: Just curous about this- are REGEXes rigorously deterministic
Message-Id: <Xns9369E3AF096B1sdn.comcast@216.166.71.239>

-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1

roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson) wrote in
news:b8eitk$j2d$1@canopus.cc.umanitoba.ca:

> The reason this can happen for perlre is that perlre can include
> function calls, and Perl does not specify the order of function
> call processing if one has multiple function calls in the same
> statement. As I recall, Perl also does not completely specify the
> order of operations of argument evaluation.

Yes it does, and yes it does.  Perhaps you're thinking of C.

- -- 
Eric
print scalar reverse sort qw p ekca lre reh 
ts uJ p, $/.r, map $_.$", qw e p h tona e;
-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32) - WinPT 0.5.13

iD8DBQE+qz78Y96i4h5M0egRAj4MAKDsz5+zQ8fjPOYSkBN0MlcRDDSD4gCg+WPF
v4+mVnpShwMfXZ+VLv1QRM0=
=kV4a
-----END PGP SIGNATURE-----


------------------------------

Date: Sun, 27 Apr 2003 02:17:27 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: regex for word whitespace word
Message-Id: <3EAB3D9B.6090907@rochester.rr.com>

John Gros wrote:

> I have been trying to get a regex to pick up both single words and two word
> descriptions of an a href tag but it refuses to pickup two word
> descriptions. I have been going over perlre looking for ways to do this and
> have tried many variations. I often believe I have a regex that logically
> should work yet while $1 is set $2 is not. $1 being the link and $2 being
> the description, it does get set for a single word description but not set
> for two word descriptions.
> 
> Regexs I have tried.
> /([A-Z]{2}.HTM)(>.*?<)/;
> 
> /([A-Z]{2}.HTM)>(.*?)</;
> 
> /([A-Z]{2}.HTM)>(\w*\s\w*)</;
> 
> /([A-Z]{2}.HTM)>(\w*\s?\w{0,20})</;
> 
> /([A-Z]{2}.HTM)>(\w*.?\w{0,20})</;
> 
> /([A-Z]{2}.HTM)>(\w*.+\w+)</;
> 
> /([A-Z]{2}.HTM)>(\w*.+)</;
> 
> /([A-Z]{2}.HTM)>(\w*.?\w+)</;
> 

Hmmmm...a little test program with your eight regexes shows they all 
seem to behave like you want, with the exception of the first, which 
includes spurious > and < characters in $2:

$_='blah blah <a href=XXAB.HTM>blah1 blah2</a> blah blah blah';
print "1:1=$1,2=$2\n" if /([A-Z]{2}.HTM)(>.*?<)/;
print "2:1=$1,2=$2\n" if /([A-Z]{2}.HTM)>(.*?)</;
print "3:1=$1,2=$2\n" if /([A-Z]{2}.HTM)>(\w*\s\w*)</;
print "4:1=$1,2=$2\n" if /([A-Z]{2}.HTM)>(\w*\s?\w{0,20})</;
print "5:1=$1,2=$2\n" if /([A-Z]{2}.HTM)>(\w*.?\w{0,20})</;
print "6:1=$1,2=$2\n" if /([A-Z]{2}.HTM)>(\w*.+\w+)</;
print "7:1=$1,2=$2\n" if /([A-Z]{2}.HTM)>(\w*.+)</;
print "8:1=$1,2=$2\n" if /([A-Z]{2}.HTM)>(\w*.?\w+)</;

Could you show us a couple of examples of what it is you are trying to 
match?  I assume your data is HTML.  Typical href "tags" (properly 
called attributes, I believe) contain " characters around the href, as in:

    <a href="http://www.perl.com/CPAN-local/README.html">Click here for 
CPAN</a>

Such an href would not be matched by any of your regexes.  Also note 
that href's tend to be case insensitive (perhaps depending upon the 
server), so you might miss those that are lowercase.  Use the /i switch 
to make your entire regex case insensitive.  Also note that your regexes 
might also grab things which aren't href's, like perhaps just plain text 
like:

    AAZHTM>xxx<

Note that your unescaped . is a regex metacharacter which will match any 
character.

Also, you mentioned that $1 gets set by your regex and $2 is not set by 
it.  By that, do you mean that $2 retained its previous value?  If the 
regex has two sets of parens and it successfully matches, that shouldn't 
happen -- $2 should be set to whatever the regex inside the second set 
of parens matched.  Did you check to see if your matches were successful 
(something like

    die "horribly" unless /regex/;

perhaps)?  If $2 is coming up empty, that might be because its regex 
matched the empty string.  That would be possible in some of your examples.

By far the best bet for any HTML manipulation is to parse the HTML with 
an HTML parser.  The HTML::Parser module is one possibility.  It will 
catch all the subtleties that roll-your-own regex coding will never get. 
  It is actually a fairly difficult job to parse HTML.

That might not seem very helpful, but it is good advice.
-- 
Bob Walton



------------------------------

Date: Sat, 26 Apr 2003 21:30:18 -0500
From: "Eric J. Roode" <REMOVEsdnCAPS@comcast.net>
Subject: Re: regex for word whitespace word
Message-Id: <Xns9369E4EBBF3D5sdn.comcast@216.166.71.239>

-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1

"John Gros" <johngros@bigpond.net.au> wrote in
news:nLFqa.1860$lD4.12990@news-server.bigpond.net.au: 

> 
> I have run out of ideas.

You might share with us what your input data looks like, and what you
expect to match of it.

- -- 
Eric
print scalar reverse sort qw p ekca lre reh 
ts uJ p, $/.r, map $_.$", qw e p h tona e;
-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32) - WinPT 0.5.13

iD8DBQE+q0CxY96i4h5M0egRAk3eAKCewL1tAjy/uU0v+bmKoojgSgimoQCcD5lG
AiW6NLn/YpPaCe4DrUKMUfU=
=7Q2m
-----END PGP SIGNATURE-----


------------------------------

Date: Sun, 27 Apr 2003 03:32:22 GMT
From: "John Gros" <johngros@bigpond.net.au>
Subject: Re: regex for word whitespace word
Message-Id: <abIqa.2856$lD4.15965@news-server.bigpond.net.au>


"John Gros" <johngros@bigpond.net.au> wrote in message
news:nLFqa.1860$lD4.12990@news-server.bigpond.net.au...
> I have been trying to get a regex to pick up both single words and two
word
> descriptions of an a href tag but it refuses to pickup two word
> descriptions. I have been going over perlre looking for ways to do this
and
> have tried many variations. I often believe I have a regex that logically
> should work yet while $1 is set $2 is not. $1 being the link and $2 being
> the description, it does get set for a single word description but not set
> for two word descriptions.
>
> Regexs I have tried.
> /([A-Z]{2}.HTM)(>.*?<)/;
>
> /([A-Z]{2}.HTM)>(.*?)</;
>
> /([A-Z]{2}.HTM)>(\w*\s\w*)</;
>
> /([A-Z]{2}.HTM)>(\w*\s?\w{0,20})</;
>
> /([A-Z]{2}.HTM)>(\w*.?\w{0,20})</;
>
> /([A-Z]{2}.HTM)>(\w*.+\w+)</;
>
> /([A-Z]{2}.HTM)>(\w*.+)</;
>
> /([A-Z]{2}.HTM)>(\w*.?\w+)</;
>
> I have run out of ideas.

Ok Bob the actual code is,

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $doc = get 'http://www.nswtab.com.au/today/today.htm';
my @links;
my @lines = split( / /,$doc); # seperate  the html into lines
my @course;
my $temp;
my $counter = 0;
foreach(@lines){
 if (/([A-Z]{2}\.HTM)>\D/ && $_ !~ /DIVS/) # only the lines I want
 {
  /([A-Z]{2}\.HTM)>(\w*.?\w+?)</;
  $links[$counter] = $1; # the actual link
  $course[$counter++] = $2; # assigning the description
 }
}
$counter = 0;
foreach(@links){
 print "$_ : $course[$counter++]\n";
}

Only the regex line has changed for each regex. Sample data follows. The
following data only works with Tatura the following two word names fail to
match. I am at a loss to know why. I know my code works perfectly, it just
does not do what I was hoping, lol.

<TR><TD><font color="#3366FF" face="Arial" size="2"><B>MR</B> </font><font
face="Arial" size="2"><a href=MR.HTM>Tatura</FONT> <font face="Arial"
size="1">(VIC)</FONT></a></TD> <TD><font face="Arial" size="2"> shwry <B>
dead</B></font></TD> <TD><font face="arial" size="-2">xd 5,6 dd 7,8 ff 6,8
</font></TD> <TD align=left><font face="Arial" size="2">&nbsp;&nbsp;<a
href=MR01DIVS.HTM>1</A> at <B> 12:15pm</B> <font size=-2><a
href="https://betting.tabnsw.com.au/cgi-bin/login?IBFS=0&EVENT=01&STATE=2&MT
G=MR" target="blank"></A></font></TD></TR>
<TR><TD><font color="#3366FF" face="Arial" size="2"><B>BR</B> </font><font
face="Arial" size="2"><a href=BR.HTM>Sunshine Coast</FONT> <font
face="Arial" size="1">(QLD)</FONT></a></TD> <TD><font face="Arial" size="2">
ocast <B> heavy</B></font></TD> <TD><font face="arial" size="-2">xd 4,6 dd
7,8 ff 6,7 </font></TD> <TD align=left><font face="Arial"
size="2">&nbsp;&nbsp;<a href=BR01DIVS.HTM>1</A> at <B> 12:20pm</B> <font
size=-2><a
href="https://betting.tabnsw.com.au/cgi-bin/login?IBFS=0&EVENT=01&STATE=2&MT
G=BR" target="blank"></A></font></TD></TR>
<TR><TD><font color="#3366FF" face="Arial" size="2"><B>AR</B> </font><font
face="Arial" size="2"><a href=AR.HTM>Sha Tin (hk)</FONT> <font face="Arial"
size="1"></FONT></a></TD> <TD><font face="Arial" size="2"> ocast <B>
good</B></font></TD> <TD><font face="arial" size="-2">xd 7,8 dd 9,10
</font></TD> <TD align=left><font face="Arial" size="2">&nbsp;&nbsp;<a
href=AR05DIVS.HTM>5</A> at &nbsp;<B> 5:05pm</B> <font size=-2><a
href="https://betting.tabnsw.com.au/cgi-bin/login?IBFS=0&EVENT=05&STATE=2&MT
G=AR" target="blank"></A></font></TD></TR>
<TR><TD><font color="#3366FF" face="Arial" size="2"><B>PR</B> </font><font
face="Arial" size="2"><a href=PR.HTM>Yarra Glen</FONT> <font face="Arial"
size="1">(VIC)</FONT></a></TD> <TD><font face="Arial" size="2"> ocast <B>
dead</B></font></TD> <TD><font face="arial" size="-2">xd 4,5 dd 7,8 ff 7,8
</font></TD> <TD align=left><font face="Arial" size="2">&nbsp;&nbsp;<a
href=PR01DIVS.HTM>1</A> at <B> 12:35pm</B> <font size=-2><a
href="https://betting.tabnsw.com.au/cgi-bin/login?IBFS=0&EVENT=01&STATE=2&MT
G=PR" target="blank"></A></font></TD></TR>




------------------------------

Date: Sun, 27 Apr 2003 04:37:41 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: regex for word whitespace word
Message-Id: <3EAB5E6E.4050404@rochester.rr.com>

John Gros wrote:

> "John Gros" <johngros@bigpond.net.au> wrote in message
> news:nLFqa.1860$lD4.12990@news-server.bigpond.net.au...
> 
 ...
> Ok Bob the actual code is,
 ...
> my @lines = split( / /,$doc); # seperate  the html into lines


Your problem is on the above line.  It splits on every space character, 
including those in your description words.  So, for example, the string

    href=BR.HTM>Sunshine Coast</FONT

becomes two lines in @lines, neither of which matches your regex (the 
first of the two lines will match the regex in the if statement, which 
will set $1 as expected and $2 to the empty string -- the second regex 
will fail to match, and, since you didn't test to see if it matched or 
not, it looks like it set $1 (actually set in the previous regex) and $2 
looks like it was set to the null string (again, actually set in the 
previous regex).  To find this sort of stuff, use the Perl debugger -- 
that's what I did.

Even if that is fixed (by splitting on /\n/), there is still a problem 
with href=AR.HTM>Sha Tim (hk)</FONT

Why don't you just take everything until the next < ?  Maybe like:

     /([A-Z]{2}\.HTM)>(.*?)</;

Also, I note that your real data from the web page in your program has 
only \r characters between "lines".  Thus, splitting on /[\n\r]/ is 
needed to get the lines you desire and still split the test data (at 
least on Windoze).  I note that your expression to knock lines out of 
action if they contain DIVS gets rid of a lot of data due to other 
href's on the same line containing those characters.  Hopefully you 
intend that?

Also, you are counting on the web site retaining its existing format. 
The same web page display could be generated with radically different 
HTML -- with the sort of code you are writing, you are subject to minor 
changes totally messing up the functioning of your program.  Of course, 
that will be the case with any scheme to extract useful information from 
HTML, as HTML is intended to display information, not to make it 
usefully retrievable.


 ...

HTH
-- 
Bob Walton



------------------------------

Date: 26 Apr 2003 23:37:27 -0700
From: kevin_shay@yahoo.com (Kevin Shay)
Subject: Re: Regex greediness question
Message-Id: <5550ef1e.0304262237.261e75e1@posting.google.com>

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote in message news:<b89fe6$3bn$1@mamenchi.zrz.TU-Berlin.DE>...
> Well, that's a lot of critique occasioned by an innocent little bit of
> code (which actually works, to boot).  Anyway, to avoid the issues,
> I'd code it along these lines:
> 
>     while ( $html =~ m[(<table>.*?</table>)]sg ) {
>         my $table = $1;
>         if ( ... ) {}
>     }

What do you think of this approach?

$html =~ s[(<table>.*?</table>)][ deal_with_a_table($1) ]egi;

sub deal_with_a_table {
 ...
}

That way, the actual subsitution statement is relatively uncluttered,
free of scoping problems, etc., and you can put as much code as you
need into the subroutine (which will have a clearly defined return
value).

Hard to tell which is better without knowing the specific
circumstances, but in cases when you actually need to substitute
something into the original text, I don't think a while (m//) {} loop
always fits the bill...

Kevin
--
perl -MLWP::UserAgent -e '$u=new LWP::UserAgent;$u->agent("japh");
print join(" ",(split(/\s+/,(split/\n/,$u->request(HTTP::Request->
new(GET=>join("",split(/\n/,"http://groups.google.com/groups?selm=
4365%40omepd.UUCP&output=gplain"))))->content)[60]))[0..3]),",\n"'


------------------------------

Date: Sat, 26 Apr 2003 18:27:58 -0700
From: "Chris" <chris_12003@yahoo.com>
Subject: uploading photos in e-classified
Message-Id: <vamchfc2dof15f@corp.supernews.com>

I'm using the standard edition of e-classifieds and its set to allow the
user to upload one photo but I have seen other sites using the same software
that allow you to upload 5 or more.  Does anyone know what modifications I
need to make to the program so my users can upload more than one photo?

Thanks
- Chris





------------------------------

Date: Sat, 26 Apr 2003 21:31:09 -0500
From: "Eric J. Roode" <REMOVEsdnCAPS@comcast.net>
Subject: Re: uploading photos in e-classified
Message-Id: <Xns9369E510C74B5sdn.comcast@216.166.71.239>

-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1

"Chris" <chris_12003@yahoo.com> wrote in
news:vamchfc2dof15f@corp.supernews.com: 

> I'm using the standard edition of e-classifieds and its set to allow
> the user to upload one photo but I have seen other sites using the
> same software that allow you to upload 5 or more.  Does anyone know
> what modifications I need to make to the program so my users can
> upload more than one photo? 

Did you happen to have a Perl question?

- -- 
Eric
print scalar reverse sort qw p ekca lre reh 
ts uJ p, $/.r, map $_.$", qw e p h tona e;
-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32) - WinPT 0.5.13

iD8DBQE+q0DmY96i4h5M0egRAgnFAKDhV1UAwDyby1t30afCI3kq2swgwgCgzzcr
8X8X96pnCl2uxyMHgaLMPVI=
=DPbB
-----END PGP SIGNATURE-----


------------------------------

Date: Sun, 27 Apr 2003 03:05:14 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Won't let me use $[ !!!
Message-Id: <3EAB812A.CF87649D@earthlink.net>

JS Bangs wrote:
> 
> Tad McClellan sikyal:
> 
> > JS Bangs <jaspax@u.washington.edu> wrote:
> > > Hmph.
> > >
> > > I have a great reason to use $[ in my script, and to set it to an
> > > arbitrarily (and constantly changing) high value.
> >
> > Changing when?
> >
> > Between runs of the program?
> >
> > Or changing during the running of the program?
> >
> > If the later, then I don't know how to help you.
> 
> It's the latter. I have the array indices being stored at one point in
> the program, and I wanted a way to shrink the array without changing
> the index value. So I tried to write:
>         shift array;
>         $[++;
> Which perl didn't like.

That's because

   shift array;

Is a syntax error.  You need:

   shift @array;

And, of course, $[ can only be set to a constant.

> > > Specifically, when I write the line "$[ = $i;", I
> > > get a compiler error that says "That use of $[ is unsupported."
> >
> >
> > Did you look up that message in perldiag.pod?
> 
> Yes, after you pointed this out to me, and this basically explained it.
> Now, $[ can only be 0 or 1, and it is severely constrained in how you
> define it.

Oh, it can be things other than 0 or 1.  But you can't do stuff which
would change it over the course of time.

> > > I'm quite curious what the justification is for this--
> >
> > Where did you look to find out?
> 
> Here? perldiag.pod has the justification I was looking for, but I
> didn't know that that doc would have anything relevant.
> 
> Anyway, it looks like I'll have to find a different way of optimizing
> this program.

Use shift to remove your front values, and then subtract a variable,
$num_values_shifted_out, from the variable you're using to index into
the array.

If you could show some of your code, I could be a bit clearer.

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$@[$a%6
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4899
***************************************


home help back first fref pref prev next nref lref last post