[23702] in Perl-Users-Digest
Perl-Users Digest, Issue: 5908 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Dec 7 18:05:41 2003
Date: Sun, 7 Dec 2003 15:05:11 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 7 Dec 2003 Volume: 10 Number: 5908
Today's topics:
Re: can some one please explain this regex?! <geoff.cox@blueyonder.co.uk>
Re: can some one please explain this regex?! <matthew.garrish@sympatico.ca>
Re: can some one please explain this regex?! <invalid-email@rochester.rr.com>
Re: can some one please explain this regex?! <noreply@gunnar.cc>
Re: can some one please explain this regex?! <noreply@gunnar.cc>
Re: can some one please explain this regex?! <geoff.cox@blueyonder.co.uk>
Re: can some one please explain this regex?! <geoff.cox@blueyonder.co.uk>
Re: can some one please explain this regex?! <geoff.cox@blueyonder.co.uk>
Re: can some one please explain this regex?! <noreply@gunnar.cc>
Re: can some one please explain this regex?! <geoff.cox@blueyonder.co.uk>
Re: can some one please explain this regex?! <geoff.cox@blueyonder.co.uk>
Direct parallel port access under windows XP for DOS ap (Ran)
Re: Direct parallel port access under windows XP for DO <invalid-email@rochester.rr.com>
Re: How to write to drive A:\ from CGI Perl (Cle)
Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
Re: hwo to match more than 1 line? <noreply@gunnar.cc>
Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 07 Dec 2003 19:32:24 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: can some one please explain this regex?!
Message-Id: <dtv6tv4f45dq0b6k7gl7nftrhl5s22lh8v@4ax.com>
On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
<geoff.cox@blueyonder.co.uk> wrote:
I should have made things a bit clearer - so here is the whole code
and a sample of html which it is to work on .. can any one see why it
doesn't get the name and address info?!
Cheers
Geoff
My code is as follows but it does not work!
---------------------------
use strict;
print ("name of html file?\n");
my $namehtml = <STDIN>;
print ("name of email list file?\n");
my $newhtml = <STDIN>;
open(IN, "$namehtml");
open(OUT, ">>$newhtml");
my $line = <IN>;
while (defined($line=<IN>)) {
# if ($line =~ / (.*?)<\/H6>/i) {
# print OUT ("$1 \n");
# }
if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
.+?
Address.+?<TD[^>]+>([^<]+)
/isx ) {
print OUT ("Name: $1\nAddress: $2\n");
}
}
close (IN);
close (OUT);
-----------------------------
which is working on for example
<TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
<TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
<TR>
<TD align=left width="20%" colSpan=2><B>Address</B></TD>
<TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
London N88 5XX</TD></TR>
Cheers
Geoff
------------------------------
Date: Sun, 7 Dec 2003 14:24:23 -0500
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: can some one please explain this regex?!
Message-Id: <M7LAb.2053$3y1.239778@news20.bellglobal.com>
"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote in message
news:0iq6tv0f7u1efrv1ge01eog6e89kpc4okj@4ax.com...
> Hello,
>
> this comes from my posting re how to match more than 1 line (from
> Gunnar) but would appreciate any one just explaining what is matched
> as the code does not work for me. If I could learn from this I could
> probably sort it out for myself ..
>
>
To break it down piece by piece:
/Head\s+Teacher.+?<TD[^>]+>([^<]+).+?Address.+?<TD[^>]+>([^<]+)/is
matches "head" (you have the /i switch on, so it will match any case)
followed by one or more whitespace characters, followed by "teacher",
followed by one or more characters up to an opening <td. You then have a
negated character class, so it will match all text up to the next closing >,
and then another negated character class will match and capture anything up
to the next opening <.
I imagine this might be where your problem is. None of your match patterns
allow for zero occurrences, which means that there has to be at least one
character between the <td and closing >. In other words, your pattern would
never match <td>, but only something like <td class="foo">.
Moving on, you then have two non-greedy matches (.+?). The first will match
anything up to "address" and the second will match anything up to the next
<td. The regex then repeats itself with the two negated classes: one looking
for the end of the <td> and the other capturing everything up to the next
opening <. And once again, your pattern will fail unless there is at least
one character between the <td and >.
(I removed the /x from your original posting because it just allows
whitespace and comments in your regex, which didn't help the readability of
it, in my opinion of course.)
Matt
------------------------------
Date: Sun, 07 Dec 2003 19:53:03 GMT
From: Bob Walton <invalid-email@rochester.rr.com>
Subject: Re: can some one please explain this regex?!
Message-Id: <3FD3817C.3070406@rochester.rr.com>
Geoff Cox wrote:
> On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
> <geoff.cox@blueyonder.co.uk> wrote:
>
> I should have made things a bit clearer - so here is the whole code
> and a sample of html which it is to work on .. can any one see why it
> doesn't get the name and address info?!
>
> Cheers
>
> Geoff
>
>
> My code is as follows but it does not work!
-------------------------------^^^^^^^^^^^^^
A much more specific description of what your code does/doesn't do it
called for in a newsgroup posting. Please state exactly what it does
that it shouldn't do, or what it doesn't do that it should do. "Doesn't
work" is next to meaningless -- we can't read your mind.
>
> ---------------------------
> use strict;
use warnings;
>
> print ("name of html file?\n");
> my $namehtml = <STDIN>;
>
> print ("name of email list file?\n");
> my $newhtml = <STDIN>;
>
>
> open(IN, "$namehtml");
> open(OUT, ">>$newhtml");
>
> my $line = <IN>;
Since you didn't modify $/, this will read only one line. I think
that's your fundamental problem. Try:
my $line;
{local $/;$line=<IN>} #slurp the input
and see if that works better.
>
> while (defined($line=<IN>)) {
Here you are reading the rest of the lines of filehandle IN, but one at
a time. You will have skipped the first line (which was read above).
If you slurp the input, you should get rid of the while loop.
> # if ($line =~ / (.*?)<\/H6>/i) {
> # print OUT ("$1 \n");
> # }
>
> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
> .+?
> Address.+?<TD[^>]+>([^<]+)
> /isx ) {
> print OUT ("Name: $1\nAddress: $2\n");
> }
>
> }
>
> close (IN);
> close (OUT);
>
> -----------------------------
>
> which is working on for example
>
>
> <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
> <TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
> <TR>
> <TD align=left width="20%" colSpan=2><B>Address</B></TD>
> <TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
> London N88 5XX</TD></TR>
...
> Geoff
Yes: you read the first line of your file, and throw it away. That was
the line with Teacher etc in it. But even if you didn't do that, the
remainder of the lines are read one at a time, and no one line contains
enough stuff to match your pattern. Slurp it all, and your pattern
might match. Here is a slightly modified standalone copy/paste/execute
style copy of your program that looks like it might "work":
use strict;
use warnings;
#print ("name of html file?\n");
#my $namehtml = <STDIN>;
#print ("name of email list file?\n");
#my $newhtml = <STDIN>;
#open(IN, "$namehtml");
#open(OUT, ">>$newhtml");
my $line;
{local $/;$line = <DATA>} #slurp the file
#while (defined($line=<DATA>)) {
# if ($line =~ / (.*?)<\/H6>/i) {
# print OUT ("$1 \n");
# }
if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
.+?
Address.+?<TD[^>]+>([^<]+)
/isx ) {
print ("Name: $1\nAddress: $2\n");
}
#}
#close (IN);
#close (OUT);
__END__
<TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
<TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
<TR>
<TD align=left width="20%" colSpan=2><B>Address</B></TD>
<TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
London N88 5XX</TD></TR>
HTH.
--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl
------------------------------
Date: Sun, 07 Dec 2003 21:01:48 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: can some one please explain this regex?!
Message-Id: <br017s$283m2h$1@ID-184292.news.uni-berlin.de>
Geoff Cox wrote:
> here is the whole code and a sample of html which it is to work on
And, as I suspected, the problem has nothing to do with the regex...
Read Bob's explanation carefully!
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 07 Dec 2003 21:11:46 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: can some one please explain this regex?!
Message-Id: <br01qk$255uoi$1@ID-184292.news.uni-berlin.de>
Matt Garrish wrote:
> Geoff Cox wrote:
>> this comes from my posting re how to match more than 1 line (from
>> Gunnar) but would appreciate any one just explaining what is
>> matched as the code does not work for me. If I could learn from
>> this I could probably sort it out for myself ..
>
> To break it down piece by piece:
>
> /Head\s+Teacher.+?<TD[^>]+>([^<]+).+?Address.+?<TD[^>]+>([^<]+)/is
<snip>
> I imagine this might be where your problem is. None of your match
> patterns allow for zero occurrences, which means that there has to
> be at least one character between the <td and closing >. In other
> words, your pattern would never match <td>, but only something like
> <td class="foo">.
Yeah, you are right, of course. Both the occurrences of
<TD[^>]+>
should better be
<TD[^>]*>
(But, as explained in other posts, that limitation was not the reason
why OP's code didn't "work".)
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 07 Dec 2003 20:58:19 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: can some one please explain this regex?!
Message-Id: <7m47tvsbnbut3r06qnntb10c8b0e0dp5b0@4ax.com>
On Sun, 07 Dec 2003 19:53:03 GMT, Bob Walton
<invalid-email@rochester.rr.com> wrote:
Bob,
many thanks for your thoughts - the following code gets the first set
of name/address data but stops at that point - 'afraid I haven't used
your form of slurp before and do not see how to move through the rest
of the file containing the name/address data?
Geoff
use strict;
use warnings;
print ("name of html file?\n");
my $namehtml = <STDIN>;
print ("name of email list file?\n");
my $newhtml = <STDIN>;
open(DATA, "$namehtml");
open(OUT, ">>$newhtml");
my $line;
{local $/;$line = <DATA>} #slurp the file
#while (defined($line=<DATA>)) {
# if ($line =~ / (.*?)<\/H6>/i) {
# print OUT ("$1 \n");
# }
if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
.+?
Address.+?<TD[^>]+>([^<]+)
/isx ) {
print OUT ("Name: $1\nAddress: $2\n");
}
#}
close (IN);
close (OUT);
>Geoff Cox wrote:
>
>> On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
>> <geoff.cox@blueyonder.co.uk> wrote:
>>
>> I should have made things a bit clearer - so here is the whole code
>> and a sample of html which it is to work on .. can any one see why it
>> doesn't get the name and address info?!
>>
>> Cheers
>>
>> Geoff
>>
>>
>> My code is as follows but it does not work!
>
>-------------------------------^^^^^^^^^^^^^
>A much more specific description of what your code does/doesn't do it
>called for in a newsgroup posting. Please state exactly what it does
>that it shouldn't do, or what it doesn't do that it should do. "Doesn't
>work" is next to meaningless -- we can't read your mind.
>
>
>>
>> ---------------------------
>> use strict;
>
>use warnings;
>
>
>>
>> print ("name of html file?\n");
>> my $namehtml = <STDIN>;
>>
>> print ("name of email list file?\n");
>> my $newhtml = <STDIN>;
>>
>>
>> open(IN, "$namehtml");
>> open(OUT, ">>$newhtml");
>>
>> my $line = <IN>;
>
>Since you didn't modify $/, this will read only one line. I think
>that's your fundamental problem. Try:
>
> my $line;
> {local $/;$line=<IN>} #slurp the input
>
>and see if that works better.
>
>
>>
>> while (defined($line=<IN>)) {
>
>Here you are reading the rest of the lines of filehandle IN, but one at
>a time. You will have skipped the first line (which was read above).
>If you slurp the input, you should get rid of the while loop.
>
>
>> # if ($line =~ / (.*?)<\/H6>/i) {
>> # print OUT ("$1 \n");
>> # }
>>
>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>> .+?
>> Address.+?<TD[^>]+>([^<]+)
>> /isx ) {
>> print OUT ("Name: $1\nAddress: $2\n");
>> }
>>
>> }
>>
>> close (IN);
>> close (OUT);
>>
>> -----------------------------
>>
>> which is working on for example
>>
>>
>> <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
>> <TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
>> <TR>
>> <TD align=left width="20%" colSpan=2><B>Address</B></TD>
>> <TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
>> London N88 5XX</TD></TR>
>...
>
>
>> Geoff
>
>Yes: you read the first line of your file, and throw it away. That was
>the line with Teacher etc in it. But even if you didn't do that, the
>remainder of the lines are read one at a time, and no one line contains
>enough stuff to match your pattern. Slurp it all, and your pattern
>might match. Here is a slightly modified standalone copy/paste/execute
>style copy of your program that looks like it might "work":
>
>use strict;
>use warnings;
>#print ("name of html file?\n");
>#my $namehtml = <STDIN>;
>
>#print ("name of email list file?\n");
>#my $newhtml = <STDIN>;
>
>
>#open(IN, "$namehtml");
>#open(OUT, ">>$newhtml");
>
>my $line;
>{local $/;$line = <DATA>} #slurp the file
>
>#while (defined($line=<DATA>)) {
># if ($line =~ / (.*?)<\/H6>/i) {
># print OUT ("$1 \n");
># }
> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
> .+?
> Address.+?<TD[^>]+>([^<]+)
> /isx ) {
> print ("Name: $1\nAddress: $2\n");
> }
>
>#}
>
>#close (IN);
>#close (OUT);
>
>__END__
> <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
><TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
><TR>
><TD align=left width="20%" colSpan=2><B>Address</B></TD>
><TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
>London N88 5XX</TD></TR>
>
>HTH.
------------------------------
Date: Sun, 07 Dec 2003 21:04:19 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: can some one please explain this regex?!
Message-Id: <p757tv8lf7tgoc86g30oekivj3ghpfj5i1@4ax.com>
On Sun, 07 Dec 2003 21:01:48 +0100, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:
>Geoff Cox wrote:
>> here is the whole code and a sample of html which it is to work on
>
>And, as I suspected, the problem has nothing to do with the regex...
>Read Bob's explanation carefully!
Gunnar
must be almost there - I have posted my version based on Bob's code
... but it only gets the first name/address info - not clear how I
move through the rest of the file?
by the way - your code seems to work fine minus my suggestion re the
additional < ?!
Cheers
Geoff
------------------------------
Date: Sun, 07 Dec 2003 21:21:55 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: can some one please explain this regex?!
Message-Id: <4967tv41vra61gb8ceknddvofg91ifaril@4ax.com>
On Sun, 7 Dec 2003 14:24:23 -0500, "Matt Garrish"
<matthew.garrish@sympatico.ca> wrote:
>
>"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote in message
>news:0iq6tv0f7u1efrv1ge01eog6e89kpc4okj@4ax.com...
>> Hello,
>>
>> this comes from my posting re how to match more than 1 line (from
>> Gunnar) but would appreciate any one just explaining what is matched
>> as the code does not work for me. If I could learn from this I could
>> probably sort it out for myself ..
>>
>>
>
>To break it down piece by piece:
Matt,
many thanks - will read in a minute - but you might like to look at
following code - thsi works OK except that it only gets the first set
of name/address data - I do not see at the moment how to move along
the slurped input to get the other sets of name/address info ..? any
ideas?! Cheers Geoff
use strict;
use warnings;
print ("name of html file?\n");
my $namehtml = <STDIN>;
print ("name of email list file?\n");
my $newhtml = <STDIN>;
open(DATA, "$namehtml");
open(OUT, ">>$newhtml");
my $line;
{local $/;$line = <DATA>} #slurp the file
#while (defined($line=<DATA>)) {
# if ($line =~ / (.*?)<\/H6>/i) {
# print OUT ("$1 \n");
# }
if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
.+?
Address.+?<TD[^>]+>([^<]+)
/isx ) {
print OUT ("Name: $1\nAddress: $2\n");
}
#}
close (DATA);
close (OUT);
>
>/Head\s+Teacher.+?<TD[^>]+>([^<]+).+?Address.+?<TD[^>]+>([^<]+)/is
>
>matches "head" (you have the /i switch on, so it will match any case)
>followed by one or more whitespace characters, followed by "teacher",
>followed by one or more characters up to an opening <td. You then have a
>negated character class, so it will match all text up to the next closing >,
>and then another negated character class will match and capture anything up
>to the next opening <.
>
>I imagine this might be where your problem is. None of your match patterns
>allow for zero occurrences, which means that there has to be at least one
>character between the <td and closing >. In other words, your pattern would
>never match <td>, but only something like <td class="foo">.
>
>Moving on, you then have two non-greedy matches (.+?). The first will match
>anything up to "address" and the second will match anything up to the next
><td. The regex then repeats itself with the two negated classes: one looking
>for the end of the <td> and the other capturing everything up to the next
>opening <. And once again, your pattern will fail unless there is at least
>one character between the <td and >.
>
>(I removed the /x from your original posting because it just allows
>whitespace and comments in your regex, which didn't help the readability of
>it, in my opinion of course.)
>
>Matt
>
------------------------------
Date: Sun, 07 Dec 2003 22:48:43 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: can some one please explain this regex?!
Message-Id: <br07gk$25ou5e$1@ID-184292.news.uni-berlin.de>
Geoff Cox wrote:
> Bob,
>
> many thanks for your thoughts - the following code gets the first
> set of name/address data but stops at that point - 'afraid I
> haven't used your form of slurp before and do not see how to move
> through the rest of the file containing the name/address data?
Well, you haven't told us before that there are more than one
name/address pair.
> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
Try to change that to
while ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
----^^^^^
> /isx ) {
and that to
/gisx ) {
-------------------^
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 07 Dec 2003 22:15:43 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: can some one please explain this regex?!
Message-Id: <sf97tvc83ohp2oidbef0vfe6lht35bg7jh@4ax.com>
On Sun, 07 Dec 2003 20:58:19 GMT, Geoff Cox
<geoff.cox@blueyonder.co.uk> wrote:
>On Sun, 07 Dec 2003 19:53:03 GMT, Bob Walton
><invalid-email@rochester.rr.com> wrote:
>
>Bob,
>
>many thanks for your thoughts - the following code gets the first set
>of name/address data but stops at that point - 'afraid I haven't used
>your form of slurp before and do not see how to move through the rest
>of the file containing the name/address data?
Obvious really !! just need to use while instead of if and add the g
option ..
Thanks everyone for all the help!
Cheers
Geoff
>
>Geoff
>
>use strict;
>use warnings;
>print ("name of html file?\n");
>my $namehtml = <STDIN>;
>
>print ("name of email list file?\n");
>my $newhtml = <STDIN>;
>
>
>open(DATA, "$namehtml");
>open(OUT, ">>$newhtml");
>
>my $line;
>{local $/;$line = <DATA>} #slurp the file
>
>#while (defined($line=<DATA>)) {
># if ($line =~ / (.*?)<\/H6>/i) {
># print OUT ("$1 \n");
># }
> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
> .+?
> Address.+?<TD[^>]+>([^<]+)
> /isx ) {
> print OUT ("Name: $1\nAddress: $2\n");
> }
>
>#}
>
>close (IN);
>close (OUT);
>
>
>
>
>>Geoff Cox wrote:
>>
>>> On Sun, 07 Dec 2003 18:02:07 GMT, Geoff Cox
>>> <geoff.cox@blueyonder.co.uk> wrote:
>>>
>>> I should have made things a bit clearer - so here is the whole code
>>> and a sample of html which it is to work on .. can any one see why it
>>> doesn't get the name and address info?!
>>>
>>> Cheers
>>>
>>> Geoff
>>>
>>>
>>> My code is as follows but it does not work!
>>
>>-------------------------------^^^^^^^^^^^^^
>>A much more specific description of what your code does/doesn't do it
>>called for in a newsgroup posting. Please state exactly what it does
>>that it shouldn't do, or what it doesn't do that it should do. "Doesn't
>>work" is next to meaningless -- we can't read your mind.
>>
>>
>>>
>>> ---------------------------
>>> use strict;
>>
>>use warnings;
>>
>>
>>>
>>> print ("name of html file?\n");
>>> my $namehtml = <STDIN>;
>>>
>>> print ("name of email list file?\n");
>>> my $newhtml = <STDIN>;
>>>
>>>
>>> open(IN, "$namehtml");
>>> open(OUT, ">>$newhtml");
>>>
>>> my $line = <IN>;
>>
>>Since you didn't modify $/, this will read only one line. I think
>>that's your fundamental problem. Try:
>>
>> my $line;
>> {local $/;$line=<IN>} #slurp the input
>>
>>and see if that works better.
>>
>>
>>>
>>> while (defined($line=<IN>)) {
>>
>>Here you are reading the rest of the lines of filehandle IN, but one at
>>a time. You will have skipped the first line (which was read above).
>>If you slurp the input, you should get rid of the while loop.
>>
>>
>>> # if ($line =~ / (.*?)<\/H6>/i) {
>>> # print OUT ("$1 \n");
>>> # }
>>>
>>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>>> .+?
>>> Address.+?<TD[^>]+>([^<]+)
>>> /isx ) {
>>> print OUT ("Name: $1\nAddress: $2\n");
>>> }
>>>
>>> }
>>>
>>> close (IN);
>>> close (OUT);
>>>
>>> -----------------------------
>>>
>>> which is working on for example
>>>
>>>
>>> <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
>>> <TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
>>> <TR>
>>> <TD align=left width="20%" colSpan=2><B>Address</B></TD>
>>> <TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
>>> London N88 5XX</TD></TR>
>>...
>>
>>
>>> Geoff
>>
>>Yes: you read the first line of your file, and throw it away. That was
>>the line with Teacher etc in it. But even if you didn't do that, the
>>remainder of the lines are read one at a time, and no one line contains
>>enough stuff to match your pattern. Slurp it all, and your pattern
>>might match. Here is a slightly modified standalone copy/paste/execute
>>style copy of your program that looks like it might "work":
>>
>>use strict;
>>use warnings;
>>#print ("name of html file?\n");
>>#my $namehtml = <STDIN>;
>>
>>#print ("name of email list file?\n");
>>#my $newhtml = <STDIN>;
>>
>>
>>#open(IN, "$namehtml");
>>#open(OUT, ">>$newhtml");
>>
>>my $line;
>>{local $/;$line = <DATA>} #slurp the file
>>
>>#while (defined($line=<DATA>)) {
>># if ($line =~ / (.*?)<\/H6>/i) {
>># print OUT ("$1 \n");
>># }
>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>> .+?
>> Address.+?<TD[^>]+>([^<]+)
>> /isx ) {
>> print ("Name: $1\nAddress: $2\n");
>> }
>>
>>#}
>>
>>#close (IN);
>>#close (OUT);
>>
>>__END__
>> <TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
>><TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
>><TR>
>><TD align=left width="20%" colSpan=2><B>Address</B></TD>
>><TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
>>London N88 5XX</TD></TR>
>>
>>HTH.
------------------------------
Date: Sun, 07 Dec 2003 22:53:07 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: can some one please explain this regex?!
Message-Id: <2kb7tvsdornlpbsc4t9rrf3mg0kvg2i56d@4ax.com>
On Sun, 07 Dec 2003 22:48:43 +0100, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:
>Geoff Cox wrote:
>> Bob,
>>
>> many thanks for your thoughts - the following code gets the first
>> set of name/address data but stops at that point - 'afraid I
>> haven't used your form of slurp before and do not see how to move
>> through the rest of the file containing the name/address data?
>
>Well, you haven't told us before that there are more than one
>name/address pair.
Gunnar,
sorry - I thought I had made it clear that the text I'd given was just
a sample of the file ... any way - all's wel that ends well !
Many thanks for all your help. I've learnt quite a bit tonight!
Cheers
Geoff
>
>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>
>Try to change that to
>
> while ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>----^^^^^
>
>> /isx ) {
>
>and that to
>
> /gisx ) {
>-------------------^
------------------------------
Date: 7 Dec 2003 11:42:24 -0800
From: ran_tao@netvision.net.il (Ran)
Subject: Direct parallel port access under windows XP for DOS applications
Message-Id: <2d73b65b.0312071142.1c7d82ad@posting.google.com>
Hello,
I will describe my problem, hoping that anyone can help with a
solution.
I am running an old DOS application. This application cannot be
altered due to many reasons.
This application uses an old HASP dongle. The hasp drivers for dos
dongles does not work with this application. The only way I could make
this application work, is by writing a code similar to UserPort, which
changes the IO permission map for the calling process.
My problem is that recently new display adapters drivers like ATI
Radeon will not allow me change the TSS to enable direct port access
for the 16Bit process. When disabling the display adapter driver and
using the default windows XP driver, this problem is "solved", of
course not having a display driver reduces performance but I can live
with that.
Second problem is that when using the UserPort with new HyperThreading
motherboards, when the 16bit application tries to access the port, the
computer restarts.
I tried using direct-io, disabling the printer port and enabling
378-37F for the EXE I am using, but with no success. It did not get
the correct response from the dongle.
I think it's because the EXE is using direct bios calls and a virtual
device driver cannot capture the calls and handle them correctly.
Does anyone have a solution to this problem? Any ideas?
I would appreciate any help you can offer...
Thanks for your time,
Ran.
------------------------------
Date: Sun, 07 Dec 2003 20:01:39 GMT
From: Bob Walton <invalid-email@rochester.rr.com>
Subject: Re: Direct parallel port access under windows XP for DOS applications
Message-Id: <3FD3837F.6030007@rochester.rr.com>
Ran wrote:
...
> I am running an old DOS application. This application cannot be
> altered due to many reasons.
>
> This application uses an old HASP dongle. The hasp drivers for dos
> dongles does not work with this application. The only way I could make
> this application work, is by writing a code similar to UserPort, which
> changes the IO permission map for the calling process.
>
> My problem is that recently new display adapters drivers like ATI
> Radeon will not allow me change the TSS to enable direct port access
> for the 16Bit process. When disabling the display adapter driver and
> using the default windows XP driver, this problem is "solved", of
> course not having a display driver reduces performance but I can live
> with that.
>
> Second problem is that when using the UserPort with new HyperThreading
> motherboards, when the 16bit application tries to access the port, the
> computer restarts.
>
> I tried using direct-io, disabling the printer port and enabling
> 378-37F for the EXE I am using, but with no success. It did not get
> the correct response from the dongle.
>
> I think it's because the EXE is using direct bios calls and a virtual
> device driver cannot capture the calls and handle them correctly.
...
> Ran.
>
Is there a Perl question buried in there somewhere? If not, please
don't post to comp.lang.perl.misc .
Sounds to me like your old software has finally come to the end of its
life. Amazing it lasted this long. Why not just keep an old 486 around
to run it on?
--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl
------------------------------
Date: 7 Dec 2003 13:21:23 -0800
From: canle@lecan.com (Cle)
Subject: Re: How to write to drive A:\ from CGI Perl
Message-Id: <8a21e493.0312071321.4d5de54e@posting.google.com>
Cle wrote for help: How to write to drive A:\ from CGI Perl .
jue: "...BS, not possible. ".
Cle: Jue, If you can't do it as you wrote, I thank you. Period.
I will look forward to buy your best. I will introduce my best Perl
script, which is better than the one that I am posting in my website,
then you may loose at least a customer and may more online buyers like
me, because you write Perl better than many others, and you will
nerver know that someday, one person out there can do it, as replied
by:
Sinan: Easy! ! : #! C:/Perl/bin/perl.exe ....
Tintin: open FILE, ">C:/some/path/output.txt" or die "Can not open
C:/some/path/output.txt $!\n";
My computer can't reboost with a floppy disk which is fully inserted
in drive A. I think it is safe to have a disk in there so PCAnywhere
can't remotely run my computer in future. This is why I asked such
newbie question for help, if I don't want to copy output and paste it
to my computer's Notepad then "save as" a file in drice A:!
Thank again for your helps!
Cle
------------------------------
Date: Sun, 07 Dec 2003 19:06:54 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <80u6tvs03p92icfen7ld6vr7p0i1ntebpt@4ax.com>
On Sun, 07 Dec 2003 19:07:26 +0100, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:
>Geoff Cox wrote:
>> Gunnar Hjalmarsson wrote:
>>>
>>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>>> .+?
>>> Address.+?<TD[^>]+>([^<]+)
>>> /isx ) {
>>> print "Name: $1\nAddress: $2\n";
>>> }
>>
>> the above is not working for me at the moment - if you have the
>> time (and patience!) it would really help me if you could "talk" me
>> through it ...
>
>I'd prefer not to. Besides the character classes, which we now have
>explained, and a couple of modifiers, whose meaning you can read about
>in 'perldoc perlre', it doesn't include anything that was not included
>in the regex you posted yourself.
OK - will do - I follow above except I would have expected that
>>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
would need
>>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)<
and
>>> Address.+?<TD[^>]+>([^<]+)
would need
>>> Address.+?<TD[^>]+>([^<]+)<
ie the "<" to signify where the ([^<]+) ends - as you do have a "<" in
the .+?<TD[^>]+> section?! I must be missing something?
My code is as follows but it does not work!
---------------------------
use strict;
print ("name of html file?\n");
my $namehtml = <STDIN>;
print ("name of email list file?\n");
my $newhtml = <STDIN>;
open(IN, "$namehtml");
open(OUT, ">>$newhtml");
my $line = <IN>;
while (defined($line=<IN>)) {
# if ($line =~ / (.*?)<\/H6>/i) {
# print OUT ("$1 \n");
# }
if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
.+?
Address.+?<TD[^>]+>([^<]+)
/isx ) {
print OUT ("Name: $1\nAddress: $2\n");
}
}
close (IN);
close (OUT);
-----------------------------
which is working on for example
<TD align=left width="20%" colSpan=2><B>Head Teacher</B></TD>
<TD vAlign=top width="80%" colSpan=2>Fred Green</TD></TR>
<TR>
<TD align=left width="20%" colSpan=2><B>Address</B></TD>
<TD vAlign=top width="80%" colSpan=2>Park Road, Northgate,
London N88 5XX</TD></TR>
Cheers
Geoff
>
>I suggest that you post a minimal but complete program that others can
>run and that illustrates that the above regex fails in extracting the
>name and address.
------------------------------
Date: Sun, 07 Dec 2003 21:30:05 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: hwo to match more than 1 line?
Message-Id: <br02t1$26i5vh$1@ID-184292.news.uni-berlin.de>
Geoff Cox wrote:
> Gunnar Hjalmarsson wrote:
>>Geoff Cox wrote:
>>>Gunnar Hjalmarsson wrote:
>>>>
>>>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>>>> .+?
>>>> Address.+?<TD[^>]+>([^<]+)
>>>> /isx ) {
>>>> print "Name: $1\nAddress: $2\n";
>>>> }
>
> I follow above except I would have expected that
>
>>>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>
> would need
>
>>>> if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)<
>
> and
>
>>>> Address.+?<TD[^>]+>([^<]+)
>
> would need
>
>>>> Address.+?<TD[^>]+>([^<]+)<
>
> ie the "<" to signify where the ([^<]+) ends - as you do have a "<"
> in the .+?<TD[^>]+> section?! I must be missing something?
Since [^<]+ matches any character besides <, it stops matching as soon
as a < is reached. Consequently, adding those '<' characters as you
suggest does not make a difference.
If I had used .+? instead, it would have been necessary to do
(.+?)<
HTH
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 07 Dec 2003 21:24:14 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <sg67tv096003lfgknek8ketju3hn9a2o58@4ax.com>
On Sun, 07 Dec 2003 21:30:05 +0100, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:
>Since [^<]+ matches any character besides <, it stops matching as soon
>as a < is reached. Consequently, adding those '<' characters as you
>suggest does not make a difference.
>
>If I had used .+? instead, it would have been necessary to do
>
> (.+?)<
Gunnar - yes I follow that now!
Geoff
>
>HTH
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 5908
***************************************