[23700] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5906 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Dec 7 11:05:39 2003

Date: Sun, 7 Dec 2003 08:05:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 7 Dec 2003     Volume: 10 Number: 5906

Today's topics:
        ! <geoff.cox@blueyonder.co.uk>
    Re: ! <kuujinbo@hotmail.com>
    Re: How to open a file from the end and read the last 1 <jwillmore@remove.adelphia.net>
    Re: How to write to drive A:\ from CGI Perl <jurgenex@hotmail.com>
        hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
    Re: hwo to match more than 1 line? (Jay Tilton)
    Re: hwo to match more than 1 line? <noreply@gunnar.cc>
    Re: hwo to match more than 1 line? <me@privacy.net>
    Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
    Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
    Re: hwo to match more than 1 line? <noreply@gunnar.cc>
    Re: hwo to match more than 1 line? <jurgenex@hotmail.com>
    Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
    Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
    Re: hwo to match more than 1 line? <noreply@gunnar.cc>
    Re: hwo to match more than 1 line? <jurgenex@hotmail.com>
    Re: hwo to match more than 1 line? <noreply@gunnar.cc>
    Re: hwo to match more than 1 line? (Tad McClellan)
    Re: hwo to match more than 1 line? (Tad McClellan)
    Re: hwo to match more than 1 line? <flavell@ph.gla.ac.uk>
    Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
    Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
    Re: hwo to match more than 1 line? <geoff.cox@blueyonder.co.uk>
        Why can't I parse google search results? (bob)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 07 Dec 2003 10:59:18 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: !
Message-Id: <ks16tv420tod9vg5ligmjc48qruhtgnbce@4ax.com>

On Sun, 7 Dec 2003 23:10:48 +1300, "Tintin" <me@privacy.net> wrote:

>
>"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote in message
>news:1es5tv8ppfs34pua3uesreni0hlkjvsc7d@4ax.com...
>> Hello,
>>
>> How do I capture text that goes over 2 lines?
>>
>> The text could be say
>>
>>           <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
>> London  N500 5JJJ</TD></TR>
>>
>> The following code only gets the text up to and including Northgate,
>>
>>  if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
>>        print OUT ("$1 \n");
>>        }
>>
>> Ideas please?!
>
>You've discovered that regexes aren't very robust/easy/flexible when it
>comes to parsing HTML.  Use one of the HTML parsers on CPAN.
>

There seem to be a large number of them! any recommendation?!

Cheers

Geoff



------------------------------

Date: Sun, 07 Dec 2003 20:33:27 +0900
From: ko <kuujinbo@hotmail.com>
Subject: Re: !
Message-Id: <bqv3dc$2p5$1@pin3.tky.plala.or.jp>

Geoff Cox wrote:
> On Sun, 7 Dec 2003 23:10:48 +1300, "Tintin" <me@privacy.net> wrote:
> 
> 
>>"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote in message
>>news:1es5tv8ppfs34pua3uesreni0hlkjvsc7d@4ax.com...

[snip]

>>>Ideas please?!
>>
>>You've discovered that regexes aren't very robust/easy/flexible when it
>>comes to parsing HTML.  Use one of the HTML parsers on CPAN.
> 
> There seem to be a large number of them! any recommendation?!

HTML::Parser. If you're only interested in extracting text, here's an 
example to get you started:

http://search.cpan.org/src/GAAS/HTML-Parser-3.34/eg/htext

There are other example scripts in the parent directory.

HTH - keith



------------------------------

Date: Sun, 07 Dec 2003 14:53:54 GMT
From: James Willmore <jwillmore@remove.adelphia.net>
Subject: Re: How to open a file from the end and read the last 100 lines
Message-Id: <20031207095354.106b22f8.jwillmore@remove.adelphia.net>

On Sun, 07 Dec 2003 03:37:47 GMT
"Mihai N." <nmihai_year_2000@yahoo.com> wrote:
> Uri Guttman <uri@stemsystems.com> wrote in
> news:x7brqlsao7.fsf@mail.sysarch.com: 
> >>>>>> "AS" == Anno Siegel <anno4000@lublin.zrz.tu-berlin.de>
> >writes:
> >   AS> Sara <genericax@hotmail.com> wrote in comp.lang.perl.misc:
> >  >> anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote in
> >  >message> news:<bqs8md$hfo$1@mamenchi.zrz.TU-Berlin.DE>...
<snip>
> I agree CPAN is great.
> But you have to put hings in balance.
> 
> When the time + effort to search for what you need,
> evaluate the 20 possible modules, select one or two, compare,
> understand how they work, I would rather write my own two lines.
> 
> I would not do this for complex stuff, like parsing XML/HTML,
> sending emails, DB interogations, etc.
> Where is the limit for "complex" for each one, it is for each
> one to decide. If I am not able to write a line of perl to
> do my stuff, chances are I will not be able to use properly a
> CPAN module.
> CPAN is not going to think for you.

Problem is ... you have been *told* which module will work :-)  So,
evaluation of a module that *works* doesn't appear to be something you
*need* to do.

It would be a different story if you *didn't* know which module would
work - but the author (no less) of a working module has told you it
works and why you should use it.

So, what's the problem with using it?  Have you even *looked* at it? 
If you "evaluated" the module and found it doesn't work for you, have
you let the author know this?

-- 
Jim

Copyright notice: all code written by the author in this post is
 released under the GPL. http://www.gnu.org/licenses/gpl.txt 
for more information.

a fortune quote ...
Prof:    So the American government went to IBM to come up with a
data   encryption standard and they came up with ... Student:
EBCDIC!" 


------------------------------

Date: Sun, 07 Dec 2003 11:58:00 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: How to write to drive A:\ from CGI Perl
Message-Id: <cBEAb.1505$nz.180@nwrddc01.gnilink.net>

[Please don't top-post]
[Please don't blindly full quote]
[Please use proper quote marks; I tried to repair as much as possible]

Cle wrote:
> "A. Sinan Unur" asu1@c-o-r-n-e-l-l.edu> wrote in message
> >I did
>> look at that page. What I saw was scary:
>>
>>  <title>http://lecan.com</title><meta http-equiv=Content-Type
>>  content=text/html;
>>  charset=UTF-8><font size=4 face=Verdana, helvetica><h5><body
>> text="blue"><br>Book List Database Sample <body text="blue"><br>
>>
>>  If that does not look weird to you, then please go back to basics.

> What browser did you look at my homepage to see only my HTML codes ? I
> could view its output from my windows 98 and XP by old Netscape 4.5 to
> 7.1 or IE 6.0. This srcipt run OK on Linux 7.1 server.

Why do you think he couldn't view the page in his browser? He didn't say
that.
But when looking at the source code of that page apparently he noticed so
many defects that any sane browser should have refused to display that page.
Please (assuming that the quoted text is an actual excerpt from the HTML
document), go back to basics and learn HTML first before trying anything
else.
Just those four lines contain half a dozen mistakes.

> I am having a better version of this type of Perl, and will invite you
> to look at it again in the near fututre, if you have time.

You will have a better version of Perl? Really?

> I could read a file in drive A of my computer from cgi-bin Telnet by
> having this line in my CGI Perl script:
>
> print "<iframe src =\"file:///A:\FileNmae.txt"></iframe>";

Who is "I"?
- I as in the user, i.e. the person in front of the computer? Well, no need
for HTML or Perl or CGI or anything. That "I" can just read any file on any
drive using Notepad or whatever editor I like.
- I as in the web browser, that is running on the users computer? Well, a
program run by a user has the same access as the user himself. A no-brainer.
- I as in a CGI script on some web server? BS, not possible. There is no way
for a web server to read anything from a clients computer (let's ignore
security bugs for the sake of this discussion).

> The reason that I like to write to drive A: or other removable drive,
> from CGI as a select option: I like to keep my file private from
> SVCHOST.EXE which has RPC calls to do Remote things from windows, they
> did remotely turned off my microphone at audio chatroom by RPC and
> network problem, among other commands from Paltak.

Who is "they"? A wild guess: the admins of your organization, maybe? Then
you may want to check if what you are trying to do does not violate the
organizations policies.

> So any hacker can
> read my file from another windows OS, even if they know my password
> from camera and other methods or not, someday, they might delete or
> copy my file from another PC.

Ok, there are two answers:
- If a hacker can access your computer (remotely), then you didn't do your
homework securing your computer. Period. That is solely your own fault and
has nothing to do whatsoever with SVCHOST, or CGI, or HTML, or Perl or any
other issue mentioned in this tedious thread
- Some people may point out that part of your problem is your OS and that
other OSes have less security flaws.

> I am hoping that you will have time to view my other output in near
> future. I am learning about server as you suggested.

I strongly suggest you go back and take a few additional classes about the
basics of computing and how the WWW works before coming back. Maybe I am
wrong, but you seem to be missing so much very basic knowledge that it is
difficult to even guess where you got stuck. And a newsgroup about Perl is
definitely not the right place for trying to figure it out.

As far as I am concerned: I won't read you any more. Don't take it personal,
but it's just not worth my time.

jue




------------------------------

Date: Sun, 07 Dec 2003 09:31:41 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: hwo to match more than 1 line?
Message-Id: <1es5tv8ppfs34pua3uesreni0hlkjvsc7d@4ax.com>

Hello,

How do I capture text that goes over 2 lines?

The text could be say

          <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
London  N500 5JJJ</TD></TR>

The following code only gets the text up to and including Northgate,

 if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
       print OUT ("$1 \n");
       }

Ideas please?!

Thanks

Geoff


------------------------------

Date: Sun, 07 Dec 2003 09:39:24 GMT
From: tiltonj@erols.com (Jay Tilton)
Subject: Re: hwo to match more than 1 line?
Message-Id: <3fd2f4a6.17752751@news.erols.com>

Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:

: How do I capture text that goes over 2 lines?
: 
: The text could be say
: 
:           <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
: London  N500 5JJJ</TD></TR>
: 
: The following code only gets the text up to and including Northgate,
: 
:  if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
                                           ^
-------------------------------------------^
The /m switch affects only how ^ and $ match, and your regex contains
neither of those metacharacters.

You want the /s switch, which lets . match a newline character.

:        print OUT ("$1 \n");
:        }



------------------------------

Date: Sun, 07 Dec 2003 10:52:05 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: hwo to match more than 1 line?
Message-Id: <bqutdk$26nabf$1@ID-184292.news.uni-berlin.de>

Geoff Cox wrote:
> How do I capture text that goes over 2 lines?
> 
> The text could be say
> 
>           <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
> London  N500 5JJJ</TD></TR>
> 
> The following code only gets the text up to and including
> Northgate,
> 
>  if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
>        print OUT ("$1 \n");
>        }
> 
> Ideas please?!

Use the right modifier. /m seems not to be what you want. Look up in

     perldoc perlre

what to use instead.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Sun, 7 Dec 2003 23:10:48 +1300
From: "Tintin" <me@privacy.net>
Subject: Re: hwo to match more than 1 line?
Message-Id: <bquud0$26rhgf$1@ID-172104.news.uni-berlin.de>


"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote in message
news:1es5tv8ppfs34pua3uesreni0hlkjvsc7d@4ax.com...
> Hello,
>
> How do I capture text that goes over 2 lines?
>
> The text could be say
>
>           <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
> London  N500 5JJJ</TD></TR>
>
> The following code only gets the text up to and including Northgate,
>
>  if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
>        print OUT ("$1 \n");
>        }
>
> Ideas please?!

You've discovered that regexes aren't very robust/easy/flexible when it
comes to parsing HTML.  Use one of the HTML parsers on CPAN.




------------------------------

Date: Sun, 07 Dec 2003 10:30:46 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <s106tvscc94vmavii01atj9fbpolctcdoi@4ax.com>

On Sun, 07 Dec 2003 09:39:24 GMT, tiltonj@erols.com (Jay Tilton)
wrote:

>Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:
>
>: How do I capture text that goes over 2 lines?
>: 
>: The text could be say
>: 
>:           <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
>: London  N500 5JJJ</TD></TR>
>: 
>: The following code only gets the text up to and including Northgate,
>: 
>:  if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
>                                           ^
>-------------------------------------------^
>The /m switch affects only how ^ and $ match, and your regex contains
>neither of those metacharacters.
>
>You want the /s switch, which lets . match a newline character.
>
>:        print OUT ("$1 \n");
>:        }


Jay,

thanks for that - I'm still not quite there - I am trying to get the
name and address only out of following - how should I do this?  Geoff

  <TR>
          <TD vAlign=top align=left colSpan=4>
            <H6><IMG height=10 alt=bullet
src="barnet_files/blue_bullet2.gif" 
            width=7>&nbsp;&nbsp;The College</H6></TD></TR>
        <TR>
          <TD align=left width="20%" colSpan=2><B>Head
Teacher</B></TD>
          <TD vAlign=top width="80%" colSpan=2>Fred Smith</TD></TR>
        <TR>
          <TD align=left width="20%" colSpan=2><B>Address</B></TD>
          <TD vAlign=top width="80%" colSpan=2>Cedar Road, Northgate,
Sussex  N777 5RJ</TD></TR>




------------------------------

Date: Sun, 07 Dec 2003 10:40:39 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <kp06tv8lstleourff1dq99qn05knr9h4t0@4ax.com>

On Sun, 07 Dec 2003 10:52:05 +0100, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:

>Geoff Cox wrote:
>> How do I capture text that goes over 2 lines?
>> 
>> The text could be say
>> 
>>           <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
>> London  N500 5JJJ</TD></TR>
>> 
>> The following code only gets the text up to and including
>> Northgate,
>> 
>>  if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
>>        print OUT ("$1 \n");
>>        }
>> 
>> Ideas please?!
>
>Use the right modifier. /m seems not to be what you want. Look up in
>
>     perldoc perlre
>
>what to use instead.

Gunnar,

think you are correct about the m but could you take a look at my
other email which show the text I am trying to use..?

Thanks

Geoff



------------------------------

Date: Sun, 07 Dec 2003 12:06:07 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: hwo to match more than 1 line?
Message-Id: <bqv1op$27o73h$1@ID-184292.news.uni-berlin.de>

Geoff Cox wrote:
> I am trying to get the name and address only out of following - how
> should I do this?  Geoff
> 
>   <TR>
>           <TD vAlign=top align=left colSpan=4>
>             <H6><IMG height=10 alt=bullet
> src="barnet_files/blue_bullet2.gif" 
>             width=7>&nbsp;&nbsp;The College</H6></TD></TR>
>         <TR>
>           <TD align=left width="20%" colSpan=2><B>Head
> Teacher</B></TD>
>           <TD vAlign=top width="80%" colSpan=2>Fred Smith</TD></TR>
>         <TR>
>           <TD align=left width="20%" colSpan=2><B>Address</B></TD>
>           <TD vAlign=top width="80%" colSpan=2>Cedar Road, Northgate,
> Sussex  N777 5RJ</TD></TR>

That was quite a different question. This might do what you want:

     if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
                    .+?
                    Address.+?<TD[^>]+>([^<]+)
                   /isx ) {
         print "Name: $1\nAddress: $2\n";
     }

But don't use it if you don't understand it. And even if you do 
understand it, you may want to use a module for parsing HTML instead.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Sun, 07 Dec 2003 11:26:13 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: hwo to match more than 1 line?
Message-Id: <p7EAb.7379$dA1.611@nwrddc03.gnilink.net>

Geoff Cox wrote:

You are asking the wrong question, but anyway...

> How do I capture text that goes over 2 lines?
>
> The text could be say
>
>           <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
> London  N500 5JJJ</TD></TR>
>
> The following code only gets the text up to and including Northgate,
>
>  if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {

To answer the question you did ask in the subject:
You are using the wrong modifier. Actually you are using exactly the
opposite one to the one you need.
Please "perldoc perlre" about what 'm' and what 's' do.

[...]
>
> Ideas please?!

The question you should have asked but didn't ask is: what is the right tool
to parse HTML?

And as has been answered a gazillion of times: parsing HTML correctly is
rocket science and nobody with a sane mind would attempt to do it using REs.
See 'perldoc -q "remove HTML"' for why and how and what to do instead.

jue




------------------------------

Date: Sun, 07 Dec 2003 12:11:28 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <q266tv4jtv0tkngg6aotcc2fvj4sh6s33u@4ax.com>

On Sun, 07 Dec 2003 12:06:07 +0100, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:

>Geoff Cox wrote:
>> I am trying to get the name and address only out of following - how
>> should I do this?  Geoff
>> 
>>   <TR>
>>           <TD vAlign=top align=left colSpan=4>
>>             <H6><IMG height=10 alt=bullet
>> src="barnet_files/blue_bullet2.gif" 
>>             width=7>&nbsp;&nbsp;The College</H6></TD></TR>
>>         <TR>
>>           <TD align=left width="20%" colSpan=2><B>Head
>> Teacher</B></TD>
>>           <TD vAlign=top width="80%" colSpan=2>Fred Smith</TD></TR>
>>         <TR>
>>           <TD align=left width="20%" colSpan=2><B>Address</B></TD>
>>           <TD vAlign=top width="80%" colSpan=2>Cedar Road, Northgate,
>> Sussex  N777 5RJ</TD></TR>
>
>That was quite a different question. This might do what you want:
>
>     if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>                    .+?
>                    Address.+?<TD[^>]+>([^<]+)
>                   /isx ) {
>         print "Name: $1\nAddress: $2\n";
>     }
>
>But don't use it if you don't understand it. And even if you do 
>understand it, you may want to use a module for parsing HTML instead.

Gunnar,

I have tried an HTML parser - I do het the text OK but would like to
understand your regex ... what does the [^<] stand for? 

Geoff



------------------------------

Date: Sun, 07 Dec 2003 12:40:54 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <kr76tvk5kvp048c5dgiego3c6aare76th0@4ax.com>

On Sun, 07 Dec 2003 11:26:13 GMT, "Jürgen Exner"
<jurgenex@hotmail.com> wrote:


>And as has been answered a gazillion of times: parsing HTML correctly is
>rocket science and nobody with a sane mind would attempt to do it using REs.
>See 'perldoc -q "remove HTML"' for why and how and what to do instead.

OK ! will go for the HTML parser!

Cheers

Geoff

>
>jue
>



------------------------------

Date: Sun, 07 Dec 2003 13:39:47 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: hwo to match more than 1 line?
Message-Id: <bqv792$26q364$1@ID-184292.news.uni-berlin.de>

Geoff Cox wrote:
> what does the [^<] stand for?

It's a character class representing any character but '<'.

If you want to learn regular expressions, you need to study

     http://www.perldoc.com/perl5.8.0/pod/perlre.html

Not once, not twice, but over and over again. The answer to your 
question, and most other questions about Perl regular expressions, can 
be found there.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Sun, 07 Dec 2003 12:44:29 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: hwo to match more than 1 line?
Message-Id: <NgFAb.8117$%01.903@nwrddc02.gnilink.net>

Geoff Cox wrote:
> On Sun, 07 Dec 2003 12:06:07 +0100, Gunnar Hjalmarsson
> <noreply@gunnar.cc> wrote:
[...]
>>     if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
[...]
> understand your regex ... what does the [^<] stand for?

From "perldoc perlre":
    In particular the following metacharacters have their standard
    *egrep*-ish meanings:
[...]
        []  Character class

However for some unknown reason there is no explanation of the meaning of ^
in the docs.
Only for POSIX classes the docs mention

    You can negate the [::] character classes by prefixing the class name
    with a '^'.

From this you have to infer that you can negate a non-POSIX class, too.

To answer the original question:
[^<] stands for the character class which contains every character except
the less-than sign.

jue




------------------------------

Date: Sun, 07 Dec 2003 13:48:10 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: hwo to match more than 1 line?
Message-Id: <bqv7oq$26q9mp$1@ID-184292.news.uni-berlin.de>

Jürgen Exner wrote:
> Geoff Cox wrote:
>> what does the [^<] stand for?
> 
> From "perldoc perlre":
>     In particular the following metacharacters have their
>     standard *egrep*-ish meanings:
> [...]
>         []  Character class
> 
> However for some unknown reason there is no explanation of the
> meaning of ^ in the docs.

Hmm.. You are right. Shouldn't somebody better do something about
that? After all, it's one of the most common constructs in Perl
regular expressions.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Sun, 7 Dec 2003 06:40:54 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: hwo to match more than 1 line?
Message-Id: <slrnbt67um.kbp.tadmc@magna.augustmail.com>

Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:

> but could you take a look at my
> other email


This is not email.

This is a Usenet newsgroup.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Sun, 7 Dec 2003 06:44:48 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: hwo to match more than 1 line?
Message-Id: <slrnbt6860.kbp.tadmc@magna.augustmail.com>

Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:

> what does the [^<] stand for? 


It doesn't "stand for" anything, it "matches" something though.

It matches any single character that is not the "<" character.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Sun, 7 Dec 2003 13:45:16 +0000
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <Pine.LNX.4.53.0312071331410.18641@ppepc56.ph.gla.ac.uk>

On Sun, 7 Dec 2003, Gunnar Hjalmarsson wrote:

> > However for some unknown reason there is no explanation of the
> > meaning of ^ in the docs.
>
> Hmm.. You are right. Shouldn't somebody better do something about
> that?

FWIW: I hadn't gained a close acquaintance with regexes before I
started on Perl, and I recall also being a bit disappointed that the
Perl documentation seemed to be written for readers who already would
have a working acquaintance with regexes and were chiefly looking for
details of the specific Perl embodiment.

I noticed more recently that the Cambridge PCRE library (Perl
compatible regular expressions) has a general presentation of this
regular expression syntax, which (as the name implies) is deliberately
close to Perl.  It starts about halfway down the composite page
http://www.pcre.org/pcre.txt - below the heading:

 PCRE REGULAR EXPRESSION DETAILS

which some readers might find to be a useful adjunct to the Perl
documentation.  Hope this helps a bit.



------------------------------

Date: Sun, 07 Dec 2003 15:23:52 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <r9h6tvk2lrs2su2nubj45hdkfjr4u24d4g@4ax.com>

On Sun, 07 Dec 2003 13:39:47 +0100, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:

>Geoff Cox wrote:
>> what does the [^<] stand for?
>
>It's a character class representing any character but '<'.

Gunnar,

OK thanks for that - I have printed off the perlre pages!

Having tried the HTML Parser module it gives me too much text ... am I
able to use it selectively?

Geoff




>If you want to learn regular expressions, you need to study
>
>     http://www.perldoc.com/perl5.8.0/pod/perlre.html
>
>Not once, not twice, but over and over again. The answer to your 
>question, and most other questions about Perl regular expressions, can 
>be found there.



------------------------------

Date: Sun, 07 Dec 2003 15:24:32 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <neh6tvofpd42jkf0g5uou3fils3auq4b7d@4ax.com>

On Sun, 07 Dec 2003 12:44:29 GMT, "Jürgen Exner"
<jurgenex@hotmail.com> wrote:

>Geoff Cox wrote:
>> On Sun, 07 Dec 2003 12:06:07 +0100, Gunnar Hjalmarsson
>> <noreply@gunnar.cc> wrote:
>[...]
>>>     if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>[...]
>> understand your regex ... what does the [^<] stand for?
>
>From "perldoc perlre":
>    In particular the following metacharacters have their standard
>    *egrep*-ish meanings:
>[...]
>        []  Character class
>
>However for some unknown reason there is no explanation of the meaning of ^
>in the docs.
>Only for POSIX classes the docs mention
>
>    You can negate the [::] character classes by prefixing the class name
>    with a '^'.
>
>From this you have to infer that you can negate a non-POSIX class, too.
>
>To answer the original question:
>[^<] stands for the character class which contains every character except
>the less-than sign.
>
>jue

Thanks Jue ...

Cheers

Geoff


>



------------------------------

Date: Sun, 07 Dec 2003 15:50:31 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: hwo to match more than 1 line?
Message-Id: <nmi6tvchjeh44uc4hraugsptpfvc61if33@4ax.com>

On Sun, 07 Dec 2003 12:06:07 +0100, Gunnar Hjalmarsson
<noreply@gunnar.cc> wrote:


>     if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
>                    .+?
>                    Address.+?<TD[^>]+>([^<]+)
>                   /isx ) {
>         print "Name: $1\nAddress: $2\n";
>     }

Gunnar,

the above is not working for me at the moment - if you have the time
(and patience!) it would really help me if you could "talk" me through
it ...

Cheers

Geoff




>
>But don't use it if you don't understand it. And even if you do 
>understand it, you may want to use a module for parsing HTML instead.



------------------------------

Date: 7 Dec 2003 08:01:16 -0800
From: utsuxs@hotmail.com (bob)
Subject: Why can't I parse google search results?
Message-Id: <51c3a5d3.0312070801.5093c8cf@posting.google.com>

I'm trying to extract data from the results page of search engines
with these two
modules  use LWP::Simple and HTML::Parse, and the get command.

I can extract from yahoo and altavista but google is not cooperating.

I get this error message

Can't fetch HTML from http://www.google.com/search?q=smeghead at
parsing.pl line 13.



I obviously missing something but I don't know what it is.  Help would
be greatly appreaciated.  Thank you.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5906
***************************************


home help back first fref pref prev next nref lref last post