[22675] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4896 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Apr 26 00:10:40 2003

Date: Fri, 25 Apr 2003 21:10:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 25 Apr 2003     Volume: 10 Number: 4896

Today's topics:
        Just curous about this- are REGEXes rigorously determin (Sara)
    Re: Just curous about this- are REGEXes rigorously dete (Walter Roberson)
        Parsing HTML file of tables <myicq@gmx_fjernmig_.net>
    Re: Parsing HTML file of tables <jurgenex@hotmail.com>
    Re: parsing perl code (audit, summarize, trace) <ericw@nospam.ku.edu>
    Re: Regex greediness question (Tad McClellan)
    Re: RegEx question <spam@thecouch.homeip.net>
    Re: Tough question for the guru's; Grep Once, Awk Twice (Agrapha)
    Re: Tough question for the guru's; Grep Once, Awk Twice (Agrapha)
    Re: Wildcard DNS <abigail@abigail.nl>
    Re: XS or SWIG <ericw@nospam.ku.edu>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 25 Apr 2003 17:08:09 -0700
From: genericax@hotmail.com (Sara)
Subject: Just curous about this- are REGEXes rigorously deterministic
Message-Id: <776e0325.0304251608.29ddb7c0@posting.google.com>

OK, probably just the "scientist" in me, but with such an enormously
large set of possibilities, I wonder if regex results
deterministically map into a 1:1 into set of correct solutions (1
input, 1 regex = 1 result?).

I know *I* sure couldn't prove it. After being out of school this long
proving 2+2=4 would be s stretch. But its always in the back of my
mind as I compose regexes that there may be sets of inputs & regexes
that have 2 or more valid solutions? I doubt that the trivial ones I
compose would ever cross that boundry, but who knows?

Are regexes *theory*, that is a proposal that cannot be proven, or are
they proven to map 1:1 from the input to the output domains?

Just a curiosity, sorry for the interruption.. Oh by the way I have
this COBOL program can someone write it in Perl for me?? *duck*

Have a good fin de semana everyone!
-Gx


------------------------------

Date: 26 Apr 2003 00:46:38 GMT
From: roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)
Subject: Re: Just curous about this- are REGEXes rigorously deterministic
Message-Id: <b8ckte$lbu$1@canopus.cc.umanitoba.ca>

In article <776e0325.0304251608.29ddb7c0@posting.google.com>,
Sara <genericax@hotmail.com> wrote:
:OK, probably just the "scientist" in me, but with such an enormously
:large set of possibilities, I wonder if regex results
:deterministically map into a 1:1 into set of correct solutions (1
:input, 1 regex = 1 result?).

:I know *I* sure couldn't prove it. After being out of school this long
:proving 2+2=4 would be s stretch. But its always in the back of my
:mind as I compose regexes that there may be sets of inputs & regexes
:that have 2 or more valid solutions? I doubt that the trivial ones I
:compose would ever cross that boundry, but who knows?

There is more than one valid representation for any regex in perl.
You can add indefinitely levels of bracketing, and any * or + can
be re-written in {m,n} notation. And of course,  x+  is the same
as  xx*  .

This doesn't even consider the possibilities of re-writting using
zero-width assertions or other extended Perl regex features.


Any time that the regex involves alternation, it's valid to switch the
order of the alternatives.


Classical pure regular expressions allow only grouping (brackets),
alternation (usually written as '+') and indefinite numbers of repetitions
(usually written as '*'). Because of alternation, there is no one "right"
way of writing classical regular expressions. You could, however, impose
a canonical order for regular expressions by imposing a topological sort
amongst alternatives  (as long as you only have a finite number of
possible symbols ;-)  )
(usuall


-- 
"Meme" is self-referential; memes exist if and only if the "meme" meme
exists. "Meme" is thus logically a meta-meme; but until the existance
of meta-memes is more widely recognized, "meta-meme" is not a meme.
   -- A Child's Garden Of Memes


------------------------------

Date: Fri, 25 Apr 2003 22:13:20 GMT
From: TDJ <myicq@gmx_fjernmig_.net>
Subject: Parsing HTML file of tables
Message-Id: <Xns936925EB28B6tdjtdjtdj@212.54.64.135>

Any hints on this one: I am trying to parse an HTML file of
tables (for a TV listing). Problem is that programs with and without
"long descriptions" are mixed in the HTML file.

I want to extract the data from both short and long entries,
and come to the following data structure for input into an
SQL server:

time|program|short description|long description

(on short entries, the long description is of course just empty)

How to parse the tables if I want to mix data from 2 _or_ 3 tables ?

Any hints ?


These two structures are mixed:
(----------- is added by me for visibility reasons)

Short entry:
    <table>
      <tr>
        <td valign="top" width="2%">
          <img ...>
        </td>
        <td>
          07.00 program title
        </td>
      </tr>
    </table>
    	------------------------
    <table>
      <tr>
        <td valign="top" width="2%">
          <img ..>
        </td>
        <td>
          Short description bla bal bal
        </td>
      </tr>
    </table>



Long entry


    <div id="ID99999999">
      <table>
        <tr>
          <td>
            <img>
          </td>
          <td width="98%" valign='top'>
            <a href="#">07.25 Program title</a> 
            (9) 
          </td>
        </tr>
      </table>
    </div>
    	--------------------
    <table>
      <tr>
        <td>
           
        </td>
        <td>
          Short description of the program
        </td>
      </tr>
    </table>
    	--------------------
    <div id="ID99999999t2z">
      <table>
        <tr>
          <td>
             
          </td>
          <td>
    	    	    	Long description bla bla bla blab alb alb albla bla bla
    	    	    	more bla bal bal
    	     </td>
        </tr>
      </table>
    </div>

====================================================================


------------------------------

Date: Fri, 25 Apr 2003 22:27:52 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: Parsing HTML file of tables
Message-Id: <IDiqa.1663$B61.1652@nwrddc01.gnilink.net>

TDJ wrote:
> Any hints on this one: I am trying to parse an HTML file of
[...]
> How to parse the tables if I want to mix data from 2 _or_ 3 tables ?

Trivial. You would use HTML::Parser or one of its cousins.

jue




------------------------------

Date: Sat, 26 Apr 2003 03:04:30 GMT
From: Eric Wilhelm <ericw@nospam.ku.edu>
Subject: Re: parsing perl code (audit, summarize, trace)
Message-Id: <pan.2003.04.25.22.01.37.746939.5659@nospam.ku.edu>

On Thu, 24 Apr 2003 08:07:51 -0500, Winfried Koenig wrote:

> Eric Wilhelm wrote:
>> Is there a module which allows you to parse perl code?

> start with:
> 
> $ perl -MO=Xref some_script.pl
> 
> then read 'perldoc O' and 'perldoc B::Xref'.


Wow!  Thanks, this is great.

--Eric


------------------------------

Date: Fri, 25 Apr 2003 19:21:41 -0500
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Regex greediness question
Message-Id: <slrnbajk8l.78i.tadmc@magna.augustmail.com>

Sara <genericax@hotmail.com> wrote:
> "Tman" <nerdy1@snet.net> wrote in message news:<Z8Rpa.1172$Hg4.279561317@newssvr10.news.prodigy.com>...
>> Am I misunderstanding something here?
>> 
>> C:\Temp>perl -de 1
>> .....
>>   DB<1> p "aaaaaabaaaaa" =~ /a(.*?b.*?)a/
>> aaaaab
>>
> 
> in a regex, would 
> 
>   /(something)*?/ 
> 
> ever be any different than 
> 
>   /(something)*/

> Wouldn't both match zero-many sometings?


They will both match (return true) for *every* possible string
that you might put into $_.

Try it with a few strings of your own choosing...

Beware of matching the empty string!


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Fri, 25 Apr 2003 22:16:47 -0400
From: Mina Naguib <spam@thecouch.homeip.net>
Subject: Re: RegEx question
Message-Id: <m_lqa.14308$_w.275198@wagner.videotron.net>

-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1

Yang Xiao wrote:
> Hi all,
> How to I simplify this?
> 
> if(/(NYWS)\d{1,3}\$?/ or /(LNWS)\d{1,3}\$?/ or /CHARTER/ or /LNFFTW/){
> do something here...
> }

if (/((LN|NY)WS\d{1,3})|CHARTER|LNFFTW/) {
do something here
}

-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE+qewSeS99pGMif6wRAm48AJ9B2utOMsOHgsMiFNJC1kzfZqN4vACfe2q2
dKz1asBknGJi7CnGzLuTC5k=
=00YY
-----END PGP SIGNATURE-----



------------------------------

Date: 25 Apr 2003 19:38:55 -0700
From: brian@box201.com (Agrapha)
Subject: Re: Tough question for the guru's; Grep Once, Awk Twice (or more)
Message-Id: <11aabb15.0304251838.414bb00b@posting.google.com>

brian@box201.com (Agrapha) wrote in message news:<11aabb15.0304241544.1eff4b6a@posting.google.com>...
> "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de> wrote in message news:<b88igv$rc2$1@nets3.rz.RWTH-Aachen.DE>...
> 
> > Could we please put an end to all this noise? I think it has been
> 
> Agreed. A few less flame throwers will be much appreciated. I will do
>

for those who have forgotten what my original breach of etiquette was,
the statements which initiated first blood was a combination of 2:
 1) "First let me say I know nothing about perl..."
 2) "..can someone help me convert it to and efficient perl script?"

This was seen as asking for free code. I intended it as honest fact. 

On a brighter note, I will post a before and after code sometime this
weekend.


------------------------------

Date: 25 Apr 2003 20:01:45 -0700
From: brian@box201.com (Agrapha)
Subject: Re: Tough question for the guru's; Grep Once, Awk Twice (or more)
Message-Id: <11aabb15.0304251901.32ad5b6c@posting.google.com>

"Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de> wrote in message news:<b86v2g$dbh$1@nets3.rz.RWTH-Aachen.DE>...
> >Also sprach Agrapha:
> > 
> > Looking at the arrays now. Is it reasonable to load an array with
> > 100,000 lines (records)? Really I don't know. 
> 
> Not if you can avoid it. 
> Most problems require to look at each line of a file only once. 
> In such a case you should not slurp a whole file into an
> array. Instead, iterate over the file line-wise. Perl makes that quite
> easy so it's a common idiom:
> 
>     open F, "file" or die $!;
>     while (<F>) {
>         # each line now in $_ including terminating newline
>         ...
>     }

I love the elegance of looking at a file only once. This script needs
to look at three different files. Is it possible to slurp 3 files into
F?


------------------------------

Date: 25 Apr 2003 22:47:38 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Wildcard DNS
Message-Id: <slrnbajeoa.fir.abigail@alexandra.abigail.nl>

Chumley the Walrus (springb2k@yahoo.com) wrote on MMMDXXIV September
MCMXCIII in <URL:news:1ef65641.0304251039.3fbd8eec@posting.google.com>:
%%  Does Wildcard DNS allow you to create domain aliases on your server?
%%  
%%  myalias.mydomain.com , for example?
%%  
%%  Is this something that a website operator (other than system
%%  administrator )can implement on unix and linux servers easily?


Do you have a Perl question?



Abigail
-- 
BEGIN {print "Just "   }
CHECK {print "another "}
INIT  {print "Perl "   }
END   {print "Hacker\n"}


------------------------------

Date: Fri, 25 Apr 2003 22:26:38 GMT
From: Eric Wilhelm <ericw@nospam.ku.edu>
Subject: Re: XS or SWIG
Message-Id: <pan.2003.04.25.17.23.41.640785.5659@nospam.ku.edu>

On Fri, 25 Apr 2003 13:07:43 -0500, Peter Wilson wrote:

>Does anyone know of a
> book / web site / set of examples of how to write XS or SWIG or have any
> advice on which is best to use. I have a header file (.h) and the
> library (.dll) and no source files.
> 
> Oh and my C is very bad thus I am trying to avoid it as much as I can.
> From an example in C the function I am trying to call simply calls
> 
> ad_textul->ad_name
> 
> returning the ad_name from the structured definition ad_textul or at
> least that's what I think its doing.
> 
I have used SWIG to write a perl wrapper for just such a toolkit.  It has
support for "shadow classes" which (I think) would give you read-write
interface to your variable.

You would probably be okay without writing extra C code, provided that
you aren't trying to get write-access to arrays (at which point you have
to get into perlguts and figure out how to translate from one to the
other).

I too am not a C programmer.  I looked at both systems and SWIG seemed to
be the one with the shorter learning curve and smaller interface code.  I
have found a couple of issues with it (not declaring variables, etc), but
have been able to work-around them using short intermediate scripts and
includes between generating and compiling.

The manual is HUGE and very informative (www.swig.org).

--Eric


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4896
***************************************


home help back first fref pref prev next nref lref last post