[23482] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 5695 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Oct 22 11:10:45 2003

Date: Wed, 22 Oct 2003 08:10:12 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 22 Oct 2003     Volume: 10 Number: 5695

Today's topics:
        Regex to extract row data from text <timbenz@timbenz.com>
    Re: Regex to extract row data from text (Anno Siegel)
    Re: Regex to extract row data from text <djo@pacifier.com>
    Re: Regex to extract row data from text <tore@aursand.no>
    Re: Regex to extract row data from text <tassilo.parseval@rwth-aachen.de>
    Re: Regex to extract row data from text <bernard.el-haginDODGE_THIS@lido-tech.net>
    Re: Regex to extract row data from text <syscjm@gwu.edu>
        Rows not being returned ? (Sylvie Stone)
    Re: Stale data in DB_File? <admin@asarian-host.net>
    Re: Taint - having some real trouble here, taint/perl e <abigail@abigail.nl>
    Re: Taint differences between 5.8.0 and 5.8.1? (Rafael Garcia-Suarez)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 22 Oct 2003 07:25:26 GMT
From: TimBenz <timbenz@timbenz.com>
Subject: Regex to extract row data from text
Message-Id: <Xns941C44EA5FD7timbenztimbenzcom@66.75.162.196>

I need a RegEx that I can use to scroll through textual data to extract 
lines in a semi-regular format. The original data is a form something like 
this:

AAA AAAAA BBBB BB CCCCC DDDDD EEEEEE FFFFFFF

Note, there are zero or more spaces in the "A" entity and the "B" entity, 
and the rest of the entities have no spaces. Second, there is no fixed 
length for any of the entities. They can be any non-zero length. About the 
only point of consistency is that the "B" entity has a finite number of 
forms, about fifteen. So far my attempt has been like this:

(.*)(COM|COMMON SHARES|Domestic Common)\s{1,}(.*?)\s{1,}(.*?)\s{1,}(.*?)\s

From which I extract $1, $3, and $5. 

How do I spool through the whole text file and extract every line for which 
the above holds? Are there better ways of doing this without the arduous 
part where I have to detail all the variants of the B entity?

Thanks.


------------------------------

Date: 22 Oct 2003 08:16:03 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Regex to extract row data from text
Message-Id: <bn5ec3$ku3$1@mamenchi.zrz.TU-Berlin.DE>

TimBenz  <timbenz@timbenz.com> wrote in comp.lang.perl.misc:
> I need a RegEx that I can use to scroll through textual data to extract 
> lines in a semi-regular format. The original data is a form something like 
> this:
> 
> AAA AAAAA BBBB BB CCCCC DDDDD EEEEEE FFFFFFF
> 
> Note, there are zero or more spaces in the "A" entity and the "B" entity, 
> and the rest of the entities have no spaces. Second, there is no fixed 
> length for any of the entities. They can be any non-zero length. About the 
> only point of consistency is that the "B" entity has a finite number of 
> forms, about fifteen. So far my attempt has been like this:
> 
> (.*)(COM|COMMON SHARES|Domestic Common)\s{1,}(.*?)\s{1,}(.*?)\s{1,}(.*?)\s

Which is the part that is supposed to catch the "B" entry?  The one
starting "(COM..." has only three alternatives.

> From which I extract $1, $3, and $5. 

What about $2?

> How do I spool through the whole text file and extract every line for which 
> the above holds?

    my @extract;
    while ( <FILE> ){
        push @extract, $_ if /.../;
    }

>                   Are there better ways of doing this without the arduous 
> part where I have to detail all the variants of the B entity?

No.  From what you say, it is only possible to delimit the "A" record
after having identified the "B" record.

Anno


------------------------------

Date: Wed, 22 Oct 2003 08:23:01 GMT
From: "David Oswald" <djo@pacifier.com>
Subject: Re: Regex to extract row data from text
Message-Id: <F7rlb.15325$YO5.7353588@news3.news.adelphia.net>


"TimBenz" <timbenz@timbenz.com> wrote in message

> I need a RegEx that I can use to scroll through textual data to extract
> lines in a semi-regular format. The original data is a form something like
> this:
>
> AAA AAAAA BBBB BB CCCCC DDDDD EEEEEE FFFFFFF
>
> Note, there are zero or more spaces in the "A" entity and the "B" entity,
> and the rest of the entities have no spaces. Second, there is no fixed
> length for any of the entities. They can be any non-zero length. About the
> only point of consistency is that the "B" entity has a finite number of
> forms, about fifteen. So far my attempt has been like this:
>
> (.*)(COM|COMMON SHARES|Domestic Common)\s{1,}(.*?)\s{1,}(.*?)\s{1,}(.*?)\s
>
> From which I extract $1, $3, and $5.

The biggest problem is, how are you planning on delimiting the A segment
from the B segment, if the A segment itself can contain any one-or-more
number of characters that include the space, and yet it's a space that
separates
A from B?  The only way to solve that problem IS to enumerate through
alternation
all the forms that B can take, so that you can use B as an anchor-point.

Fortunately, you don't have to do it in quite so ugly a way.

Try something like this:

while ( my $line = <DATA> );
    my $re_alternates = join "|", @alternates_list;
    if ( my ($first, $third, $fifth) = $line =~
        m/^(.+?)(?:$re_alternates)\s+(\w+)\s+\w+\s+(\w+)\s+$/ ) {
            #do your stuff...
    }
}

 ...to explain...
You said you only want to capture the first, third and fifth groupings.  So
I only used
capturing parenthesis on those portions of the match.  I used non-capturing
parens
to confine the alternation.  And all of the alternates are built up into
$re_alternates.

Finally, instead of using $1, $2, $3, I just used the regexp in list context
so that the
scalars $first, $third, and $fifth would be populated in case of a match.

Good luck...







------------------------------

Date: Wed, 22 Oct 2003 11:43:10 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: Regex to extract row data from text
Message-Id: <pan.2003.10.22.07.53.10.729244@aursand.no>

On Wed, 22 Oct 2003 07:25:26 +0000, TimBenz wrote:
> The original data is a form something like this:
> [...]

Why don't you post a bit of the _excact_ data you're trying to parse, thus
making it a lot easier for us?

Chance is that you'll get a few answers to your original post, and then
you goes "yeah, but the data could also include...blah...blah...".


-- 
Tore Aursand <tore@aursand.no>


------------------------------

Date: 22 Oct 2003 09:52:40 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: Regex to extract row data from text
Message-Id: <bn5k18$ruh$1@nets3.rz.RWTH-Aachen.DE>

Also sprach Tore Aursand:

> On Wed, 22 Oct 2003 07:25:26 +0000, TimBenz wrote:
>> The original data is a form something like this:
>> [...]
> 
> Why don't you post a bit of the _excact_ data you're trying to parse, thus
> making it a lot easier for us?
> 
> Chance is that you'll get a few answers to your original post, and then
> you goes "yeah, but the data could also include...blah...blah...".

This chance is even higher when he posts a sample of exact data. 

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


------------------------------

Date: Wed, 22 Oct 2003 09:59:18 +0000 (UTC)
From: "Bernard El-Hagin" <bernard.el-haginDODGE_THIS@lido-tech.net>
Subject: Re: Regex to extract row data from text
Message-Id: <Xns941C79BFE9DCCelhber1lidotechnet@62.89.127.66>

"Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de> wrote: 

> Also sprach Tore Aursand:
> 
>> On Wed, 22 Oct 2003 07:25:26 +0000, TimBenz wrote:
>>> The original data is a form something like this:
>>> [...]
>> 
>> Why don't you post a bit of the _excact_ data you're trying to parse,
>> thus making it a lot easier for us?
>> 
>> Chance is that you'll get a few answers to your original post, and
>> then you goes "yeah, but the data could also
>> include...blah...blah...". 
> 
> This chance is even higher when he posts a sample of exact data. 


Just for completness - ahem - this chance is the *highest* when he posts no 
sample data at all.


:-)


Cheers,
Bernard


------------------------------

Date: Wed, 22 Oct 2003 06:25:55 -0400
From: Chris Mattern <syscjm@gwu.edu>
Subject: Re: Regex to extract row data from text
Message-Id: <3F965B33.2050506@gwu.edu>

Tassilo v. Parseval wrote:
> Also sprach Tore Aursand:
> 
> 
>>On Wed, 22 Oct 2003 07:25:26 +0000, TimBenz wrote:
>>
>>>The original data is a form something like this:
>>>[...]
>>
>>Why don't you post a bit of the _excact_ data you're trying to parse, thus
>>making it a lot easier for us?
>>
>>Chance is that you'll get a few answers to your original post, and then
>>you goes "yeah, but the data could also include...blah...blah...".
> 
> 
> This chance is even higher when he posts a sample of exact data. 
> 
When you're parsing input data, what is necessary is a true understanding
of its syntax, not samples which will almost invariably fail to cover
certain cases.  "The data looks like such-and-so" or "The data is in
a form like this" is usually a red flag that the speaker doesn't understand
his input data well enough to parse it properly.

                  Chris Mattern



------------------------------

Date: 22 Oct 2003 06:27:00 -0700
From: sylviestone@canada.com (Sylvie Stone)
Subject: Rows not being returned ?
Message-Id: <181a24a8.0310220527.317f639b@posting.google.com>

Hi Group!

Can someone PLEASE tell me why this is not returning the $one variable
?
I'l pulling results from a survey database table where the answers was
a range from 1 - 10. This is howe I'm unsuccessfully printing the
results:

my @numbers = (["0"],["1"],
["2"],["3"],["4"],["5"],["6"],["7"],["8"],["9"],["10"]);
my $sth  = $dbh->prepare('select count(*) from $SURVEYTABLE where info
= ?');
my $i = 0;
foreach (@numbers) {
 my ($nums) = @$_;
 $sth->execute( $nums );
 ++$i;
 my $one = $sth->fetchrow_array();
 print "one is $one<br>num is $nums<br>\n";
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants did not
answer this question" if ($nums eq "0");;
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
1<br>" if ($nums eq "1");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
2<br>" if ($nums eq "2");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
3<br>" if ($nums eq "3");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
4<br>" if ($nums eq "4");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
5<br>" if ($nums eq "5");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
6<br>" if ($nums eq "6");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
7<br>" if ($nums eq "7");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
8<br>" if ($nums eq "8");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
9<br>" if ($nums eq "9");
 print "<font color=\"#0000FF\"><b>$one</b></font> respondants rated
10<br>" if ($nums eq "10");
}


TIA!

Sylvie.


------------------------------

Date: Wed, 22 Oct 2003 13:24:52 +0200
From: "Mark" <admin@asarian-host.net>
Subject: Re: Stale data in DB_File?
Message-Id: <DOydnXeW6YGQ9AuiRVn-sg@giganews.com>

"Bob Walton" <invalid-email@rochester.rr.com> wrote in message
news:3F95F448.5080505@rochester.rr.com...

> So the basic operations are:  In a given process, establish a lock, then
> tie the DBM-type file, do whatever to it (read, write), untie the
> DBM-type file, and remove the lock. For best results, use a separate
> empty file (or a file whose contents don't matter) for locking purposes
> and only touch the locked DBM-type file when your process has a lock on
> the lock file.
>
> Note also that you risk a lot more than just loss of data synchronicity
> ("stale data") if you don't lock -- you can easily corrupt the file.


Thank you for your answer. It is perfectly clear now. :)

- Mark




------------------------------

Date: 22 Oct 2003 08:58:42 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Taint - having some real trouble here, taint/perl experts, please help
Message-Id: <slrnbpchm2.jqa.abigail@alexandra.abigail.nl>

Darren Dunham (ddunham@redwood.taos.com) wrote on MMMDCCIII September
MCMXCIII in <URL:news:V6hlb.2509$wY3.1498@newssvr25.news.prodigy.com>:
^^  
^^ > This also times out and kills the web page AND the running command
^^ > 'client':
^^  
^^ >   exec("/usr/src/client/client",$p1,$p2,$p3,$p4);
^^ >   exit;
^^  
^^  That exit line is never executed.  The perdoc on exec will tell you
^^  why.  If you want to do that, use a fork as above.


If you look up the perldoc on exec yourself, you see in the first
paragraph that it *is* possible that 'exec' returns.



Abigail
-- 
perl -le 's[$,][join$,,(split$,,($!=85))[(q[0006143730380126152532042307].
          q[41342211132019313505])=~m[..]g]]e and y[yIbp][HJkP] and print'


------------------------------

Date: 22 Oct 2003 12:09:37 GMT
From: rgarciasuarez@free.fr (Rafael Garcia-Suarez)
Subject: Re: Taint differences between 5.8.0 and 5.8.1?
Message-Id: <slrnbpcsgo.2ld.rgarciasuarez@rafael.serd.lyon.hexaflux.loc>

Matthew Braid wrote:
>I was just trolling through my messages file recently and noticed that 
>ever since I upgraded from 5.8.0 to 5.8.1 I've been getting a lot of 
>'Insecure dependency' (ie taint) errors from one of my daemon scripts.
>
>On closer inspection I narrowed it down to an exec call in MIME::Lite. 
>This chunk of code had not produced an error before while taint mode is 
>on (and in fact the comments around that chunk of code basically said 
>'Run sendmail in a taint-safe fashion').
>
>Has exec become more taint-aware between 5.8.0 and 5.8.1?

A few taint bugs have been corrected.
Does deleting $ENV{TERM} help ?

>I worked around it by untainting everything passed to exec, but it was a 
>little surprising and I haven't seen anything mentioned about the change 
>in documentation.

You can also use the -t command-line switch in place of -T when
debugging taint mode programs : it turns tainting fatal errors into
warnings. (see perlrun.)

-- 
Uniform is not *NIX


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5695
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[23482] in Perl-Users-Digest

Perl-Users Digest, Issue: 5695 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Oct 22 11:10:45 2003

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Oct 22 11:10:45 2003