[22447] in Perl-Users-Digest
Perl-Users Digest, Issue: 4668 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Mar 5 21:06:02 2003
Date: Wed, 5 Mar 2003 18:05:14 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 5 Mar 2003 Volume: 10 Number: 4668
Today's topics:
Re: A problem about the time function <me@privacy.net>
Re: CGI query string help <wsegrave@mindspring.com>
Re: CGI query string help <wsegrave@mindspring.com>
Re: CGI query string help <flavell@mail.cern.ch>
Re: CGI query string help <wsegrave@mindspring.com>
cookie-lib.pl won't delete cookies <mdudley@execonn.com>
Re: cookie-lib.pl won't delete cookies <spamtrap@nowhere.com>
Re: Counting matches in a regular expression (Anno Siegel)
Re: Counting matches in a regular expression <goldbb2@earthlink.net>
Re: Counting matches in a regular expression (Anno Siegel)
Re: DBD and DBI on Solaris 64 bit <makbo@pacbell.net>
Re: DBD and DBI on Solaris 64 bit <rereidy@indra.com>
Re: File::Tail problem (Anno Siegel)
Re: Greedy regexps <asby@kinderen4kinderen.org>
Re: Greedy regexps (Tad McClellan)
Re: Greedy regexps <steven.smolinski@sympatico.ca>
Re: Greedy regexps <abigail@abigail.nl>
Re: Greedy regexps <abigail@abigail.nl>
Re: Greedy regexps <steven.smolinski@sympatico.ca>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 6 Mar 2003 08:35:39 +1100
From: "Tintin" <me@privacy.net>
Subject: Re: A problem about the time function
Message-Id: <b45qjd$1rq78i$1@ID-172104.news.dfncis.de>
"Ho Man Lam" <mlho@pc89132.cse.cuhk.edu.hk> wrote in message
news:b45dos$iel$1@eng-ser1.erg.cuhk.edu.hk...
> In Unix/Linux, I can get the time since the Epoch (00:00:00 UTC, January
> 1, 1970), measured in seconds by the command "date".
>
> For example:
>
> date -d "20030305 11:22:36" +%s
>
> the command gives:
> 1046834556
>
> I know there is a fuction "time" give current time when the function is
> called. But I want to get the seconds since the Epoch for a specific time.
$seconds=1046900056; # specific number of seconds since the Epoch for the
time
print scalar localtime($seconds);
------------------------------
Date: Wed, 5 Mar 2003 13:37:17 -0600
From: "William Alexander Segraves" <wsegrave@mindspring.com>
Subject: Re: CGI query string help
Message-Id: <b45qr9$qhi$1@slb9.atl.mindspring.net>
"Mothra" <mothra@nowhereatall.com> wrote in message
news:3e662e32@usenet.ugs.com...
<snip>
> > my $user_name = param('name');
> > my $user_age = param('age');
> ^^^^^
> shouldn't that be url_param?
> The params are in the url not the posted form
CGI.pm's param method accepts both POST and GET, as well as GET explicitly
via the query string. This capability is one compelling reason for use of
CGI.pm for parsing forms, as the scripts can be easily tested in a variety
of ways. See the CGI.pm documentation for details.
> > # removed undefined CGI object $query
> I missed that in my prev. response. Good catch :-)
Thanks & Cheers!
Bill Segraves
------------------------------
Date: Wed, 5 Mar 2003 15:42:58 -0600
From: "William Alexander Segraves" <wsegrave@mindspring.com>
Subject: Re: CGI query string help
Message-Id: <b45r9o$jf4$1@slb9.atl.mindspring.net>
"Eric J. Roode" <REMOVEsdnCAPS@comcast.net> wrote in message
news:Xns93357DF17AB6sdn.comcast@216.166.71.239...
<snip>
> url_param is only needed if parameters are given on the URL *and* the CGI
> is invoked via POST method, a questionable practice.
<snip>
Hmmm. WADR, why is it a questionable practice when the documentation is
quite explicit about it?
On the contrary, I still view the behaviour of CGI.pm's params method, when
presented with both GET via params in the query string and POST for the
submit method of a form as a useful part of the rich capabilities of CGI.pm.
My copy of the documentation is explicit about the behaviour in the section
"Mixing POST and URL Parameters".
In fact, the OP (and others interested in exploring the capabilities of
CGI.pm) might try the following little code snippet, in which it is not
required at all to know the param names in advance, i.e., when writing the
script. The params can be loaded at run time.
#!perl -w
# testloop.pl - can be initialized via query string,
# e.g., http://localhost/cgi-bin/testloop.pl?first=Bill;last=Segraves
use strict;
use CGI qw(-no_xhtml :standard);
print header, start_html, hr, start_form;
foreach (param()) {
print p($_, textfield($_, param($_)))
}
print submit(param('send')), end_form, hr, end_html;
Interested parties might try running this script without initialization via
the query string to see that the form that is generated is, well,
uninteresting. OTOH, when the script is initialized via the query string, a
useful form ensues if the user has chosen a useful set of params.
Examination of the HTML source that is generated will clearly show that
CGI.pm ignores the query string, as stated in the documentation.
<!DOCTYPE html
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html lang="en-US"><head><title>Untitled Document</title>
</head><body><hr><form method="post"
action="/cgi-bin/testloop.pl?first=Bill;last=Segraves"
enctype="application/x-www-form-urlencoded">
<p>first <input type="text" name="first" value="Eric"></p><p>last <input
type="text" name="last" value="Roode"></p><input type="submit"
name=".submit"></form><hr></body></html>
In the above, I had initialized the form with my name and then substituted
Eric's name. Clearly, the initialization query string is present in the HTML
form that is generated, even after filling the form with another name and
submitting it.
In summary, Eric, I'll assert that what you view as a questionable practice
is not a questionable practice *here* at all. It is in fact a very *useful*
practice. Others, YMMV.
Cheers.
Bill Segraves
------------------------------
Date: Wed, 5 Mar 2003 23:20:24 +0100
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: CGI query string help
Message-Id: <Pine.LNX.4.53.0303052255010.15579@lxplus085.cern.ch>
On Wed, Mar 5, William Alexander Segraves inscribed on the eternal scroll:
> "Eric J. Roode" <REMOVEsdnCAPS@comcast.net> wrote in message
> > url_param is only needed if parameters are given on the URL *and* the CGI
> > is invoked via POST method, a questionable practice.
> <snip>
>
> Hmmm. WADR, why is it a questionable practice when the documentation is
> quite explicit about it?
Well, you're right that CGI.pm documents its _support_ for the
practice - but there has been concern as to whether the CGI
specification itself actually guarantees that a server will support
this behaviour.
Clearly, no amount of support in CGI.pm could force a web server to do
something which it didn't actually do.
> On the contrary, I still view the behaviour of CGI.pm's params method, when
> presented with both GET via params in the query string and POST for the
> submit method of a form as a useful part of the rich capabilities of CGI.pm.
I don't think anyone's arguing against its potential utility; but I
had in the past counseled against relying on it being supported by
servers, even though the servers that I know about (though they're all
Apache, so that doesn't help) do in fact support it.
However, you prompt me to re-read the draft RFC, and I find at
http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.8
it says of QUERY_STRING:
Servers MUST supply this value to scripts
without any prevarication as to whether the method is POST or GET.
See also the link to section 3.2 for definition of the parts of the
Script-URI
So that seems to be at least a basis for saying that any server which
didn't support it would not be conforming to the best-practice
specification. And I don't know a server that fails to support this;
and I suppose that L.Stein knows servers better than I do.
So I guess I should stop worrying about this.
cheers
------------------------------
Date: Wed, 5 Mar 2003 18:24:13 -0600
From: "William Alexander Segraves" <wsegrave@mindspring.com>
Subject: Re: CGI query string help
Message-Id: <b464q9$558$1@slb4.atl.mindspring.net>
"Alan J. Flavell" <flavell@mail.cern.ch> wrote in message
news:Pine.LNX.4.53.0303052255010.15579@lxplus085.cern.ch...
<snip>
> However, you prompt me to re-read the draft RFC, and I find at
>
> http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.8
>
> it says of QUERY_STRING:
>
> Servers MUST supply this value to scripts
>
> without any prevarication as to whether the method is POST or GET.
> See also the link to section 3.2 for definition of the parts of the
> Script-URI
>
> So that seems to be at least a basis for saying that any server which
> didn't support it would not be conforming to the best-practice
> specification. And I don't know a server that fails to support this;
> and I suppose that L.Stein knows servers better than I do.
>
> So I guess I should stop worrying about this.
Please let us know when it worries you. ;-)
Thanks for your response, Alan. I've saved it in *the* folder with your name
on it, for easy recall from among other pearls of your wisdom.
I am especially thankful for your expertise in standards, which I see you
contributing in more than this newsgroup.
In the mean time, i.e., while I'm not worrying about the potential problem,
I'll continue to enjoy the use of a capability that CGI.pm exploits very
nicely.
Thanks again and Cheers!
Bill Segraves
------------------------------
Date: Wed, 05 Mar 2003 17:07:05 -0500
From: Marshall Dudley <mdudley@execonn.com>
Subject: cookie-lib.pl won't delete cookies
Message-Id: <3E667509.16CBC8E5@execonn.com>
I am not sure if this is an IE problem, or a cookie-lib.pl problem. But
when I use the delete cookie subroutine of this library, it works fine
with Netscape, but is ignored by IE.
I have trapped the command cookie-lib.pl is sending and this is what is
being sent when I am trying to delete the cookie:
Set-Cookie: user-id=; expires=Thu, 01-Jan-1970 00:00:00 GMT; path=/
Set-Cookie: password=; expires=Thu, 01-Jan-1970 00:00:00 GMT; path=/
In these cases, the user-id and password still return the earlier
cookie, which was written by the same library.
Anyone have any ideas what the problem might be and how to get around
it?
Thanks,
Marshall
------------------------------
Date: Wed, 05 Mar 2003 23:07:06 GMT
From: Andrew Lee <spamtrap@nowhere.com>
Subject: Re: cookie-lib.pl won't delete cookies
Message-Id: <dpvc6vk3rm3ib2q9qacoonu0fo7qkep31i@4ax.com>
On Wed, 05 Mar 2003 17:07:05 -0500, Marshall Dudley
<mdudley@execonn.com> wrote:
>I am not sure if this is an IE problem, or a cookie-lib.pl problem. But
>when I use the delete cookie subroutine of this library, it works fine
>with Netscape, but is ignored by IE.
>
>I have trapped the command cookie-lib.pl is sending and this is what is
>being sent when I am trying to delete the cookie:
>
>Set-Cookie: user-id=; expires=Thu, 01-Jan-1970 00:00:00 GMT; path=/
>Set-Cookie: password=; expires=Thu, 01-Jan-1970 00:00:00 GMT; path=/
>
>In these cases, the user-id and password still return the earlier
>cookie, which was written by the same library.
>
>Anyone have any ideas what the problem might be and how to get around
>it?
Use CGI.pm
Example :
# create a new CGI object
my $cgi = new CGI;
# see if there's a cookie called myCookie
my %cookieStuff = $cgi->cookie("myCookie");
# try to nab data from that cookie --
# if the cookie doesn't have data, try current params
my ($fname, $lname, $phone, $dept, $email) =
map { $cookieStuff{$_} = $cgi->param($_) || $cookieStuff{$_} } qw
/fname lname phone dept email/;
# create a new cookie called GimmeCookie w/ 30 day shelf life
my $new_cookie = $cgi->cookie( -name => "GimmeCookie",
-value => \%cookieStuff,
-path => '/',
-domain => ".mydomain.com",
-expires => "+30d"
);
# try -expires=> 'now'); to delete a cookie
And so forth ... everything you need is in CGI.pm -- this is code I use
accross browsers.
HTH
------------------------------
Date: 5 Mar 2003 20:06:41 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Counting matches in a regular expression
Message-Id: <b45lch$gc8$1@mamenchi.zrz.TU-Berlin.DE>
John W. Krahn <krahnj@acm.org> wrote in comp.lang.perl.misc:
> "John W. Krahn" wrote:
> >
> > Barty wrote:
> > >
> > > "John W. Krahn" <krahnj@acm.org> wrote in message
> > > news:3E605F22.71C3FB02@acm.org...
> > > > Barty wrote:
> > > > >
> > > > > I'm replying to myself, but mean this for those that answered...
> > > > >
> > > > > Thanks for your responses.. None ended up faster than my methods.
> > > > > Speed is
> > > > > a major concern in this.. I'm trying to find out how many 6/49 number
> > > > > combinations have never won a prize.. So for all of the 14 million
> > > > > combinations (or so),
> > > >
> > > > Just subtracting 1993 from 14 million (or so) will tell you how many
> > > > have NOT won.
> > >
> > > Actually I'm trying to find out how many combinations haven't even matched
> > > 3... There are 13,983,816 combinations. so far I'm at 5,160,000 and there
> > > hasn't been one.
> >
> > Here is a program that will find all three number combinations. It
> > takes about 15 minutes to run on my 366 MHz computer.
>
> Also, if I precompute $bitarray and store it on disk then the run time
> is about two minutes.
Pre-calculating all three-element subsets is certainly a good first step.
It remains to find the number of six-element sets that are not supersets
of one of them, or, equivalently, the number of six-element sets that
contain at least one. That seems to be non-trivial and I'd expect this
to be the real time (and/or memory-) hog.
There must be methods to estimate the value on statistical grounds only.
I'd try to do that before planning a program for an exact count, just
to know what to expect.
Anno
------------------------------
Date: Wed, 05 Mar 2003 17:20:22 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Counting matches in a regular expression
Message-Id: <3E667826.BB3D1422@earthlink.net>
Anno Siegel wrote:
>
> John W. Krahn <krahnj@acm.org> wrote in comp.lang.perl.misc:
>> "John W. Krahn" wrote:
>>>
>>> Barty wrote:
>>>>
>>>> "John W. Krahn" <krahnj@acm.org> wrote in message
>>>> news:3E605F22.71C3FB02@acm.org...
>>>>> Barty wrote:
>>>>>>
>>>>>> I'm replying to myself, but mean this for those that answered...
>>>>>>
>>>>>> Thanks for your responses.. None ended up faster than my
>>>>>> methods. Speed is a major concern in this.. I'm trying to find
>>>>>> out how many 6/49 number combinations have never won a prize..
>>>>>> So for all of the 14 million combinations (or so),
>>>>>
>>>>> Just subtracting 1993 from 14 million (or so) will tell you how
>>>>> many have NOT won.
>>>>
>>>> Actually I'm trying to find out how many combinations haven't even
>>>> matched 3... There are 13,983,816 combinations. so far I'm at
>>>> 5,160,000 and there hasn't been one.
>>>
>>> Here is a program that will find all three number combinations. It
>>> takes about 15 minutes to run on my 366 MHz computer.
>>
>> Also, if I precompute $bitarray and store it on disk then the run
>> time is about two minutes.
>
> Pre-calculating all three-element subsets is certainly a good first
> step.
Why? What does it gain you? I don't see what they're good for.
> It remains to find the number of six-element sets that are not
> supersets of one of them, or, equivalently, the number of six-element
> sets that contain at least one.
The OP's goal is to find the set of six-element sets which *don't*
contain at least one three-element match with a six-element winning
drawing.
What good does it do for us to know the three-element subsets which
aren't winners? How is this related to the set of six-element sets
which aren't winners?
> That seems to be non-trivial and I'd expect
> this to be the real time (and/or memory-) hog.
*My* program didn't hog much time and/or memory.
First, it calculated all three-element subsets of the winning drawings.
This did not take a huge amount of time or memory (4 seconds). Given
the simple operations needed to do this: read, split, decrement, sort,
then three for loops (assigning to $bitstring), I suspect that the time
was dominated by IO. I'd be more sure if Devel::DProf showed
line-by-line times.
Then, it looped over possible all six-element drawings, skipping those
which won and printing out the few which remained.
Since this was done with six nested for loops, without building up a
list, this took very little memory... just the 6 iterators.
My tests for having drawn a winning combo were partly done in the outer
loops: If we've just picked 1,2,3, and if that's enough to win, we don't
need to draw another 3 numbers, so we skip 15180 loops within that. If
we've picked 1,2,3,4, and 1,2,4 is a winner, then we don't need to draw
another two numbers, so we skip 946 loops, and so on. Because of the
early identification and skipping of winners (and because of there being
nearly two thousand winners, and thus a lot skipped early), it took less
than a second to print out all 24 losers.
> There must be methods to estimate the value on statistical grounds
> only. I'd try to do that before planning a program for an exact count,
> just to know what to expect.
Where would one start?
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: 5 Mar 2003 23:21:56 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Counting matches in a regular expression
Message-Id: <b460qk$ns9$1@mamenchi.zrz.TU-Berlin.DE>
Benjamin Goldberg <goldbb2@earthlink.net> wrote in comp.lang.perl.misc:
> -=-=-=-=-=-
>
> Barty wrote:
> [snip]
> >
> > It's running now.. Looks like it will take around two hours (on my
> > athlon 1400) which is better than I though it would be! If anyone
> > would like to play with it, here's my code:
> > http://www.ianbishop.ca/649.zip
>
> I've attached my own code, and the output of one run.
>
> There couldn't *really* have only been 24 six-digit-combos which didn't
> get at least 3 digits in common with a winning draw, could there?
I'm getting a lot more.
> And the whole thing finishes in 4 seconds -- is that possible?
My run-time estimate: A bit above 5 hours.
> If my algorithm is wrong, does anyone have a guess as to the problem?
I think the algorithm is correct, if it is meant to be this:
1. Generate the set of all three-element subsets of all recorded
draws. (Each draw of six primarily generates 20 subsets of three,
but of course many duplicates will appear across the draws.)
Call these winners.
2. Build subsets of up to six elements of {1 .. 46} in sequence,
adding one element at a time. At each step, check if any three-
element subset of the current set is among the "winners" of step 1.
If so, don't check any supersets of the current set.
If we're in the last step (dealing with six-element subsets),
print the set, again after checking that no subset of three is
a "winner".
> And if it's right -- nyah nyah, my code's faster than yours is :P
Something must be wrong with your bit addressing scheme, is my guess.
The algorithm as such is sound, imo.
Anno
------------------------------
Date: Thu, 06 Mar 2003 00:59:54 GMT
From: makbo <makbo@pacbell.net>
Subject: Re: DBD and DBI on Solaris 64 bit
Message-Id: <3E669D89.9020903@pacbell.net>
Thanx for the suggestion. Profiling Perl is not something I've actually
ever done before, because in over 10 years I've never needed to! ;-)
My "perldoc -q profile" output recommends Devel::DProf or
Benchmark.pm, any advice on which one to try?
My hunch is that the problem is DBD::CSV, not DBD::Oracle (I wasn't very
specific in the original post, but I am reading the data from Oracle
using DBD::Oracle. I prepare, execute, and fetchrow_array the rows
using "SELECT * from table" from an Oracle source, then prepare and
execute an "INSERT into table values (?,?...)" statement to the CSV target.
As I "tail -f" the CSV output file, I can see the rows being spit out
painfully slowly compared to previously. My Oracle 9i instances in
general are equal to or faster than their comparable Oracle 8i instances
(since I'm trying to use the nifty new stuff like auto segment
management, etc).
I will also try the event 10046 trace ...
Ron Reidy wrote:
> I am going to go out on a limb here and guess the problem is with your
> Oracle 9i instance. Have you:
>
> 1. Profiled the Perl code?
> 2. Used event 10046 to trace waits in the database?
>
> I have runPerl 5.8.0 compiled to use 64 bit integers and DBI/DBD::Oracle
> on 64 Bit Solais against 8i (64 bit), 9iR1, and 9iR2 and have not seen
> these problems.
>
> --
> Ron Reidy
> Oracle DBA
>
> makbo wrote:
>
>> I have a related question. I am a long time Perl and DBD::Oracle user
>> under various Unix systems. (I still remeber when oraperl was new...)
>>
>> I downloaded gcc (current version) and successfully compiled it under
>> Solaris 8 64-bit.
>>
>> I downloaded Perl 5.8 64-bit and successfully compiled it. I
>> installed Oracle 9i 64-bit and successfully compiled and installed DBI
>> and DBD::Oracle 64-bit.
>>
>> A script using DBD::CSV to write out flat files from the database that
>> worked reasonably well under 32-bit Perl 5.6, Solaris 5.6, Oracle 8i,
>> now runs slow as a lame dog (i.e. triple the time) under the newer
>> 64-bit environment (as stated above) -- all data items constant.
>>
>> Any ideas? Anyone with similar experience?
>>
>> Joe Smith wrote:
>>
>>> In article <3E5B774D.5020905@indra.com>, Ron Reidy
>>> <rereidy@indra.com> wrote:
>>>
>>>> Rich wrote:
>>>>
>>>>> I am running Solaris 9 64 bit which comes with Perl v5.6.1 built for
>>>>> sun4-solaris-64int. I want to build in the DBD and DBI modules for
>>>>
>>>>
>>>>
>>>> Since you don't have a 64bit C compiler, dowload the 64bit gcc C
>>>> compiler from http://www.sunfreeware.com/, and rebuild Perl and all
>>>> the other modules you have installed.
>>>
>>>
>>>
>>>
>>> An important point, worth repeating, is that you need to use the same
>>> compiler for modules as the main program.
>>>
>>> That is, if you are going to be using gcc to compile the modules, then
>>> you need to compile perl from its sources.
>>>
>>> -Joe
>>
>>
>>
>
------------------------------
Date: Wed, 05 Mar 2003 18:30:51 -0700
From: Ron Reidy <rereidy@indra.com>
Subject: Re: DBD and DBI on Solaris 64 bit
Message-Id: <3E66A4CB.6050700@indra.com>
makbo wrote:
> Thanx for the suggestion. Profiling Perl is not something I've actually
> ever done before, because in over 10 years I've never needed to! ;-)
>
> My "perldoc -q profile" output recommends Devel::DProf or Benchmark.pm,
> any advice on which one to try?
I have used both. I would start with Benchmark and then look at the
problem from there.
>
> My hunch is that the problem is DBD::CSV, not DBD::Oracle (I wasn't very
> specific in the original post, but I am reading the data from Oracle
> using DBD::Oracle. I prepare, execute, and fetchrow_array the rows
> using "SELECT * from table" from an Oracle source, then prepare and
> execute an "INSERT into table values (?,?...)" statement to the CSV target.
>
> As I "tail -f" the CSV output file, I can see the rows being spit out
> painfully slowly compared to previously. My Oracle 9i instances in
> general are equal to or faster than their comparable Oracle 8i instances
> (since I'm trying to use the nifty new stuff like auto segment
> management, etc).
>
> I will also try the event 10046 trace ...
>
[ snip ]
------------------------------
Date: 5 Mar 2003 19:26:38 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: File::Tail problem
Message-Id: <b45j1e$erd$1@mamenchi.zrz.TU-Berlin.DE>
Nuno Branco <branco@markdata.pt> wrote in comp.lang.perl.misc:
>
> I am writing an application to monitor log files and i am using FILE::Tail
> to help me with that.
>
> My problem is that randomly (sometimes is once a day, other times is once a
> week) all the file that I am "tailing" gets dumped on my screen since it
> was opened.
>
> I have nothing (or at least i think so) on my cronjobs that could be
> trigering this. I tought maybe logrotate could be doing it but it's not the
> case.
>
> Does someone else had similar problems with file::tail ? The manpage i read
> says nothing about it.
Well, it says this:
Obviously, if this happens and you have reset_tail set
to c<-1>, you will suddenly get a whole bunch of lines
- lines you already saw. So in this case, reset_tail
should probably be set to a small positive number or
even 0.
That seems to describe the effect you are seeing. "If this happens"
refers to a particular way of log rotating, so I'd double-check that.
Anno
------------------------------
Date: Wed, 5 Mar 2003 20:48:56 +0100
From: "Asby" <asby@kinderen4kinderen.org>
Subject: Re: Greedy regexps
Message-Id: <3e6654aa$0$49106$e4fe514c@news.xs4all.nl>
"Bigus" <somewhere@nowhere.com> wrote in message
news:b44p77$nio@newton.cc.rl.ac.uk...
> > $html =~ s!<td[^>]+class="?head1"?>([^<]+)</td>!###$1###!gi;
>
> That's an interesting way of doing a regexp that I wouldn't of thought
of -
> ie: [^>]+ - matching anything except a >, thereby ensuring that while it
> matches any possible property of an element it doesn't go beyond the
element
> :-)
Well it's the easiest way to match HTML, you know which character ends the
html tag and you want everything between it and it's a faster way of
searching too. The regexp engine will see the > and knows that it doesn't
match with [^>]. If you use .* the regexp engine is greedy and will look as
far as he can, even further than the >, and when he reaches the end of the
line he will look back to make the rest of the regexp match.
--
ttfn,
Asby
$_="qdjb3H kqdP qdgsnmA srtJ";y/a-y3/b-za/;print scalar reverse
------------------------------
Date: Wed, 5 Mar 2003 16:01:57 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Greedy regexps
Message-Id: <slrnb6csul.4u0.tadmc@magna.augustmail.com>
Asby <asby@kinderen4kinderen.org> wrote:
> "Bigus" <somewhere@nowhere.com> wrote in message
> news:b44p77$nio@newton.cc.rl.ac.uk...
>> > $html =~ s!<td[^>]+class="?head1"?>([^<]+)</td>!###$1###!gi;
>>
>> That's an interesting way of doing a regexp that I wouldn't of thought
> of -
> Well it's the easiest way to match HTML,
It is the easiest way to match *some* HTML.
It won't match this for instance:
<td class="head1"> Income < Expenses </td>
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Wed, 05 Mar 2003 22:17:42 GMT
From: Steven Smolinski <steven.smolinski@sympatico.ca>
Subject: Re: Greedy regexps
Message-Id: <aIu9a.6285$Or5.699162@news20.bellglobal.com>
Tad McClellan <tadmc@augustmail.com> wrote:
> Asby <asby@kinderen4kinderen.org> wrote:
>> "Bigus" <somewhere@nowhere.com> wrote in message
>> news:b44p77$nio@newton.cc.rl.ac.uk...
>
>>> > $html =~ s!<td[^>]+class="?head1"?>([^<]+)</td>!###$1###!gi;
>>>
>>> That's an interesting way of doing a regexp that I wouldn't of thought
>> of -
>
>> Well it's the easiest way to match HTML,
>
>
> It is the easiest way to match *some* HTML.
>
> It won't match this for instance:
>
> <td class="head1"> Income < Expenses </td>
^
^
<
But that's not strictly HTML (please correct me if I'm wrong). It's
only HTML-ish. It's certainly not valid xhtml.
Isn't a strict parser supposed to bail because it's not valid? That
would make the regex's problem that it *does* match (at odd positions),
rather than rejecting the snippet as malformed.
Steve
--
Steven Smolinski => http://arbiter.ca/
GnuPG Public Key => http://arbiter.ca/steves_public_key.txt
=> or email me with 'auto-key' in the subject.
Key Fingerprint => 08C8 6481 3A7B 2A1C 7C26 A5FC 1A1B 66AB F637 495D
------------------------------
Date: 05 Mar 2003 22:39:57 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Greedy regexps
Message-Id: <slrnb6cv5t.55o.abigail@alexandra.abigail.nl>
Bigus (somewhere@nowhere.com) wrote on MMMCDLXXIII September MCMXCIII in
<URL:news:b44qon$11jq@newton.cc.rl.ac.uk>:
}} Abigail wrote:
}}
}} > The ? modifier doesn't stop the regexp from matching, and it doesn't
}} > stop the regexp from matching at the earliest point possible.
}} >
}} > If you insist on doing this with a regexp, you need to make it much
}} > tigher, to prevent it from matching too much. But it would be far,
}} > far better to parse HTML.
}} >
}} > People should stop thinking it's easy to parse HTML with a single
}} > regexp. It's not. Parsing is far, far easier.
}}
}} I do understand that parsing HTML isn't easy if you are to take into account
}} all the possible complications as identified in the CPAN FAQ, however, I
}} think for my purposes half a dozen lines of regexps will work nicely to
}} preprocess the HTML pages ready for keyword searching. Covering "all likely
}} possibilities" is enough.. if I do spot something at a later date that I've
}} not taken into account then I can make the necessary correction.
Yada, yada, yada. If it was all so simple and possible, why did you
come here with a problem?
Abigail
--
use lib sub {($\) = split /\./ => pop; print $"};
eval "use Just" || eval "use another" || eval "use Perl" || eval "use Hacker";
------------------------------
Date: 05 Mar 2003 22:44:44 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Greedy regexps
Message-Id: <slrnb6cves.55o.abigail@alexandra.abigail.nl>
Steven Smolinski (steven.smolinski@sympatico.ca) wrote on MMMCDLXXIII
September MCMXCIII in <URL:news:aIu9a.6285$Or5.699162@news20.bellglobal.com>:
.. Tad McClellan <tadmc@augustmail.com> wrote:
.. > Asby <asby@kinderen4kinderen.org> wrote:
.. >> "Bigus" <somewhere@nowhere.com> wrote in message
.. >> news:b44p77$nio@newton.cc.rl.ac.uk...
.. >
.. >>> > $html =~ s!<td[^>]+class="?head1"?>([^<]+)</td>!###$1###!gi;
.. >>>
.. >>> That's an interesting way of doing a regexp that I wouldn't of thought
.. >> of -
.. >
.. >> Well it's the easiest way to match HTML,
.. >
.. >
.. > It is the easiest way to match *some* HTML.
.. >
.. > It won't match this for instance:
.. >
.. > <td class="head1"> Income < Expenses </td>
.. ^
.. ^
.. <
..
.. But that's not strictly HTML (please correct me if I'm wrong). It's
You are wrong. A < in HTML is only markup if it's followed by something
that could make it markup. A space is not such a thing.
.. only HTML-ish. It's certainly not valid xhtml.
XHTML uses XML, while HTML is an SGML application. XML is for wimps
who can't parse SGML, and for whom LISP isn't verbose enough.
.. Isn't a strict parser supposed to bail because it's not valid? That
.. would make the regex's problem that it *does* match (at odd positions),
.. rather than rejecting the snippet as malformed.
Throw it through any confirming SGML parser, and it *will* validate.
Abigail
--
# Perl 5.6.0 broke this.
%0=map{reverse+chop,$_}ABC,ACB,BAC,BCA,CAB,CBA;$_=shift().AC;1while+s/(\d+)((.)
(.))/($0=$1-1)?"$0$3$0{$2}1$2$0$0{$2}$4":"$3 => $4\n"/xeg;print#Towers of Hanoi
------------------------------
Date: Wed, 05 Mar 2003 23:18:59 GMT
From: Steven Smolinski <steven.smolinski@sympatico.ca>
Subject: Re: Greedy regexps
Message-Id: <DBv9a.3217$6z.678225@news20.bellglobal.com>
Abigail <abigail@abigail.nl> wrote:
> Steven Smolinski (steven.smolinski@sympatico.ca) wrote [...]:
> .. But that's not strictly HTML (please correct me if I'm wrong).
>
> You are wrong. A < in HTML is only markup if it's followed by
> something that could make it markup. A space is not such a thing.
Thanks. I'm not very good at SGML applications, because I've never
worked with it very closely.
> XML is for wimps who can't parse SGML, and for whom LISP isn't verbose
> enough.
I'll admit to the first, but not the second. I only use XML in spite of
it's verbosity, not because of it; and only for formats I'll never have
to fully write manually; and only when I need an interface between apps
which is so low-effort even cow-orkers can code to it.
Steve
--
Steven Smolinski => http://arbiter.ca/
GnuPG Public Key => http://arbiter.ca/steves_public_key.txt
=> or email me with 'auto-key' in the subject.
Key Fingerprint => 08C8 6481 3A7B 2A1C 7C26 A5FC 1A1B 66AB F637 495D
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 4668
***************************************