[28944] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 188 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Mar 2 14:09:59 2007

Date: Fri, 2 Mar 2007 11:09:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 2 Mar 2007     Volume: 11 Number: 188

Today's topics:
    Re: Disable warnings from specific module <ced@blv-sam-01.ca.boeing.com>
        display datestamp in HTML <jimmyc@trexagi.net>
    Re: display datestamp in HTML <bbbart@inGen.be>
    Re: display datestamp in HTML <tony_curtis32@yahoo.com>
    Re: display datestamp in HTML <jimmyc@trexagi.net>
    Re: display datestamp in HTML <jimmyc@trexagi.net>
    Re: display datestamp in HTML <jimmyc@trexagi.net>
    Re: display datestamp in HTML <jimmyc@trexagi.net>
    Re: Expressing AND, OR, and NOT in a Single Pattern <greg.ferguson@icrossing.com>
    Re: Expressing AND, OR, and NOT in a Single Pattern <greg.ferguson@icrossing.com>
    Re: FAQ 6.21 What's wrong with using grep in a void con <brian.d.foy@gmail.com>
    Re: format issue <greg.ferguson@icrossing.com>
    Re: how to find the "yesterday" logfile name? <greg.ferguson@icrossing.com>
        Match a regular expression <alex.habar.nam@gmail.com>
    Re: Match a regular expression <thepoet_nospam@arcor.de>
    Re: Match a regular expression <m@rtij.nl.invlalid>
    Re: mod_perl error:  (120000) exit was called at <noreply@gunnar.cc>
    Re: mod_perl errors <m@rtij.nl.invlalid>
    Re: Perl and MySQL (Jens Thoms Toerring)
    Re: Q on regex of LWP::Simple data <greg.ferguson@icrossing.com>
    Re: Q on regex of LWP::Simple data <len@philpot.org>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 2 Mar 2007 09:08:55 -0800
From: "comp.llang.perl.moderated" <ced@blv-sam-01.ca.boeing.com>
Subject: Re: Disable warnings from specific module
Message-Id: <1172855334.973863.220390@64g2000cwx.googlegroups.com>

On Mar 1, 10:15 pm, "comp.llang.perl.moderated" <c...@blv-
sam-01.ca.boeing.com> wrote:
> On Feb 27, 11:52 am, "Paul Lalli" <mri...@gmail.com> wrote:
>
>
>
> > On Feb 27, 11:38 am, snu...@gmail.com wrote:
>
> >....
>
> > You could try wrapping a call to any of those experimental features in
> > a block in which $SIG{__WARN__} has been changed to ignore warnings.
> > Or you could try locally setting $^W to 0 in a block around those
> > features (though I don't think that would handle the case where the
> > module's code is specifically generating a warning using the warn()
> > function).
>
> > {
> >   local $SIG{__WARN__} = sub { 1 };
> >   do_something_experimental();
>
> > }
>
> Ugly but you could filter stderr entirely...
> I think:
>
> open( SAVE, '>&STDERR') or die $!;
> {   open(STDERR, '>', File::Spec->devnull);
>     do_something_experimental();}
>
> open( STDERR, '>&SAVE' ) or die $!;
> ...
>

Sorry, my suggestion is really dumb the more I think about it..  You
might lose a fatal message from SOAP::Lite for instance.  I'm not
sure why setting SIG{__WARN__} or even $^W locally wouldn't do
what's needed.  The 'warnings' pragma would be better  than the
global $^W however. A 'no warnings "blah.." for example could just
filter out the specific nuisance warning(s) whiile leaving other
unexpected errors intact.  perldoc warnings.

--
Chales DeRykus



------------------------------

Date: Fri, 02 Mar 2007 11:40:32 -0500
From: Phil M <jimmyc@trexagi.net>
Subject: display datestamp in HTML
Message-Id: <hqjgu2pgjm90od7qg71lkbrh6f8gbampnf@4ax.com>

Hi all.  I have am HTML web page where I upload a file which is always
called document.pdf.  Users get there, click on the icon and download
the document.pdf.

Below the icon/link where users click to get document, is there any
way to display the datestamp of document.pdf?

ie:  the document you're about to download was updated: March 2 2007

My web page file name is called index.html (page where I want the
datestamp to de displayed)
The directory where I can store perl scripts (in .pl) is called /bin
The file from which I want the datestamp to be retrieved is called
document.pdf

I searched the Internet for such a simple script but couldn't find
any.  I'm not a programmer, so please bear with me.  Assume I find a
pl script which captures the datestamp of document.pdf, all I know is
that I have to modify the header of the .pl file to read:
#!/usr/bin/perl  then upload it in /bin

THEN, what code do I have to paste in the index.html to print the
actual datestamp?  The server is Apache 1.3.26

Thanks for any suggestions!


------------------------------

Date: Fri, 2 Mar 2007 21:59:47 +0500
From: Bart Van Loon <bbbart@inGen.be>
Subject: Re: display datestamp in HTML
Message-Id: <slrneugm03.29r.bbbart@charles.inGen.islamabad>

It was Fri, 02 Mar 2007 11:40:32 -0500, when Phil M wrote:
> Hi all.  I have am HTML web page where I upload a file which is always
> called document.pdf.  Users get there, click on the icon and download
> the document.pdf.
>
> Below the icon/link where users click to get document, is there any
> way to display the datestamp of document.pdf?
>
> ie:  the document you're about to download was updated: March 2 2007
>
> My web page file name is called index.html (page where I want the
> datestamp to de displayed)
> The directory where I can store perl scripts (in .pl) is called /bin
> The file from which I want the datestamp to be retrieved is called
> document.pdf
>
> I searched the Internet for such a simple script but couldn't find
> any.  I'm not a programmer, so please bear with me.  Assume I find a
> pl script which captures the datestamp of document.pdf, all I know is
> that I have to modify the header of the .pl file to read:
> #!/usr/bin/perl  then upload it in /bin
>
> THEN, what code do I have to paste in the index.html to print the
> actual datestamp?  The server is Apache 1.3.26

isn't there a Perl interface to xpdf's pdfinfo?

$ pdfinfo flyer.pdf
Title:          flyer.dvi
Creator:        dvips(k) 5.95b Copyright 2005 Radical Eye Software
Producer:       GPL Ghostscript 8.54
CreationDate:   Fri Mar  2 21:58:32 2007
ModDate:        Fri Mar  2 21:58:32 2007
Tagged:         no
Pages:          2
Encrypted:      no
Page size:      612 x 792 pts (letter)
File size:      12694 bytes
Optimized:      no
PDF version:    1.4

you want the ModDate, I believe.

-- 
regards,
BBBart

   The real fun of living wisely is that you get to be smug about it.
		  -- Calvin


------------------------------

Date: Fri, 02 Mar 2007 12:11:57 -0500
From: Tony Curtis <tony_curtis32@yahoo.com>
Subject: Re: display datestamp in HTML
Message-Id: <es9lsu$ab5$1@knot.queensu.ca>

Phil M wrote:
> Hi all.  I have am HTML web page where I upload a file which is always
> called document.pdf.  Users get there, click on the icon and download
> the document.pdf.
> 
> Below the icon/link where users click to get document, is there any
> way to display the datestamp of document.pdf?

If the modification date of the file is what you want as the actual 
document modification date, then SSI would be a much simpler solution, 
but nothing to do with perl.

hth
t


------------------------------

Date: Fri, 02 Mar 2007 12:36:21 -0500
From: Phil M <jimmyc@trexagi.net>
Subject: Re: display datestamp in HTML
Message-Id: <isngu2l2vl8gc5dluahlk8mufsp8vkcmdb@4ax.com>

Thanks for the quick reply.  It can be either. Modification date,
upload date or file date or creation date; whatever is simpler.

Either the file datestamp (as it appears next to the filename when I
log in to the ftp web server) or the date which it was uploaded.  it
doesn't matter since the date that I create the pdf file I also upload
it; so creation/modification/upload date is the same.

>If the modification date of the file is what you want as the actual 
>document modification date, then SSI would be a much simpler solution, 
>but nothing to do with perl.


------------------------------

Date: Fri, 02 Mar 2007 12:48:39 -0500
From: Phil M <jimmyc@trexagi.net>
Subject: Re: display datestamp in HTML
Message-Id: <7qogu25ndjvta9sof75it2bkcdui1r11hv@4ax.com>

> then SSI would be a much simpler solution

Thanks, I'll ask the same question on a SSI Javascript group.


------------------------------

Date: Fri, 02 Mar 2007 13:31:12 -0500
From: Phil M <jimmyc@trexagi.net>
Subject: Re: display datestamp in HTML
Message-Id: <94rgu2hrmkrs4161v5b03e5k6vr3ea4b2i@4ax.com>

I found a small perl script which claims to do the trick and it seems
that I can't get the code right:

Here's my web page: (note that the date doesn't appear)
http://www.greekradio.net/psa.html

This is the script: (stored in cgi/lastmodified.pl)


#!/usr/bin/perl

# Script name:
# @1 Last Modified Date and Time

# Purpose:
# This Perl script checks the "Last Modified" date and time
# of a file.

# Uses:
# Call the display via SSI using the tag below:
# <!--#include virtual="yourfolder1/yourfolder2/lastmodified.cgi" -->

# License Notice: 
# Copyright 2004 UPDI Network Enterprise, www.upoint.info/cgi
# You are free to use and distribute this script as long as
# you keep this license notice intact.

############################################################
# FULL PATH (not URL) to the file:
############################################################
$filename = "/hrnet/u/chrb/www/ekm_psa/document.pdf";

############################################################
# Turn on debug mode so that you can see the formats
############################################################
$debug = "0";               	# 1 = ON    0 = OFF
				# Set to "0" after testing
############################################################
# DO NOT EDIT BELOW THIS LINE
############################################################

use POSIX 'strftime';
my $time = (stat $filename)[9];
print "Content-type: text/html\n\n";

print strftime '%a %d.%b.%Y @ %I:%M %p', localtime $time;

if ($debug eq 1){
print "<p>";
print "a - ";
print strftime '%a', localtime $time;
print "<BR>";
print "A - ";
print strftime '%A', localtime $time;
print "<BR>";
print "b - ";
print strftime '%b', localtime $time;
print "<BR>";
print "B - ";
print strftime '%B', localtime $time;
print "<BR>";
print "c - ";
print strftime '%c', localtime $time;
print "<BR>";
print "d - ";
print strftime '%d', localtime $time;
print "<BR>";
print "H - ";
print strftime '%H', localtime $time;
print "<BR>";
print "I - ";
print strftime '%I', localtime $time;
print "<BR>";
print "j - ";
print strftime '%j', localtime $time;
print "<BR>";
print "m - ";
print strftime '%m', localtime $time;
print "<BR>";
print "M - ";
print strftime '%M', localtime $time;
print "<BR>";
print "p - ";
print strftime '%p', localtime $time;
print "<BR>";
print "s - ";
print strftime '%s', localtime $time;
print "<BR>";
print "U - ";
print strftime '%U', localtime $time;
print "<BR>";
print "w - ";
print strftime '%w', localtime $time;
print "<BR>";
print "W - ";
print strftime '%W', localtime $time;
print "<BR>";
print "x - ";
print strftime '%x', localtime $time;
print "<BR>";
print "X - ";
print strftime '%X', localtime $time;
print "<BR>";
print "y - ";
print strftime '%y', localtime $time;
print "<BR>";
print "Y - ";
print strftime '%Y', localtime $time;
print "<BR>";
print "Z - ";
print strftime '%Z', localtime $time;
}

Thanks for any suggestions on what I might be doing wrong.


------------------------------

Date: Fri, 02 Mar 2007 13:46:00 -0500
From: Phil M <jimmyc@trexagi.net>
Subject: Re: display datestamp in HTML
Message-Id: <45sgu2p8g52akvqea1dp659vsuaunc1lj6@4ax.com>

I also saved the file as an shtml, same results:

http://www.greekradio.net/psa.shtml


------------------------------

Date: 2 Mar 2007 10:27:20 -0800
From: "gf" <greg.ferguson@icrossing.com>
Subject: Re: Expressing AND, OR, and NOT in a Single Pattern
Message-Id: <1172860040.202790.30350@z35g2000cwz.googlegroups.com>

On Mar 1, 3:18 pm, "h3xx" <amphetamach...@gmail.com> wrote:
> I like doing things in one line:
>
> print grep { /suspended/ && ! /Data_services/ } <DATA>;


I prefer this method too. For clarity and long-term maintenance it is
much better because the esoterica of regex can make the desired
results hard to figure out and the bugs in the pattern even harder to
find.

Also, speed wise, this is a lot faster. The regex engine has to do a
lot of work that can be short circuited by the booleans.

Sometimes it's better to break the search for matching patterns into
single lines too. It's kind of macho programmer-wise to string it all
together into one mondo regex pattern and have it work, but the logic
can get fragile.

The only thing I'd do differently to these patterns is add an anchor
to the 'Data_services' pattern, like so...

/^<Query id='Data_services/

Anchors speed up regex an incredible amount. I did benchmarks of index
vs various ways of using regex, and an anchored qr// that was
initialized outside a loop was the fastest at finding patterns inside
long strings, when the pattern was at the end of the string. At the
beginning of a string it should be equal to index(). Index() was
faster when finding a fixed string somewhere in the middle of another
string.



------------------------------

Date: 2 Mar 2007 10:44:31 -0800
From: "gf" <greg.ferguson@icrossing.com>
Subject: Re: Expressing AND, OR, and NOT in a Single Pattern
Message-Id: <1172861070.997760.108580@v33g2000cwv.googlegroups.com>

The Regexp::Assemble module on CPAN is way cool for building big
patterns with minimal fuss.

http://search.cpan.org/~dland/Regexp-Assemble-0.28/Assemble.pm

The resulting patterns are very efficient and pretty good when you
want to learn how to write complex regex.



------------------------------

Date: Fri, 02 Mar 2007 11:57:25 -0600
From: brian d  foy <brian.d.foy@gmail.com>
Subject: Re: FAQ 6.21 What's wrong with using grep in a void context?
Message-Id: <020320071157258592%brian.d.foy@gmail.com>

In article <1172789048.615029.15870@k78g2000cwa.googlegroups.com>, h3xx
<amphetamachine@gmail.com> wrote:

> On Mar 1, 2:03 pm, PerlFAQ Server <b...@stonehenge.com> wrote:
> >     In perls older than 5.8.1, map suffers from this problem as well. But
> >     since 5.8.1, this has been fixed, and map is context aware - in void
> >     context, no lists are constructed.

> Why isn't grep context-aware as well?!? I think of grep as the boring
> sister-function of map.

People weren't abusing grep like they do with map. :)

You show examples of modifying $_ in teh blocks for these built-ins,
but the documentation warns against that. It's best to avoid those
practices.

-- 
Posted via a free Usenet account from http://www.teranews.com



------------------------------

Date: 2 Mar 2007 09:56:14 -0800
From: "gf" <greg.ferguson@icrossing.com>
Subject: Re: format issue
Message-Id: <1172858174.577668.272150@z35g2000cwz.googlegroups.com>

On Mar 1, 11:31 pm, "robertchen...@gmail.com"
<robertchen...@gmail.com> wrote:
> My perl use printf to print:
>
>  printf "Tasks number\tTask\tLibrary\tTargets\tBytes\n";
>  printf "%-6d\t%-s\t%-s\t%-5d\t%-10d\n", $task_hash{$task},$2, $1,
> $task_targets{$task}, $task_bytes{$task};
>
> Tasks number    Task    Library Targets Bytes
> 1050    cmd_tsk maint_tl        1050    7340950
> 1000    vcs     maint_tl        1000    953032
> 384     discover        maint_tl        384     27290
> 213     get_v2_build_tsk        relmgmt_tl      339     36951
> 136     runScriptUnix   relmgmt_tl      136     1438804
> 120     wilc_tsk        maint_tl        120     96432
> 73      v3sync_tsk      maint_tl        73      19355
>
> The format not alignment good...
>
> I also tried this way:
> printf "Number of tasks    Task      Library      Targets      Bytes
> \n";
> printf "%5d     %s          %s        %8d          %10d\n", $value,$2,
> $1, $task_targets{$task}, $task_bytes{$task};
>
> Anyone has good solution for print ?


Printf is very capable of giving you the results you want. The problem
is you aren't telling printf how to do it.

As is, your format statement is telling printf() what types of fields
it's outputting, but you're not defining the widths of ALL the fields.
Instead, you're telling it to output the full width of the string
variables, then add a tab, then output the full width of the string
variable, then another tab... but strings are varying lengths so your
columns are wandering.

What you should be doing is adding in the width to each of the string
format markers.

If you don't know what those are, you can preflight your data and find
the length of the longest strings for each of the two columns, then
supply those to printf as part of the format.

And, you can do that two different ways:

1. Build the format statement dynamically after figuring out the
lengths.
2. Supply the lengths using the '*' length marker in the statements,
with the actual precomputed lengths for the fields supplied as
parameters.

Or, if your string fields will ALWAYS be within a certain length, or
you intend to enforce that length, you can just put in the lengths in
the % markers and let printf add space as needed to fill to the end of
the field. Then you can use a single space instead of the tab, to mark
the gaps in the columns.

Printf() is a really powerful tool, but, just like so many other tools
in the Perl toolbox, you have to tell it what  you want it to do or
your results will be different than what you want. It can't read your
mind so you have to be very explicit when giving it directions.



------------------------------

Date: 2 Mar 2007 09:42:17 -0800
From: "gf" <greg.ferguson@icrossing.com>
Subject: Re: how to find the "yesterday" logfile name?
Message-Id: <1172857337.491050.231980@z35g2000cwz.googlegroups.com>

On Mar 2, 2:04 am, "robertchen...@gmail.com" <robertchen...@gmail.com>
wrote:
> see if my directory has many logfiles like this:
>
> log022607.log
> log022707.log
> log022807.log
> log030107.log
>
> today is 030207(03/02/07), I want to find the "yesterday" log, how
> could I do in perl?
> When time is 04/01/07, to get the "yesterday' log, which is
> log033107.log maybe need some special handle, also when process for
> year end...?
>
> thanks very much!

Other ways to attack the same problems...

Precompute what the file's name should be, then use the "-M filename"
operator to  get the number of days since the file was modified based
on the start date of the script. If your script  is always running
this will give you bad results. If it's daily then quits (like a
scheduled job) then it'll be fine.

Again, precompute the file's name, then use "stat(filename)" to get
the file's particulars. You'll get its OS timestamp and can calculate
the delta in seconds from the current time to determine if the file
really has aged one day's worth of seconds.

If you are on Unix and don't know the file's name, then use the find
command ...

my $file_is = `find /path/to/files -m 1 -type t`;

and you'll get the file's name back if one exists that is one day old.



------------------------------

Date: 2 Mar 2007 08:57:04 -0800
From: "whiskey" <alex.habar.nam@gmail.com>
Subject: Match a regular expression
Message-Id: <1172854624.180714.197480@h3g2000cwc.googlegroups.com>

I couldn't find any help on the web (also tried on another group) and
I'm not sure if this is the right place to ask such things. However, I
think most Perl programmers know regular expressions well.

So, my question is: how do I match, using a regular expression,
another regular expression within a string ?

Example: given a string like "$foo =~ /regexp/"*, I want to split it
into tokens: TOK_VAR -> $foo, TOK_BINDOP -> =~, TOK_REGEXP -> /
regexp/. For this, I'm using regular expressions. So how do I match
the regular expression ? Sure, in this example it may be easy, but
what about a string like "$foo =~ /regexp/ && $bar =~ /pxeger/" ?

* No, I'm not writing a Perl interpreter, I just want to know if it is
possible to avoid parsing the string



------------------------------

Date: Fri, 02 Mar 2007 19:39:44 +0100
From: Christian Winter <thepoet_nospam@arcor.de>
Subject: Re: Match a regular expression
Message-Id: <45e86f63$0$15949$9b4e6d93@newsspool4.arcor-online.net>

whiskey wrote:
> I couldn't find any help on the web (also tried on another group) and
> I'm not sure if this is the right place to ask such things. However, I
> think most Perl programmers know regular expressions well.
> 
> So, my question is: how do I match, using a regular expression,
> another regular expression within a string ?
> 
> Example: given a string like "$foo =~ /regexp/"*, I want to split it
> into tokens: TOK_VAR -> $foo, TOK_BINDOP -> =~, TOK_REGEXP -> /
> regexp/. For this, I'm using regular expressions. So how do I match
> the regular expression ? Sure, in this example it may be easy, but
> what about a string like "$foo =~ /regexp/ && $bar =~ /pxeger/" ?
> 
> * No, I'm not writing a Perl interpreter, I just want to know if it is
> possible to avoid parsing the string

That depends on how complex your "string" will be. There's all
kind of funky stuff possible that can make you shoot your foot,
e.g. quotatet regex delimiters in the match pattern, multiline-
regexes (/x modifier), regexes without match operator, precompiled
REs, etc, etc.

If they are only as complex as your example, you might get away
with splitting your string on a set of logical operators and
running something like

my( $TOK_VAR, $TOK_BINDOP, $TOK_REGEXP, $TOK_DELIM, $TOK_MODIFIER ) =
	$string =
	/(\$\S+)\s*([!=]~)\s*m?(.)([^\3]*)(?<!\\)\3([a-z]*)/;

-Chris


------------------------------

Date: Fri, 2 Mar 2007 19:46:25 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Match a regular expression
Message-Id: <118mb4-749.ln1@news.rtij.nl>

On Fri, 02 Mar 2007 08:57:04 -0800, whiskey wrote:

> I couldn't find any help on the web (also tried on another group) and
> I'm not sure if this is the right place to ask such things. However, I
> think most Perl programmers know regular expressions well.
> 
> So, my question is: how do I match, using a regular expression,
> another regular expression within a string ?
> 
> Example: given a string like "$foo =~ /regexp/"*, I want to split it
> into tokens: TOK_VAR -> $foo, TOK_BINDOP -> =~, TOK_REGEXP -> /
> regexp/. For this, I'm using regular expressions. So how do I match
> the regular expression ? Sure, in this example it may be easy, but
> what about a string like "$foo =~ /regexp/ && $bar =~ /pxeger/" ?
> 
> * No, I'm not writing a Perl interpreter, I just want to know if it is
> possible to avoid parsing the string

I'm pretty sure there is something on CPAN for this, but am to lazy to
look it up right now.

M4


------------------------------

Date: Fri, 02 Mar 2007 18:29:31 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: mod_perl error:  (120000) exit was called at
Message-Id: <54r54qF225peuU1@mid.individual.net>

Seansan wrote:
> Gunnar Hjalmarsson wrote:
>> I'm using this function to prevent exit() issues in mod_perl:
>>
>>   sub myexit {
>>     if ($ENV{MOD_PERL}) {
>>       if ($] < 5.006)    {
>>         require Apache;
>>         Apache::exit();
>>       }
>>     }
>>     exit;
>>   }
> 
> Thanks, I will try this. But I am also wondering what the correct way is 
> that mod_perl expects

What makes you think that's not correct? ;-)

Anyway, take a look at 
http://perl.apache.org/docs/1.0/guide/porting.html#Terminating_requests_and_processes__the_exit___and_child_terminate___functions

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Fri, 2 Mar 2007 19:49:01 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: mod_perl errors
Message-Id: <t58mb4-749.ln1@news.rtij.nl>

On Fri, 02 Mar 2007 11:11:17 +0100, Seansan wrote:

> How can I force mod_perl to show the errors that the script makes in 
> debug mode to the screen?
> 
> I would like some more information then :
> 
> Internal Server Error
> The server encountered an internal error or misconfiguration and was 
> unable to complete your request.

That is a webserver error, not a mod_perl error. I don't think you can do
much more than look at the servers error log. You could write a custom 500
page that does this though....

M4


------------------------------

Date: 2 Mar 2007 18:47:59 GMT
From: jt@toerring.de (Jens Thoms Toerring)
Subject: Re: Perl and MySQL
Message-Id: <54r9qvF21go61U1@mid.uni-berlin.de>

Brian Wakem <no@email.com> wrote:
> Charles A. Landemaine wrote:

> > I have a MySQL table that is used to store comments on my blog. I just
> > found out it's filled with spam. The table itself is 2 GB big, with
> > more than half a million spam backlinks. I haven't found much material
> > on how to interact between Perl and MySQL. What I'd like to do is do a
> > simple Perl script that opens the DB table, goes through all rows and
> > deletes all those which contain "<a href". I'll have to do that during
> > the night, not to disrupt the server. How could I do that?
> > Thanks,

> No need for Perl.

> DELETE FROM table WHERE comments LIKE '%<a href%';

Well, as long as the spammers don't get sneaky (and that's
what spammers typically are;-) and put a bit of extra white-
space between the '<' and 'a' or 'a' and the 'href' - in that
case a Perl solution might become necessary. But a better
long-term solution than throwing things out after they already
went into the database would be to modify the script that puts
the data in there to refuse to store data containing a link in
the first place.
                               Regards, Jens
--
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de


------------------------------

Date: 2 Mar 2007 10:16:32 -0800
From: "gf" <greg.ferguson@icrossing.com>
Subject: Re: Q on regex of LWP::Simple data
Message-Id: <1172859392.462622.38490@v33g2000cwv.googlegroups.com>

On Mar 2, 6:15 am, Len Philpot <l...@philpot.org> wrote:

> At this point, I'm very low on the Perl learning cliff (oh, for the
> simplicity and clarity of C! :-), so I'll probably take an
> incrementally-complex approach to parsing it. This whole exercise is for
> my own use and edification, anyway.

Ok. I think you meant "curve" instead of "cliff"...

And "the simplicity and clarity of C"? Perl and C are so similar as
far as their allowing the programmer to write terse and cryptic code,
or very verbose code, and still maintain speed. It's the programmers
choice and not something enforced by the language. That said...

The problem with finding strings or data in HTML pages is the
variablity of the format of the pages. HTML is unstructured and relies
on the browser to turn the data into human-readable form. For our
purposes as programmers it makes our job more difficult because we
want to grab the easiest tool to do the job and regex seems to be the
tool to handle finding data in lines that change.

The problem is that HTML allows arbitrary line breaks in the file and
the browser will gobble them then parse the page then format it for
us. Perl doesn't do that. It's doing what you told it to (usually)
and, in this case, what you told it to do is not nearly as complex as
what the browser is doing.

You can get closer to what the browser is doing by stripping all the
line-end characters from the document, then applying your regex
pattern reiteratively to the resulting single line, OR you can tell
the regex engine to ignore line-ends for you. Check out the 'm' and
's' options to regex. Combined with 'g' you should be homing in on the
data you want. Usually.

Sometimes those are still going to fail so you have to dig out the big
guns and parse the document like a browser. There's HTML::Parser and
various derived modules. Of those I like HTML::TreeBuilder. Pass it
HTML using

my $t = HTML::TreeBuilder->new_from_content(get('your url'));

and it will parse it and build a tree. It'll lock the tree and turn it
into an HTML::Element object which you can search and extract info
using the methods of that object. Of those I like the 'look_down()'
method because it's so flexible. Give it the right parameters and
it'll let you loop through the page and find whatever you want. Of
course, as always you have to tell it correctly, and that can be a
tough thing to determine, but that's a different subject for a
different time and probably a different group.

Another way to attack the same problem is to use the various xpath
implementations for HTML in Perl. Search on CPAN and you'll find some.
xpath is a cool way of looking at HTML but, at least for me, it's not
as intuitive as how TreeBuilder and the parsers do it.



------------------------------

Date: Fri, 2 Mar 2007 12:50:39 -0600
From: Len Philpot <len@philpot.org>
Subject: Re: Q on regex of LWP::Simple data
Message-Id: <1xkrvdmpjfti9.lkon85ttn774$.dlg@40tude.net>

On 2 Mar 2007 10:16:32 -0800, gf wrote:

> On Mar 2, 6:15 am, Len Philpot <l...@philpot.org> wrote:
> 
>> At this point, I'm very low on the Perl learning cliff (oh, for the
>> simplicity and clarity of C! :-), so I'll probably take an
>> incrementally-complex approach to parsing it. This whole exercise is for
>> my own use and edification, anyway.
> 
> Ok. I think you meant "curve" instead of "cliff"...
> 
> And "the simplicity and clarity of C"? Perl and C are so similar as
> far as their allowing the programmer to write terse and cryptic code,
> or very verbose code, and still maintain speed. It's the programmers
> choice and not something enforced by the language. That said...

Actually, 'cliff' was intentional, as was the C reference - A weak
attempt at humor, I guess. I'm just trying to come to terms with the
looseness that Perl allows (although doesn't require). It's purely my
preference : I like algorithmic flexibility, but with a tighter
syntactic regimen, i.e., for me TIMTOWTDI gets in the way of learning
"the best/right way to do X". However, I'm sure its's very different for
others (as is obviously the case). I really like the way C is not as
abstracted - "the machine prints through" - but once again that's my
preference. Lots of very knowledgeable people feel differently. :-)

 
> The problem with finding strings or data in HTML pages is the
> variablity of the format of the pages. HTML is unstructured and relies
> on the browser to turn the data into human-readable form. For our
> purposes as programmers it makes our job more difficult because we
> want to grab the easiest tool to do the job and regex seems to be the
> tool to handle finding data in lines that change.

Fortunately in this case, what I'm looking for is (AFAICT) uniquely
labeled and fairly contained. However, newlines do occur and I'll haev
to deal with that.

 
> Sometimes those are still going to fail so you have to dig out the big
> guns and parse the document like a browser. There's HTML::Parser and
> various derived modules. Of those I like HTML::TreeBuilder. Pass it
> HTML using
> 
> my $t = HTML::TreeBuilder->new_from_content(get('your url'));

Thanks for the suggestions - I'll take a look at them.
-- 

 ---- Len Philpot -------- l e n @ p h i l p o t . o r g  (no spaces)
 ------- ><> ------------- http://pages.suddenlink.net/lenphilpot/


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 188
**************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[28944] in Perl-Users-Digest

Perl-Users Digest, Issue: 188 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Mar 2 14:09:59 2007

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Mar 2 14:09:59 2007