[22193] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4414 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 16 14:05:46 2003

Date: Thu, 16 Jan 2003 11:05:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 16 Jan 2003     Volume: 10 Number: 4414

Today's topics:
    Re: Another HTML parsing question <bart.lateur@pandora.be>
        Day of the month redirection (Ralph)
    Re: Day of the month redirection <mbudash@sonic.net>
    Re: Day of the month redirection (Randal L. Schwartz)
    Re: Day of the month redirection <comdog@panix.com>
    Re: Day of the month redirection <flavell@mail.cern.ch>
    Re: Forced switch from PERL to ASP/VBSCRIPT. Where do I (Peter Scott)
    Re: howto specify maybe 1+ space regex SOLVED <lance@augustmail.com>
    Re: howto specify maybe 1+ space regex SOLVED <bongie@gmx.net>
        howto specify maybe 1+ space regexp <lance@augustmail.com>
    Re: Is logic in regular expression possible? (Michael J.)
    Re: Is logic in regular expression possible? <bernard.el-hagin@DODGE_THISlido-tech.net>
    Re: Is logic in regular expression possible? <bart.lateur@pandora.be>
    Re: Is logic in regular expression possible? (Tad McClellan)
    Re: Is logic in regular expression possible? <jurgenex@hotmail.com>
    Re: Is logic in regular expression possible? <pinyaj@rpi.edu>
    Re: Matching entries in lists (Bernd Schandl)
    Re: perl cgi redirect to variables instead of displayin (Tad McClellan)
    Re: Problem with DBI (Jason Singleton)
    Re: Question about high performance spidering in perl (Peter Scott)
    Re: Question about high performance spidering in perl <extendedpartition@NOSPAM.yahoo.com>
    Re: Question about high performance spidering in perl <extendedpartition@NOSPAM.yahoo.com>
    Re: Question about high performance spidering in perl <uri@stemsystems.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 16 Jan 2003 14:46:18 GMT
From: Bart Lateur <bart.lateur@pandora.be>
Subject: Re: Another HTML parsing question
Message-Id: <lofd2vgs3htrlvilol4ddt2fnb88edb5h1@4ax.com>

VisionHolderTech wrote:

>I have been reading the docs for HTML::Parser and ALL related modules
>and am still just as confused.  I have tried searching the usenet but
>everyone seems to be wanting to REMOVE HTML tags -- I want EXTRACT and
>PROCESS the stuff between them.

>Scenario:
>
>I am trying to extract data from tables on in HTML documents and
>format them for input into a database.

I've HTML::TokeParser for this kind of task before. This parser allows
you to process your file one token (start tag, end tag, plain text, ...)
at a time, pretty much like you can read an ordinary text file one line
at a time. With the range operator, "..", you can select sections
between "<td>" and "</td>", for example. It's a good thing common
browsers are/were very picky on those tables, so it'll be extremely rare
to encounter such a start tag, without an associated end tag.

As for your data: extracting the stuff between "<form>" and "</form>"
will likely do. This here shows how to filter out a record:

	my $file = 'test.html';
	use HTML::TokeParser;
	my $p = HTML::TokeParser->new($file) or die 
	  "Cannot read file '$file': $!";
	while(my $token = $p->get_token) {
	    if((my $table_open = $token->[0] eq 'S' && $token->[1] eq
'table')
	      .. (my $table_close = $token->[0] eq 'E' && $token->[1] eq
'table')) {
	        # $table_open is true on the start tag,
	        # $table_close on the end tag
	        # just in case you need it...
	        if((my $form_open = $token->[0] eq 'S' && $token->[1] eq
'form')
	          .. (my $form_close = $token->[0] eq 'E' && $token->[1]
eq  'form')) {
	            # same for $form_open and $form_close
	            # in fact, we'll use them here:
	            print "New record:\n" if $form_open;
	            print $token->[1] if $token->[0] eq 'T';  # plain
text
	            print "\nEnd of record:\n" if $form_close;
	        }
	    }
	}


Sorry for the word wrap in the code... you'll likely see where that
happened.

This prints the text from the html wrapped in "form" tags, which in turn
must be wrapped in "table" tags. Begin and end of each record is clearly
marked. And in a similar way, you can use the other nested tags, to
identify various fields.

Do not rely on contiguity of the text: it may happen that two
consecutive "T" tokens are returned, each with only part of the text. It
depends on the buffer size. So the safest thing to do, is to append all
text to a common string, which you initialized in an open tag, and may
process as a whole, in the closing tag.

   HTH,
   Bart.


------------------------------

Date: 16 Jan 2003 08:30:32 -0800
From: ralph@beseenmg.com (Ralph)
Subject: Day of the month redirection
Message-Id: <58772576.0301160830.48dba579@posting.google.com>

Please help - I'm a rookie. I'm searching for a script that will
redirect visitors of my website to a specific page depending on the
day of the month and uses the clock on my server to determine the
date.

I found a free javascript that does this but it uses the clock on the
visitor's computer. I've been told javascripts can't be modified to
read from my server and I need a perl script. Can anyone help me
please?

Thank you

Ralph


------------------------------

Date: Thu, 16 Jan 2003 16:56:41 GMT
From: Michael Budash <mbudash@sonic.net>
Subject: Re: Day of the month redirection
Message-Id: <mbudash-E17F5A.08564016012003@typhoon.sonic.net>

In article <58772576.0301160830.48dba579@posting.google.com>,
 ralph@beseenmg.com (Ralph) wrote:

> Please help - I'm a rookie. I'm searching for a script that will
> redirect visitors of my website to a specific page depending on the
> day of the month and uses the clock on my server to determine the
> date.
> 
> I found a free javascript that does this but it uses the clock on the
> visitor's computer. I've been told javascripts can't be modified to
> read from my server and I need a perl script. Can anyone help me
> please?

perl's localtime function is what you want. from the command prompt type:

perldoc - f localtime

here's a minimal perl cgi script to get you started:

#!/path/to/perl

use CGI qw/:standard/;

print redirect('http://' . $ENV{HTTP_HOST}. '/page' . (localtime)[3] . 
".html");

hth-


------------------------------

Date: 16 Jan 2003 09:27:46 -0800
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Day of the month redirection
Message-Id: <86adi1gg31.fsf@red.stonehenge.com>

>>>>> "Ralph" == Ralph  <ralph@beseenmg.com> writes:

Ralph> Please help - I'm a rookie. I'm searching for a script that will
Ralph> redirect visitors of my website to a specific page depending on the
Ralph> day of the month and uses the clock on my server to determine the
Ralph> date.

Ralph> I found a free javascript that does this but it uses the clock on the
Ralph> visitor's computer. I've been told javascripts can't be modified to
Ralph> read from my server and I need a perl script. Can anyone help me
Ralph> please?

    #!/usr/bin/perl

    use CGI qw(redirect);

    my ($day_of_month) = (localtime)[3];
    print redirect sprintf "/monthly/%02d.html", $day_of_month;

This redirects to /monthly/01.html on the first day of the month, etc.

print "Just another Perl hacker,";

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


------------------------------

Date: Thu, 16 Jan 2003 11:55:51 -0600
From: brian d foy <comdog@panix.com>
Subject: Re: Day of the month redirection
Message-Id: <160120031155516948%comdog@panix.com>

In article <58772576.0301160830.48dba579@posting.google.com>, Ralph
<ralph@beseenmg.com> wrote:

> Please help - I'm a rookie. I'm searching for a script that will
> redirect visitors of my website to a specific page depending on the
> day of the month and uses the clock on my server to determine the
> date.

i have always found it easier to have a static URL which returns
the right data rather than redirecting every request.  just update
the page (or link to the right page) depending on the day, using
your favorite scheduling utility.

-- 
brian d foy, comdog@panix.com


------------------------------

Date: Thu, 16 Jan 2003 18:26:11 +0100
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: Day of the month redirection
Message-Id: <Pine.LNX.4.40.0301161821350.18281-100000@lxplus069.cern.ch>

On Jan 16, Michael Budash inscribed on the eternal scroll:

>  ralph@beseenmg.com (Ralph) wrote:
>
> > Please help - I'm a rookie. I'm searching for a script that will
> > redirect visitors of my website to a specific page depending on the
> > day of the month and uses the clock on my server to determine the
> > date.

> perl's localtime function is what you want.

OK; but don't forget that it works in terms of the server's timezone -
which might or might not be the real timezone applicable at the
server's physical location, depending on how the server's admin has
set it up[1] - but has no inherent knowledge of what the timezone
might be a the client's location.

If this is what you want, then fine; I just wanted to mention it.


[1] I've often configured web servers to UTC, regardless of the real
local timezone.




------------------------------

Date: Thu, 16 Jan 2003 16:52:01 GMT
From: peter@PSDT.com (Peter Scott)
Subject: Re: Forced switch from PERL to ASP/VBSCRIPT. Where do I begin?
Message-Id: <RqBV9.48831$Yo4.3499141@news1.calgary.shaw.ca>

In article <81iV9.12677$Vf3.133412@vixen.cso.uiuc.edu>,
 Dan <dharding@uiuc.edu> writes:
>Due to a merger of university departments, I am potentially being forced 
>to change the way I do web development. I've always used PERL for my CGI 
>development. I'm now being told by the new derpartment head that "he 
>doesn't want PERL running on any of his servers; it's too CPU-intensive" 
>so I must now do all coding in VBScript/ASP. Where do I start? (other 
>than finding a new job). Are there any "VB for PERL Afficionados" types 
>of books or resources? What do you recommend for language references 
>(downloadable/printable preferred)? How about "teach yourself" books? 
>Assume I've never had any exposure to Visual Basic.
>
>Not only do I have to learn the language(s)/techniques, but I then have 
>to recode all existing applications. Yay.

In addition to the serious suggestions you have received... perhaps you
could write a VBScript wrapper that would execute perl on the remainder
of the script, which would not need to be changed.

Or conversely, how about this: we get Damian drunk enough to write
Lingua::Obsolete::VBScript.  You code some VBScript applications that
are actually run through Perl and then when your department head
wonders why they are running so much faster than the other ASPs you
reveal the truth and expose him for the minion of M$ that he really is.

-- 
Peter Scott
http://www.perldebugged.com


------------------------------

Date: Thu, 16 Jan 2003 11:27:14 -0600
From: Lance Hoffmeyer <lance@augustmail.com>
Subject: Re: howto specify maybe 1+ space regex SOLVED
Message-Id: <Xns930574810B74Dlanceaugustmailcom@216.166.71.236>

$text =~ s/($base1\s*?\=\s*?)[\d]{2,3}/$1$b4r4/g;


Lance Hoffmeyer <lance@augustmail.com> wrote in 
news:Xns930573506DC22lanceaugustmailcom@216.166.71.236:

> I am having problems on specifying a regexp.
> I want to include 0 or more spaces.
> 
> Here is what works but not as efficient as what I want:
> 
> $text =~ s/($base1.*?\=.*?)[\d]{2,3}/$1$b4r4/g;
> 
> Here is what doesn't work but is what I want:
> 
> $text =~ s/($base1\s+?\=\s+?)[\d]{2,3}/$1$b4r4/g;
> 
> What have I misspecified?
> 
> Lance
> 



------------------------------

Date: Thu, 16 Jan 2003 18:50:10 +0100
From: "Harald H.-J. Bongartz" <bongie@gmx.net>
Subject: Re: howto specify maybe 1+ space regex SOLVED
Message-Id: <6689043.NonhoigJhi@nyoga.dubu.de>

Lance Hoffmeyer wrote:
> $text =~ s/($base1\s*?\=\s*?)[\d]{2,3}/$1$b4r4/g;

- No need to escape an '='.
- No need for the '?'s.  Do you know what they do?
  Why do you think you need them here?
- No need for brackets around '\d'.
  Why do you think you need them here?

Ciao,
        Harald
-- 
Harald H.-J. Bongartz <bongie@gmx.net>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
++$sheep while !$asleep;


------------------------------

Date: Thu, 16 Jan 2003 11:20:13 -0600
From: Lance Hoffmeyer <lance@augustmail.com>
Subject: howto specify maybe 1+ space regexp
Message-Id: <Xns930573506DC22lanceaugustmailcom@216.166.71.236>

I am having problems on specifying a regexp.
I want to include 0 or more spaces.

Here is what works but not as efficient as what I want:

$text =~ s/($base1.*?\=.*?)[\d]{2,3}/$1$b4r4/g;

Here is what doesn't work but is what I want:

$text =~ s/($base1\s+?\=\s+?)[\d]{2,3}/$1$b4r4/g;

What have I misspecified?

Lance


------------------------------

Date: 16 Jan 2003 06:06:56 -0800
From: upro@gmx.net (Michael J.)
Subject: Re: Is logic in regular expression possible?
Message-Id: <bc3e2bd9.0301160606.5c041266@posting.google.com>

Bernard El-Hagin <bernard.el-hagin@DODGE_THISlido-tech.net> wrote in message news:<b065tm$fjq$1@korweta.task.gda.pl>...
> In article <bc3e2bd9.0301160338.1c8fb98b@posting.google.com>, Michael
> J. wrote:
> > Hi all!
> > 
> > I wonder if the following is possible:
> > 
> > I havea word, let's say "hello", and I want to write a regex that
> > replaces vowel in the following manner:
> > a -> e
> > e -> a
> > i -> ai
> > o -> u
> > u -> o
> > 
> > ... and I'd like to do it in one line, like this:
> > s/a,e,i,o,u/e,a,ai,u,o/
> 
> 
> my %conv = ( a => 'e',
>              e => 'a',
>              i => 'ai',
>              o => 'u',
>              u => 'o' );
> 
> 
> s/([aeiou])/$conv{$1}/g;
> 
> 
> 
> Cheers,
> Bernard

Thanks, I didn't find this...
Now, if I want to replace "i" only if it is preceeded by a consonant,
would that be correct:
s/ae[qwrtzpsdfghjklyxcvbnm]iou/$conv{$1}/g

I modified tha table like this:
my %conv = ( a => 'e',
             e => 'a',
             [qwrtzpsdfghjklyxcvbnm]i => 'ai',
             o => 'u',
             u => 'o');

I don't know if that's correct...

tia,
Upro


------------------------------

Date: Thu, 16 Jan 2003 14:24:29 +0000 (UTC)
From: Bernard El-Hagin <bernard.el-hagin@DODGE_THISlido-tech.net>
Subject: Re: Is logic in regular expression possible?
Message-Id: <b06fat$p19$2@korweta.task.gda.pl>

In article <bc3e2bd9.0301160606.5c041266@posting.google.com>, Michael
J. wrote:
> Bernard El-Hagin <bernard.el-hagin@DODGE_THISlido-tech.net> wrote in
message news:<b065tm$fjq$1@korweta.task.gda.pl>...
>> In article <bc3e2bd9.0301160338.1c8fb98b@posting.google.com>, Michael
>> J. wrote:
>> > Hi all!
>> > 
>> > I wonder if the following is possible:
>> > 
>> > I havea word, let's say "hello", and I want to write a regex that
>> > replaces vowel in the following manner:
>> > a -> e
>> > e -> a
>> > i -> ai
>> > o -> u
>> > u -> o
>> > 
>> > ... and I'd like to do it in one line, like this:
>> > s/a,e,i,o,u/e,a,ai,u,o/
>> 
>> 
>> my %conv = ( a => 'e',
>>              e => 'a',
>>              i => 'ai',
>>              o => 'u',
>>              u => 'o' );
>> 
>> 
>> s/([aeiou])/$conv{$1}/g;
>> 
>> 
>> 
>> Cheers,
>> Bernard
> 
> Thanks, I didn't find this...
> Now, if I want to replace "i" only if it is preceeded by a consonant,
> would that be correct:
> s/ae[qwrtzpsdfghjklyxcvbnm]iou/$conv{$1}/g


No. For one thing $1 is not set because you have no capturing parens
in your regex. Besides that the pattern will match a string which
consists of the letter 'a' followed by the letter 'e' followed
by *one* of the letters in the character class (the stuff between [
and ]) followed by 'i', then by 'o', then by 'u'. You don't seem to
have basic knowledge of regexes in Perl so I suggest looking at:


   perldoc perlretut
   perldoc perlre


> I modified tha table like this:
> my %conv = ( a => 'e',
>              e => 'a',
>              [qwrtzpsdfghjklyxcvbnm]i => 'ai',


No, no, no. The keys of a hash are not regular expressions. Please
take a look at:


   perldoc perldata


Cheers,
Bernard
--
echo 42|perl -pe '$#="Just another Perl hacker,"'


------------------------------

Date: Thu, 16 Jan 2003 14:51:18 GMT
From: Bart Lateur <bart.lateur@pandora.be>
Subject: Re: Is logic in regular expression possible?
Message-Id: <aihd2vkh9k1o0fgsv8cg97q2ictbfdnouh@4ax.com>

Michael J. wrote:

>Now, if I want to replace "i" only if it is preceeded by a consonant,
>would that be correct:
>s/ae[qwrtzpsdfghjklyxcvbnm]iou/$conv{$1}/g

No. First of all, if your pattern is more than one character, use
alternatives, separated with "|". And second, use lookbehind to verify
it comes after a consonant. After all, you don't want this to be
actually included in the match.

	s/([aeou]|(?<=[qwrtzpsdfghjklyxcvbnm])i)/$conv{$1}/g;

(untested)

-- 
	Bart.


------------------------------

Date: Thu, 16 Jan 2003 08:39:10 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Is logic in regular expression possible?
Message-Id: <slrnb2dh0e.ehj.tadmc@magna.augustmail.com>

Michael J. <upro@gmx.net> wrote:
> Bernard El-Hagin <bernard.el-hagin@DODGE_THISlido-tech.net> wrote in message news:<b065tm$fjq$1@korweta.task.gda.pl>...
>> In article <bc3e2bd9.0301160338.1c8fb98b@posting.google.com>, Michael
>> J. wrote:

>> > I want to write a regex that
>> > replaces vowel in the following manner:
>> > a -> e
>> > e -> a
>> > i -> ai
>> > o -> u
>> > u -> o

>> my %conv = ( a => 'e',
>>              e => 'a',
>>              i => 'ai',
>>              o => 'u',
>>              u => 'o' );
>> 
>> 
>> s/([aeiou])/$conv{$1}/g;


> Now, if I want to replace "i" only if it is preceeded by a consonant,
> would that be correct:


What happened when you tried it?


> s/ae[qwrtzpsdfghjklyxcvbnm]iou/$conv{$1}/g


No. Your modified pattern matches strings that are 6-characters long,
there are no 6-characters keys in %conv, and it does not capture the 
matched characters into $1.


   tr/aeou/eauo/;
   s/(?<=[^aeiou])i/ai/g;  # matches only if i preceded by non-vowel

or, with %conv as above:

   s/([aeou]|(?<=[^aeiou])i)/$conv{$1}/g;

or, same thing only easier to read:

   s/([aeou]          # always replace these
     |                # or
     (?<=[^aeiou])i)  # replace this only if following a non-vowel char
    /$conv{$1}/gx;


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Thu, 16 Jan 2003 15:30:49 GMT
From: "J�rgen Exner" <jurgenex@hotmail.com>
Subject: Re: Is logic in regular expression possible?
Message-Id: <JeAV9.14898$7O4.3351@nwrddc02.gnilink.net>

Michael J. wrote:
> I havea word, let's say "hello", and I want to write a regex that
> replaces vowel in the following manner:
> a -> e
> e -> a
> i -> ai
> o -> u
> u -> o

We discussed exactly the same problem just last week in the thread "Using
tr/// - Am I barking up the wrong tree?". You may want to re-read that
thread.

jue




------------------------------

Date: Thu, 16 Jan 2003 11:06:12 -0500
From: Jeff 'japhy' Pinyan <pinyaj@rpi.edu>
To: "Michael J." <upro@gmx.net>
Subject: Re: Is logic in regular expression possible?
Message-Id: <Pine.SGI.3.96.1030116110420.357386A-100000@vcmr-64.server.rpi.edu>

[posted & mailed]

On 16 Jan 2003, Michael J. wrote:

>Now, if I want to replace "i" only if it is preceeded by a consonant,
>would that be correct:
>s/ae[qwrtzpsdfghjklyxcvbnm]iou/$conv{$1}/g

No.  You'd need to do something like:

  s/([aeou]|(?<=[b-df-hj-np-tv-z])i)/$conv{$1}/g;

And the hash would stay the same.  The regex matches either one of the
letters 'a', 'e', 'o', or 'u', OR an 'i' that is preceded by a consonant.
Lowercase, that is.

-- 
Jeff Pinyan            RPI Acacia Brother #734            2003 Rush Chairman
"And I vos head of Gestapo for ten     | Michael Palin (as Heinrich Bimmler)
 years.  Ah!  Five years!  Nein!  No!  | in: The North Minehead Bye-Election
 Oh.  Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)



------------------------------

Date: 16 Jan 2003 08:28:06 -0800
From: schandl@gmx.net (Bernd Schandl)
Subject: Re: Matching entries in lists
Message-Id: <7b52b64e.0301160828.2b32a102@posting.google.com>

"John W. Krahn" <krahnj@acm.org> wrote in message news:<3E25FC54.5026EA5C@acm.org>...
> Bernd Schandl wrote:
> > 
> > schandl@gmx.net (Bernd Schandl) wrote in message news:<7b52b64e.0301100655.1a7035c3@posting.google.com>...
> > > [...]
> > 
> > Thanks a lot for the lively discussions. After considering what I read here
> > and in a German newsgroup and thinking a bit by myself (!), I settled on the
> > following solution. Assuming I have the first list in a hash %numcat,
> > I wrote the following routine, which gets a number from the second list as
> > an argument and returns a hash key (or undef if no fitting entry is found):
> > 
> > sub findkey {
> >   my ( $number ) = @_;
> >   for ( keys( %numcat ) ) { if ( $number =~ /^$_/ ) { return $_ } }
> 
> If $number is the partial phone number you are searching for and the
> keys from %numcat are the full phone numbers then this won't work.

Fortunately, it is the other way: $number is always the complete number and
the keys in %numcat are partial or complete numbers.

> [..]
> Or you could use index:
> [...] 
> Or substr:
> [...] 

When using one of those, wouldn't there be several comparisons (internally) for
each pair? That's what I would image how a string "123" is identified as a
substring of "9878971239879"...

   Bernd


------------------------------

Date: Thu, 16 Jan 2003 08:11:28 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: perl cgi redirect to variables instead of displaying HTML question
Message-Id: <slrnb2dfcg.ee8.tadmc@magna.augustmail.com>

Entfred <entfred@hotmail.com> wrote:

> I have a CGI perl program which has a form.
> The form accepts input parameters that are passed to 
> a search engine.  The search engine is a CGI program
> that queries a database.  
> 
> When my CGI program calls the search engine 


Is is supposed to be a GET or a POST request?

I expect that the search engine's terms of service allow
you to do what you plan to do?


> via a form using
> hidden fields, 


How, exactly, are you invoking this second CGI program?


> Question - Is there any way for me to redirect the output of 
> the search engine results to variables inside my perl program
> rather than the data going to the screen?  This way, I could
> manipulate the returned data from the search engine.


The LWP::* modules are good for that sort of thing.

Here's a program that does a CPAN search and returns the
resulting HTML for further processing:

---------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;

my $url = 'http://search.cpan.org/search';

my $searchterm = 'UserAgent';    # from CGI::param() in a real program
                                 # may need to URL encode it...

my $result = get "$url?query=$searchterm";

print $result;   # here is the "further processing" :-)
---------------------------------


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 16 Jan 2003 06:14:07 -0800
From: jsn@microlib.demon.co.uk (Jason Singleton)
Subject: Re: Problem with DBI
Message-Id: <5b7e0283.0301160614.3d7a9a3c@posting.google.com>

jsn@microlib.demon.co.uk (Jason Singleton) wrote in message news:<5b7e0283.0301160132.20a815d0@posting.google.com>...
> Jeff Zucker <jeff@vpservices.com> wrote in message news:<3E25A087.3010802@vpservices.com>...
> > Jason Singleton wrote:
> > 
> > > Can't locate auto/DBI/prepare.al in @INC (@INC contains: C:/Perl/lib
> > > C:/Perl/site/lib . C:/Apache2/cgi-bin/WebOPAC/packages) at
> > > drivers/text.pl line 27 Compilation failed in require at
> > > C:/Apache2/cgi-bin/WebOPAC/webopac.cgi line 69.
> > 
> > 
> > It looks like DBI is not installed, use PPM to install it if you are 
> > using ActiveState.
> 
> DBI works fine with a ADO connection it's when I try to use AnyData to
> access a PIPE deliminated file.

I have solved the problem.  I'd put DBI->prepate instead of
$dbh->prepare, daft mistake but easily missed :)


------------------------------

Date: Thu, 16 Jan 2003 16:36:27 GMT
From: peter@PSDT.com (Peter Scott)
Subject: Re: Question about high performance spidering in perl
Message-Id: <fcBV9.47493$sV3.2637990@news3.calgary.shaw.ca>

In article <x77kd6l3qi.fsf@mail.sysarch.com>,
 Uri Guttman <uri@stemsystems.com> writes:
>>>>>> "EP" == Extended Partition <extendedpartition@NOSPAM.yahoo.com> writes:
>
>  EP> <snip>
>  >> Why don't you use a package that's already been written - in Perl - to
>  >> do what you want?  Use Harvest-NG, a powerful open source spidering 
>  >> and indexing system.  http://webharvest.sourceforge.net/ng/.
>
>  EP> I considered that. But my client has some specialized requirements
>  EP> that would require an almost total rewrite of an existing package. 

What specialized requirements does your client have that make you think
that it won't be easier to either use existing customization hooks in
Harvest-NG or to diddle its source?

>  EP> in all, it's easier to develop from scratch.

That's difficult to believe.  The onus is on you to prove this assertion.

>i have developed 2 major crawlers (one in c, the other in perl) and
>there are many issues to deal with and it will take you longer to do
>than you would think. 

What he said.  I haven't written a crawler, but I have spent several
years evaluating, tuning, customizing, integrating, and hacking a number
of different crawlers running in a large application.  Crawling is far 
more involved than most people suspect.

-- 
Peter Scott
http://www.perldebugged.com


------------------------------

Date: Thu, 16 Jan 2003 11:44:16 -0600
From: Extended Partition <extendedpartition@NOSPAM.yahoo.com>
Subject: Re: Question about high performance spidering in perl
Message-Id: <sfrd2v8j72sh47f0mujm7v0ovsn9o953v0@4ax.com>

<snip>
>>  EP> I considered that. But my client has some specialized requirements
>>  EP> that would require an almost total rewrite of an existing package. 
>
>What specialized requirements does your client have that make you think
>that it won't be easier to either use existing customization hooks in
>Harvest-NG or to diddle its source?

This particular client has a lot of "security" considerations and
using an already developed open source package just doesn't appeal to
them. I called them up this morning and brought it up and their answer
was a simple "absolutely not".  I might, however, use Harvest-NG as a
base for my own project to get ideas, algorithms, etc.

>That's difficult to believe.  The onus is on you to prove this assertion.

Might be difficult to believe but it's nonetheless true. This
particular client has already given me a list of requirements that's
26 pages long. After looking at the software it would take me more
time to analyze the changes that need to be made, restructure the code
to my needs, and modify the existing code (while adding what I need to
it) than writing it from scratch.  After looking at the requirements
it would basically entail an entire rewrite of the Hargest-NG code to
accomplish my goal.  But besides that, the client has given me a firm
"no" on this idea.  

>>i have developed 2 major crawlers (one in c, the other in perl) and
>>there are many issues to deal with and it will take you longer to do
>>than you would think. 
>
>What he said.  I haven't written a crawler, but I have spent several
>years evaluating, tuning, customizing, integrating, and hacking a number
>of different crawlers running in a large application.  Crawling is far 
>more involved than most people suspect.

I'm learning that. To be honest when I accepted this project I was
under the assumption that I was facing a few weeks worth of work. Now
I realize it'll be more like several months if I'm lucky. I am going
to download the available perl based crawlers and see how they do
things then see if I can use ANY of their code in my application. But
it looks like my team has a LONG road ahead of us.

Thanks for your input! You guys are helping this to become a LOT
clearer!

Extended


------------------------------

Date: Thu, 16 Jan 2003 11:45:57 -0600
From: Extended Partition <extendedpartition@NOSPAM.yahoo.com>
Subject: Re: Question about high performance spidering in perl
Message-Id: <0srd2vcfalu62ptd49pterbtkbu9n8874q@4ax.com>

<snip>
>i have developed 2 major crawlers (one in c, the other in perl) and
>there are many issues to deal with and it will take you longer to do
>than you would think. the bigges issue usually is scaling more than
>crawling logic. depending on how many sites you want to crawl in total
>and how often you want to crawl them, you can choose many different
>crawler architectures. also you say you need special processing which
>can be the hardest part (it was in the perl crawler i did. WAY too many
>special crunching rules). some of the processing steps may cause major
>design changes in how the crawler works (e.g. rules to extract new urls,
>revisit url frequency). i wouldn't recommend you tackle this without
>getting experienced help before you get in too deep. the perl crawler i
>did was a complete rewrite of a very bad program the client had done for
>them. unfortunately most of the crawling and processing rules were
>already encoded in that program and we had to painfully analyze it to
>keep the same rules. had we started from a fresh design, the whole thing
>would have been easier. so don't just code up this with the first perl
>hacker you can hire. realize that it is a critical part of your system
>and you should get it done correctly the first time. this is a
>professional and courteous piece of advice for you. real world class
>crawlers are not trivial.
>
>uri


Thanks for the advice and tips. Seems like this is going to be a major
job!  I'm doing my research as you suggested and I'm trying different
algorithms, ideas, etc. So hopefully this thing will pan out as a very
positive thing for both me and my client.

Thanks Again,
Extended


------------------------------

Date: Thu, 16 Jan 2003 18:02:02 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Question about high performance spidering in perl
Message-Id: <x7ptqxdld1.fsf@mail.sysarch.com>

>>>>> "EP" == Extended Partition <extendedpartition@NOSPAM.yahoo.com> writes:


  >>> i have developed 2 major crawlers (one in c, the other in perl) and
  >>> there are many issues to deal with and it will take you longer to do
  >>> than you would think. 
  >> 
  >> What he said.  I haven't written a crawler, but I have spent several
  >> years evaluating, tuning, customizing, integrating, and hacking a number
  >> of different crawlers running in a large application.  Crawling is far 
  >> more involved than most people suspect.

  EP> I'm learning that. To be honest when I accepted this project I was
  EP> under the assumption that I was facing a few weeks worth of work. Now
  EP> I realize it'll be more like several months if I'm lucky. I am going
  EP> to download the available perl based crawlers and see how they do
  EP> things then see if I can use ANY of their code in my application. But
  EP> it looks like my team has a LONG road ahead of us.

just some more info for you. the crawler we did in c took 3 full time
people 6 months to do from scratch with no prototype or other code to
study. it was scaled for 2000 parallel fetches and full site crawls
around the world. full processing and indexing were done in another
system. it had a complete event loop system and message passing. it ran
on a single low end sparc and was used by northern light, which was the
top search engine a few years ago.

the perl crawler was based on stem and that made much of the previous
crawler's infrastructure go away as stem provides the event loop,
message passing and many other services. so this one took 1 person about
3 months to write. but we also did all the processing in the same system
and there were a myriad of painful heuristic details where we spent much
of our (2 people) 6 month total sentence. this only crawled a fixed
number of sites (20k) and not always to their full depth.

so that should give you a picture of how much work it takes to write
crawlers. two real projects written by good developers and the minimum
was 3 months for a crawler that didn't even cover the whole net. 

the scaling is always the key issue. that is not an easy problem and
everyone has different ways to handle it. a large scale crawler requires
some complex combination of events/threads/processes/boxes and all the
communication and control amongst them. there are so many design
possibilties that you have to know what you are doing lest you go down a
path to a bad architecture. you will know you did that when a year later
you are cursing the design because it doesn't scale well!

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org
Damian Conway Perl Classes - January 2003 -- http://www.stemsystems.com/class


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4414
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[22193] in Perl-Users-Digest

Perl-Users Digest, Issue: 4414 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Thu Jan 16 14:05:46 2003

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 16 14:05:46 2003