[12651] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 60 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jul 7 14:47:21 1999

Date: Wed, 7 Jul 1999 11:36:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 7 Jul 1999     Volume: 9 Number: 60

Today's topics:
        Jet ODBC BUG. (was Re: Win32::ODBC bug. (was Re: Access webmuse@my-deja.com
    Re: Jet ODBC BUG. (was Re: Win32::ODBC bug. (was Re: Ac <kerih@nospam.sprintmail.com>
        Joining a string? (pedro)
    Re: Joining a string? <gellyfish@gellyfish.com>
        RE: Joining a string? <torcu99@teleline.es>
    Re: Learning Perl Books <gellyfish@gellyfish.com>
    Re: Learning Perl Books <uri@sysarch.com>
    Re: Learning Perl Books <JFedor@datacom-css.com>
    Re: Learning Perl Books (Graham Ashton)
    Re: Local CGI with ActivePerl (Grant D. Watson)
    Re: Local CGI with ActivePerl (Bart Lateur)
    Re: Local CGI with ActivePerl (elephant)
    Re: looking for HTML parser <gellyfish@gellyfish.com>
    Re: looking for HTML parser (Abigail)
        lwp and authentification <jdhunter@nitace.bsd.uchicago.edu>
    Re: lwp and authentification (Abigail)
    Re: lwp and authentification <mmo2@my-deja.com>
    Re: lwp and authentification <jdhunter@nitace.bsd.uchicago.edu>
    Re: META Tag Extraction Script <debot@xs4all.nl>
        Digest Administrivia (Last modified: 1 Jul 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 07 Jul 1999 14:17:36 GMT
From: webmuse@my-deja.com
Subject: Jet ODBC BUG. (was Re: Win32::ODBC bug. (was Re: Access returns rows, but not Win32::ODBC?)
Message-Id: <7lvnhj$dp5$1@nnrp1.deja.com>

(I'm cross-posting this to comp.lang.perl.misc because that is where
this thread originally started, though I now have found that the
problem is not specific to the Win32::ODBC perl module.)

Although all signs pointed towards a bug in Win32::ODBC, I thought of
something new to try this morning. I created a small Java application
that uses the JDBC-ODBC bridge to query our Access database (just as
my perl script used Win32::ODBC to query the database).

This java application was verified to work on several queries. Then
I attempted the queries that were [incorrectly] returning zero results
in my perl script. These returned zero results with the Java application
as well. When I run the same queries directly in the Access97 database,
they work fine.

The only thing the Java application and the Perl script have in common
is that they both use the Access (Jet) ODBC driver to communicate with
our database. This leads me to believe there is a bug in the Jet
drivers. We have the latest MDAC drivers: 2.1 SP2 (2.1.2.4202.3 (GA)).

More information and details about this problem can be read in my
first two posts included below...

I hope someone can shed some light on this for us. If anyone knows a
good contact at Microsoft for this kind of thing, please forward this
thread to them, or let me know their address so I can do it. They are
impossible to get ahold of for support, without jumping through a bunch
of hoops.

Regards,
Thomas

In article <7ludan$vp6$1@nnrp1.deja.com>,
  webmuse@my-deja.com wrote:
> I hate to use that subject without being able to confirm
> the bug is in Win32::ODBC, but we have exhausted every
> other channel I can think of, and all things point to a
> bug in Win32::ODBC (or in how it works with the new MDAC
> drivers).
>
> We have now updated everything to the latest MDAC drivers: 2.1 SP2.
>
> We have worked on this problem for 5 or 6 days now... it seems to
> be only related to checking for null fields -- but only in ODBC
> requests through Perl/Win32. The queries work fine when they are
> run directly in Access!
>
> We had a text field where we were requesting all records
> with (thefield = null OR thefield = ''). After installing
> SQL Server 7, and MDAC updates, the query no longer worked.
> We suspected a corrupted database, and spent most of the time
> working on this aspect of it. We have dismissed this notion
> after I completely exported all tables to CSV files, fixed
> several screwed up index fields, and re-imported the CSV
> files into a new database. Also, keep in mind the queries
> work fine in Access, just not when we try them in Perl CGI
> scripts through Win32::ODBC.
>
> Then today I tried a few new things. One thing was to change
> all the records that had null in that field to have empty string:
>
> UPDATE TheTable SET thefield = '' WHERE thefield = null;
>
> Then I changed our query to only look for records where
> thefield = '', and THIS WORKED! I thought we had a workaround
> to tide us over until our migration to SQL Server could be
> completed (I'm Assuming SQL server will not have this problem).
>
> Well then this evening we happened across another query we're
> using (one of many) that request records based on looking for
> date fields with null values. This no longer works correctly
> (again, ONLY THROUGH PERL/ODBC! It works fine within Access.)
>
> We are out of ideas on how to debug/workaround this problem.
> No one believes this could be a problem with Win32::ODBC because
> it has been available for so long, but I suspect it is a
> problem between the Win32::ODBC driver and the new MDAC drivers.
> I want to re-iterate that we have been using Win32::ODBC for
> several years now. The only thing that changed on our server
> was that we installed SQL Server 7 and the latest MDAC updates.
>
> Any suggestions would be appreciated so that I don't
> jump out my window. It's only one story up, but I could be
> badly bruised in the fall!
>
> -thomas
>
> In article <7lg1dp$ghg$1@nnrp1.deja.com>,
>   webmuse@my-deja.com wrote:
> > Hello,
> >
> > Windows NT 4, IIS 4, win32 Perl build 316, latest ver of
> > Win32::ODBC.
> >
> > I've been using Win32::ODBC for a long time, and I thought I
> > had figured out all of it's tricks and traps. For example,
> > a few months ago I discovered why some of my select queries
> > weren't always working (large memo fields need the buffer
> > increased).
> >
> > We installed SQL Server 7 on our server yesterday, but haven't
> > fiddled with it yet. We just wanted to put it on and make sure
> > everything remains stable. For the time being, we're still using
> > Access97 for our database work.
> >
> > We use a ton of CGI scripts that rely on Win32::ODBC. As far
> > as I know (and have seen), they are all still working fine. But we
> > have this one CGI/ODBC script that has started behaving strangely. I
> > have tried all the debugging tricks I know with Win32::ODBC, using
> > Run() instead of Sql(), DumpData(), MoreResults(), RowCount(),
> > Error(), etc...
> >
> > Here's the problem. I have a SELECT ... INNER JOIN statement
> > that combines two tables, and spits out some records based on the
> > WHERE clause (which operates on only one table). There is rarely
> > a time when this particular query doesn't return at least one
> > record, so when it started to do that, we suspected a problem.
> > (This happened right after our upgrade to SQL Server 7.)
> >
> > I checked it with Run(), and sure enough it executes without
> > an error... just no results. I copied the SQL statement, and
> > pasted it into a query window in Access. I hit the execute
> > button, and it returns 4 or 5 results!
> >
> > The only thing I know of that can keep Win32::ODBC from displaying
> > records when they are displayed OK in Access is if the buffer is
> > too low. But none of these records have large memo fields!
> >
> > On a whim, I tried changing the buffer to 200KB, but it didn't
> > do any good. Any ideas? Do I have a corrupt database maybe? The
> > only thing that makes sense is some driver was changed when SQL
> > Server 7 was installed... ?
> >
> > Thanks,
> > Thomas


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.


------------------------------

Date: Wed, 7 Jul 1999 08:41:53 -0600
From: "Keri Hardwick" <kerih@nospam.sprintmail.com>
Subject: Re: Jet ODBC BUG. (was Re: Win32::ODBC bug. (was Re: Access returns rows, but not Win32::ODBC?)
Message-Id: <7lvotc$qil$1@ash.prod.itd.earthlink.net>

Well, you've given a lot of info, except perhaps one of the more important
pieces:  show us one of the SQL statements that works is Access and fails
via ODBC.

Keri Hardwick
webmuse@my-deja.com wrote in message <7lvnhj$dp5$1@nnrp1.deja.com>...
>(I'm cross-posting this to comp.lang.perl.misc because that is where
>this thread originally started, though I now have found that the
>problem is not specific to the Win32::ODBC perl module.)
>
>Although all signs pointed towards a bug in Win32::ODBC, I thought of
>something new to try this morning. I created a small Java application
>that uses the JDBC-ODBC bridge to query our Access database (just as
>my perl script used Win32::ODBC to query the database).
>
>This java application was verified to work on several queries. Then
>I attempted the queries that were [incorrectly] returning zero results
>in my perl script. These returned zero results with the Java application
>as well. When I run the same queries directly in the Access97 database,
>they work fine.
>
>The only thing the Java application and the Perl script have in common
>is that they both use the Access (Jet) ODBC driver to communicate with
>our database. This leads me to believe there is a bug in the Jet
>drivers. We have the latest MDAC drivers: 2.1 SP2 (2.1.2.4202.3 (GA)).
>
>More information and details about this problem can be read in my
>first two posts included below...
>
>I hope someone can shed some light on this for us. If anyone knows a
>good contact at Microsoft for this kind of thing, please forward this
>thread to them, or let me know their address so I can do it. They are
>impossible to get ahold of for support, without jumping through a bunch
>of hoops.
>
>Regards,
>Thomas
>
>In article <7ludan$vp6$1@nnrp1.deja.com>,
>  webmuse@my-deja.com wrote:
>> I hate to use that subject without being able to confirm
>> the bug is in Win32::ODBC, but we have exhausted every
>> other channel I can think of, and all things point to a
>> bug in Win32::ODBC (or in how it works with the new MDAC
>> drivers).
>>
>> We have now updated everything to the latest MDAC drivers: 2.1 SP2.
>>
>> We have worked on this problem for 5 or 6 days now... it seems to
>> be only related to checking for null fields -- but only in ODBC
>> requests through Perl/Win32. The queries work fine when they are
>> run directly in Access!
>>
>> We had a text field where we were requesting all records
>> with (thefield = null OR thefield = ''). After installing
>> SQL Server 7, and MDAC updates, the query no longer worked.
>> We suspected a corrupted database, and spent most of the time
>> working on this aspect of it. We have dismissed this notion
>> after I completely exported all tables to CSV files, fixed
>> several screwed up index fields, and re-imported the CSV
>> files into a new database. Also, keep in mind the queries
>> work fine in Access, just not when we try them in Perl CGI
>> scripts through Win32::ODBC.
>>
>> Then today I tried a few new things. One thing was to change
>> all the records that had null in that field to have empty string:
>>
>> UPDATE TheTable SET thefield = '' WHERE thefield = null;
>>
>> Then I changed our query to only look for records where
>> thefield = '', and THIS WORKED! I thought we had a workaround
>> to tide us over until our migration to SQL Server could be
>> completed (I'm Assuming SQL server will not have this problem).
>>
>> Well then this evening we happened across another query we're
>> using (one of many) that request records based on looking for
>> date fields with null values. This no longer works correctly
>> (again, ONLY THROUGH PERL/ODBC! It works fine within Access.)
>>
>> We are out of ideas on how to debug/workaround this problem.
>> No one believes this could be a problem with Win32::ODBC because
>> it has been available for so long, but I suspect it is a
>> problem between the Win32::ODBC driver and the new MDAC drivers.
>> I want to re-iterate that we have been using Win32::ODBC for
>> several years now. The only thing that changed on our server
>> was that we installed SQL Server 7 and the latest MDAC updates.
>>
>> Any suggestions would be appreciated so that I don't
>> jump out my window. It's only one story up, but I could be
>> badly bruised in the fall!
>>
>> -thomas
>>
>> In article <7lg1dp$ghg$1@nnrp1.deja.com>,
>>   webmuse@my-deja.com wrote:
>> > Hello,
>> >
>> > Windows NT 4, IIS 4, win32 Perl build 316, latest ver of
>> > Win32::ODBC.
>> >
>> > I've been using Win32::ODBC for a long time, and I thought I
>> > had figured out all of it's tricks and traps. For example,
>> > a few months ago I discovered why some of my select queries
>> > weren't always working (large memo fields need the buffer
>> > increased).
>> >
>> > We installed SQL Server 7 on our server yesterday, but haven't
>> > fiddled with it yet. We just wanted to put it on and make sure
>> > everything remains stable. For the time being, we're still using
>> > Access97 for our database work.
>> >
>> > We use a ton of CGI scripts that rely on Win32::ODBC. As far
>> > as I know (and have seen), they are all still working fine. But we
>> > have this one CGI/ODBC script that has started behaving strangely. I
>> > have tried all the debugging tricks I know with Win32::ODBC, using
>> > Run() instead of Sql(), DumpData(), MoreResults(), RowCount(),
>> > Error(), etc...
>> >
>> > Here's the problem. I have a SELECT ... INNER JOIN statement
>> > that combines two tables, and spits out some records based on the
>> > WHERE clause (which operates on only one table). There is rarely
>> > a time when this particular query doesn't return at least one
>> > record, so when it started to do that, we suspected a problem.
>> > (This happened right after our upgrade to SQL Server 7.)
>> >
>> > I checked it with Run(), and sure enough it executes without
>> > an error... just no results. I copied the SQL statement, and
>> > pasted it into a query window in Access. I hit the execute
>> > button, and it returns 4 or 5 results!
>> >
>> > The only thing I know of that can keep Win32::ODBC from displaying
>> > records when they are displayed OK in Access is if the buffer is
>> > too low. But none of these records have large memo fields!
>> >
>> > On a whim, I tried changing the buffer to 200KB, but it didn't
>> > do any good. Any ideas? Do I have a corrupt database maybe? The
>> > only thing that makes sense is some driver was changed when SQL
>> > Server 7 was installed... ?
>> >
>> > Thanks,
>> > Thomas
>
>
>Sent via Deja.com http://www.deja.com/
>Share what you know. Learn what you don't.





------------------------------

Date: Wed, 07 Jul 1999 09:43:48 GMT
From: pedro@nospam.co.uk (pedro)
Subject: Joining a string?
Message-Id: <37832041.5785584@news.freeuk.net>

Just a simple question that i couldn't find in the
'thick books'.

If you have one scalar say $a="blue" and
another variable as say $b="sky" how do
you add them in perl so that $c=$a+$b in other
words  c$="bluesky" ?

pedro


------------------------------

Date: 7 Jul 1999 10:57:03 +0100
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: Joining a string?
Message-Id: <3783246f@newsread3.dircon.co.uk>

pedro <pedro@nospam.co.uk> wrote:
> Just a simple question that i couldn't find in the
> 'thick books'.
> 
> If you have one scalar say $a="blue" and
> another variable as say $b="sky" how do
> you add them in perl so that $c=$a+$b in other
> words  c$="bluesky" ?

With the concatenation operator - '.'

By interpolation into a single string $c = "$a$b";

By using join()

By using sprintf()

 ...

All of these methods are described in the fine manpages ...

BTW you dont want to use $a or $b as variables - Read the prelfunc manpage
to find out.

/J\
-- 
"While they're pumping, you're soaking them" - Speed Loader TV Advert


------------------------------

Date: Tue, 6 Jul 1999 16:11:55 +0100
From: "Torcuato" <torcu99@teleline.es>
Subject: RE: Joining a string?
Message-Id: <7lvn2d$545@telerad.teleline.es>


pedro <pedro@nospam.co.uk> escribió en el mensaje de noticias
37832041.5785584@news.freeuk.net...
> Just a simple question that i couldn't find in the
> 'thick books'.
>
> If you have one scalar say $a="blue" and
> another variable as say $b="sky" how do
> you add them in perl so that $c=$a+$b in other
> words  c$="bluesky" ?
>
> pedro

prueba con $c=$a.$b;




------------------------------

Date: 6 Jul 1999 21:42:18 -0000
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: Learning Perl Books
Message-Id: <7ltt7q$m6$1@gellyfish.btinternet.com>

On Tue, 06 Jul 1999 17:09:34 GMT Mind Logic wrote:
> Can anyone recommend some good books on learning Perl for the WWW? Currently I 
> have the Lama book and the Perl for Dummies book coming to me mailorder. I'm 
> looking for entry level books for beginners. What can you guys recommend?
> 

Plenty of reviews:

  <http://reference.perl.com/query.cgi?books>

/J\
-- 
Jonathan Stowe <jns@gellyfish.com>
Some of your questions answered:
<URL:http://www.btinternet.com/~gellyfish/resources/wwwfaq.htm>
Hastings: <URL:http://www.newhoo.com/Regional/UK/England/East_Sussex/Hastings>


------------------------------

Date: 06 Jul 1999 23:02:40 -0400
From: Uri Guttman <uri@sysarch.com>
Subject: Re: Learning Perl Books
Message-Id: <x7lnctuppr.fsf@home.sysarch.com>

>>>>> "M" == Michboy832  <michboy832@aol.com> writes:

  M> I like Sam's Teach Yourself Perl in 21 Days by Laura Lemay...  It
  M> is written much like a textbook is (with quizzes and exercises at
  M> the end of each chapter) and is very organized.  It is for the true
  M> beginner.

there is a thread going on right now about that book's earlier edition
by till. i have not seen the latest edition by lemay but i wouldn't
trust a sam's book to hold up a chair. the publisher shows such little
concern for accuracy or any respect for the reader's intelligence.

uri

-- 
Uri Guttman  -----------------  SYStems ARCHitecture and Software Engineering
uri@sysarch.com  ---------------------------  Perl, Internet, UNIX Consulting
Have Perl, Will Travel  -----------------------------  http://www.sysarch.com
The Best Search Engine on the Net -------------  http://www.northernlight.com
"F**king Windows 98", said the general in South Park before shooting Bill.


------------------------------

Date: Wed, 7 Jul 1999 03:50:43 -0400
From: "Jody Fedor" <JFedor@datacom-css.com>
Subject: Re: Learning Perl Books
Message-Id: <7luv0u$bh2$1@plonk.apk.net>


Mind Logic wrote in message <37823883$0$229@nntp1.ba.best.com>...
>Can anyone recommend some good books on learning Perl for the WWW?
Currently I
>have the Lama book and the Perl for Dummies book coming to me mailorder.
I'm
>looking for entry level books for beginners. What can you guys recommend?

I currently use:

O'Reilly - Programming Perl, Perl Cookbook, Mastering Regular Expressions &
Advanced Perl Programming, Perl In A Nutshell

Sams.net - Teach Yourself Perl in 21 Days (Lemay), Web Programming with Perl
5

Wiley - CGI/Perl Cookbook


You can also read books on-line at Macmillan http://www.mcp.com/ .

Jody




------------------------------

Date: 7 Jul 1999 08:48:13 GMT
From: billynospam@mirror.bt.co.uk (Graham Ashton)
Subject: Re: Learning Perl Books
Message-Id: <slrn7o652e.f3o.billynospam@wing.mirror.bt.co.uk>

In article <x7lnctuppr.fsf@home.sysarch.com>, Uri Guttman wrote:
>>>>>> "M" == Michboy832  <michboy832@aol.com> writes:
>
>  M> I like Sam's Teach Yourself Perl in 21 Days by Laura Lemay...  
>
>there is a thread going on right now about that book's earlier edition
>by till. 

I bought it when I was a naive Perl newbie. it got me started, but it
set me back quite a long way. the code isn't idiomatic Perl. I keep
trying to give it to people in the office, but I can't bring myself to
do it without saying that it's crap. funnily enough, they seem much more
interested in borrowing llamas, camels, bighorned sheep and panthers.

>i wouldn't trust a sam's book to hold up a chair. 

oh dear. I use a Sam's CGI/HTML book for holding up my bed. if there was
room under the bed for both of them I'd use the Perl one too...

-- 
Graham

P.S. <billynospam@mirror.bt.co.uk> is a fully working address...


------------------------------

Date: 07 Jul 1999 00:47:44 GMT
From: vbasicboy@aol.com (Grant D. Watson)
Subject: Re: Local CGI with ActivePerl
Message-Id: <19990706204744.20250.00009073@ng-fu1.aol.com>

>>>Huh? Why not? It's *GIVEN* that MSIE calls the program. That was clearly
>>>stated in the question. Now, if MSIE calls the program, and doesn't
>>>capture its output, whose fault is that? Hmmm?
>> 
>> The bug in MSIE may well be that it launches the program at all.
>> 
>
>The bug is in the design of Windows if this is a bug ...

Microsoft is trying to merge Windows Explorer and Internet Explorer.  It is
logical for Explorer to launch a program, so Microsoft wanted Internet Explorer
to do the same thing.  It's not a bug, it's just a design decision.

Grant D. Watson
VBasicBoy@aol.com


------------------------------

Date: Wed, 07 Jul 1999 07:32:09 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: Local CGI with ActivePerl
Message-Id: <3784fcc0.2262031@news.skynet.be>

Grant D. Watson wrote:

>Microsoft is trying to merge Windows Explorer and Internet Explorer.  It is
>logical for Explorer to launch a program, so Microsoft wanted Internet Explorer
>to do the same thing.  It's not a bug, it's just a design decision.

But it breaks the unity. If a program resides on a server, you can only
launch it *on the server*. You'll probably get the result in the browser
window. If a program resides on the local machine, you launch it on the
local machine, in a different window.

For Perl scripts on Win machines, there's another way of looking at it:
the script is the "document", perl is the "viewer" (browser extension).
So you look at the script using perl. Which, huh, runs the script. Oops.

	Bart.


------------------------------

Date: Wed, 7 Jul 1999 19:29:05 +1000
From: e-lephant@b-igpond.com (elephant)
Subject: Re: Local CGI with ActivePerl
Message-Id: <MPG.11edc8f03f66f054989b10@news-server>

Grant D. Watson wrote:
>Microsoft is trying to merge Windows Explorer and Internet Explorer.  It is
>logical for Explorer to launch a program, so Microsoft wanted Internet Explorer
>to do the same thing.  It's not a bug, it's just a design decision.

just quietly .. it's got nothing to do with Internet Explorer and Windows 
Explorer merging (although since IE version 4.0 the subsystem used is 
obviously shared) .. nothing to do with any bugs in MSIE (no matter how 
desperate the anti-MS people are *8^)

it's got everything to do with MIME types and file extensions and the 
ways in which Internet Explorer (and Navigator for that matter) handle 
them

under MS-Windows MIME associations are determined from the filename alone 
(via the extension) .. and when the filename is typed into a browser 
using the 'file' URL specifier both IE and Navigator ask windows to 
handle it .. windows will tell them the associated default application 
and they hand it off to that application for processing (obviously they 
already know what file extensions they can handle themselves and do not 
call out to the OS for those)

so if you type file://c:/blah.pl into your browser window then Windows 
reports the default associated program and the file is handed to that 
program for handling

the only difference between Navigator and IE is that Navigator gives you 
an option of running the file or downloading it - whereas IE assumes that 
you already have the file on your filesystem so just runs it

why do the browsers do this ? .. why not is the better question .. you're 
asking for something with the 'file' specifier .. what makes more sense ? 
- to just display the printable characters in the file ? .. or to hand 
the file over to the associated application ?

sure .. in perl files it doesn't appear to be very useful - mainly 
because the viewing window closes automatically at the end of the script 
 .. but for any file with an interactive viewer (like wordprocessing files 
 .. spreadsheets .. image formats not supported by the browser .. etc.) it 
makes a lot more sense than just displaying the printable characters of 
the file you requested

-- 
 jason - remove all hyphens for email reply -


------------------------------

Date: 6 Jul 1999 19:38:50 -0000
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: looking for HTML parser
Message-Id: <7ltm0b$eq$1@gellyfish.btinternet.com>

On Tue, 06 Jul 1999 18:49:30 GMT ii4@hotmail.com wrote:
> 
> Does anyone know if a shareware is available to
> parse HTML in perl?
> 

The HTML::Parser family of modules available from CPAN are what you
want.

/J\
-- 
Jonathan Stowe <jns@gellyfish.com>
Some of your questions answered:
<URL:http://www.btinternet.com/~gellyfish/resources/wwwfaq.htm>
Hastings: <URL:http://www.newhoo.com/Regional/UK/England/East_Sussex/Hastings>


------------------------------

Date: 6 Jul 1999 18:34:16 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: looking for HTML parser
Message-Id: <slrn7o54j3.tch.abigail@alexandra.delanet.com>

brian d foy (brian@pm.org) wrote on MMCXXXV September MCMXCIII in
<URL:news:brian-0607991551360001@sri.dialup.access.net>:
 .. In article <7ltj3m$lse$1@nnrp1.deja.com>, ii4@hotmail.com wrote:
 .. 
 .. > Does anyone know if a shareware is available to
 .. > parse HTML in perl?
 .. 
 .. HTML::Parser works nicely.


If you're satisfied with Netscape's level of "parsing", yes. Many
people are, just like many people are satisfied with M$ products.

But if you really want to parse HTML, use nsgmls, or one of the
other SGML parsers.

HTML::Parser has no HTML knowledge - which *is* necessary to parse HTML.



Abigail
-- 
perl -wlpe '}{$_=$.' file  # Count the number of lines.


  -----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
   http://www.newsfeeds.com       The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including  Dedicated  Binaries Servers ==-----


------------------------------

Date: Wed, 7 Jul 1999 00:03:03 GMT
From: John Hunter <jdhunter@nitace.bsd.uchicago.edu>
Subject: lwp and authentification
Message-Id: <1raet9iax4.fsf@ace.bsd.uchicago.edu>


I have a couple of subroutines I use for downloading URLs which *work
fine* for everything I've tried to date, including sites with username
and password protection.  But now I'm stumped by 'Nature
Neuroscience'.

The authorization_basic method fails and I get a login page returned
when I request an article from their site (I have a correct username
and password for this site).  Apparently they have some other form of
authentification.  When you logon to their site and provide a user
name and password (with netscape say) all future downloads are
authorized.  I want to emulate this behavior in a perl script to
automatically download a series of articles whose URLs I have.

How do I go about tracking down the proper way to do this
authentification.  Apparently I need to interact with some CGI script
and then sdtore some information somewhere wo reflect this (or does
the server store the info?).  I'm including a sample use of the
subroutines below.

This URL will bounce you to the login page.

http://library.neurosci.nature.com/server-java/Propub/neuro/nn1198_621.pdf

Advice please!
John Hunter


my $ua = new_user_agent($cookie_file);

foreach (keys %need_to_get_tot) {
  print BIBOUT "$bibout_tot{$_}\n" if $opt_b;
  if ($opt_l) {
    print "match: $need_to_get_tot{$_}\n\t key: $_.pdf\n";
  }
  else {
    print "fetching: $need_to_get_tot{$_}\n\t writing: $_.pdf\n";
    my $article = get_url($ua, $need_to_get_tot{$_}, $user_name{$journal_key{$_}}, $password{$journal_key{$_}});
    open(PDF_FILE,">$_.pdf");
    print PDF_FILE $article;
    close PDF_FILE;
  }
}


sub new_user_agent {
  #optional arg $cookiefile; returns a user agent
  my $cookiefile= shift || "$HOME/.netscape/cookies";
  my $ua = new LWP::UserAgent;
  $ua->agent("Mozilla/4.0");
  $ua->cookie_jar(HTTP::Cookies::Netscape->new(File => $cookiefile, AutoSave => 1,ignore_discard=>1));
  return $ua;
}

sub get_url {
  #required args $ua and $url; 
  my $ua = shift || die "you must pass a user_agent object to get_url\n";
  my $url = shift || die "you must pass a url to get_url fool!\n";
  my $supplied_user = shift || $user;
  my $supplied_pass = shift || $pass;
#  print "user is $supplied_user\n";
#  print "pass is $supplied_pass\n";
  print "fetching $url\n" if $opt_v;
  print "my url is $url\n" if $opt_v;
  my $request = new HTTP::Request 'GET' , $url or die "$!";
  $request->authorization_basic("$supplied_user", "$supplied_pass") if ($supplied_user and $supplied_pass);
  my $res = $ua->request($request) or die "$!";
  if ($res->is_success) {
    print "got it\n" if $opt_v;
    my $content = $res->content;
    return $content;
  }
  else {
    print "ouch: I couldn't get it\n" if $opt_v;
    return 0
  }
}



------------------------------

Date: 6 Jul 1999 19:46:06 -0500
From: abigail@delanet.com (Abigail)
Subject: Re: lwp and authentification
Message-Id: <slrn7o58po.tch.abigail@alexandra.delanet.com>

John Hunter (jdhunter@nitace.bsd.uchicago.edu) wrote on MMCXXXVI
September MCMXCIII in <URL:news:1raet9iax4.fsf@ace.bsd.uchicago.edu>:
 .. 
 .. The authorization_basic method fails and I get a login page returned
 .. when I request an article from their site (I have a correct username
 .. and password for this site).  Apparently they have some other form of
 .. authentification.  When you logon to their site and provide a user
 .. name and password (with netscape say) all future downloads are
 .. authorized.  I want to emulate this behavior in a perl script to
 .. automatically download a series of articles whose URLs I have.

So, what exactly is your Perl question? This looks like an HTTP and/or
browser issue. Please ask in rec.cooking.bbq.penguin.



Abigail
-- 
perl -wleprint -eqq-@{[ -eqw\\- -eJust -eanother -ePerl -eHacker -e\\-]}-


  -----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
   http://www.newsfeeds.com       The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including  Dedicated  Binaries Servers ==-----


------------------------------

Date: Wed, 07 Jul 1999 09:39:12 GMT
From: Marc Mosthav <mmo2@my-deja.com>
Subject: Re: lwp and authentification
Message-Id: <7lv77v$8cs$1@nnrp1.deja.com>

In article <slrn7o58po.tch.abigail@alexandra.delanet.com>,
  abigail@delanet.com wrote:
> John Hunter (jdhunter@nitace.bsd.uchicago.edu) wrote on MMCXXXVI
> September MCMXCIII in <URL:news:1raet9iax4.fsf@ace.bsd.uchicago.edu>:
> ..
> .. The authorization_basic method fails and I get a login page
returned
> .. when I request an article from their site (I have a correct
username
> .. and password for this site).  Apparently they have some other form
of
> .. authentification.  When you logon to their site and provide a user
> .. name and password (with netscape say) all future downloads are
> .. authorized.  I want to emulate this behavior in a perl script to
> .. automatically download a series of articles whose URLs I have.
>
> So, what exactly is your Perl question? This looks like an HTTP and/or
> browser issue. Please ask in rec.cooking.bbq.penguin.
>
Actually is sounds to me like this page is using a cookie to store your
credentials. If that is the case then you could use the HTTP::Cookies
module

Marc


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.


------------------------------

Date: Wed, 7 Jul 1999 15:20:18 GMT
From: John Hunter <jdhunter@nitace.bsd.uchicago.edu>
Subject: Re: lwp and authentification
Message-Id: <1r7locij0t.fsf@ace.bsd.uchicago.edu>


If you notice in the code I posted in the new_user_agent routine I am
using cookies.  

sub new_user_agent {
  #optional arg $cookiefile; returns a user agent
  my $cookiefile= shift || "$HOME/.netscape/cookies";
  my $ua = new LWP::UserAgent;
  $ua->agent("Mozilla/4.0");
  $ua->cookie_jar(HTTP::Cookies::Netscape->new(File => $cookiefile, AutoSave => 1,ignore_discard=>1));
  return $ua;
}

What I need perl to do for me is emulate the behavior that netscape
does when I login in and provide my user name and password.  I guess
I'm looking for a perl tool that would do the proper handshaking to
fill out the CGI script and store the proper state info so that
subsequent requests would be allowed.  I'm also needing some HTTP
info; what kind of authentification is there in general that I need to
be able to emulate.  My guess is that the authorization_basic method
does everything in the URL, another approach would be to store
something in the cookies file (which my method above should handle
also) and a third would be something stored on the server side.

Thanks,
John

>>>>> "Marc" == Marc Mosthav <mmo2@my-deja.com> writes:

    Marc> Actually is sounds to me like this page is using a cookie to
    Marc> store your credentials. If that is the case then you could
    Marc> use the HTTP::Cookies module

    Marc> Marc





------------------------------

Date: Wed, 07 Jul 1999 00:07:57 +0200
From: Frank de Bot <debot@xs4all.nl>
Subject: Re: META Tag Extraction Script
Message-Id: <37827E3D.97A5794D@xs4all.nl>

Once I had written this script (I've put it below my text). It worked
pretty wel.
I've created a list at the file named: list1.site  .The scripts loads every
file with the module LWP::Simple and it picks out the description, keywords
and title. All the results with all fields ( keywords, description and
title) are stored in the file add.txt with this format: title || url ||
description || keywords
This script should NOT be run in a browser, it will give a timeout.

I hope you can do anything with this script. I think its better then
nothing at all


use LWP::Simple;

open(LIST, "list1.site");
@list = <LIST>;
close (LIST);
$done = 0;
foreach $url(@list) {
 $string = get($url);

 $check = 0;
 $check2 = 0;

 $string = lc($string);

 @split = split(/<|>/,$string);

 $count = 0;
 $count1 = 1;
 $count2 = 2;
 foreach $line(@split) {
  $line = lc($line);
  if ($split[($count)] eq "title") {
   if ($split[($count2)] eq "/title") {
    $titel = $split[($count1)];
    $check = "1";
   }
  }
  if ($line =~ /meta\s+name=description\s+content\=/) {
   $line =~ s/meta\s+name=description\s+content\=//g;
   $line =~ s/\"//g;
   $desc = $line;
   $check2 = "1";
  }
  if ($line =~ /meta\s+name=\"description\"\s+content\=/) {
   $line =~ s/meta\s+name=\"description\"\s+content\=//g;
   $line =~ s/\"//g;
   $check2 = "1";
   $desc = $line;
  }
  if ($line =~ /meta\s+name=keywords\s+content\=/) {
   $line =~ s/meta\s+name=keywords\s+content\=//g;
   $line =~ s/\"//g;
   $check2 = "1";
   $key = $line;
  }
  if ($line =~ /meta\s+name=\"keywords\"\s+content\=/) {
   $line =~ s/meta\s+name=\"keywords\"\s+content\=//g;
   $line =~ s/\"//g;
   $check2 = "1";
   $key = $line;
  }
  $count++;
  $count1++;
  $count2++;
 }
 $done++;
 if ($url =~ /\n/) { chop $url; }
 if (($check eq "1") && ($check2 eq "1")) {
  $suc = "Successfull...";
  open(ADD, ">>add.txt");
  $titel =~ s/\r//g;
  $desc =~ s/\r//g;
  $key =~ s/\r//g;
  if ($titel =~ /\n/) { $titel =~ s/\n//g; }
  if ($url =~ /\n/) { $url =~ s/\n//g; }
  if ($desc =~ /\n/) { $desc =~ s/\n//g; }
  if ($key =~ /\n/) { $key =~ s/\n//g; }

  print ADD "$titel || $url || $desc || $key\n";
  close (ADD);
 }
 else { $suc = "Failed"; }


 print "$done: $url -> $suc ($check, $check2)\n";
}


wired2000@my-deja.com wrote:

> Hi,
>
> I'm trying to figure out the best way to grab data inside a META tag.
> For example:
> <META name="description" content="Hello world">
> <META HTTP-EQUIV="Refresh" content="5;http://www.myworld.com">
>
> Would report back data in terms of an array or any basic way of extract
> data from META Tags. Ideally such that I can get something like this:
>
> I'm starting with an array @data which is the entire webpage using
> libwww and then I'd like to search through the array for any meta tags
> and extract them so that I can report what Meta tags are being used,
> particularly the keywords and description meta tag.
>
> Also, anyone know how to use libwww to retrieve X bytes of data? For
> example: If I only want to retrieve the first 10k of a webpage (rather
> than the entire document), anyone know how?
>
> Typical scenario is if someone has a 1meg webpage and I only want to
> grab the first 10k of data as to avoid wasting bandwidth...
>
> Thanks everyone!
> Pat
>
> Sent via Deja.com http://www.deja.com/
> Share what you know. Learn what you don't.

--
Contact Information:

                         \\\|///
                       \\  - -  //
                        (  @ @  )
----------------------oOOo-(_)-oOOo--------------------|
| General:                                             |
|                                                      |
| EMAIL: debot@xs4all.nl                               |
|------------------------------------------------------|
| Penpal International                                 |
|                                                      |
| URL: http://www.debot.nl/ppi/  or  http://fly.to/ppi |
| EMAIL: debot@xs4all.nl  or  ppi@debot.nl             |
-------------------------------Oooo---------------------
                        oooO    (   )
                       (   )    ) /
                       \ (     (_/
                        \_)




------------------------------

Date: 1 Jul 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 1 Jul 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.

The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V9 Issue 60
************************************


home help back first fref pref prev next nref lref last post