[22536] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4757 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Mar 25 00:10:32 2003

Date: Mon, 24 Mar 2003 21:10:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 24 Mar 2003     Volume: 10 Number: 4757

Today's topics:
        Perl PHP and MySQL - Gather Images <patrick@eganconsulting.com>
    Re: Perl PHP and MySQL - Gather Images <patrick@eganconsulting.com>
        Perl WDB Sybase - DBGateway (vaayu)
    Re: perl's expat.so <No_4@dsl.pipex.com>
    Re: pointing stderr to a module <usenet@tinita.de>
    Re: Simple, but I can't figure it out. <flavell@mail.cern.ch>
    Re: Simple, but I can't figure it out. <tore@aursand.no>
        sort by columns <gproulx@tva.ca>
        WWW::Mechanize newbie question <kellygreer1@hell.rr.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 24 Mar 2003 23:05:20 -0500
From: "news.sympatico.ca" <patrick@eganconsulting.com>
Subject: Perl PHP and MySQL - Gather Images
Message-Id: <pCQfa.3062$062.367512@news20.bellglobal.com>

Hello All,

I am developing a small cgi/perl application to run serverside and grab data
and content from another web site.  The essence is

1.  Another company has real estate listings
2. I can search based on the agent ID
3. When I download a page, I can extract urls, call those urls and parse out
links to images.

I want to fetch those images and either save them in blobs in my MySQL DB or
save them to a folder on my server with a reference to the file in the
MySQL.

We can parse out the links no problem, it is getting the images from an HTTP
page to the other server that seems to be the problem.

There are lots of applications that pinch the content and redirect to the
browser, however, nothing that I can see that can be automated from one
server to another.

BTW, we have copyright permission from the provider of the listing data.


Any leads would be helpful.

This is a sample piece of code to grab the links...

#!/usr/local/bin/perl -w


# socket based hypertext grep URLs.  Given a URL, this
# prints out URLs of hyperlinks and images.

use strict;
use Socket;                   # include Socket module
require 'tcp.pl';             # file with Open_TCP routine
require 'web.pl'; # file with parseURL routine
use vars qw($opt_h $opt_i $opt_l);
use Getopt::Std;

# parse command line arguments
getopts('hil');

# print out usage if needed
if (defined $opt_h || $#ARGV<0) { help(); }

# if it wasn't an option, it was a URL
while($_ = shift @ARGV) {
  hgu($_, $opt_i, $opt_l);
}


# Subroutine to print out usage information


sub usage {
  print "usage: $0 -hil URL(s)\n";
  print "       -h           help\n";
  print "       -i           print out image URLs\n";
  print "       -l           print out hyperlink URLs\n";
  exit(-1);
}


# Subroutine to print out help text along with usage information


sub help {
  print "Hypertext grep URL help\n\n";
  print "This program prints out hyperlink and image links that\n";
  print "are referenced by a user supplied URL on a web server.\n\n";

  usage();
}


# hypertext grep url


sub hgu {

  # grab parameters
  my($full_url, $images, $hyperlinks)=@_;
  my $all = !($images || $hyperlinks);
  my @links;
  my @links2;

  # if the URL isn't a full URL, assume that it is a http request
  $full_url="http://$full_url" if ($full_url !~
                                 m/(\w+):\/\/([^\/:]+)(:\d*)?([^#]*)/);

  # break up URL into meaningful parts
  my @the_url = parse_URL($full_url);

  if (!defined @the_url) {
    print "Please use fully qualified valid URL\n";
    exit(-1);
  }

  # we're only interested in HTTP URL's
  return if ($the_url[0] !~ m/http/i);

  # connect to server specified in 1st parameter
  if (!defined open_TCP('F', $the_url[1], $the_url[2])) {
    print "Error connecting to web server: $the_url[1]\n";
    exit(-1);
  }

  # request the path of the document to get
    print F "GET $the_url[3] HTTP/1.0\n";
    print F "Accept: */*\n";
    print F "User-Agent: hgrepurl/1.0\n\n";

  # print out server's response.

  # get the HTTP response line
  my $the_response=<F>;

  # if not an "OK" response of 200, skip it
  if ($the_response !~ m@^HTTP/\d+\.\d+\s+200\s@) {return;}

  # get the header data
  while(<F>=~ m/^(\S+):\s+(.+)/) {
    # skip over the headers
  }

  my $data='';
  # get the entity body
  while (<F>) {$data.=$_};

  # close the network connection
  close(F);


  # fetch images and hyperlinks into arrays, print them out

  if (defined $images || $all) {
    @links=grab_urls($data, ('img', 'src', 'body', 'background'));
  }
  if (defined $hyperlinks || $all) {
    @links2= grab_urls($data, ('a', 'href'));
  }

  my $link;
  for $link (@links, @links2) { print "$link\n"; }

}


patrick@eganconsulting.com





------------------------------

Date: Mon, 24 Mar 2003 23:09:07 -0500
From: "news.sympatico.ca" <patrick@eganconsulting.com>
Subject: Re: Perl PHP and MySQL - Gather Images
Message-Id: <XFQfa.3066$062.369035@news20.bellglobal.com>

BTW the PHP is just for displaying the data after it has been harvested.

Patrick

"news.sympatico.ca" <patrick@eganconsulting.com> wrote in message
news:pCQfa.3062$062.367512@news20.bellglobal.com...
> Hello All,
>
> I am developing a small cgi/perl application to run serverside and grab
data
> and content from another web site.  The essence is
>
> 1.  Another company has real estate listings
> 2. I can search based on the agent ID
> 3. When I download a page, I can extract urls, call those urls and parse
out
> links to images.
>
> I want to fetch those images and either save them in blobs in my MySQL DB
or
> save them to a folder on my server with a reference to the file in the
> MySQL.
>
> We can parse out the links no problem, it is getting the images from an
HTTP
> page to the other server that seems to be the problem.
>
> There are lots of applications that pinch the content and redirect to the
> browser, however, nothing that I can see that can be automated from one
> server to another.
>
> BTW, we have copyright permission from the provider of the listing data.
>
>
> Any leads would be helpful.
>
> This is a sample piece of code to grab the links...
>
> #!/usr/local/bin/perl -w
>
>
> # socket based hypertext grep URLs.  Given a URL, this
> # prints out URLs of hyperlinks and images.
>
> use strict;
> use Socket;                   # include Socket module
> require 'tcp.pl';             # file with Open_TCP routine
> require 'web.pl'; # file with parseURL routine
> use vars qw($opt_h $opt_i $opt_l);
> use Getopt::Std;
>
> # parse command line arguments
> getopts('hil');
>
> # print out usage if needed
> if (defined $opt_h || $#ARGV<0) { help(); }
>
> # if it wasn't an option, it was a URL
> while($_ = shift @ARGV) {
>   hgu($_, $opt_i, $opt_l);
> }
>
>
> # Subroutine to print out usage information
>
>
> sub usage {
>   print "usage: $0 -hil URL(s)\n";
>   print "       -h           help\n";
>   print "       -i           print out image URLs\n";
>   print "       -l           print out hyperlink URLs\n";
>   exit(-1);
> }
>
>
> # Subroutine to print out help text along with usage information
>
>
> sub help {
>   print "Hypertext grep URL help\n\n";
>   print "This program prints out hyperlink and image links that\n";
>   print "are referenced by a user supplied URL on a web server.\n\n";
>
>   usage();
> }
>
>
> # hypertext grep url
>
>
> sub hgu {
>
>   # grab parameters
>   my($full_url, $images, $hyperlinks)=@_;
>   my $all = !($images || $hyperlinks);
>   my @links;
>   my @links2;
>
>   # if the URL isn't a full URL, assume that it is a http request
>   $full_url="http://$full_url" if ($full_url !~
>                                  m/(\w+):\/\/([^\/:]+)(:\d*)?([^#]*)/);
>
>   # break up URL into meaningful parts
>   my @the_url = parse_URL($full_url);
>
>   if (!defined @the_url) {
>     print "Please use fully qualified valid URL\n";
>     exit(-1);
>   }
>
>   # we're only interested in HTTP URL's
>   return if ($the_url[0] !~ m/http/i);
>
>   # connect to server specified in 1st parameter
>   if (!defined open_TCP('F', $the_url[1], $the_url[2])) {
>     print "Error connecting to web server: $the_url[1]\n";
>     exit(-1);
>   }
>
>   # request the path of the document to get
>     print F "GET $the_url[3] HTTP/1.0\n";
>     print F "Accept: */*\n";
>     print F "User-Agent: hgrepurl/1.0\n\n";
>
>   # print out server's response.
>
>   # get the HTTP response line
>   my $the_response=<F>;
>
>   # if not an "OK" response of 200, skip it
>   if ($the_response !~ m@^HTTP/\d+\.\d+\s+200\s@) {return;}
>
>   # get the header data
>   while(<F>=~ m/^(\S+):\s+(.+)/) {
>     # skip over the headers
>   }
>
>   my $data='';
>   # get the entity body
>   while (<F>) {$data.=$_};
>
>   # close the network connection
>   close(F);
>
>
>   # fetch images and hyperlinks into arrays, print them out
>
>   if (defined $images || $all) {
>     @links=grab_urls($data, ('img', 'src', 'body', 'background'));
>   }
>   if (defined $hyperlinks || $all) {
>     @links2= grab_urls($data, ('a', 'href'));
>   }
>
>   my $link;
>   for $link (@links, @links2) { print "$link\n"; }
>
> }
>
>
> patrick@eganconsulting.com
>
>
>




------------------------------

Date: 24 Mar 2003 20:58:59 -0800
From: vaayu@epatra.com (vaayu)
Subject: Perl WDB Sybase - DBGateway
Message-Id: <6f39192e.0303242058.3ad641ed@posting.google.com>

Hi,

I am a newbie, to WDB and perl. Can you please tell me if there is a
better method than WDB to access Sybase thru a browser?

If WDB is still the best method, has anyone encountered this error:

Undefined subroutine &main::dblogin called at
/appdev/uipdev5/opt/iplanet/servers/https-wdb_server/wdb_conf/syb_dbi.pl
line 40,  line 109

I feel there is something missing in the Perl area of my setup, that
the syb_dbi.pl is not "reading" the sybperl.pl properly or something
in that direction.

Can you please point me in the correct direction?
thanks in advance,
Vaayu


------------------------------

Date: Mon, 24 Mar 2003 23:46:42 +0000
From: Big and Blue <No_4@dsl.pipex.com>
Subject: Re: perl's expat.so
Message-Id: <3e7f98e3$0$4851$cc9e4d1f@news.dial.pipex.com>

Xiaojun Ping wrote:
> 
> There is no errors when I set LD_LIBRARY_PATH for libexpat.so before I 
> run my perl program, but I always get the following errors when I set 
> $ENV{LD_LIBRARY_PATH} to the same value inside my perl program, any idea ?
> 
> ...perl5.6.0/sun4-solaris-thread-multi/auto/XML/Parser/Expat/Expat.so' 
> for module XML::Parser::Expat: ld.so.1: /usr/bin/perl: fatal: 
> libexpat.so.0: open failed: No such file or directory at 
> ...perl-5.6.0/SunOS5.6/lib/5.6.0/sun4-solaris-thread-multi/DynaLoader.pm 
> line 200, <F> line 5.

    LD_LIBRARY_PATH is an environment variable that is used by the 
dynamic loader which starts up the program.  Setting it as an 
environment variabel within teh program (even within a BEGIN block) will 
do you no good as it is already too late to be noticed.

    The correct thing to do is to use the -rpath option to the 
linker/compiler when linking the extension.to add an RPATH entry to the 
dynamic library. Relying on an environemnt variable is unreliable. 
You've built and installed libexpat - tell the linker where it is.

    Of course, the simplest way to do this is to put all of your built 
libraries into one directory and configure dir that into perl when you 
build it, assuming you build Perl yourself.

    On Linux you could look at ldconfig.


-- 
      -*-    Just because I've written it here doesn't    -*-
      -*-    mean that you should, or I do, believe it.   -*-



------------------------------

Date: 24 Mar 2003 23:15:35 GMT
From: Tina Mueller <usenet@tinita.de>
Subject: Re: pointing stderr to a module
Message-Id: <tinhca0ji$19b$tina@news01.tinita.de>

Scott <scott@scottsavarese.com> wrote:
> Pretty much a do a $mod = new LogModule( "file" ); $mod->do_log(
> "message" ); and it will log "message" to the file.

> The thing is, we get stderr messages that I also want to get logged. I
> can do an open( STDERR, "file" ); but if I do that the stderr messages
> come out unformatted without a time stamp...

maybe:

$SIG{__WARN__} = sub { $mod->do_log( @_ ) };
# untested

hth, tina
-- 
http://www.tinita.de/     \  enter__| |__the___ _ _ ___
http://Movies.tinita.de/   \     / _` / _ \/ _ \ '_(_-< of
http://www.perlquotes.de/   \    \ _,_\ __/\ __/_| /__/ perception
http://www.tinita.de/peace/link.html - Spread Peace


------------------------------

Date: Mon, 24 Mar 2003 23:56:14 +0100
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: Simple, but I can't figure it out.
Message-Id: <Pine.LNX.4.53.0303242338200.22946@lxplus065.cern.ch>


On Mon, Mar 24, Corey Andrews inscribed on the eternal scroll:

Subject : Simple, but I can't figure it out.

According to the posting guidelines of this group, you have just
practically guaranteed yourself low-quality answers (of which this
might be one), because your subject header already shouted to the
treetops that you were asking a low-quality question.

> Okay guys, here's the problem. I'm a newbie and I'm not too familiar
> with all the functions/methods of perl,

There is ABSOLUTELY NOTHING WRONG WITH BEING A NEWBIE: everyone has to
start somewhere.  You just have to go about it in an appropriate way,
if you hope to get anything productive out of the experience.  (As the
actress said to the archbishop...)

Your real problem is that, to be candid, you don't understand your
real problem.

> so scripting cgi to something
> of my liking is quite difficult.

What's happening here is that you're tackling a relatively difficult
task for a newcomer (i.e writing a server-side script) but you seem to
lack the relevant background in designing a program to do what you
want, quite apart from any issues about coding it in Perl (the topic
for this group) let alone the issues of writing a server-side script
for the Common Gateway Interface (which is properly the business of
the comp.infosystems.www.authoring.cgi group).

No need to get upset about that, and no offence meant, but it's better
to be able to walk a few steps before entering for a half-marathon.

> Very simple thing right? Well I can't figure out how to do it. I've
> been experimenting with the for loops and such, but I'm lost. Any
> help? Or just help on where to get started?

It might be a help if you'd reveal any background that you might have
in programming, but if this is your first encounter with the field,
then I'd seriously suggest buying a Perl-related introductory book and
playing around with the language before worrying about CGI scripts.

The "Llama book" i.e learning Perl would be a fine start, for example.
(No, I don't get anything out of recommending it, I just think it's
very good of its kind - get the newest edition though...)


------------------------------

Date: Tue, 25 Mar 2003 01:12:16 +0100
From: "Tore Aursand" <tore@aursand.no>
Subject: Re: Simple, but I can't figure it out.
Message-Id: <pan.2003.03.24.22.53.40.375388@aursand.no>

On Mon, 24 Mar 2003 14:17:56 -0800, Corey Andrews wrote:
> Subject: Simple, but I can't figure it out.

Please post the _subject_ of your post in the subject of your post, not
how _you_ feel about the post.

> I'm a newbie and I'm not too familiar with all the functions/methods
> of perl [...]

But you've read the documentation available, right?

> What I've been working on though, is to create a script that will read and print out the first 20 lines of
> a file.

There are many, many, many (ah, Police Academy) ways to do this.  I'm
quite surprised that you haven't been able to solve it yourself.  You say
that you've been experimenting with "for loops and such" -- have you
looked at the splice() function?

Two easy methods to get you going;

  my @lines = <FILE>;
  print split( @lines, 0, 20 );

Or - memory friendly:

  while ( <FILE> ) {
      last if $. == 21;
      print;
  }


-- 
Tore Aursand


------------------------------

Date: Mon, 24 Mar 2003 22:34:07 -0500
From: Gilles <gproulx@tva.ca>
Subject: sort by columns
Message-Id: <S6Qfa.3046$062.355691@news20.bellglobal.com>

I need to sort a file that contain 13 columns  in the following format :

PSPIU -5.10  80.60 2003 03 22 16.000  -999  285  3.0 -999.0 -999.00 700.0
CYMAI -5.80  81.20 2003 03 23 8.000  273  285  2.0 -999.0 -999.00 300.0
AAAAA -5.80  81.20 2003 03 20 12.000  273  285  2.0 400.0 -999.00 300.0
CYMAI -5.80  81.20 2003 03 20 12.000  273  285  2.0 -999.0 -999.00 300.0
AAAAA -5.80  81.20 2003 03 20 12.000  273  285  2.0 -999.0 -999.00 300.0

[etc ...]

I need to sort these file according to the columns  :
    - The date must be in increasing order (columns 4-7)
    - for each date, the field in the first column must be in increasing
order
   - for each date and for each 1 row, the columns 8 must be also sorted in
increasing order        

in the previous example, I must get something like

AAAAA -5.80  81.20 2003 03 20 12.000  273  285  2.0 -999.0 -999.00 300.0
AAAAA -5.80  81.20 2003 03 20 12.000  273  285  2.0 400.0 -999.00 300.0
CYMAI -5.80  81.20 2003 03 20 12.000  273  285  2.0 -999.0 -999.00 300.0
PSPIU -5.10  80.60 2003 03 22 16.000  -999  285  3.0 -999.0 -999.00 700.0
CYMAI -5.80  81.20 2003 03 23 8.000  273  285  2.0 -999.0 -999.00 300.0

I'm a novice in perl scripting, so your help will be very appreciated.

Thank you for your help


------------------------------

Date: Tue, 25 Mar 2003 04:48:51 GMT
From: "Kelly Greer" <kellygreer1@hell.rr.com>
Subject: WWW::Mechanize newbie question
Message-Id: <TcRfa.11796$eM1.1694412@twister.southeast.rr.com>

hi newsgroup,

What do you have to do to be able to use the WWW::Mechanize Perl module on a
windows machine?  The current version from www.activestate.com doesn't seem
to support this.  Will this work on windows?  Where do I get the code?

Thanks,

Kelly Greer
kellygreer1@nospam.com
Change nospam to yahoo






------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4757
***************************************


home help back first fref pref prev next nref lref last post