[23679] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5886 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Dec 3 03:10:52 2003

Date: Wed, 3 Dec 2003 00:10:20 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 3 Dec 2003     Volume: 10 Number: 5886

Today's topics:
        skipping files that are not present <provicon@earthlink.net>
    Re: skipping files that are not present <asu1@c-o-r-n-e-l-l.edu>
    Re: skipping files that are not present <usenet@morrow.me.uk>
    Re: skipping files that are not present <usenet@morrow.me.uk>
    Re: skipping files that are not present <asu1@c-o-r-n-e-l-l.edu>
    Re: skipping files that are not present <slaven@rezic.de>
    Re: Starting Perl Script at Bootup <bernard.el-haginDODGE_THIS@lido-tech.net>
    Re: Starting Perl Script at Bootup <josef.moellers@fujitsu-siemens.com>
    Re: using GD module <eddGallary2@hotmail.com>
    Re: using GD module <usenet@morrow.me.uk>
    Re: using GD module <eddGallary2@hotmail.com>
    Re: using GD module <usenet@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 03 Dec 2003 05:01:13 GMT
From: "John" <provicon@earthlink.net>
Subject: skipping files that are not present
Message-Id: <t6ezb.25447$sb4.14029@newsread2.news.pas.earthlink.net>

I have downloaded a bunch of web pages with information I need to process
using LWP.

The files names are:

1.html
2.html
3.html and so on..

However, in the downloading process, because of some reason, certain numbers
are missing.

Now I run the code below to process the files above with some missing
sequence:

use HTML::TokeParser;

$number = 1;
while ($number < 17000) {
my $ns="C:\\JapanUSA\\$number.html";
my $stream = HTML::TokeParser->new($ns);
open(OUT, '>>c:\\JapanUSA.txt');

while (my $token=$stream->get_token) {

if ($token->[0] eq 'S' and $token->[1] eq 'td' and $token->[4] eq '<td>') {

 $text_string=$stream->get_trimmed_text('/td');
 print OUT "$text_string;" ;}
if ($token->[0] eq 'E' and $token->[1] eq 'tr') {
 print OUT "\n";}
}

$number = $number+1;

}

When I run the code above, if, for example, 110.html is missing, the program
gets stuck.  Is there anyway to change above codes so that if 110.html is
missing, for instance, it goes to 111.html instead rather than get stuck??

I have posted this question before, but my posting and possible help from
Perl experts are all gone.

Any help will be deeply appreciated, and I have over 17,000 html files with
sequential numbers with some missing numbers randomly in between.  I have to
process these 17,000 html files, but currently, the above program gets stuck
when a certain sequential number is missing.

Thank you in advance for your help.





------------------------------

Date: 3 Dec 2003 05:47:26 GMT
From: "A. Sinan Unur" <asu1@c-o-r-n-e-l-l.edu>
Subject: Re: skipping files that are not present
Message-Id: <Xns944680C17F07asu1cornelledu@132.236.56.8>

"John" <provicon@earthlink.net> wrote in
news:t6ezb.25447$sb4.14029@newsread2.news.pas.earthlink.net: 

> I have downloaded a bunch of web pages with information I need to
> process using LWP.
> 
> The files names are:
> 
> 1.html
> 2.html
> 3.html and so on..
> 
> However, in the downloading process, because of some reason, certain
> numbers are missing.
> 
> Now I run the code below to process the files above with some missing
> sequence:
> 
> use HTML::TokeParser;
> 
> $number = 1;
> while ($number < 17000) {
> my $ns="C:\\JapanUSA\\$number.html";

To put it politely, this is utterly stupid. Just process the files that 
exist in the directory.

> I have posted this question before, but my posting and possible help
> from Perl experts are all gone.

Have you heard of Google?

-- 
A. Sinan Unur
asu1@c-o-r-n-e-l-l.edu
Remove dashes for address
Spam bait: mailto:uce@ftc.gov


------------------------------

Date: Wed, 3 Dec 2003 06:06:22 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: skipping files that are not present
Message-Id: <bqjugu$9l$1@wisteria.csv.warwick.ac.uk>


"John" <provicon@earthlink.net> wrote:

> use HTML::TokeParser;

use Errno;

> $number = 1;
> while ($number < 17000) {

This is a C-style for loop, so make it look like one.

  for(my $number = 1; $number < 17000; $number++) {

I would be tempted to say 'make this a Perl-style for loop', viz:

  for my $number (1..17000) {

except that I have a horrible suspicion that would actually create the
whole list and gobble memory: anyone?

> my $ns="C:\\JapanUSA\\$number.html";

Sort out your indentation.

You don't need to use '\' for pathnames in Perl, even under Win32, so
don't. (The exception to this is paths that will be given to
system().)

      my $ns = "C:/JapanUSA/$number.html";

> my $stream = HTML::TokeParser->new($ns);

  unless ($stream) {
      $!{ENOENT} or die "can't create parser for $ns: $!";
      next;
  }

If you don't fix your loop this will go into an infinite loop, so make
sure you do.

> open(OUT, '>>c:\\JapanUSA.txt');

 ... or die "can't open output file: $!";

*Always* check the return value of open.

You don't need to keep reopening this: move the open outside the
loop.

> while (my $token=$stream->get_token) {
> 
> if ($token->[0] eq 'S' and $token->[1] eq 'td' and $token->[4] eq '<td>') {
> 
>  $text_string=$stream->get_trimmed_text('/td');
>  print OUT "$text_string;" ;}

You shouldn't have a space before ';'.
You should have a newline before '}' if your BLOCK spans several
lines.
Sort out your indentation.

> if ($token->[0] eq 'E' and $token->[1] eq 'tr') {
>  print OUT "\n";}
> }
> 
> $number = $number+1;

Now you've made a proper for loop you don't need this.

> 
> }

Ben

-- 
If I were a butterfly I'd live for a day, / I would be free, just blowing away.
This cruel country has driven me down / Teased me and lied, teased me and lied.
I've only sad stories to tell to this town: / My dreams have withered and died.
  ben@morrow.me.uk   <=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>   (Kate Rusby)


------------------------------

Date: Wed, 3 Dec 2003 06:17:42 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: skipping files that are not present
Message-Id: <bqjv66$9l$3@wisteria.csv.warwick.ac.uk>


"A. Sinan Unur" <asu1@c-o-r-n-e-l-l.edu> wrote:
> "John" <provicon@earthlink.net> wrote in
> news:t6ezb.25447$sb4.14029@newsread2.news.pas.earthlink.net: 
> > my $ns="C:\\JapanUSA\\$number.html";
> 
> To put it politely, this is utterly stupid. Just process the files that 
> exist in the directory.

That ain't necessarily the right answer, if there are other files
there as well (as is not unlikely in this case).

I suppose you could do something like

  my @files = sort grep { /^\d+.html$/ } readdir $DIR;

 .

Ben

-- 
  The cosmos, at best, is like a rubbish heap scattered at random.
                                                         - Heraclitus
  ben@morrow.me.uk


------------------------------

Date: 3 Dec 2003 06:48:53 GMT
From: "A. Sinan Unur" <asu1@c-o-r-n-e-l-l.edu>
Subject: Re: skipping files that are not present
Message-Id: <Xns94461276DE8E7asu1cornelledu@132.236.56.8>

Ben Morrow <usenet@morrow.me.uk> wrote in news:bqjv66$9l$3
@wisteria.csv.warwick.ac.uk:

> 
> "A. Sinan Unur" <asu1@c-o-r-n-e-l-l.edu> wrote:
>> "John" <provicon@earthlink.net> wrote in
>> news:t6ezb.25447$sb4.14029@newsread2.news.pas.earthlink.net: 
>> > my $ns="C:\\JapanUSA\\$number.html";
>> 
>> To put it politely, this is utterly stupid. Just process the files 
>> that exist in the directory.
> 
> That ain't necessarily the right answer, if there are other files
> there as well (as is not unlikely in this case).
>
> 
> I suppose you could do something like
> 
>   my @files = sort grep { /^\d+.html$/ } readdir $DIR;

That is what I had meant in my original post. Sorry for the poor wording. 
However, this question had already been answered here. I do not see the 
point of reposting the same question again.

See:

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8
&safe=off&th=b2ac44d01efcd76&rnum=1

or

http://tinyurl.com/xhtg

-- 
A. Sinan Unur
asu1@c-o-r-n-e-l-l.edu
Remove dashes for address
Spam bait: mailto:uce@ftc.gov


------------------------------

Date: 03 Dec 2003 08:42:25 +0100
From: Slaven Rezic <slaven@rezic.de>
Subject: Re: skipping files that are not present
Message-Id: <87ekvm1h5q.fsf@vran.herceg.de>

Ben Morrow <usenet@morrow.me.uk> writes:

> "John" <provicon@earthlink.net> wrote:
> 
> > use HTML::TokeParser;
> 
> use Errno;
> 
> > $number = 1;
> > while ($number < 17000) {
> 
> This is a C-style for loop, so make it look like one.
> 
>   for(my $number = 1; $number < 17000; $number++) {
> 
> I would be tempted to say 'make this a Perl-style for loop', viz:
> 
>   for my $number (1..17000) {
> 
> except that I have a horrible suspicion that would actually create the
> whole list and gobble memory: anyone?

No, this is optimized since 5.005 or so.

> 
> > my $ns="C:\\JapanUSA\\$number.html";
> 
> Sort out your indentation.
> 
> You don't need to use '\' for pathnames in Perl, even under Win32, so
> don't. (The exception to this is paths that will be given to
> system().)
> 
>       my $ns = "C:/JapanUSA/$number.html";
> 
> > my $stream = HTML::TokeParser->new($ns);
> 
>   unless ($stream) {
>       $!{ENOENT} or die "can't create parser for $ns: $!";
>       next;
>   }
> 
> If you don't fix your loop this will go into an infinite loop, so make
> sure you do.
> 
> > open(OUT, '>>c:\\JapanUSA.txt');
> 
> ... or die "can't open output file: $!";
> 
> *Always* check the return value of open.
> 
> You don't need to keep reopening this: move the open outside the
> loop.
> 
> > while (my $token=$stream->get_token) {
> > 
> > if ($token->[0] eq 'S' and $token->[1] eq 'td' and $token->[4] eq '<td>') {
> > 
> >  $text_string=$stream->get_trimmed_text('/td');
> >  print OUT "$text_string;" ;}
> 
> You shouldn't have a space before ';'.
> You should have a newline before '}' if your BLOCK spans several
> lines.
> Sort out your indentation.
> 
> > if ($token->[0] eq 'E' and $token->[1] eq 'tr') {
> >  print OUT "\n";}
> > }
> > 
> > $number = $number+1;
> 
> Now you've made a proper for loop you don't need this.
> 
> > 
> > }
> 
> Ben
> 

-- 
Slaven Rezic - slaven@rezic.de

    Berlin Perl Mongers - http://berliner.pm.org


------------------------------

Date: Wed, 3 Dec 2003 06:51:34 +0000 (UTC)
From: "Bernard El-Hagin" <bernard.el-haginDODGE_THIS@lido-tech.net>
Subject: Re: Starting Perl Script at Bootup
Message-Id: <Xns94464FA21D578elhber1lidotechnet@62.89.127.66>

Abigail <abigail@abigail.nl> wrote in
news:slrnbsqbjj.eu.abigail@alexandra.abigail.nl: 

> Matt (nospam@yahoo.com) wrote on MMMDCCXLV September MCMXCIII in
> <URL:news:vsnvuj6bl44ge5@corp.supernews.com>:
> ()  I have a perl script I would like to have run every time my server
> boots up. ()  Its running Redhat Linux.  How would I do that?
> 
> 
> You can't. Only C programs can be run when the server boots up.


That's rather incomplete, since only *uncompiled* C programs can be run 
when the server boots up.


-- 
Cheers,
Bernard


------------------------------

Date: Wed, 03 Dec 2003 08:52:55 +0100
From: Josef =?iso-8859-1?Q?M=F6llers?= <josef.moellers@fujitsu-siemens.com>
Subject: Re: Starting Perl Script at Bootup
Message-Id: <3FCD9657.4ED11E70@fujitsu-siemens.com>

Matt wrote:
> =

> I have a perl script I would like to have run every time my server boot=
s up.
> Its running Redhat Linux.  How would I do that?

It depends on at what stage of the boot process you want to run the
script and whether or not the directories needed are available (i.e.
mounted).

There are several (os-related, as Ted already said) issues regarding
where to put the call of the script, but in general, there's no reason
why it should not work.
You might want to look at the /etc/rc.d directory tree. Just try adding
your script somewhere in that tree and see what happens. There's little
chance your machine will blow up if something goes wrong (unless, of
course, you fiddle around with the connection to the nuclear power plant
your machine controls).

-- =

Josef M=F6llers (Pinguinpfleger bei FSC)
	If failure had no penalty success would not be a prize
						-- T.  Pratchett


------------------------------

Date: Wed, 03 Dec 2003 15:46:32 +1100
From: Edo <eddGallary2@hotmail.com>
Subject: Re: using GD module
Message-Id: <3FCD6AA8.7040502@hotmail.com>

#!/usr/local/bin/perl
use strict;
use warnings;
use GD::Graph::lines;

my @data = (
             ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
             [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
             [ sort { $a <=> $b } (1, 2, 5, 6, 3, 1.5, 1, 3, 4) ]
             );

my $graph = GD::Graph::lines->new(400, 300);
my $my_graph;
$graph->set(
             x_label           => 'X Label',
             y_label           => 'Y label',
             title             => 'Some simple graph',
             y_max_value       => 8,
             y_tick_number     => 8,
             y_label_skip      => 2
             ) or die $my_graph->error;

my $gd = $my_graph->plot(\@data) or die $my_graph->error;  <--- 23

Can't call method "plot" on an undefined value at prog/graph line 23.



------------------------------

Date: Wed, 3 Dec 2003 04:48:56 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: using GD module
Message-Id: <bqjpvo$q6o$2@wisteria.csv.warwick.ac.uk>


Edo <eddGallary2@hotmail.com> wrote:
<snip>
> my $graph = GD::Graph::lines->new(400, 300);
> my $my_graph;
> $graph->set(
>              x_label           => 'X Label',
>              y_label           => 'Y label',
>              title             => 'Some simple graph',
>              y_max_value       => 8,
>              y_tick_number     => 8,
>              y_label_skip      => 2
>              ) or die $my_graph->error;
> 
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;  <--- 23
> 
> Can't call method "plot" on an undefined value at prog/graph line 23.

Have you actually read that code? Nowhere do you set $my_graph, so
it's hardly surprising it contains undef.

Why did you put the line 'my $my_graph;' in? Was it because you got a
'strict' error about it not being declared? 'strict' is there to help
you find typos and thinkos: don't shut it up by simply putting 'my
$var;' at the top until you have checked that you actually did mean to
have a separate variable.

Ben

-- 
Like all men in Babylon I have been a proconsul; like all, a slave ... During
one lunar year, I have been declared invisible; I shrieked and was not heard,
I stole my bread and was not decapitated.
~ ben@morrow.me.uk ~                   Jorge Luis Borges, 'The Babylon Lottery'


------------------------------

Date: Wed, 03 Dec 2003 16:30:34 +1100
From: Edo <eddGallary2@hotmail.com>
Subject: Re: using GD module
Message-Id: <3FCD74FA.4040905@hotmail.com>

Ben Morrow wrote:
> Edo <eddGallary2@hotmail.com> wrote:
> <snip>
> 
>>my $graph = GD::Graph::lines->new(400, 300);
>>my $my_graph;
>>$graph->set(
>>             x_label           => 'X Label',
>>             y_label           => 'Y label',
>>             title             => 'Some simple graph',
>>             y_max_value       => 8,
>>             y_tick_number     => 8,
>>             y_label_skip      => 2
>>             ) or die $my_graph->error;
>>
>>my $gd = $my_graph->plot(\@data) or die $my_graph->error;  <--- 23
>>
>>Can't call method "plot" on an undefined value at prog/graph line 23.
> 
> 
> Have you actually read that code? Nowhere do you set $my_graph, so
> it's hardly surprising it contains undef.
> 
> Why did you put the line 'my $my_graph;' in? Was it because you got a
> 'strict' error about it not being declared? 'strict' is there to help
> you find typos and thinkos: don't shut it up by simply putting 'my
> $var;' at the top until you have checked that you actually did mean to
> have a separate variable.
> 
> Ben
> 

all what I am trying to do is follow the example in the man GD::Graph 
exactly as it is presented, it does not mention any var declaration of 
my_Graph so I copy and paste whatever in there, with my not very strong 
understanding on how modules work in general and GD in particular I was 
hoping to start learning by examples from the man pages, but that did 
not work in this case, in help with explanation is appriciated.

thanks



------------------------------

Date: Wed, 3 Dec 2003 06:13:27 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: using GD module
Message-Id: <bqjuu7$9l$2@wisteria.csv.warwick.ac.uk>


Edo <eddGallary2@hotmail.com> wrote:
> all what I am trying to do is follow the example in the man GD::Graph 
> exactly as it is presented, it does not mention any var declaration of 
> my_Graph so I copy and paste whatever in there,

Never copy/paste code without understanding how it works.

You seem to be rather prone to getting your cases mixed up: it is very
important in Perl that things have the correct case.

> with my not very strong 
> understanding on how modules work in general and GD in particular

I think you probably need to learn a little more about Perl and about
programming in general before you try using GD. For instance, what did
you *expect* your line

> >> ...  $my_graph->plot(\@data) ...

to do, given that you hadn't assigned anything to $my_graph?

Ben

-- 
And if you wanna make sense / Whatcha looking at me for?          (Fiona Apple)
                            * ben@morrow.me.uk *


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5886
***************************************


home help back first fref pref prev next nref lref last post