[30940] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2185 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Feb 5 18:09:50 2009

Date: Thu, 5 Feb 2009 15:09:11 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 5 Feb 2009     Volume: 11 Number: 2185

Today's topics:
        closures and streams question <cdalten@gmail.com>
    Re: closures and streams question <jimsgibson@gmail.com>
    Re: closures and streams question <smallpond@juno.com>
        CSPEC issue: lossing scope (or incorrect scope) in cspe balldarrens@gmail.com
    Re: error printing page using LWP::Simple <larry@example.invalid>
    Re: error printing page using LWP::Simple <tim@burlyhost.com>
    Re: error printing page using LWP::Simple <brian.helterline@hp.com>
    Re: error printing page using LWP::Simple <tim@burlyhost.com>
    Re: Getting system info using perl <uri@stemsystems.com>
        HTML::TreeBuilder issue <dean.karres@gmail.com>
    Re: HTML::TreeBuilder issue <tadmc@seesig.invalid>
    Re: HTML::TreeBuilder issue <dean.karres@gmail.com>
    Re: HTML::TreeBuilder issue <smallpond@juno.com>
    Re: unable to delete a file <tadmc@seesig.invalid>
    Re: unable to delete a file <cartercc@gmail.com>
    Re: unable to delete a file <tim@burlyhost.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 5 Feb 2009 13:44:17 -0800 (PST)
From: grocery_stocker <cdalten@gmail.com>
Subject: closures and streams question
Message-Id: <eb1db4f1-8e2e-45ea-a459-48ff3dd33079@i20g2000prf.googlegroups.com>

The following code generates even numbers..

[cdalten@localhost oakland]$ more even.pl
#!/usr/bin/perl -w

sub even_number_printer_gen {
    # This function returns a reference to an anon. subroutine.
    # This anon. subroutine prints even numbers starting from $input.
    my($input) = @_;
    if ($input % 2) { $input++};  # Next even number, if the given
                                  # number is odd
    $rs = sub {
        print "$input ";  # Using $input,which is a my variable
                                  # declared in an outside scope
        $input  += 2;
    };
    return $rs;   # Return a reference to the subroutine above
}

my $iterator = even_number_printer_gen(30);
# $iterator now points to a closure.
# Every time you call it, it prints the next successive even number.
for ($i = 0; $i < 10; $i++) {
    &$iterator();
}
print "\n";

[cdalten@localhost oakland]$ ./even.pl
30 32 34 36 38 40 42 44 46 48
[cdalten@localhost oakland]$

The thing I'm not seeing why using clousres would be important in this
case. Can someone enlighten me?


------------------------------

Date: Thu, 05 Feb 2009 14:38:16 -0800
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: closures and streams question
Message-Id: <050220091438164300%jimsgibson@gmail.com>

In article
<eb1db4f1-8e2e-45ea-a459-48ff3dd33079@i20g2000prf.googlegroups.com>,
grocery_stocker <cdalten@gmail.com> wrote:

> The following code generates even numbers..
> 
> [cdalten@localhost oakland]$ more even.pl
> #!/usr/bin/perl -w
> 
> sub even_number_printer_gen {
>     # This function returns a reference to an anon. subroutine.
>     # This anon. subroutine prints even numbers starting from $input.
>     my($input) = @_;
>     if ($input % 2) { $input++};  # Next even number, if the given
>                                   # number is odd
>     $rs = sub {
>         print "$input ";  # Using $input,which is a my variable
>                                   # declared in an outside scope
>         $input  += 2;
>     };
>     return $rs;   # Return a reference to the subroutine above
> }
> 
> my $iterator = even_number_printer_gen(30);
> # $iterator now points to a closure.
> # Every time you call it, it prints the next successive even number.
> for ($i = 0; $i < 10; $i++) {
>     &$iterator();
> }
> print "\n";
> 
> [cdalten@localhost oakland]$ ./even.pl
> 30 32 34 36 38 40 42 44 46 48
> [cdalten@localhost oakland]$
> 
> The thing I'm not seeing why using clousres would be important in this
> case. Can someone enlighten me?

One reason would be so you could define two such generators. Try:

my $iter1 = even_number_printer_gen(30);
my $iter2 = even_number_printer_gen(100);
for (my $i = 0; $i < 10; $i++) {
    &$iter1();
    $iter2->();
}
print "\n";

__OUTPUT__

30 100 32 102 34 104 36 106 38 108 40 110 42 112 44 114 46 116 48 118 

Because the subroutine is defined as a closure, there are two values of
the lexical variable $input defined instead of a single, shared one.

Note the alternate method of calling the second generator function.

-- 
Jim Gibson


------------------------------

Date: Thu, 5 Feb 2009 14:46:12 -0800 (PST)
From: smallpond <smallpond@juno.com>
Subject: Re: closures and streams question
Message-Id: <82d6d0af-60b4-4018-8771-c589bec9cd63@e1g2000pra.googlegroups.com>

On Feb 5, 4:44=A0pm, grocery_stocker <cdal...@gmail.com> wrote:
> The following code generates even numbers..
>
> [cdalten@localhost oakland]$ more even.pl
> #!/usr/bin/perl -w
>
> sub even_number_printer_gen {
> =A0 =A0 # This function returns a reference to an anon. subroutine.
> =A0 =A0 # This anon. subroutine prints even numbers starting from $input.
> =A0 =A0 my($input) =3D @_;
> =A0 =A0 if ($input % 2) { $input++}; =A0# Next even number, if the given
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 # num=
ber is odd
> =A0 =A0 $rs =3D sub {
> =A0 =A0 =A0 =A0 print "$input "; =A0# Using $input,which is a my variable
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 # dec=
lared in an outside scope
> =A0 =A0 =A0 =A0 $input =A0+=3D 2;
> =A0 =A0 };
> =A0 =A0 return $rs; =A0 # Return a reference to the subroutine above
>
> }
>
> my $iterator =3D even_number_printer_gen(30);
> # $iterator now points to a closure.
> # Every time you call it, it prints the next successive even number.
> for ($i =3D 0; $i < 10; $i++) {
> =A0 =A0 &$iterator();}
>
> print "\n";
>
> [cdalten@localhost oakland]$ ./even.pl
> 30 32 34 36 38 40 42 44 46 48
> [cdalten@localhost oakland]$
>
> The thing I'm not seeing why using clousres would be important in this
> case. Can someone enlighten me?


perl 5.10 has static "state" variables, you only need the closure in
5.8.

Why is it important?  Can you be enlightened?  Those aren't perl
questions.  As an exercise, maybe try to access and print the value
of
$input inside your for loop.



------------------------------

Date: Thu, 5 Feb 2009 12:42:54 -0800 (PST)
From: balldarrens@gmail.com
Subject: CSPEC issue: lossing scope (or incorrect scope) in cspec subroutine.
Message-Id: <e23bc7cb-f7f1-46b2-b594-5ae96a46bf42@y1g2000pra.googlegroups.com>

I've attempted to add a global spec to my logging configuration, and
it will not see the scope of which I would like it to.
If possible can you take a quite look at this and let me know if there
is a way to do this.

The general premise of this is that I have a top level package that
starts up my environment (in this case DBALL.pm).
Inisde of this library, I initialize and watch a standardized
configuration file, in which any other libraries are added as
categories.

I have a resource type of factory in this example of which returns a
newly instantiated object (in this case DBALL::TESTOBJECT).
Inside of this object I have set attributes that I would like to be
able to add to every log statement (I have done this via a cspec - of
which is not acting like I would like it to).
The cspec override is stipulated inside of DBALL.pm, and I would like
to reference the calling object (in this case DBALL::TESTOBJECT), but
it sees self as something coming from
Log::Log4perl::Layout::PatternLayout.

I've added a if routine to dump out caller to let me know who's
calling the cspec, and it dumps that out for viewing.

If you are able, can you unzip this and run test.pl to see the output?

I will put them in line here as well, they are rudimentary, but they
can reproduce my problem easily.

Any help that you can provide would be great.

Thanks,
Darren


The log configuration file:

<<<< ./logger.conf >>>>
log4perl.rootLogger = INFO, ScreenApp
log4perl.appender.ScreenApp= Log::Log4perl::Appender::Screen
log4perl.appender.ScreenApp.stderr   = 0
log4perl.appender.ScreenApp.layout   = PatternLayout
log4perl.appender.ScreenApp.layout.ConversionPattern = %d %-5p %-4L %M
%U %m%n
log4perl.appender.ScreenApp.Threshold = TRACE
log4perl.logger.DBALL.RESOURCEFACTORY = INFO,ScreenApp
log4perl.logger.DBALL.TESTOBJECT = INFO, ScreenApp
log4perl.oneMessagePerAppender = 1
<<<< logger.conf >>>>


The test.pl file:

<<<< ./test.pl >>>>
#!/usr/bin/perl -w
use strict;
use DBALL;
my $object = DBALL::RESOURCEFACTORY->new();
print "DONE";
<<<<test.pl >>>>

<<<< ./DBALL.pm >>>>
package DBALL;

use lib "../DBALL";

use Module::Locate qw / locate /;
use File::Basename;
use Log::Log4perl;
use Data::Dumper;

BEGIN {
    use Module::Locate qw / locate /;
    my $location = locate __PACKAGE__;
    my ($name,$path,$suffix) = fileparse($location,"\.pm");
    my $logConf = $path . "logger.conf";
    Log::Log4perl::Layout::PatternLayout::add_global_cspec( 'U',
        sub {
            if(defined($self->{id}) && defined($self->{mgmt_ip})){
                return sprintf(" [%s:%s]", $self->{id}, $self->
{mgmt_ip});
            }else{
                print Dumper(caller(0));
                return " ERROR ";
            }

            } );
    Log::Log4perl->init_and_watch($logConf, 3);
}
require DBALL::RESOURCEFACTORY;
1;
<<<< ./DBALL.pm >>>>


The resource factory file:

<<<< ./DBALL/RESOURCEFACTORY.pm >>>>
package DBALL::RESOURCEFACTORY;
use strict;
use Log::Log4perl;

use lib "./DBALL";

require Exporter;
our @ISA = qw(Exporter);
our @EXPORT_OK = qw();


sub new {
    my($class, $args) = @_;
    my ($self, $factoryClass);
    $self = bless {}, $class;
    $self->{logger} = Log::Log4perl::get_logger($class);
    $self->{logger}->info("RESOURCE FACTORY INSTANTIATED");
    eval "require DBALL::TESTOBJECT";
    return new DBALL::TESTOBJECT->new($args);
}
<<<< ./DBALL/RESOURCEFACTORY.pm >>>>

and finally the TESTOBJECT

<<<< ./DBALL/TESTOBJECT.pm >>>>
package DBALL::TESTOBJECT;
use strict;
use Log::Log4perl;

sub new {
    my($class, $args) = @_;
    my $self = bless {
        attr1=>"hello",
        attr2=>"world",
        id => "TESTOBJECT",
        mgmt_ip => "1.1.1.1",
        logger => undef,
    },$class;
    $self->{logger} = Log::Log4perl::get_logger($class);
    $self->{logger}->info("REF: " . ref($self));
    $self->{logger}->info("CLASS: $class");
    return $self;
}
1;
<<<< ./DBALL/TESTOBJECT.pm >>>>


------------------------------

Date: Thu, 5 Feb 2009 12:44:04 -0700
From: Larry Gates <larry@example.invalid>
Subject: Re: error printing page using LWP::Simple
Message-Id: <xpbn03lwr03u.1pipjx09r77oy$.dlg@40tude.net>

On Thu, 05 Feb 2009 07:43:31 -0600, Chris Mattern wrote:

> On 2009-02-05, Larry Gates <larry@example.invalid> wrote:
>>
>>  # use strict;
> 
> Why?

That's an example of me trying things to make it work, and as I run out of
guesses, the thing I post is the ugliest version I had.:-(  This is a
proper post:

C:\MinGW\source>perl scraper2.pl
Use of uninitialized value in concatenation (.) or string at scraper2.pl
line 12
 .
t is
C:\MinGW\source>type scraper2.pl
  use strict;
  use warnings;
  use LWP::Simple;

  # load the complete content of the url in question
  # via LWP::Simple::get(...)

my $t = get 'http://www.fourmilab.ch/cgi-
bin/
Yoursky?z=1&lat=35.0836&ns=North&lon=106.651&ew=West';
# this is line 11
print "t is $t";
#  perl scraper2.pl

C:\MinGW\source>]

Is it not a problem that the url is folded over with newlines?
-- 
larry gates

Even if you aren't in doubt, consider the mental welfare of the person who
has to maintain the code after you, and who will probably put parens in
the wrong place.  -- Larry Wall in the perl man page


------------------------------

Date: Thu, 05 Feb 2009 12:07:06 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: error printing page using LWP::Simple
Message-Id: <XFHil.5256$V25.4058@newsfe24.iad>

Larry Gates wrote:

> On Thu, 05 Feb 2009 07:43:31 -0600, Chris Mattern wrote:
> 
>> On 2009-02-05, Larry Gates <larry@example.invalid> wrote:
>>>
>>>  # use strict;
>> 
>> Why?
> 
> That's an example of me trying things to make it work, and as I run
> out of
> guesses, the thing I post is the ugliest version I had.:-(  This is a
> proper post:
> 
> C:\MinGW\source>perl scraper2.pl
> Use of uninitialized value in concatenation (.) or string at
> scraper2.pl line 12
> .
> t is
> C:\MinGW\source>type scraper2.pl
>   use strict;
>   use warnings;
>   use LWP::Simple;
> 
>   # load the complete content of the url in question
>   # via LWP::Simple::get(...)
> 
> my $t = get 'http://www.fourmilab.ch/cgi-
> bin/
> Yoursky?z=1&lat=35.0836&ns=North&lon=106.651&ew=West';
> # this is line 11
> print "t is $t";
> #  perl scraper2.pl
> 
> C:\MinGW\source>]
> 
> Is it not a problem that the url is folded over with newlines?


I assumed the new lines were due to you posting with your news client
wrapping the text?  Anyway, as I mentioned before, assign a value to
the variable before you try and assign the data from get().

I.e.,

my $t = 'Something went wrong!';
$t = get 'http://domain-and-url-here';
print "t is $t\n";

Have you tried this yet?  Is your example domain and URL actually
wrapping in your script, or just when you post it here?
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Thu, 05 Feb 2009 12:46:39 -0800
From: Brian Helterline <brian.helterline@hp.com>
Subject: Re: error printing page using LWP::Simple
Message-Id: <gmfj7g$g4g$1@usenet01.boi.hp.com>

Tim Greer wrote:
> Larry Gates wrote:
> 
>> On Thu, 05 Feb 2009 07:43:31 -0600, Chris Mattern wrote:
>>
>>> On 2009-02-05, Larry Gates <larry@example.invalid> wrote:
>>>>  # use strict;
>>> Why?
>> That's an example of me trying things to make it work, and as I run
>> out of
>> guesses, the thing I post is the ugliest version I had.:-(  This is a
>> proper post:
>>
>> C:\MinGW\source>perl scraper2.pl
>> Use of uninitialized value in concatenation (.) or string at
>> scraper2.pl line 12
>> .
>> t is
>> C:\MinGW\source>type scraper2.pl
>>   use strict;
>>   use warnings;
>>   use LWP::Simple;
>>
>>   # load the complete content of the url in question
>>   # via LWP::Simple::get(...)
>>
>> my $t = get 'http://www.fourmilab.ch/cgi-
>> bin/
>> Yoursky?z=1&lat=35.0836&ns=North&lon=106.651&ew=West';
>> # this is line 11
>> print "t is $t";
>> #  perl scraper2.pl
>>
>> C:\MinGW\source>]
>>
>> Is it not a problem that the url is folded over with newlines?
> 
> 
> I assumed the new lines were due to you posting with your news client
> wrapping the text?  Anyway, as I mentioned before, assign a value to
> the variable before you try and assign the data from get().
> 
> I.e.,
> 
> my $t = 'Something went wrong!';
> $t = get 'http://domain-and-url-here';

according to the docs, get returns undef on error so $t will be assigned 
  for all cases.  I think you meant:

my $t = get 'http://domain-and-url-here' or 'Something went wrong!';

-- 
brian


------------------------------

Date: Thu, 05 Feb 2009 13:11:02 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: error printing page using LWP::Simple
Message-Id: <MBIil.7969$Tq3.1163@newsfe14.iad>

Brian Helterline wrote:

> Tim Greer wrote:
>> Larry Gates wrote:
>> 
>>> On Thu, 05 Feb 2009 07:43:31 -0600, Chris Mattern wrote:
>>>
>>>> On 2009-02-05, Larry Gates <larry@example.invalid> wrote:
>>>>>  # use strict;
>>>> Why?
>>> That's an example of me trying things to make it work, and as I run
>>> out of
>>> guesses, the thing I post is the ugliest version I had.:-(  This is
>>> a proper post:
>>>
>>> C:\MinGW\source>perl scraper2.pl
>>> Use of uninitialized value in concatenation (.) or string at
>>> scraper2.pl line 12
>>> .
>>> t is
>>> C:\MinGW\source>type scraper2.pl
>>>   use strict;
>>>   use warnings;
>>>   use LWP::Simple;
>>>
>>>   # load the complete content of the url in question
>>>   # via LWP::Simple::get(...)
>>>
>>> my $t = get 'http://www.fourmilab.ch/cgi-
>>> bin/
>>> Yoursky?z=1&lat=35.0836&ns=North&lon=106.651&ew=West';
>>> # this is line 11
>>> print "t is $t";
>>> #  perl scraper2.pl
>>>
>>> C:\MinGW\source>]
>>>
>>> Is it not a problem that the url is folded over with newlines?
>> 
>> 
>> I assumed the new lines were due to you posting with your news client
>> wrapping the text?  Anyway, as I mentioned before, assign a value to
>> the variable before you try and assign the data from get().
>> 
>> I.e.,
>> 
>> my $t = 'Something went wrong!';
>> $t = get 'http://domain-and-url-here';
> 
> according to the docs, get returns undef on error so $t will be
> assigned
>   for all cases.  I think you meant:
> 
> my $t = get 'http://domain-and-url-here' or 'Something went wrong!';
> 

This is why I suggested defining it and assigning a value first ($t =
"Someting went wrong!", so if it fails, it will show that, but I think
both methods would fail, actually.  You want my $t = (get 'http://url'
or 'failure'); by the looks of it.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Thu, 05 Feb 2009 16:22:29 -0500
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Getting system info using perl
Message-Id: <x7fxis37fu.fsf@mail.sysarch.com>

>>>>> "sh" == saurabh hirani <saurabh.hirani@gmail.com> writes:

  sh> Forking df and getting the time:

  sh> Total Elapsed Time = 42.04899 Seconds
  sh>   User+System Time = 5.668994 Seconds
  sh> Inclusive Times
  sh> %Time ExclSec CumulS #Calls sec/call Csec/c  Name
  sh>  96.7   5.485  5.485   1000   0.0055 0.0055  main::get_freequeue


  sh> Using df from the module
  sh> Total Elapsed Time = 0.618107 Seconds
  sh>   User+System Time = 0.768107 Seconds
  sh> Inclusive Times
  sh> %Time ExclSec CumulS #Calls sec/call Csec/c  Name
  sh>  92.1   0.013  0.708   1000   0.0000 0.0007  main::get_freequeue
  sh>  90.4   0.013  0.695   1000   0.0000 0.0007  Filesys::Df::df
  sh>  88.7   0.682  0.682   1000   0.0007 0.0007  Filesys::Df::_df

  sh> The difference is staggering.

this is not news. a single system call with some perl wrapping will
always blow away ANY fork. fork is a very slow system call and it uses
tons of resources (ram AND cpu). we always say to newbies to stop
calling backticks and such when modules or perl builtins can do it for
you. we see newbies doing `date ...` vs strftime so often.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Thu, 5 Feb 2009 12:24:25 -0800 (PST)
From: Dean Karres <dean.karres@gmail.com>
Subject: HTML::TreeBuilder issue
Message-Id: <adc061b9-32fc-4daa-80c0-af681828b08d@w39g2000prb.googlegroups.com>

Hi I posted something similar to this over in comp.lang.perl.modules
but 1) it looks like that group is not very active and 2) it does not
look like TreeBuilder is being actively maintained -- CPAN has open
bug reports on TreeBuilder that are over a year old.

So what i want to do is harvest some info from a lot of web pages on
my site.  I want to grab the first H1 tag in the file and then, if
there are H2 tags that follow the H1, I want the "sub"-H2s.  If there
are more H1s or if there are H2s that precede the first H1 or follow
any secondary H1 then they should be ignored.

There are some outstanding issues with the existence and location of
"A" tags either inside or surrounding the H2 tags but I don't think
that has anything to do with my problem.

My problem comes from the sample html file:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://
www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />
    <title>thing</title>
    <link href="index.css" rel="stylesheet" type="text/css" />
  </head>

  <body>
    <h1>A Title</h1>

    <a name="foo"></a><h2>Nav topic 1</h2>

    <h2><a name="foo2"></a>Nav topic 2</h2>

    <h2><a name="foo3">Nav topic 3</a</h2>

    <h2><a name="foo4"><a href="http://www.harvard.edu">Nav topic 4</
a></a></h2>

    <h2><a href="http://www.mit.edu"><a name="foo5">Nav topic 5</a></
a></h2>

    <a name="foo"><h2>Nav topic 6</h2></a>

    <a href="http://www.usc.edu"> <a name="foo"> <h2>Nav topic 7</h2>
</a> </a>

  </body>
</html>


This was just a trivial test case to see if I was using TreeBuilder
correctly.  If I have 0-6 H2s then the scriot works as I expect it
to.  If there are 7+ H2s the script fails on the 7th H2 with the
message:

    Can't call method "look_down" without a package or object
reference at /my/script.cgi line 88

I have marked line 88 below in a comment.

A thing I just noticed is, if I insert a P tag and paragraph between
the 6th and 7th H2 then the whole thing works.

This sure feels like a bug.

Anyway, the code segment is below.  The evals were put in during a
frustrating few minutes yesterday:


eval { $body = $tree->look_down('_tag', 'body'); };
    die __LINE__ . ": " . $@ if $@;

die "$ARGV[0] is missing a BODY tag\n" if (! $body);

eval { @bodyElementList = $body->content_list(); };
    die __LINE__ . ": " . $@ if $@;

for (my $i = 0; $i <= $#bodyElementList; $i++)
{
    if (! $firstH1)
    {
        eval { $H1 = $bodyElementList[$i]->look_down('_tag', 'h1'); };
            die __LINE__ . ": " . $@ if $@;

        if ($H1)
        {
            print STDOUT "<li><a href=\"\">" . $H1->as_trimmed_text
() .
                         "</a></li>\n";

            $firstH1++;
        }
    }
    else
    {
        #
        # LINE 88 below
        #
        eval { $H2 = $bodyElementList[$i]->look_down('_tag', 'h2'); };
            die __LINE__ . ": " . $@ if $@;

        if ($H2)
        {
            if ($h2Count == 0)
            {
                print STDOUT "<ul>\n";
            }

            if ($H2->is_inside('a'))
            {
                if (($A = $H2->look_up('_tag', 'a', 'href', qr/.+/)))
                {
                    print STDOUT "  <li><a href=\"" . $A->attr
('href') .
                                 "\">" .  $H2->as_trimmed_text() .
                                 "</a></li>\n";
                }
                elsif (($A = $H2->look_up('_tag', 'a', 'name', qr/.
+/)))
                {
                    print STDOUT "  <li><a href=\"$ARGV[0]/#" .
                                 $A->attr('name') . "\">" .
                                 $H2->as_trimmed_text() . "</a></li>
\n";
                }
                else
                {
                    print STDOUT "  <li>" . $H2->as_trimmed_text() .
                                 "</li>\n";
                }
            }
            elsif ($H2->look_down('_tag', 'a'))
            {
                if (($A = $H2->look_down('_tag', 'a', 'href', qr/.
+/)))
                {
                    print STDOUT "  <li><a href=\"" . $A->attr
('href') .
                                 "\">" .  $H2->as_trimmed_text() .
                                 "</a></li>\n";
                }
                elsif (($A = $H2->look_down('_tag', 'a',
                                            'name', qr/.+/)))
                {
                    print STDOUT "  <li><a href=\"$ARGV[0]/#" .
                                 $A->attr('name') . "\">" .
                                 $H2->as_trimmed_text() . "</a></li>
\n";
                }
                else
                {
                    print STDOUT "  <li>" . $H2->as_trimmed_text() .
                                 "</li>\n";
                }
            }
            else
            {
                print STDOUT "  <li>" . $H2->as_trimmed_text() .
                             "</li>\n";
            }

            $h2Count++;
        }
    }
}

if ($h2Count)
{
    print STDOUT "</ul>\n";
}

$tree->delete();

exit(0);




------------------------------

Date: Thu, 5 Feb 2009 15:19:07 -0600
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: HTML::TreeBuilder issue
Message-Id: <slrngomlub.d4g.tadmc@tadmc30.sbcglobal.net>

Dean Karres <dean.karres@gmail.com> wrote:

> Hi I posted something similar to this over in comp.lang.perl.modules


I saw it there, but thought that you had made it too hard to help,
so I moved on.

Have you seen the Posting Guidelines that are posted here frequently?

They contain many tips that make it more likely that someone will
take the time to figure out what is going on, like make a short
and complete program that we can run, use __DATA__ to supply file
contents, use warnings, use strict, etc...

It is often a Good Idea to try and make the smallest program
possible that will still produce the problem you need help with.

I'll do that much for you at least:

---------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new_from_file(*DATA);

my $body = eval { $tree->look_down('_tag', 'body'); };
    die __LINE__ . ": " . $@ if $@;

die "missing a BODY tag\n" unless $body;

my @bodyElementList = eval { $body->content_list(); };
    die __LINE__ . ": " . $@ if $@;


foreach my $i ( 0 .. $#bodyElementList )
{
    warn "i=$i\n";
    my $H2 = eval { $bodyElementList[$i]->look_down('_tag', 'h2'); };
            die __LINE__ . ": " . $@ if $@;

}

__DATA__
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://
www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />
    <title>thing</title>
    <link href="index.css" rel="stylesheet" type="text/css" />
  </head>

  <body>
    <h1>A Title</h1>

    <a name="foo"></a><h2>Nav topic 1</h2>

    <h2><a name="foo2"></a>Nav topic 2</h2>

    <h2><a name="foo3">Nav topic 3</a</h2>

    <h2><a name="foo4"><a href="http://www.harvard.edu">Nav topic 4</
a></a></h2>

    <h2><a href="http://www.mit.edu"><a name="foo5">Nav topic 5</a></
a></h2>

    <a name="foo"><h2>Nav topic 6</h2></a>

    <a href="http://www.usc.edu"> <a name="foo"> <h2>Nav topic 7</h2>
</a> </a>

  </body>
</html>
---------------------------------



-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Thu, 5 Feb 2009 13:58:31 -0800 (PST)
From: Dean Karres <dean.karres@gmail.com>
Subject: Re: HTML::TreeBuilder issue
Message-Id: <0bf2d061-dd3a-4a95-b766-b29a590bb424@t26g2000prh.googlegroups.com>

Oops.  I apologize. I am rarely in these groups plus I have poor
vision so except for doing a general search for my issue before
posting I did not see the guidelines.  Thank you for reformatting the
question.

All the best,
Dean...K...


------------------------------

Date: Thu, 5 Feb 2009 14:55:05 -0800 (PST)
From: smallpond <smallpond@juno.com>
Subject: Re: HTML::TreeBuilder issue
Message-Id: <96e0a886-3837-4798-bcff-652c445cca85@p2g2000prn.googlegroups.com>

On Feb 5, 3:24=A0pm, Dean Karres <dean.kar...@gmail.com> wrote:
> Hi I posted something similar to this over in comp.lang.perl.modules
> but 1) it looks like that group is not very active and 2) it does not
> look like TreeBuilder is being actively maintained -- CPAN has open
> bug reports on TreeBuilder that are over a year old.
>
> So what i want to do is harvest some info from a lot of web pages on
> my site. =A0I want to grab the first H1 tag in the file and then, if
> there are H2 tags that follow the H1, I want the "sub"-H2s. =A0If there
> are more H1s or if there are H2s that precede the first H1 or follow
> any secondary H1 then they should be ignored.
>
> There are some outstanding issues with the existence and location of
> "A" tags either inside or surrounding the H2 tags but I don't think
> that has anything to do with my problem.
>
> My problem comes from the sample html file:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://ww=
w.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html xmlns=3D"http://www.w3.org/1999/xhtml" lang=3D"en" xml:lang=3D"en">
> =A0 <head>
> =A0 =A0 <meta http-equiv=3D"Content-Type" content=3D"text/html;
> charset=3Dutf-8" />
> =A0 =A0 <title>thing</title>
> =A0 =A0 <link href=3D"index.css" rel=3D"stylesheet" type=3D"text/css" />
> =A0 </head>
>
> =A0 <body>
> =A0 =A0 <h1>A Title</h1>
>
> =A0 =A0 <a name=3D"foo"></a><h2>Nav topic 1</h2>
>
> =A0 =A0 <h2><a name=3D"foo2"></a>Nav topic 2</h2>
>
> =A0 =A0 <h2><a name=3D"foo3">Nav topic 3</a</h2>
>
> =A0 =A0 <h2><a name=3D"foo4"><a href=3D"http://www.harvard.edu">Nav topic=
 4</
> a></a></h2>
>
> =A0 =A0 <h2><a href=3D"http://www.mit.edu"><a name=3D"foo5">Nav topic 5</=
a></
> a></h2>
>
> =A0 =A0 <a name=3D"foo"><h2>Nav topic 6</h2></a>
>
> =A0 =A0 <a href=3D"http://www.usc.edu"> <a name=3D"foo"> <h2>Nav topic 7<=
/h2>
> </a> </a>
>
> =A0 </body>
> </html>
>

Have you run this through validator?  Maybe the module has a problem
with
malformed html.


------------------------------

Date: Thu, 5 Feb 2009 13:08:57 -0600
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: unable to delete a file
Message-Id: <slrngomea9.bp1.tadmc@tadmc30.sbcglobal.net>

cartercc <cartercc@gmail.com> wrote:
> On Feb 5, 12:48 pm, Tad J McClellan <ta...@seesig.invalid> wrote:
>> cartercc <carte...@gmail.com> wrote:
>> > 5 unlink $file;                  # does not delete EAUSR.txt
>>
>> Ask it why it failed by checking the return value and including $!
>>
>>    unlink $file or die "could not delete '$file' $!";


> Funny. I just tried this, and unlink deleted the EAUSR file just like
> it was supposed to. Did not get any error message at all.
>
> Must be the McClellan hex -- die or else!


You have the order of the operands reversed. It's more like:

   Do what I ask or die!

:-)


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Thu, 5 Feb 2009 11:38:02 -0800 (PST)
From: cartercc <cartercc@gmail.com>
Subject: Re: unable to delete a file
Message-Id: <2a2893ba-e3b9-4c5a-8b02-caac6d02d6b0@r41g2000prr.googlegroups.com>

On Feb 5, 2:08=A0pm, Tad J McClellan <ta...@seesig.invalid> wrote:
> You have the order of the operands reversed. It's more like:
>
> =A0 =A0Do what I ask or die!
>
> :-)

Yeah, but it's funnier the way I wrote it, as if dying is the best
option, not the worst. What other options do you have if the first one
is die?

Your way is the same as 'your money or your life' which is more of a
dog-bites-man story rather than a man-bites-dog story.


------------------------------

Date: Thu, 05 Feb 2009 11:41:03 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: unable to delete a file
Message-Id: <jhHil.10170$1r4.3594@newsfe04.iad>

cartercc wrote:

> On Feb 5, 2:08 pm, Tad J McClellan <ta...@seesig.invalid> wrote:
>> You have the order of the operands reversed. It's more like:
>>
>> Do what I ask or die!
>>
>> :-)
> 
> Yeah, but it's funnier the way I wrote it, as if dying is the best
> option, not the worst. What other options do you have if the first one
> is die?

So, don't use die, use something else relevant and helpful to report
failures (or log them), or whatever it is you want to do in said
situation.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2185
***************************************


home help back first fref pref prev next nref lref last post