[16160] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3572 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jul 10 17:08:14 2000

Date: Mon, 10 Jul 2000 14:08:00 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <963263280-v9-i3572@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Mon, 10 Jul 2000     Volume: 9 Number: 3572

Today's topics:
        howto split at a "\" or "\\" in a string?? <ditlew@abk.auc.dk>
    Re: howto split at a "\" or "\\" in a string?? (Bernard El-Hagin)
    Re: howto split at a "\" or "\\" in a string?? <fabascal@gredos.cnb.uam.es>
    Re: howto split at a "\" or "\\" in a string?? (Abigail)
    Re: howto split at a "\" or "\\" in a string?? (Craig Berry)
    Re: howto split at a "\" or "\\" in a string?? (Bernard El-Hagin)
        HP-UX System architecture (H. Merijn Brand)
    Re: HP-UX System architecture <care227@attglobal.net>
        RE: HP-UX System architecture (H. Merijn Brand)
    Re: HP-UX System architecture <gellyfish@gellyfish.com>
    Re: HP-UX System architecture <care227@attglobal.net>
        HTML TokeParser question (sync24)
    Re: HTML TokeParser question (Bernard El-Hagin)
    Re: HTML TokeParser question (Bernard El-Hagin)
    Re: HTML TokeParser question <thunderbear@bigfoot.com>
        HTTP Last-Modified header not always returned <webmaster@archiTacTic.com>
    Re: HTTP Last-Modified header not always returned (Abigail)
    Re: HTTP Last-Modified header not always returned <webmaster@archiTacTic.com>
    Re: HTTP Last-Modified header not always returned (Decklin Foster)
    Re: HTTP Last-Modified header not always returned <flavell@mail.cern.ch>
    Re: HTTP Last-Modified header not always returned (brian d foy)
    Re: HTTP Last-Modified header not always returned (Decklin Foster)
    Re: HTTP Last-Modified header not always returned (Marcel Grunauer)
    Re: HTTP Last-Modified header not always returned <webmaster@archiTacTic.com>
    Re: HTTP Last-Modified header not always returned <uri@sysarch.com>
        Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 05 Jul 2000 13:02:37 GMT
From: "Daniel Ditlew" <ditlew@abk.auc.dk>
Subject: howto split at a "\" or "\\" in a string??
Message-Id: <NlG85.6065$Um.63443@twister.sunsite.auc.dk>

I have the following string:
"players\2\allplayers\2\\name\Nepzali\where\Channel
Graveyard-1\\name\JayBurns\where\Channel Graveyard-1\"
I want to split it at every "\" and "\\" but I cant find a way to do this (I
am a newbie). I have looked for help on www.perl.com but couldnt find
anything usefull.

- Ditlew, Denmark




------------------------------

Date: Wed, 05 Jul 2000 13:22:33 GMT
From: bernard.el-hagin@lido-tech.net (Bernard El-Hagin)
Subject: Re: howto split at a "\" or "\\" in a string??
Message-Id: <slrn8m6dcc.iha.bernard.el-hagin@gdndev25.lido-tech>

On Wed, 05 Jul 2000 13:02:37 GMT, Daniel Ditlew <ditlew@abk.auc.dk> wrote:
>I have the following string:
>"players\2\allplayers\2\\name\Nepzali\where\Channel
>Graveyard-1\\name\JayBurns\where\Channel Graveyard-1\"
>I want to split it at every "\" and "\\" but I cant find a way to do this (I
>am a newbie). I have looked for help on www.perl.com but couldnt find
>anything usefull.

#!/usr/bin/perl -w
use strict;

my $string = q!players\2\allplayers\2\\name\Nepzali\where\Channel
Graveyard-1\\name\JayBurns\where\Channel Graveyard-1\\!;

my @result = split (/\\|\\\\/, $string);


Note the extra "\" at the end of $string (if left out the single "\"
would escape the closing "!" and the you'd get a "Can't find string
terminator" error).

Bernard
--
perl -e '${qq=\x22=}=qq=\053=;$_="BeJUST_ANOTHERnaPERL_HACKERd\n";
${qq=\x2c=}=qq=\x72=;print split /[AC-Z_]$"/;'


------------------------------

Date: Wed, 05 Jul 2000 16:44:00 +0200
From: Federico Abascal <fabascal@gredos.cnb.uam.es>
Subject: Re: howto split at a "\" or "\\" in a string??
Message-Id: <396349AF.4768D8DF@gredos.cnb.uam.es>

Daniel Ditlew wrote:

> I have the following string:
> "players\2\allplayers\2\\name\Nepzali\where\Channel
> Graveyard-1\\name\JayBurns\where\Channel Graveyard-1\"
> I want to split it at every "\" and "\\" but I cant find a way to do this

Hello, Try this:
$string =
"players\2\allplayers\2\\name\Nepzali\where\Channel\Graveyard-1\\name\JayBurns\where\Channel
Graveyard-1\"
@array = split(/\\+/, $string);
I think this will work (when you put '+' it means one or more)

Greetings
Federico



------------------------------

Date: 05 Jul 2000 11:54:36 EDT
From: abigail@delanet.com (Abigail)
Subject: Re: howto split at a "\" or "\\" in a string??
Message-Id: <slrn8m6njh.ibb.abigail@alexandra.delanet.com>

Bernard El-Hagin (bernard.el-hagin@lido-tech.net) wrote on MMD September
MCMXCIII in <URL:news:slrn8m6dcc.iha.bernard.el-hagin@gdndev25.lido-tech>:
## On Wed, 05 Jul 2000 13:02:37 GMT, Daniel Ditlew <ditlew@abk.auc.dk> wrote:
## >I have the following string:
## >"players\2\allplayers\2\\name\Nepzali\where\Channel
## >Graveyard-1\\name\JayBurns\where\Channel Graveyard-1\"
## >I want to split it at every "\" and "\\" but I cant find a way to do this (I
## >am a newbie). I have looked for help on www.perl.com but couldnt find
## >anything usefull.
## 
## #!/usr/bin/perl -w
## use strict;
## 
## my $string = q!players\2\allplayers\2\\name\Nepzali\where\Channel
## Graveyard-1\\name\JayBurns\where\Channel Graveyard-1\\!;

This does *not* contain a single occurance of a double \'s! \\ inside
a single quoted string is a \. To get \\, you have to use \\\\.

## my @result = split (/\\|\\\\/, $string);

That would only match on single \'s. Double \'s get split twice.
The part after | will never match, as the first clause is a prefix.

    split /\\\\?/ => $string

should do though.



Abigail
-- 
sub J::FETCH{Just   }$_.='print+"@{[map';sub J::TIESCALAR{bless\my$J,J}
sub A::FETCH{Another}$_.='{tie my($x),$';sub A::TIESCALAR{bless\my$A,A}
sub P::FETCH{Perl   }$_.='_;$x}qw/J A P';sub P::TIESCALAR{bless\my$P,P}
sub H::FETCH{Hacker }$_.=' H/]}\n"';eval;sub H::TIESCALAR{bless\my$H,H}


------------------------------

Date: Wed, 05 Jul 2000 23:14:15 GMT
From: cberry@cinenet.net (Craig Berry)
Subject: Re: howto split at a "\" or "\\" in a string??
Message-Id: <sm7ga7ogas87@corp.supernews.com>

Bernard El-Hagin (bernard.el-hagin@lido-tech.net) wrote:
: my @result = split (/\\|\\\\/, $string);

That won't work right as written; the first alternative will always match
anything the second will, and thus the second never gets used.  So you'll
end up with spurious empty fields generated each time \\ occurs.  If you
turn it around to /\\\\|\\/ this problem is solved.  Still, it seems more
direct and readable to do /\\{1,2}/.

-- 
   |   Craig Berry - http://www.cinenet.net/users/cberry/home.html
 --*--  "Beauty and strength, leaping laughter and delicious
   |   languor, force and fire, are of us." - Liber AL II:20


------------------------------

Date: Thu, 06 Jul 2000 06:46:23 GMT
From: bernard.el-hagin@lido-tech.net (Bernard El-Hagin)
Subject: Re: howto split at a "\" or "\\" in a string??
Message-Id: <slrn8m8ahh.iha.bernard.el-hagin@gdndev25.lido-tech>

On 05 Jul 2000 11:54:36 EDT, Abigail <abigail@delanet.com> wrote:
>Bernard El-Hagin (bernard.el-hagin@lido-tech.net) wrote on MMD September
>MCMXCIII in <URL:news:slrn8m6dcc.iha.bernard.el-hagin@gdndev25.lido-tech>:
>## On Wed, 05 Jul 2000 13:02:37 GMT, Daniel Ditlew <ditlew@abk.auc.dk> wrote:
>## >I have the following string:
>## >"players\2\allplayers\2\\name\Nepzali\where\Channel
>## >Graveyard-1\\name\JayBurns\where\Channel Graveyard-1\"
>## >I want to split it at every "\" and "\\" but I cant find a way to do this (I
>## >am a newbie). I have looked for help on www.perl.com but couldnt find
>## >anything usefull.
>## 
>## #!/usr/bin/perl -w
>## use strict;
>## 
>## my $string = q!players\2\allplayers\2\\name\Nepzali\where\Channel
>## Graveyard-1\\name\JayBurns\where\Channel Graveyard-1\\!;
>
>This does *not* contain a single occurance of a double \'s! \\ inside
>a single quoted string is a \. To get \\, you have to use \\\\.
>
>## my @result = split (/\\|\\\\/, $string);
>
>That would only match on single \'s. Double \'s get split twice.
>The part after | will never match, as the first clause is a prefix.

A very interesting trap I fell into. First, I had only single \'s in my
input string but *thought* I also had double \'s, and second, I matched
only single \'s but *thought* I also matched double \'s. The resulting
output was correct, but achieved incorrectly. :-) 

>    split /\\\\?/ => $string
>
>should do though.

Yep.

Bernard
--
perl -e '${qq=\x22=}=qq=\053=;$_="BeJUST_ANOTHERnaPERL_HACKERd\n";
${qq=\x2c=}=qq=\x72=;print split /[AC-Z_]$"/;'


------------------------------

Date: Mon, 3 Jul 00 18:28:06 +0200
From: h.m.brand@hccnet.nl (H. Merijn Brand)
Subject: HP-UX System architecture
Message-Id: <8F66B12B4Merijn@192.0.1.90>

For thos who wonder how their system is put together and what's in it, the 
following perl script might help. It's able to show the system's 
architecture in a similar way to how xstm would, but this is printable 
(change line #114) and saveable.

Prerequisites:

    	HP-UX 10.20 or HP-UX 11.00 :-)

    	perl 5    	    	(I use 5.005.03 and/or 5.6.0)
    	Tk    	    	(I use 800-022, cannot guarantee anything older)
    	Tk::TreeGraph     (I use 1.023)
    	Graph::Directed   (I use 0.201)
    	Heap    	    	(I use 0.50, used in Graph::Directed)

    	access to cstm (might require root permissions)

Script is * AS IS *. No guarantee it'll give you the information you 
extected to get. If supported by your version of DIAGNOSTICS software (cstm 
& xstm), you're able to see the system type, the amount of memory, the CPU 
type, all hardware including the size of the drives. HTH.

--8<---  stm.pl
#!/pro/bin/perl -w



use strict;



$ENV{LINES} = 99999;



my (@map, @inf, $host, $rc);

print STDERR "STM ...";

if (@ARGV && -f $ARGV[0]) {

    open STM, "< $ARGV[0]";

    $rc = 0;

    }

else {

    $host = $ENV{HOST};

    $rc = "/tmp/stm.$$.rc";



    open  TMP, "> $rc";

    print TMP "sall\ninfo\nwait\nil\ndone\nmap\nquit\n\n";

    close TMP;



    open STM, "cstm < $rc |";

    open INF, "> /tmp/$host.stm";

    }

while (<STM>) {

    m/^cstm>il$/  .. m/^cstm>map$/  and push @inf => $_;

    m/^cstm>map$/ .. m/^cstm>quit$/ and push @map => $_;

    $rc and print INF;

    }

close STM;

if ($rc) {

    close INF;

    unlink $rc;

    }

else {

    ($host) = ($map[1] =~ m/(\S+)/);

    }

print STDERR "\nMAP ...";



my %map;

my @idx;

foreach (@map) {

    my ($idx, $hwp, $desc) = (m/^\*?\s*(\d+)\s*(\S+)\s+(.*?)(?= 
Information|$)/) or next;

    $map{$hwp}{idx} = $idx;

    $map{$hwp}{dsc} = $desc;

    $map{$hwp}{inf} = "";

    $idx[$idx]      = $hwp;

    }



my ($hwp, $pid) = ("", "");

$idx[0] = $host;

@{$map{""}   }{qw(idx hwp dsc inf)} = (0, "", "",       "");

@{$map{$host}}{qw(idx hwp dsc inf)} = (0, "", "System", "System");

foreach (@inf) {

    m/^-- Information Tool Log for (.*) on path (\S+) --$/ and

	$map{$hwp = $2}{inf} = $pid = $1;

    m/^\s+HPUX Model Number.*:\s+(.*)/ and $map{$host}{inf} = $1;

    m/^\s+HPUX Model String.*:\s+(.*)/ and $map{"system"}{inf} = $1;



    m/^Vendor:\s*(\S+)/ and do {

	my $x = $1;

	$map{$hwp}{inf} =~ s/\n/\n$x\n/;

	};

    m/^Product ID:\s*(.*\S)\s*Hardware Path:\s*(.*\S)/ and

	$map{$hwp}{inf} .= "\n$1";

    m/^Product Id:\s*(.*\S)\s*Vendor:\s*(.*\S)/ and

	$map{$hwp}{inf} .= "\n$2\n$1";

    m/^Hardware Model:\s*(?:0x)?(\S+)/ && $pid ne "CPU" and

	$map{$hwp}{inf} .= "\n$1";

    m/^Capacity(?:\s+\((.*)\))?:\s+(?!N\/A)(\S+)/ and

	$map{$hwp}{inf} .= "\n$2 $1";

    m/^\s+Total Configured Memory\s*:\s*(\d.*)/ and

	$map{$hwp}{inf} .= "\n$1";

    m/^\s+(PA.*) CPU Module\s+(.*)/ and

	$map{$hwp}{inf} .= "\n$1\nRev $2";

    }

print STDERR "\nWIN ...";



use Tk;

use Tk::TreeGraph;

use Graph::Directed;



my $mw = MainWindow->new ();

$mw->geometry ("650x780");

my $f   = $mw->Frame ()->pack (-side => "top", -fill => "x", -expand => 0);

my $tg  = $mw->Scrolled ("TreeGraph")->pack (-expand => 1, -fill => 
"both");

my $tgf = $tg->Subwidget ("scrolled");

$tgf->configure (

    -shortcutColor => "GreenYellow",

    -shortcutStyle => "spline",

    );



my $save = sub {

    my $file = shift;

    my @a = $tg->bbox ("all");

    my ($w, $h) = map { $_ + 10 } @a[2,3];

    my @p = map { ($_ - 20)."m" } (210, 297);	# Add marges

    my $r = $w > $h ? 90 : 0;

    $tgf->postscript (

	-file       => $file,

	-pagewidth  => $r ? $p[1] : $p[0],

	-pageheight => $r ? $p[0] : $p[1],

	-width      => $w,

	-height     => $h,

	-rotate     => $r,

	);

    };

$f->Button (-text => "Exit", -command => \&exit)->pack (-side => "left");

$f->Button (-text => "Save", -command => sub {

    &$save ("appPath.ps");

    })->pack (-side => "left");

$f->Button (-text => "Print", -command => sub {

    my $file = "/tmp/u.$<";

    &$save ($file);

    system "lp", "-s", "-dxerox", "-ob", $file;

    unlink $file;

    })->pack (-side => "left");

$tg->addLabel (text => $host);

print STDERR "\nBLD ...";



my $g = Graph::Directed->new ();

foreach my $idx (0 .. $#idx) {

    my $hwp = $idx[$idx];

    if ($map{$hwp}{inf} eq "") {

	($map{$hwp}{inf} = $map{$hwp}{dsc}) =~ s/\s*\((.*)//;

	my $x = $1;

	$x =~ s/\).*// and $map{$hwp}{inf} .= "\n$x";

	}

    $map{$hwp}{inf} =~ s/^\s+//;

    $map{$hwp}{inf} =~ s/^(.{10}\S+)\s/$1\n/mg;

    printf "%3d %-16s %-25s %s\n", $idx, $hwp, $map{$hwp}{dsc},

	join "\n\t\t\t\t\t       " => split m/\n/ => $map{$hwp}{inf};

    (my $hhwp = $hwp) =~ s:[/.]?[^/.]+$::;

    $hhwp =~ tr/a-z]//d;

    while (!exists $map{$hhwp}) {

	$hhwp =~ s:[/.][^/.]+$::

	}

    my $hidx = $map{$hhwp}{idx};

    $hidx eq $idx and next;

    $g->add_edge ($hidx, $idx);

    $g->set_attribute ("weight", $hidx, $idx, 1);

    }



print STDERR "\nDIS ...";

my %seen = ();



sub disNode ($;$)

{

    my $idx = shift;

    $seen{$idx}++ and return;



    my $p  = @_ ? shift : "";



    my $hwp = $idx[$idx];

    my @inf = split m/\n/ => $map{$hwp}{inf};

    if ($p ne "") {

	$tg->addNode (nodeId => $idx, text => [ $hwp, @inf ], after => $p);

	}

    else {

	$tg->addNode (nodeId => $idx, text => [ $hwp, @inf ]);

	}

    if ($g->is_sink_vertex ($idx)) {

	# add hidden arrow down to force space

	return;

	}

    $p = 0; my %mapped = ();

    foreach my $r (

	    map  { my $s = $g->is_sink_vertex ($_) || 0;

		   $s or $p++;

		   [ $_, 1 - $s ];

		   }

	    sort { $a <=> $b }

	    grep { !$mapped{$_}++ && $_ != $idx; } # uniq

		$g->out_edges ($idx)) {

	my ($t, $top) = @$r;

	if ($seen{$t}) {

	    $tg->addShortcutInfo (from => $idx, to => $t);

	    next;

	    }

	$top and $p--;

	disNode ($t, $idx);

	}

    } # disNode



disNode (0);

foreach my $idx ($g->toposort) {

    disNode ($idx);

    }

print STDERR "\nSHW ...";

$tg->addAllShortcuts ();

my @a = $tg->bbox ("all");

$tg->configure (

    -scrollregion  => [ 0, 0, $a[2] + 50, $a[3] + 50 ],

    );

$tg->nodeBind (

    button  => "<2>",

    color   => "OrangeRed",

    command => sub {

	my %h = @_;

	print STDERR "Node $h{nodeId}\n";

	foreach my $k (sort keys %h) {

	    printf STDERR "     %-10s => %s\n", $k, $h{$k};

	    }

	printf STDERR "\t%s\n", $tg->getNodeRectangle (nodeId => $h{nodeId});

	});

$tg->update ();

print STDERR "\nDONE\n";

MainLoop;

-->8---

-- 
H.Merijn Brand
using perl5.005.03 and 5.6.0 on HP-UX 10.20, HP-UX 11.00, AIX 4.2, AIX 4.3,
  DEC OSF/1 4.0 and WinNT 4.0 SP-6a,  often with Tk800.022 and/or DBD-Unify
ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/H/HM/HMBRAND/
Member of Amsterdam Perl Mongers (http://www.amsterdam.pm.org/)


------------------------------

Date: Mon, 03 Jul 2000 12:39:33 -0400
From: Drew Simonis <care227@attglobal.net>
Subject: Re: HP-UX System architecture
Message-Id: <3960C1C5.55A1AEF8@attglobal.net>

"H. Merijn Brand" wrote:
> 
> For thos who wonder how their system is put together and what's in it, the
> following perl script might help. It's able to show the system's
> architecture in a similar way to how xstm would, but this is printable
> (change line #114) and saveable.
> 

This post is a bit long.  Maybe next time you could just post a link
to the source, eh?


------------------------------

Date: Mon, 3 Jul 00 20:56:14 +0200
From: h.m.brand@hccnet.nl (H. Merijn Brand)
Subject: RE: HP-UX System architecture
Message-Id: <8F66D4B84Merijn@192.0.1.90>

care227@attglobal.net (Drew Simonis) wrote in
<3960C1C5.55A1AEF8@attglobal.net>: 

>"H. Merijn Brand" wrote:
>> 
>> For thos who wonder how their system is put together and what's in it,
>> the following perl script might help. It's able to show the system's
>> architecture in a similar way to how xstm would, but this is printable
>> (change line #114) and saveable.
>> 
>
>This post is a bit long.  Maybe next time you could just post a link
>to the source, eh?

You may have easy access to some kind of home page. I don't :-(

I've tried to make the Subject header clear enough for people not 
having/not interested in HP-UX not wanting to see the message body.

I'll try harder next time ;-) I was thrown out 5 times. Hope to find a 
better way to get some free net space where I can post things not meant
for CPAN.

If you're a HP-UX user, I hope it helped anyway...

-- 
H.Merijn Brand
using perl5.005.03 and 5.6.0 on HP-UX 10.20, HP-UX 11.00, AIX 4.2, AIX 4.3,
  DEC OSF/1 4.0 and WinNT 4.0 SP-6a,  often with Tk800.022 and/or DBD-Unify
ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/H/HM/HMBRAND/
Member of Amsterdam Perl Mongers (http://www.amsterdam.pm.org/)


------------------------------

Date: 6 Jul 2000 10:08:01 +0100
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: HP-UX System architecture
Message-Id: <8k1i9h$t00$1@orpheus.gellyfish.com>

On Mon, 03 Jul 2000 12:39:33 -0400 Drew Simonis wrote:
> "H. Merijn Brand" wrote:
>> 
>> For thos who wonder how their system is put together and what's in it, the
>> following perl script might help. It's able to show the system's
>> architecture in a similar way to how xstm would, but this is printable
>> (change line #114) and saveable.
>> 
> 
> This post is a bit long.  Maybe next time you could just post a link
> to the source, eh?

I've seen longer ... Infact I think I've posted longer .

/J\
-- 
yapc::Europe in assocation with the Institute Of Contemporary Arts
   <http://www.yapc.org/Europe/>   <http://www.ica.org.uk>


------------------------------

Date: Thu, 06 Jul 2000 09:39:40 -0400
From: Drew Simonis <care227@attglobal.net>
Subject: Re: HP-UX System architecture
Message-Id: <39648C1C.67197DDE@attglobal.net>

Jonathan Stowe wrote:
> 
> >
> > This post is a bit long.  Maybe next time you could just post a link
> > to the source, eh?
> 
> I've seen longer ... Infact I think I've posted longer .
> 
> /J\

But does anyone actually _read_ your posts?  =)

/duck


------------------------------

Date: Thu, 06 Jul 2000 11:19:53 GMT
From: Kev Smith (sync24) <kevin.m.smith@gecapital.com>
Subject: HTML TokeParser question
Message-Id: <8k1q0k$ppe$1@nnrp1.deja.com>

All,

I hope someone out there can assist!

In a nut shell, I am working on a script to extract the 'Page titles'
and the related 'Page numbers' from an index file generated from a
intranet web server. I need the titles and page numbers from the HTML
in a perl hash array so that data can be manipulated and displayed.

For example (the HTML source):

<br><b>s2k_sys_env_everysec</b><br>
&#160;&#160;&#160;<a href="Page/7/Memory/?Refresh=30">Memory</a><br>
&#160;&#160;&#160;<a href="Page/7/Main/?Refresh=30">Main</a><br>
&#160;&#160;&#160;<a href="Page/7/IO/?Refresh=30">IO</a><br>
&#160;&#160;&#160;<a href="Page/7/CPU/?Refresh=30">CPU</a><br>

I need to = extract "s2k_sys_env_everysec" and "Page/7" into a perl
hash array (and for ALL subsequent deffering titles & page numbers)
until the end of the document.

It's worth noting that I only need to read the title (IE
s2k_sys_env_everysec) and the first <a href> following and that the
format of the HTML file is pretty much static.

Now I do not claim to be a perl expert and I am getting somewhere (very
slowly) with HTML Tokeparser.

Could one of you experts out there assist or point me in the right
direction. (or even write the code for me!!!!!!)

Thanks in advance.

Kev Smith


Sent via Deja.com http://www.deja.com/
Before you buy.


------------------------------

Date: Thu, 06 Jul 2000 12:05:51 GMT
From: bernard.el-hagin@lido-tech.net (Bernard El-Hagin)
Subject: Re: HTML TokeParser question
Message-Id: <slrn8m8t8h.iha.bernard.el-hagin@gdndev25.lido-tech>

Kev Smith (sync24) <kevin.m.smith@gecapital.com> wrote:
>All,
>
>I hope someone out there can assist!
>
>In a nut shell, I am working on a script to extract the 'Page titles'
>and the related 'Page numbers' from an index file generated from a
>intranet web server. I need the titles and page numbers from the HTML
>in a perl hash array so that data can be manipulated and displayed.
>
>For example (the HTML source):
>
><br><b>s2k_sys_env_everysec</b><br>
>&#160;&#160;&#160;<a href="Page/7/Memory/?Refresh=30">Memory</a><br>
>&#160;&#160;&#160;<a href="Page/7/Main/?Refresh=30">Main</a><br>
>&#160;&#160;&#160;<a href="Page/7/IO/?Refresh=30">IO</a><br>
>&#160;&#160;&#160;<a href="Page/7/CPU/?Refresh=30">CPU</a><br>
>
>I need to = extract "s2k_sys_env_everysec" and "Page/7" into a perl
>hash array (and for ALL subsequent deffering titles & page numbers)
>until the end of the document.

If the format of the data is always the same you could use the following
regexes:

1. For the s2k_sys_env_everysec part:

m#^<br><b>(.*)</b><br>$#;

And now $1 has the page title. If you're sure the title can't contain
the "<" character you should probably use:

m/^<br><b>([^<]*)</b><br>$/;

2. For the page number part:

m#"(Page/\d+?)/#;

$1 contains "Page/x" where x signifies any digits up until the closest "/".

Now just loop over the whole file and using those regexes define your
hash's keys and values.

>It's worth noting that I only need to read the title (IE
>s2k_sys_env_everysec) and the first <a href> following and that the
                               ^^^^^^^^^^^^^^
In that case you can do a "next" as long as you're between the first finding of
a Page/x string and the following finding of a page title.

>format of the HTML file is pretty much static.

Good. The above regexes depend on that. If you want something more
flexible you should use an HTML parser. A regex won't do.

>Now I do not claim to be a perl expert and I am getting somewhere (very
>slowly) with HTML Tokeparser.
>
>Could one of you experts out there assist or point me in the right
>direction. (or even write the code for me!!!!!!)

Oops! Take cover!


Bernard
--
perl -e '${qq=\x22=}=qq=\053=;$_="BeJUST_ANOTHERnaPERL_HACKERd\n";
${qq=\x2c=}=qq=\x72=;print split /[AC-Z_]$"/;'


------------------------------

Date: Thu, 06 Jul 2000 12:09:44 GMT
From: bernard.el-hagin@lido-tech.net (Bernard El-Hagin)
Subject: Re: HTML TokeParser question
Message-Id: <slrn8m8tfq.iha.bernard.el-hagin@gdndev25.lido-tech>

I inadvertently wrote:
>Kev Smith (sync24) <kevin.m.smith@gecapital.com> wrote:
>>All,
>>
>>I hope someone out there can assist!
>>
>>In a nut shell, I am working on a script to extract the 'Page titles'
>>and the related 'Page numbers' from an index file generated from a
>>intranet web server. I need the titles and page numbers from the HTML
>>in a perl hash array so that data can be manipulated and displayed.
>>
>>For example (the HTML source):
>>
>><br><b>s2k_sys_env_everysec</b><br>
>>&#160;&#160;&#160;<a href="Page/7/Memory/?Refresh=30">Memory</a><br>
>>&#160;&#160;&#160;<a href="Page/7/Main/?Refresh=30">Main</a><br>
>>&#160;&#160;&#160;<a href="Page/7/IO/?Refresh=30">IO</a><br>
>>&#160;&#160;&#160;<a href="Page/7/CPU/?Refresh=30">CPU</a><br>
>>
>>I need to = extract "s2k_sys_env_everysec" and "Page/7" into a perl
>>hash array (and for ALL subsequent deffering titles & page numbers)
>>until the end of the document.
>
>If the format of the data is always the same you could use the following
>regexes:
>
>1. For the s2k_sys_env_everysec part:
>
>m#^<br><b>(.*)</b><br>$#;
>
>And now $1 has the page title. If you're sure the title can't contain
>the "<" character you should probably use:
>
>m/^<br><b>([^<]*)</b><br>$/;

There's a mistake there. Either I need to escape the / in the regex
yielding:

m/^<br><b>([^<]*)<\/b><br>$/;

Or use a different match operator delimiter:

m#^<br><b>([^<]*)</b><br>$#;

Sorry.

Bernard
--
perl -e '${qq=\x22=}=qq=\053=;$_="BeJUST_ANOTHERnaPERL_HACKERd\n";
${qq=\x2c=}=qq=\x72=;print split /[AC-Z_]$"/;'


------------------------------

Date: Thu, 06 Jul 2000 14:16:47 +0200
From: =?iso-8859-1?Q?Thorbj=F8rn?= Ravn Andersen <thunderbear@bigfoot.com>
Subject: Re: HTML TokeParser question
Message-Id: <396478AF.52142F3A@bigfoot.com>

"Kev Smith (sync24)" wrote:

> In a nut shell, I am working on a script to extract the 'Page titles'
> and the related 'Page numbers' from an index file generated from a
> intranet web server. I need the titles and page numbers from the HTML
> in a perl hash array so that data can be manipulated and displayed.

You might find this much easier if you run the pages in
question through the W3C utility TIDY first.

-- 
  Thorbjørn Ravn Andersen         "...plus...Tubular Bells!"
  http://bigfoot.com/~thunderbear


------------------------------

Date: Sun, 9 Jul 2000 11:02:04 +0200
From: "nicolas" <webmaster@archiTacTic.com>
Subject: HTTP Last-Modified header not always returned
Message-Id: <scX95.15287$DL.63169@nnrp1.none.net>

Hi everyone,
I don't know if this is the right place for asking about HTTP, but here it
goes :

I made a Perl script to get HTTP headers for URLs, and I need the
Last-Modified time of the page.
While most servers return this header correctly, others don't return
that data. Why ? could it be :
-a misspelling? (i spelled it "(/^Last-Modified:\s*(.+)/i)" in the RegEx)
-wrong syntax?
-maybe some type of servers don't return this info? in this case is there a
work-around?

Anybody have an idea?

Thanks,
Nicolas








------------------------------

Date: 09 Jul 2000 10:38:44 EDT
From: abigail@delanet.com (Abigail)
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <slrn8mh4l5.tts.abigail@alexandra.delanet.com>

nicolas (webmaster@archiTacTic.com) wrote on MMDIV September MCMXCIII in
<URL:news:scX95.15287$DL.63169@nnrp1.none.net>:
"" Hi everyone,
"" I don't know if this is the right place for asking about HTTP, but here it
"" goes :

No, it isn't. What makes you think so? Would you ask about FTP in a 
FORTRAN group, about NNTP in a Java group and about SMTP in an Ada
group?

"" I made a Perl script to get HTTP headers for URLs, and I need the
"" Last-Modified time of the page.
"" While most servers return this header correctly, others don't return
"" that data. Why ? could it be :
"" -a misspelling? (i spelled it "(/^Last-Modified:\s*(.+)/i)" in the RegEx)
"" -wrong syntax?
"" -maybe some type of servers don't return this info? in this case is there a
"" work-around?


The RFC will have an answer.


Your question is not a Perl question. Please ask in a more appropriate group.



Abigail
-- 
sub _'_{$_'_=~s/$a/$_/}map{$$_=$Z++}Y,a..z,A..X;*{($_::_=sprintf+q=%X==>"$A$Y".
"$b$r$T$u")=~s~0~O~g;map+_::_,U=>T=>L=>$Z;$_::_}=*_;sub _{print+/.*::(.*)/s};;;
*_'_=*{chr($b*$e)};*__=*{chr(1<<$e)};                # Perl 5.6.0 broke this...
_::_(r(e(k(c(a(H(__(l(r(e(P(__(r(e(h(t(o(n(a(__(t(us(J())))))))))))))))))))))))


------------------------------

Date: Sun, 9 Jul 2000 17:20:46 +0200
From: "nicolas" <webmaster@archiTacTic.com>
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <rL0a5.15324$DL.63646@nnrp1.none.net>

Abigail <abigail@delanet.com> a écrit dans le message :
slrn8mh4l5.tts.abigail@alexandra.delanet.com...
> nicolas (webmaster@archiTacTic.com) wrote on MMDIV September MCMXCIII in
> <URL:news:scX95.15287$DL.63169@nnrp1.none.net>:

> No, it isn't. What makes you think so? Would you ask about FTP in a
> FORTRAN group, about NNTP in a Java group and about SMTP in an Ada
> group?
>
> Your question is not a Perl question. Please ask in a more appropriate
group.
>
Why are you so angry?
Where are there HTTP groups?
Isn't Perl meant to deal with HTTP?





------------------------------

Date: 9 Jul 2000 15:32:06 GMT
From: fosterd@hartwick.edu (Decklin Foster)
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <8ka5tl$1sq7o$1@ID-10059.news.cis.dfn.de>

nicolas <webmaster@archiTacTic.com> writes:

> Why are you so angry?

If you have to ask, you'll never know.

> Where are there HTTP groups?

Mainly in the comp.infosystems.www.* hierarchy. Particularly
ciw.servers.*. It would have been a good idea to look that up first.

> Isn't Perl meant to deal with HTTP?

No.

-- 
There is no TRUTH. There is no REALITY. There is no CONSISTENCY. There
are no ABSOLUTE STATEMENTS. I'm very probably wrong. -- BSD fortune(6)


------------------------------

Date: Sun, 9 Jul 2000 18:56:12 +0200
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <Pine.GHP.4.21.0007091851360.13220-100000@hpplus03.cern.ch>

On Sun, 9 Jul 2000, nicolas wrote:

> Isn't Perl meant to deal with HTTP?

Perl can "deal with" pretty-much anything that can be programmed.
Programming languages are like that.

Perl can be used to compile a bus timetable, but that doesn't make bus
operations on-topic for the Perl group.  Get a grasp.

Still, you evidently are more interested in asserting your right to
post on off-topic groups than in getting any useful help.  So be it.
Bye.




------------------------------

Date: Sun, 09 Jul 2000 14:04:48 -0400
From: brian@smithrenaud.com (brian d foy)
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <brian-ya02408000R0907001404480001@news.panix.com>

In article <8ka5tl$1sq7o$1@ID-10059.news.cis.dfn.de>, fosterd@hartwick.edu (Decklin Foster) posted:

> nicolas <webmaster@archiTacTic.com> writes:

> > Isn't Perl meant to deal with HTTP?
> 
> No.

sure it is.  why not?

-- 
brian d foy                    
CGI Meta FAQ <URL:http://www.smithrenaud.com/public/CGI_MetaFAQ.html>
Perl Mongers <URL:http://www.perl.org/>


------------------------------

Date: 9 Jul 2000 18:42:21 GMT
From: fosterd@hartwick.edu (Decklin Foster)
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <8kah2d$22p1g$1@ID-10059.news.cis.dfn.de>

brian d foy <brian@smithrenaud.com> writes:

> sure it is.  why not?

Perl was `meant' to do a great number of things, which are not limited
to HTTP. It's a general-purpose language.

Of course, I'm sure you knew what the OP was thinking.

-- 
There is no TRUTH. There is no REALITY. There is no CONSISTENCY. There
are no ABSOLUTE STATEMENTS. I'm very probably wrong. -- BSD fortune(6)


------------------------------

Date: Sun, 09 Jul 2000 22:52:49 GMT
From: marcel@codewerk.com (Marcel Grunauer)
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <slrn8mi0sr.51o.marcel@gandalf.local>

On Sun, 09 Jul 2000 14:04:48 -0400, brian d foy <brian@smithrenaud.com> wrote:

>In article <8ka5tl$1sq7o$1@ID-10059.news.cis.dfn.de>,
fosterd@hartwick.edu (Decklin Foster) posted:

>> nicolas <webmaster@archiTacTic.com> writes:
>
>> > Isn't Perl meant to deal with HTTP?
>> 
>> No.
>
>sure it is.  why not?


The "is meant to deal with" relation is not reflexive. The OP's question
would only be appropriate in this group if Perl is the only language
that can deal with HTTP.

alt.fan.ethernet might be as appropriate as this group.


-- 
Marcel
sub AUTOLOAD{($_=$AUTOLOAD)=~s;^.*::;;;y;_; ;;print} Just_Another_Perl_Hacker();


------------------------------

Date: Mon, 10 Jul 2000 19:37:44 +0200
From: "nicolas" <webmaster@archiTacTic.com>
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <PRna5.15607$DL.65487@nnrp1.none.net>

I don't know if anybody noticed but in this group you get more reactions
when asking inapproprate questions than appropriate ones...




------------------------------

Date: Mon, 10 Jul 2000 17:47:52 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: HTTP Last-Modified header not always returned
Message-Id: <x7wvitnagq.fsf@home.sysarch.com>

>>>>> "n" == nicolas  <webmaster@archiTacTic.com> writes:

  n> I don't know if anybody noticed but in this group you get more
  n> reactions when asking inapproprate questions than appropriate
  n> ones...

i don't know if you noticed, that there are 3 groups listed in the
headers so 'this group' is meaningless.

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page  -----------  http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net  ----------  http://www.northernlight.com


------------------------------

Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V9 Issue 3572
**************************************


home help back first fref pref prev next nref lref last post