[24249] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 6440 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Apr 21 09:10:40 2004

Date: Wed, 21 Apr 2004 06:10:13 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 21 Apr 2004     Volume: 10 Number: 6440

Today's topics:
    Re: slurp not working? ideas please! <geoffacox@dontspamblueyonder.co.uk>
    Re: slurp not working? ideas please! <tassilo.parseval@rwth-aachen.de>
    Re: slurp not working? ideas please! <geoffacox@dontspamblueyonder.co.uk>
    Re: slurp not working? ideas please! <geoffacox@dontspamblueyonder.co.uk>
    Re: sort numeric lists (OT comment) <1usa@llenroc.ude>
    Re: sort numeric lists <Joe.Smith@inwap.com>
    Re: sort numeric lists <rene.larsen@spamfilter.dk>
    Re: sort numeric lists <uri@stemsystems.com>
        Sorting on multiple fields <nospam@bigpond.com>
    Re: Thanks: Re: need Regular Expression to remove all n <Joe.Smith@inwap.com>
    Re: What about this script?  HTML::Parser <Joe.Smith@inwap.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 21 Apr 2004 08:40:11 GMT
From: Geoff Cox <geoffacox@dontspamblueyonder.co.uk>
Subject: Re: slurp not working? ideas please!
Message-Id: <j6cc801da86cich9376mn0ebjhab1pen4c@4ax.com>

On 21 Apr 2004 06:11:42 GMT, "Tassilo v. Parseval"
<tassilo.parseval@rwth-aachen.de> wrote:

Tassilo,

I am working through all your latest comments but have the error
message "not a GLOB reference at line 53", ie on following line,

   print $fh "<h2>$origtext</h2> \n" if $in_heading;

in sub text.

Also, the first attempt to open $name is failing,

find sub {
    my $name = $_;
    open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" )
     || die "can't open d:/a-keep9/short-nondb/short/members2/$name:
$!";

so in order to move on I have to either # out the die line or use

unless ($name eq ".") {open(OUT etc ...

Just a bit of lateral thinking when I found that print $name gave ".".
I know this is ugly and ignores the cause .... which I am unable to
see at the moment..

Geoff


package MyParser;
use base qw(HTML::Parser);
use strict;
use diagnostics;

my ($in_heading,$in_p, $fh);

sub register_fh {

$fh = shift;

}

sub reset { ($in_heading,$in_p)=(0,0)}

sub start {

    my ( $self, $tagname, $attr, undef, $origtext ) = @_;

    if ( $tagname eq 'h2' ) {
        $in_heading = 1;
        return;
    }

    if ( $tagname eq 'p' ) {
        $in_p = 1;
        return;
    }

    if ( $tagname eq 'option' ) {

        main::choice( $attr->{ value } );

    }

}

sub end {
    my ( $self, $tagname, $origtext ) = @_;
    if ( $tagname eq 'h2' ) {
        $in_heading = 0;
        return;
    }

    if ( $tagname eq 'p' ) {
        $in_p = 0;
        return;
    }
}

sub text {
    my ( $self, $origtext ) = @_;
    print $fh "<h2>$origtext</h2> \n" if $in_heading;
    print $fh "<p>$origtext</p> \n" if $in_p;

}

package main;

use File::Find;

my $dir = "d:/a-keep9/short-nondb/oldshort2";
my $parser = MyParser->new;

find sub {

    my $name = $_;
    open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" )
     || die "can't open d:/a-keep9/short-nondb/short/members2/$name:
$!";

    print OUT ("<html><head><title>test</title></head><body> \n");
    print OUT ("<table width='100%' border='1'> \n");
 }
    


    
#    undef $/;

    $parser->register_fh(\*OUT);
    $parser->parse_file($_);
    $parser->reset;


}, $dir;

etc etc


------------------------------

Date: 21 Apr 2004 09:26:17 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: slurp not working? ideas please!
Message-Id: <c65enp$8cdgo$1@ID-231055.news.uni-berlin.de>

Also sprach Geoff Cox:

> On 21 Apr 2004 06:11:42 GMT, "Tassilo v. Parseval"
><tassilo.parseval@rwth-aachen.de> wrote:
> 
> Tassilo,
> 
> I am working through all your latest comments but have the error
> message "not a GLOB reference at line 53", ie on following line,
> 
>    print $fh "<h2>$origtext</h2> \n" if $in_heading;
> 
> in sub text.

Sorry, my mistake. register_fh() must read:

    sub register_fh {
	# $_[0] contains the parser object
        $fh = $_[1];
    }

> Also, the first attempt to open $name is failing,
> 
> find sub {
>     my $name = $_;
>     open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" )
>      || die "can't open d:/a-keep9/short-nondb/short/members2/$name:
> $!";
> 
> so in order to move on I have to either # out the die line or use
> 
> unless ($name eq ".") {open(OUT etc ...
> 
> Just a bit of lateral thinking when I found that print $name gave ".".
> I know this is ugly and ignores the cause .... which I am unable to
> see at the moment..

The reason is that File::File::find() also returns directory names, most
notably "." and ".." as well. Have it return early when the current file
is not one that you want to process:
    
    find sub {
	return if /^\.\.?/; # catches "." and ".."
	my $name = $_;
	...
    }, $dir;

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


------------------------------

Date: Wed, 21 Apr 2004 09:47:20 GMT
From: Geoff Cox <geoffacox@dontspamblueyonder.co.uk>
Subject: Re: slurp not working? ideas please!
Message-Id: <ejgc80dvme3aijlmo7ius6g4k8rbogmcml@4ax.com>

On 21 Apr 2004 09:26:17 GMT, "Tassilo v. Parseval"
<tassilo.parseval@rwth-aachen.de> wrote:

>The reason is that File::File::find() also returns directory names, most

Tassilo,

Ah! I should have thought of that ...

Anyway - one BIG HURRAY !!!! The code now runs with -w and not a
single error or warning! Many thanks for your help - would not have
got so far without it.

Will now sort out the choice sub etc.

Have a good day!

Cheers

Geoff




>notably "." and ".." as well. Have it return early when the current file
>is not one that you want to process:
>    
>    find sub {
>	return if /^\.\.?/; # catches "." and ".."
>	my $name = $_;
>	...
>    }, $dir;
>
>Tassilo



------------------------------

Date: Wed, 21 Apr 2004 11:09:52 GMT
From: Geoff Cox <geoffacox@dontspamblueyonder.co.uk>
Subject: Re: slurp not working? ideas please!
Message-Id: <15lc801vakirmjh2csbrl81etsheto4h6d@4ax.com>

On 21 Apr 2004 09:26:17 GMT, "Tassilo v. Parseval"
<tassilo.parseval@rwth-aachen.de> wrote:

Tassilo

just one more question!?

In one of the html files I have for example

<h2> jkaskdjkla  </h2>

<option values etc

<p> hsajdj ka </p>

The code is giving the data in a different order, ie

<h2> jkaskdjkla  </h2>

<p> hsajdj ka </p>

<option values etc

Am I able to change the order?   Cheers Geoff

----------------------------------------

sub start {

    my ( $self, $tagname, $attr, undef, $origtext ) = @_;

    if ( $tagname eq 'h2' ) {
        $in_heading = 1;
        return;
    }

    if ( $tagname eq 'p' ) {
        $in_p = 1;
        return;
    }

    if ( $tagname eq 'option' ) {

        main::choice( $attr->{ value } );

    }

}

sub end {
    my ( $self, $tagname, $origtext ) = @_;
    if ( $tagname eq 'h2' ) {
        $in_heading = 0;
        return;
    }

    if ( $tagname eq 'p' ) {
        $in_p = 0;
        return;
    }
}

sub text {
    my ( $self, $origtext ) = @_;
    print $fh "<h2>$origtext</h2> \n" if $in_heading;
    print $fh "<p>$origtext</p> \n" if $in_p;

}






------------------------------

Date: 21 Apr 2004 13:03:43 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude>
Subject: Re: sort numeric lists (OT comment)
Message-Id: <Xns94D25C303F7BFasu1cornelledu@132.236.56.8>

Uri Guttman <uri@stemsystems.com> wrote in
news:x78ygpo5si.fsf@mail.sysarch.com: 

>>>>>> "RL" == René Larsen <rene.larsen@spamfilter.dk> writes:

>   RL> But, IMHO it would be better to sort it only once:
> 
>   RL> @sorted = sort { 
>   RL>   $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] || $a->[2] <=>
>   RL>     $b->[2] 
>   RL>   } @rows;
> 
> this isn't better, it is correct. see my other post where i divined
> his problem without even seeing any code. :)

Which actually amazed me ... Your ESP is better than mine :)

I had not realized people actually thought sorting the OP's way was a 
reasonable thing to do until I saw my stats students this semester sorting 
each column in a data set independently.

-- 
A. Sinan Unur
1usa@llenroc.ude (reverse each component for email address)


------------------------------

Date: Wed, 21 Apr 2004 07:16:55 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: sort numeric lists
Message-Id: <Hdphc.9223$GR.1173642@attbi_s01>

The King of Pots and Pans wrote:

> Okay some of you asked for the code. Here it is. As you can tell I am
> new to perl and am trying my best. I appreciate your kind help.
>
> @sorted = sort {$a->[1] <=> $b->[1]} @rows;
> @sorted = sort {$a->[2] <=> $b->[2]} @rows;

There's your problem.
You need to make a single two-level sort, not two separate sorts.

   @sorted = sort {$a->[1] <=> $b->[1] or $a->[2] <=> $b->[2]} @rows;

	-Joe


------------------------------

Date: Wed, 21 Apr 2004 10:31:53 +0200
From: René Larsen <rene.larsen@spamfilter.dk>
Subject: Re: sort numeric lists
Message-Id: <VA.0000001c.00828ded@spamfilter.dk>

In article <_%ohc.65863$U83.21047@fed1read03>, The King of Pots and Pans 
wrote:
> 
> Okay some of you asked for the code. Here it is. As you can tell I am
> new to perl and am trying my best. I appreciate your kind help.
> 
> I cannot use any modules that does not come with the standard perl 5.6
> install, as this program will be used on a number of different
> machines with just the standard install.
> 
> --- code ---
> #!/usr/bin/perl -w
> 
> use strict;
> 
> my @rows = ();
> my @sorted = ();
> 
> open(DATA,"sort.txt") or die "cannot open file\n";
> 
> while(<DATA>)
> {
>     my @fields = split /\s+/;
>     push @rows, \@fields;  # list of refs to lists
> }
> 
> @sorted = sort {$a->[1] <=> $b->[1]} @rows;
> @sorted = sort {$a->[2] <=> $b->[2]} @rows;

This is where you are making mistakes. You sort the data twice *but* you 
only keep the last. You probably meant:

@sorted = sort {$a->[1] <=> $b->[1]} @rows;
@sorted = sort {$a->[2] <=> $b->[2]} @sorted;

But normally when sorting by more than one key, one sorts the *last* key 
first:

@sorted = sort {$a->[2] <=> $b->[2]} @rows;
@sorted = sort {$a->[1] <=> $b->[1]} @sorted;

Try sorting your data on paper, and see what happens if you sort it from 
left to right or right to left.

But, IMHO it would be better to sort it only once:

@sorted = sort { 
  $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] 
  } @rows;

This cuts down on the comparisons, as you only test keys 2 if keys 1 are 
equal, and so on.

Anorther thing: Your test data is not good. You should at least use 
something like:

6 4 9
5 7 9

Regards, René



------------------------------

Date: Wed, 21 Apr 2004 12:48:45 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: sort numeric lists
Message-Id: <x78ygpo5si.fsf@mail.sysarch.com>

>>>>> "RL" == René Larsen <rene.larsen@spamfilter.dk> writes:

  >> @sorted = sort {$a->[1] <=> $b->[1]} @rows;
  >> @sorted = sort {$a->[2] <=> $b->[2]} @rows;

  RL> This is where you are making mistakes. You sort the data twice *but* you 
  RL> only keep the last. You probably meant:

  RL> @sorted = sort {$a->[1] <=> $b->[1]} @rows;
  RL> @sorted = sort {$a->[2] <=> $b->[2]} @sorted;

  RL> But normally when sorting by more than one key, one sorts the *last* key 
  RL> first:

  RL> @sorted = sort {$a->[2] <=> $b->[2]} @rows;
  RL> @sorted = sort {$a->[1] <=> $b->[1]} @sorted;

  RL> Try sorting your data on paper, and see what happens if you sort it from 
  RL> left to right or right to left.

that will not be any better than his code. 2 separate sorts will not do it.

  RL> But, IMHO it would be better to sort it only once:

  RL> @sorted = sort { 
  RL>   $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] 
  RL>   } @rows;

this isn't better, it is correct. see my other post where i divined his
problem without even seeing any code. :)

  RL> This cuts down on the comparisons, as you only test keys 2 if keys 1 are 
  RL> equal, and so on.

that has NOTHING to do with it. multikey sorts are not a speed up but a
CORRECY way to do sorting and subsorting. your first code and the OP's
code are both wrong, not slow.

  RL> Anorther thing: Your test data is not good. You should at least use 
  RL> something like:

  RL> 6 4 9
  RL> 5 7 9

huh? his data was fine as it showed the problem. and your data isn't
good either as with three keys to sort on you should have even more
rows for testing.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


------------------------------

Date: Wed, 21 Apr 2004 20:25:19 +1000
From: Gregory Toomey <nospam@bigpond.com>
Subject: Sorting on multiple fields
Message-Id: <2763802.dS1Cy4TfDT@GMT-hosting-and-pickle-farming>

The King of Pots and Pans wrote:

> Okay some of you asked for the code. Here it is. As you can tell I am
> new to perl and am trying my best. I appreciate your kind help.
> 
> I cannot use any modules that does not come with the standard perl 5.6
> install, as this program will be used on a number of different
> machines with just the standard install.

What you should have asked is how to sort on two fields.

The basic algorithm is to compare the two fields, and if they are equal,
compare the second fields. The gives a single boolean result.

You original post looked like weird matrix algebra & confused people.

gtoomey


------------------------------

Date: Wed, 21 Apr 2004 08:19:28 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: Thanks: Re: need Regular Expression to remove all non-numerical 
Message-Id: <k8qhc.179581$K91.447891@attbi_s02>

Robert wrote:

> Thanks folks,
> I was completly off the track - I was doing a search for the specific characters,
> $word = "+1-767-123456";
> $word =~ s/\+- //g;
> print "Word: $word\n";

Go back to the docs (perldoc perlretut), looking for "[" and "]".
   s/-\+ //g;
is not the same as
   s/[-+ ]//g;

There's a big difference between a string of characters in order
and a character class (where order is commonly not relevent).
	-Joe


------------------------------

Date: Wed, 21 Apr 2004 07:43:43 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: What about this script?  HTML::Parser
Message-Id: <PCphc.38418$ru4.39174@attbi_s52>

Danny wrote:

> "Tad McClellan" <tadmc@augustmail.com> wrote in message
> news:slrnc7kded.3l0.tadmc@magna.augustmail.com...
>>Show us what you tried and we will help you fix it.
> 
> My previous post has the script to try out.  I ran this and it works fine on
> any html but I would like to extract out the tags and such. What do you
> think?

Your previous example has a handler for 'text' and prints {TEXT}.
There is no handler for 'input' and does not print {INPUT}.

Show us where an example where you actually try to process <INPUT ...>.
	-Joe


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6440
***************************************


home help back first fref pref prev next nref lref last post