[23374] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 5593 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Sep 30 18:05:45 2003

Date: Tue, 30 Sep 2003 15:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 30 Sep 2003     Volume: 10 Number: 5593

Today's topics:
        code not working..ideas please? <geoff.cox@blueyonder.co.uk>
    Re: code not working..ideas please? <shondell@cis.ohio-state.edu>
    Re: code not working..ideas please? <xx087@freenet.carleton.ca>
    Re: code not working..ideas please? <geoff.cox@blueyonder.co.uk>
    Re: code not working..ideas please? <geoff.cox@blueyonder.co.uk>
    Re: code not working..ideas please? <geoff.cox@blueyonder.co.uk>
    Re: code not working..ideas please? <shondell@cis.ohio-state.edu>
    Re: code not working..ideas please? (Tad McClellan)
    Re: code not working..ideas please? <xx087@freenet.carleton.ca>
    Re: Endless Loop (trying for...) <krahnj@acm.org>
        GD::Graph: "mixed" graph doesn't recognize "area" graph (Emilio Mayorga)
    Re: Help with RegEx <postmaster@castleamber.com>
    Re: Help with RegEx (Tad McClellan)
    Re: newbie regxp question <krahnj@acm.org>
    Re: Obtain Connection Duration <mikeflan@earthlink.net>
        Order of attachments in Multipart POST <dwake.no.spam@alumni.stanford.org>
    Re: Perl command to copy one file into another file? <emschwar@pobox.com>
    Re: Regexp - optimisation  <abigail@abigail.nl>
    Re: Regexp - optimisation  <stephen.adam@ntlworld.com>
    Re: use strict with an undeclared variable <emschwar@pobox.com>
    Re: use strict with an undeclared variable (ko)
    Re:  <bwalton@rochester.rr.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 30 Sep 2003 18:44:05 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: code not working..ideas please?
Message-Id: <jajjnvkc8qeq0qeu7lo5mg166dfrus9iv1@4ax.com>

Hello,

In the following code minorlist contains a list of names of schools.
The major list conatins these names as part of school email addresses
with other school email addresses.

I am trying to take the first word on each line of the minorlist, find
the associated email address in the majorlist and print the email
address out to the emails file.

For some reason the emails list contains all the emails in the
majorlist and I cannot see why?!

Cheers

Geoff

open (IN, "d:/minorlist");
open (INN, "d:/majorlist");
open (OUT, ">>d:/emails");

while (defined (my $line =<IN>)) {
                          $line =~ /^(.*?)\s/i;
                          &email($1);
                                                 }

sub email {

while (defined (my $line2 = <INN>)) {
                                  if ($line2 =~ /$1/i) {
                                  print OUT ($line2);
                                                              }
                                                       }
}



------------------------------

Date: 30 Sep 2003 15:41:32 -0400
From: Ryan Shondell <shondell@cis.ohio-state.edu>
Subject: Re: code not working..ideas please?
Message-Id: <xcwy8w612tv.fsf@psi.cis.ohio-state.edu>

Geoff Cox <geoff.cox@blueyonder.co.uk> writes:

> Hello,
> 
> In the following code minorlist contains a list of names of schools.
> The major list conatins these names as part of school email addresses
> with other school email addresses.
> 
> I am trying to take the first word on each line of the minorlist, find
> the associated email address in the majorlist and print the email
> address out to the emails file.
> 
> For some reason the emails list contains all the emails in the
> majorlist and I cannot see why?!
> 
> Cheers
> 
> Geoff

You forgot...

use warnings;
use strict;

Never leave home without them. :-)

> open (IN, "d:/minorlist");
> open (INN, "d:/majorlist");
> open (OUT, ">>d:/emails");

You should _always_ check the return values of your open statements.

open (IN, 'foo') or die "Can't open foo: $!";

> while (defined (my $line =<IN>)) {
>                           $line =~ /^(.*?)\s/i;

So you want to match 0 or more of any character, non-greedily,
followed by whitespace? You realize that even just a newline, or a
space will match that test? And there doesn't appear to be a benefit
to using /i for case-insensitive matching, since you're not using any
letters in your regex. If you want the first word, do

        $line =~ /(\w+)/;

>                           &email($1);

Uh oh. What happens if your regex doesn't match, and put a new value
into $1? Answer...you have a stale value in $1 (whatever you matched
previously). Probably not what you wanted. You could throw an 'if' in
there...

        if ($line =~ /(\w+)/) {
          email($1);
        }

Also, don't use the & version of calling subs, unless you specifically
need the properties you get by doing so. Do you? Do you need to
circumvent prototypes?

Use the syntax, email($1);

> sub email {
> 
> while (defined (my $line2 = <INN>)) {
>                                   if ($line2 =~ /$1/i) {

Eww, I can see all sorts of problems happening here. You're passing a
value to this sub, but not using it (well, kinda sorta you are
sometimes...in a round-about-ish way). Use the value you just passed
in to the sub.

        my $word = shift;
        if ($line2 =~ /$word/i)

>                                   print OUT ($line2);
>                                                               }
>                                                        }
> }
> 

Hopefully this gets you started in the right direction. I suggest
perusing some of the perldocs, specifically perlsub, perlop, and
perlre for information on subs, the m// operator and its options, and
regular expressions, respectively.

Ryan
-- 
perl -e '$;=q,BllpZllla_nNanfc]^h_rpF,;@;=split//,
$;;$^R.=--$=*ord for split//,$~;sub _{for(1..4){$=
=shift;$=--if$=!=4;while($=){print chr(ord($;[$%])
+shift);$%++;$=--;}print " ";}}_(split//,$^R);q;;'


------------------------------

Date: 30 Sep 2003 19:55:07 GMT
From: Glenn Jackman <xx087@freenet.carleton.ca>
Subject: Re: code not working..ideas please?
Message-Id: <slrnbnjntp.7er.xx087@smeagol.ncf.ca>

Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:
>  open (IN, "d:/minorlist");
>  open (INN, "d:/majorlist");
>  open (OUT, ">>d:/emails");
>  
>  while (defined (my $line =<IN>)) {
>      $line =~ /^(.*?)\s/i;
>      &email($1);
>  }
>  
>  sub email {
>      while (defined (my $line2 = <INN>)) {
>          if ($line2 =~ /$1/i) {
>              print OUT ($line2);
>          }
>      }
>  }


Don't read the majorlist file for each line of minorlist.

    open MINOR, 'd:/minorlist' or die "can't open minorlist: $!\n";
    while (my $line = <MINOR>) {
        push @schools, $1 if $line =~ /^(\S+)/;
        # note the regex must match at least one character.
        # you were probably passing an empty string to your subroutine,
        # which of course matches any string.
    }
    close MINOR;
    # depending on the format of your files, this may suffice in place
    # of the while loop:
    #    chomp( my @schools = <MINOR> );

    open MAJOR, 'd:/majorlist' or die "can't open majorlist: $!\n";
    open OUT, '>>d:/emails' or die "can't open emails for writing: $!\n";
    while (my $line = <MAJOR>) {
        print OUT $line if grep { $line =~ /\Q$_/i } @schools;
    }
    close MAJOR;
    close OUT;

-- 
Glenn Jackman
NCF Sysadmin
glennj@ncf.ca


------------------------------

Date: Tue, 30 Sep 2003 20:40:21 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: code not working..ideas please?
Message-Id: <gcqjnv4c5sb10pvbadtg32orotvkdncic4@4ax.com>

Ryan

Thanks for your comments...I have changed to code below but it only
gives me one email address in the emails file ! I must be missing
something obvious?

Could you explain the

$word = shift line of yours?

Cheers

Geoff

use warnings;
use strict;

open (IN, "secondary") or die "Can't open secondary: $!";
open (INN, "cornwall") or die "Can't open cornwall: $!";
open (OUT, ">>emails") or die "Can't open emails: $!";

while (defined (my $line =<IN>)) {
                          if ($line =~ /(\w+)/) {
                                       email($1);
                                                }
                                 }

sub email {

my $word = $1;
$word =~ tr/A-Z/a-z/;
  
while (defined (my $line2 = <INN>)) {
                          if ($line2 =~ /$word/) {
                          print OUT ($line2);
                                               }
                                    }
}
:




------------------------------

Date: Tue, 30 Sep 2003 20:50:28 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: code not working..ideas please?
Message-Id: <8tqjnvs988j41eikkfrubv22chic590t41@4ax.com>

On 30 Sep 2003 19:55:07 GMT, Glenn Jackman <xx087@freenet.carleton.ca>
wrote:

Glen,

Thanks for the code. It certainly works - only problem now is
understanding how?!

>Don't read the majorlist file for each line of minorlist.
>
>    open MINOR, 'd:/minorlist' or die "can't open minorlist: $!\n";
>    while (my $line = <MINOR>) {
>        push @schools, $1 if $line =~ /^(\S+)/;

how does line above work?

>        # note the regex must match at least one character.
>        # you were probably passing an empty string to your subroutine,
>        # which of course matches any string.
>    }
>    close MINOR;
>    # depending on the format of your files, this may suffice in place
>    # of the while loop:
>    #    chomp( my @schools = <MINOR> );
>
>    open MAJOR, 'd:/majorlist' or die "can't open majorlist: $!\n";
>    open OUT, '>>d:/emails' or die "can't open emails for writing: $!\n";
>    while (my $line = <MAJOR>) {
>        print OUT $line if grep { $line =~ /\Q$_/i } @schools;

could you say what the line above is doing?

Geoff


>    }
>    close MAJOR;
>    close OUT;



------------------------------

Date: Tue, 30 Sep 2003 20:54:56 GMT
From: Geoff Cox <geoff.cox@blueyonder.co.uk>
Subject: Re: code not working..ideas please?
Message-Id: <o8rjnvg7mm2jlit2do3fhl66tteo9u12nq@4ax.com>

On 30 Sep 2003 19:55:07 GMT, Glenn Jackman <xx087@freenet.carleton.ca>
wrote:

Glenn,

>    open MINOR, 'd:/minorlist' or die "can't open minorlist: $!\n";
>    while (my $line = <MINOR>) {
>        push @schools, $1 if $line =~ /^(\S+)/;

I assume this matches the first group of non white space characters?

>        # note the regex must match at least one character.
>        # you were probably passing an empty string to your subroutine,
>        # which of course matches any string.
>    }
>    close MINOR;
>    # depending on the format of your files, this may suffice in place
>    # of the while loop:
>    #    chomp( my @schools = <MINOR> );
>
>    open MAJOR, 'd:/majorlist' or die "can't open majorlist: $!\n";
>    open OUT, '>>d:/emails' or die "can't open emails for writing: $!\n";
>    while (my $line = <MAJOR>) {
>        print OUT $line if grep { $line =~ /\Q$_/i } @schools;

not clear on above line - how does grep work? 

Geoff


>    }
>    close MAJOR;
>    close OUT;



------------------------------

Date: 30 Sep 2003 17:07:58 -0400
From: Ryan Shondell <shondell@cis.ohio-state.edu>
Subject: Re: code not working..ideas please?
Message-Id: <xcwpthi0ytt.fsf@psi.cis.ohio-state.edu>

Geoff Cox <geoff.cox@blueyonder.co.uk> writes:

> Ryan
> 
> Thanks for your comments...I have changed to code below but it only
> gives me one email address in the emails file ! I must be missing
> something obvious?
> 
> Could you explain the
> 
> $word = shift line of yours?

When you pass arguments to a sub, they are passed in the @_ array. The
shift function removes the first value from an array and returns
it. If you don't specify the name of the array, it uses @_.

So by doing

        my $word = shift;

I'm setting $word to the first value in @_. In other words, the first
arg passed into the sub.


> Cheers
> 
> Geoff
> 
> use warnings;
> use strict;
> 
> open (IN, "secondary") or die "Can't open secondary: $!";
> open (INN, "cornwall") or die "Can't open cornwall: $!";
> open (OUT, ">>emails") or die "Can't open emails: $!";
> 
> while (defined (my $line =<IN>)) {
>                           if ($line =~ /(\w+)/) {
>                                        email($1);
>                                                 }
>                                  }
> 
> sub email {
> 
> my $word = $1;
> $word =~ tr/A-Z/a-z/;

Instead of this, you should probably use the /i option to m// below.

> while (defined (my $line2 = <INN>)) {
>                           if ($line2 =~ /$word/) {

        if ($line =~ /$word/i) {

>                           print OUT ($line2);
>                                                }
>                                     }
> }

You should definitely follow the example posted elsewhere in this
thread on how to solve your problem. One of the problems you have of
doing it this way is that if you have already pulled the lines from
INN that contain what you want to match, you won't match.

Okay, that came out a bit confusing... :-)

For example, say that your files look like this...

__IN__
some
things
here
__IN__

__INN__
things
some
foo
__INN__

Your program will read in the word "some", and then start searching
through INN until it finds a match. So it will grab and discard the
word "things" from INN.

When your program processes the word "things" from your first file, it
will then start looking through the second file _at the place where
you stopped_ the first time. So it will have missed a word it should
have gotten.

In trying to solve some of your smaller problems, I completely missed
the bigger one; the implementation. Sorry bout that... :-(

Ryan
-- 
perl -e '$;=q,BllpZllla_nNanfc]^h_rpF,;@;=split//,
$;;$^R.=--$=*ord for split//,$~;sub _{for(1..4){$=
=shift;$=--if$=!=4;while($=){print chr(ord($;[$%])
+shift);$%++;$=--;}print " ";}}_(split//,$^R);q;;'


------------------------------

Date: Tue, 30 Sep 2003 16:04:07 -0500
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: code not working..ideas please?
Message-Id: <slrnbnjru7.o47.tadmc@magna.augustmail.com>

Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:

> $word =~ tr/A-Z/a-z/;


You should use the function for lowercasing when you want to
lowercase. Your method does not respect locales:

   $word = lc($word);


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 30 Sep 2003 21:14:57 GMT
From: Glenn Jackman <xx087@freenet.carleton.ca>
Subject: Re: code not working..ideas please?
Message-Id: <slrnbnjsjf.7er.xx087@smeagol.ncf.ca>

Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:
>  On 30 Sep 2003 19:55:07 GMT, Glenn Jackman <xx087@freenet.carleton.ca>
>  wrote:
> >        push @schools, $1 if $line =~ /^(\S+)/;
>  
>  I assume this matches the first group of non white space characters?

It matches the first group on non white space characters, only if they
are at the beginning of the line (note the ^ anchor)

> >        print OUT $line if grep { $line =~ /\Q$_/i } @schools;
>  
>  not clear on above line - how does grep work? 

http://www.perldoc.com/perl5.6/pod/func/grep.html

@schools is an array of patterns.  I want to check each of these
patterns against the current line.  grep is a concise function to
iterate over an array and perform some action.  In this case, the return
value of grep is the number of elements in @schools that "match" $line.

The regex /\Q$_/i contains this magic:
    $_  (the current placeholder to an element in the @schools array), 
    \Q  which means any regex metacharacters in $_ (such as * or +)
        should be escaped of their special meaning, and 
    /i  to ignore case.

Another way to code that line could be:
    foreach my $element (@schools) {
        if ($line =~ /\Q$element/i) {
            print OUT $line;
        }
    }

I highly recommend these O'Reilly books:  "Learning Perl", "Programming
Perl" and "Mastering Regular Expressions".

-- 
Glenn Jackman
NCF Sysadmin
glennj@ncf.ca


------------------------------

Date: Tue, 30 Sep 2003 19:11:54 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: Endless Loop (trying for...)
Message-Id: <3F79D54A.6D2881A7@acm.org>

Brian wrote:
> 
> Shouldn't -
> 
> my @a;
> my $y = 1;
> push (@a, $y);
> foreach my $x (0..$#a) {
>         print "array = @a\n";
>         $y++;
>         push (@a, $y);
>         }
> 
> result in an endless loop?  And if not (I can't get it to) how do I
> create an endless loop like that above?

An endless loop is usually written as:

while ( 1 ) { ... }

Or:

for ( ;; ) { ... }

Or:

{ ...; redo }



John
-- 
use Perl;
program
fulfillment


------------------------------

Date: 30 Sep 2003 11:55:49 -0700
From: e.mayorga@co.snohomish.wa.us (Emilio Mayorga)
Subject: GD::Graph: "mixed" graph doesn't recognize "area" graph type
Message-Id: <faa70e85.0309301055.4202eb7a@posting.google.com>

Hi,

I'm using the GD::Graph module to create some graphs (great module,
BTW!). Everything works fine. But in a "mixed" graph, "area" graphs
are not recognized. I get this error message:
[snip] unknown type area, assuming lines at [snip]
I am trying to create a chart with two lines and three area graphs.
The lines work fine. Area graphs work fine when used by themselves
(not in a mixed graph); I wrote a small program to test this.

I am using ActiveState Perl 5.6.1 on Win2000; GD 1.27, GD::Graph 1.31.
I've read in the documentation that support for "mixed" is fairly
limited, but area is included in the sample "types" assignment in the
documenation.

The X-axis on my graph is numerical. I don't know if that makes a
difference.

Thanks!

-Emilio


------------------------------

Date: Tue, 30 Sep 2003 20:19:45 +0200
From: John Bokma <postmaster@castleamber.com>
Subject: Re: Help with RegEx
Message-Id: <1064946119.101828@halkan.kabelfoon.nl>

vector wrote:

> my search will return, properly, the files "�FFFF.EA", "�.FFFF.TIF", 
> and "�9999.EA".  What I need is to extend the regex to refine those
> results, so that the file named "�9999.EA" is excluded from the
> results set, as it has no matching .TIF file.

^\d{16}[\dA-F]\.EA$

and check with -e if the TIF exists:

$file =~ s/EA$/TIF/;
die "no associated TIF" unless -e $file;

-- 
Kind regards,                                          prachtige ideeen
John                                          aan het einde van een dal
                                               stromen dagelijks
http://johnbokma.com/                         gedachtenwaterval



------------------------------

Date: Tue, 30 Sep 2003 13:53:02 -0500
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Help with RegEx
Message-Id: <slrnbnjk8e.nus.tadmc@magna.augustmail.com>

vector <isbat1@yahoo.com> wrote:

> Given a list of files in a directory, I want to return sets of file
> pairs.  


Using the word "pair" in the problem description often implies
"use a hash" somewhere.


> First, the target directory should be scanned for files with
> 20-character names and extensions of either .TIF or .EA.  The
> 20-character name begins with 16 numbers, and the last four characters
> can be any hexadecimal character, 0-9 or A-F.

> The next step should be
> to take the resulting set of filenames, and extract file pairs of TIFs
> and EAs that have the same name.


> so that the file named "�9999.EA" is excluded from the
> results set, as it has no matching .TIF file.


--------------------------------------
#!/usr/bin/perl
use strict;
use warnings;

my $dir = '.';
opendir DIR, $dir or die "could not open '$dir' directory  $!";

my %pairs;
foreach my $ea ( grep /^\d{16}[\dA-F]{4}\.EA$/, readdir DIR ) {
   (my $tif = $ea) =~ s/EA$/TIF/;
   $pairs{$ea} = $tif if -e "$dir/$tif";
}
closedir DIR;

print "$_  ==>  $pairs{$_}\n" for sort keys %pairs;
--------------------------------------


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Tue, 30 Sep 2003 19:39:12 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: newbie regxp question
Message-Id: <3F79DBAF.38D7E31B@acm.org>

Jesper wrote:
> 
> Ok, fair enough - I just relate the =~ operator to regxp (newbie :)).

The binding operators (=~ and !~) are not strictly related to regular
expressions, they can also be used with tr/// which does not use regular
expressions at all.


John
-- 
use Perl;
program
fulfillment


------------------------------

Date: Tue, 30 Sep 2003 21:52:19 GMT
From: Mike Flannigan <mikeflan@earthlink.net>
Subject: Re: Obtain Connection Duration
Message-Id: <3F79FBBB.1177AB00@earthlink.net>


Great idea.  It worked great.  All I had to do was get
dumpel.exe and write the code.  Thanks for all your help.

Note:  There is no rule against talking to yourself on
these groups  :-)


Mike


Mike Flannigan wrote:

> I got a new idea (by myself).  Maybe if I search the
> System log and calculate it myself . . .
>
> I'll let you know.
>
> Mike



------------------------------

Date: 30 Sep 2003 12:29:57 -0700
From: David Wake <dwake.no.spam@alumni.stanford.org>
Subject: Order of attachments in Multipart POST
Message-Id: <9noex23wi2.fsf@Turing.Stanford.EDU>

Does anyone know of a way to control the order of file attachments
sent in a multipart POST statement?  They seem to be sent in a random
order each time.


use HTTP::Request::Common;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;

my $response = $ua->request(POST 'http://www.foo.com/forminput.cgi,
			       Content_Type => 'form-data',
			       Content      => [ 
                         file1   => ["/home/dwake/file1"],
                         file2   => ["/home/dwake/file2"],
                       ]  );

The order in which file1 and file2 are attached seems to vary each
time this script is run.  I would like to always send them in the same
order if possible.

Thanks,

David


------------------------------

Date: Tue, 30 Sep 2003 15:47:56 -0600
From: Eric Schwartz <emschwar@pobox.com>
Subject: Re: Perl command to copy one file into another file?
Message-Id: <etohe2u54oj.fsf@wormtongue.emschwar>

"Bill" <WA5JUL@hotmail.com> writes:
> The objective is to create a sum of the two files with out any duplications.
> I included "use" into the script, but all it does now is make the file copy
> and loops with out making any comapisons with output6.txt.

Here's the problem: we're not psychic.  You asked for a solution that
copied one file to another, and you got that.  Now you appear to want
to roll your own version of diff(1), instead.  We can't help you
unless we know what you actually want-- if you ask for something else,
don't be surprised if you get that instead.

> # To use this script, type the following at the command line prompt:
> # bbb.pl file1 file2 output_file_name
>
> open (INPUT1, "$ARGV[0]") or die "Cannot open $ARGV[0]";

Useless use of quotes, and you didn't use $! in your error string:

open (INPUT1, $ARGV[0]) or die "Cannot open $ARGV[0]: $!";


Also, every Perl program you write should begin with:

#!/path/to/perl   # please put the REAL path to perl here!!!
use warnings;
use strict;

> open (OUTPUT, "> $ARGV[2]");

Why didn't you die() on error here, too?

open (OUTPUT, "> $ARGV[2]") or die "Cannot open $ARGV[2]: $!";

Only that's wrong, you don't want to do that here.  Wait until after
you've done the copy, and THEN open it for append, not write.

> #print "First  input  file is: $ARGV[0]\n";
> #print "Second input  file is: $ARGV[1]\n";
> #print "The    output file is: $ARGV[2]\n\n";
>
> # This half of the code dumps everything in 'file1' to the output 'file3'
>
> #while ( <INPUT1> ) {
> #    print "Reading $ARGV[0]\n";
> #    print OUTPUT $_;
> #}
>
> use File::Copy;
> copy("$ARGV[0]","$ARGV[2]") or die "Copy failed: $!";

copy($ARGV[0], $ARGV[2]) or die "Copy failed: $!";

There's no need to quote variables by themselves like that.  It can
even cause problems in some cases.  Stop doing that.

Also, everything that happened before happens now-- i.e., after the
copy() call is through, file3 is an exact copy of file1.

> # This half of the code compares 'file2' to 'file1' and writes out
> # any line that doesn't match to the output 'file3'
>
> open (INPUT1, "$ARGV[1]") or die "Cannot open $ARGV[1]";

open(INPUT1, $ARGV[1] or die "Cannot open $ARGV[1]: $!";

Why are you re-opening INPUT1 here?  You're using INPUT2 below, but
you never open it.  Are you meaning to do that here?

You also want to open $ARGV[2] for output here:

open(OUTPUT, '>>', $ARGV[2]) or die "couldn't open $ARGV[2] for append: $!";

> while ( <INPUT1> ) {
>   $match = 0;
>
>   $a = $_;

while(my $a = <INPUT1>) {
   $match = 0;

>   open (INPUT2, "$ARGV[0]") or die "Cannot open $ARGV[0]";

   open (INPUT2, $ARGV[0]) or die "Cannot open $ARGV[0]: $!";

>   while ( <INPUT2> ) {
>     $b = $_;
>     if ($a eq $b) {
>       $match = 1;
>       last;
>     }
>   }
>
>   if ($match == 0) {
>     print OUTPUT $a;
>   }

Ick.  Try this (untested)

   print OUTPUT $a unless grep /^\Q$a\E$/, <INPUT2>;

> }

This is a mess, and I'm fairly sure none of it does either what you
think it does, or what you want it to.  First, you read a line from
the second input file.  Then you assign it to $a.  Remember, INPUT1 is
a filehandle for the contents of $ARGV[1].

Then you read from INPUT2, which you never opened, so naturally
nothing is read.  If you'd enabled warnings, perl would have said
something like: "readline() on unopened filehandle INPUT2 at - line
<foo>."  Anyway, the logic is completely reversed-- after the copy,
file3 is the same as file1, so you should isntead read file2, and then
print out whatever's in that file that is NOT in file1.

First off, this is a generally horrible solution, as you have to read
through the entire contents of file1 EACH BLOODY TIME, which means
horrid runtimes.  A much better solution is to read both files, once,
into a hash, and let that handle uniqueness for you:

#!/usr/bin/perl
use warnings;
use strict;

my $output = pop @ARGV;
my %lines = ();
while(<>) {
  chomp;
  $lines{$_}++;
}

open(OUTPUT, '>', $output) or die "couldn't open $output: $!";
print OUTPUT join("\n", keys %lines), "\n";

There's probably even better solutions; this is just off the top of my
head.

-=Eric
-- 
Come to think of it, there are already a million monkeys on a million
typewriters, and Usenet is NOTHING like Shakespeare.
		-- Blair Houghton.


------------------------------

Date: 30 Sep 2003 19:22:24 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Regexp - optimisation 
Message-Id: <slrnbnjlvg.kb6.abigail@alexandra.abigail.nl>

Stephen Adam (stephen.adam@ntlworld.com) wrote on MMMDCLXXXII September
MCMXCIII in <URL:news:sjjeb.7967$4D.5237236@newsfep2-win.server.ntli.net>:
-:  Hi guys and girls,
-:  
-:  I've just written a (crude) program to extract all the e-mail addresses from
-:  a file,
-:  the regexp takes ages. I was just wondering if anyone had any suggestions on
-:  improving
-:  it or any aspects of the program. I know it won't pick up addresses with
-:  comments in them but thats OK.
-:  
-:  
-:  @origlist = $string =~ m{[\w]+@[\w]+[.]+[\w.]*}g;


While I'm not going to comment on the inability of this regex to
recognize email addresses, it's clear why this is slow: too much
backtracking. You should see some speedup by changing the regex
to:

    m{(?>\w+)\@\w+[.][\w.]*}g;


Abigail
-- 
A perl rose:  perl -e '@}>-`-,-`-%-'


------------------------------

Date: Tue, 30 Sep 2003 22:24:44 +0100
From: "Stephen Adam" <stephen.adam@ntlworld.com>
Subject: Re: Regexp - optimisation 
Message-Id: <Ewmeb.1436$z43.729@newsfep1-gui.server.ntli.net>

Thanks for the help Abigail,

It seems to recognise all the e-mail addresses i've tested it with though I
know its pretty limited.

Take Care


Steve






------------------------------

Date: Tue, 30 Sep 2003 12:19:19 -0600
From: Eric Schwartz <emschwar@pobox.com>
Subject: Re: use strict with an undeclared variable
Message-Id: <eton0cm6swo.fsf@wormtongue.emschwar>

kuujinbo@hotmail.com (ko) writes:
> And he *finally* sees the light! For some reason I had it stuck in my
> head that $a and $b were only special within a sort routine, and have
> always associated special variables with uppercased alphabetic or
> punctuation characters.

Personally, I recommend avoiding them except in sort routines anyway.
I find it confusing to see $a and $b in otherwise strict-safe code,
and usually spend a minute or two looking for their definition until I
remember that they don't need one.

-=Eric
-- 
Come to think of it, there are already a million monkeys on a million
typewriters, and Usenet is NOTHING like Shakespeare.
		-- Blair Houghton.


------------------------------

Date: 30 Sep 2003 14:36:48 -0700
From: kuujinbo@hotmail.com (ko)
Subject: Re: use strict with an undeclared variable
Message-Id: <92d64088.0309301336.5c6c3479@posting.google.com>

Eric Schwartz <emschwar@pobox.com> wrote in message news:<eton0cm6swo.fsf@wormtongue.emschwar>...
> kuujinbo@hotmail.com (ko) writes:
> > And he *finally* sees the light! For some reason I had it stuck in my
> > head that $a and $b were only special within a sort routine, and have
> > always associated special variables with uppercased alphabetic or
> > punctuation characters.
> 
> Personally, I recommend avoiding them except in sort routines anyway.
> I find it confusing to see $a and $b in otherwise strict-safe code,
> and usually spend a minute or two looking for their definition until I
> remember that they don't need one.
> 
> -=Eric

Actually I make it a point to not use variables that don't somehow
relate (in my mind at least) to what I'm trying to do - more
specifically, I only use $a and $b in sort routines.

My lame excuse for why it took me so long to figure this one out :)


------------------------------

Date: Sat, 19 Jul 2003 01:59:56 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: 
Message-Id: <3F18A600.3040306@rochester.rr.com>

Ron wrote:

> Tried this code get a server 500 error.
> 
> Anyone know what's wrong with it?
> 
> if $DayName eq "Select a Day" or $RouteName eq "Select A Route") {

(---^


>     dienice("Please use the back button on your browser to fill out the Day
> & Route fields.");
> }
 ...
> Ron

 ...
-- 
Bob Walton



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5593
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[23374] in Perl-Users-Digest

Perl-Users Digest, Issue: 5593 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Tue Sep 30 18:05:45 2003

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Sep 30 18:05:45 2003