[16142] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3554 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jul 10 15:37:37 2000

Date: Mon, 10 Jul 2000 12:37:23 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <963257843-v9-i3554@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Mon, 10 Jul 2000     Volume: 9 Number: 3554

Today's topics:
        Checking the size of an E-mail... <raphaelp@nr1webresource.com>
    Re: Checking the size of an E-mail... <care227@attglobal.net>
    Re: Checking the size of an E-mail... <lauren_smith13@hotmail.com>
    Re: Checking the size of an E-mail... <care227@attglobal.net>
    Re: Checking the size of an E-mail... <raphaelp@nr1webresource.com>
    Re: Checking the size of an E-mail... <care227@attglobal.net>
    Re: Checking the size of an E-mail... <raphaelp@nr1webresource.com>
    Re: Checking the size of an E-mail... <gellyfish@gellyfish.com>
    Re: Checking the size of an E-mail... <gellyfish@gellyfish.com>
    Re: Checking the size of an E-mail... (Philip 'Yes, that's my address' Newton)
        cheer up Tad <diane_lj_lee@yahoo.co.uk>
    Re: cheer up Tad (Tad McClellan)
    Re: cheer up Tad <gellyfish@gellyfish.com>
        Closing a socket with LWP <gilbert.bruyas@free.fr>
        Closing files connecting a parent to a child salim@cygnos.com
        Comparing fields of two files <JohnCasey_member@newsguy.com>
    Re: Comparing fields of two files (Eric Bohlman)
    Re: Comparing fields of two files (Abigail)
    Re: Comparing fields of two files (jason)
    Re: Comparing fields of two files <bwalton@rochester.rr.com>
    Re: Comparing fields of two files (Keith Calvert Ivey)
    Re: Comparing fields of two files (jason)
        Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 7 Jul 2000 17:36:00 +0200
From: "Raphael Pirker" <raphaelp@nr1webresource.com>
Subject: Checking the size of an E-mail...
Message-Id: <8k4tmj$72o$16$1@news.t-online.com>

Hi all,

I've asked this question already, but I've been pointed to the perldoc -f
length which shows only:

=item length EXPR

=item length

Returns the length in bytes of the value of EXPR.  If EXPR is
omitted, returns length of C<$_>.

and I really don't get how I could check whether an e-mail is over 50k in
size using this information!

Could anyone please help with maybe a sample-code?

Thanks in advance,

Raphael




------------------------------

Date: Fri, 07 Jul 2000 14:27:10 -0400
From: Drew Simonis <care227@attglobal.net>
Subject: Re: Checking the size of an E-mail...
Message-Id: <396620FE.3DB57597@attglobal.net>

Raphael Pirker wrote:
> 
> length EXPR
> 
> length
> 
> Returns the length in bytes of the value of EXPR.  If EXPR is
> omitted, returns length of $_.
> 
> and I really don't get how I could check whether an e-mail is over 50k in
> size using this information!

Did you even try?
 
> Could anyone please help with maybe a sample-code?  

What you should be posting is _your_ code that you tried to make work,
but need help with.  Using length(), you'd do something like:

----------------8<----------------------------------8<--------------
#!/usr/bin/perl -w

use strict;
undef $/; # undefines the input record seperator, so I can get the
          # whole file into the one variable.  Generally you'd do
          # this local to a block, but since this is a one off...

open LENGTH, "$path_to_message_file" or die "barf: $! \n";

# double quotes aren't required with an open() function call when 
# specifying a scalar variable, but I like to keep in the habit of 
# always quoting.   

my $slurp = <LENGTH>; # read the file into a scalar (if we wouldn't 
                      # have undef'd $/, we'd have only gotten the 
                      # first line of the file. 


my $length = length $slurp; #length now has the number of bytes.

if ("$length" > '50000')
{
	#its pretty big
}
else
{
	#do your thing
}

----------------8<----------------------------------8<--------------

and another way would be to use stat.

$ perldoc -f stat  (a much better way, IMHO)


------------------------------

Date: Fri, 7 Jul 2000 12:05:56 -0700
From: "Lauren Smith" <lauren_smith13@hotmail.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k59ne$ilf$1@brokaw.wa.com>


Drew Simonis <care227@attglobal.net> wrote in message
news:396620FE.3DB57597@attglobal.net...
> Raphael Pirker wrote:
> >
> > length EXPR
> >
> > length
> >
> > Returns the length in bytes of the value of EXPR.  If EXPR is
> > omitted, returns length of $_.
> >
> > and I really don't get how I could check whether an e-mail is over 50k
in
> > size using this information!
>
> ----------------8<----------------------------------8<--------------
> #!/usr/bin/perl -w
>
> use strict;
> undef $/; # undefines the input record seperator, so I can get the
>           # whole file into the one variable.  Generally you'd do
>           # this local to a block, but since this is a one off...
>
> open LENGTH, "$path_to_message_file" or die "barf: $! \n";
>
> # double quotes aren't required with an open() function call when
> # specifying a scalar variable, but I like to keep in the habit of
> # always quoting.
>
> my $slurp = <LENGTH>; # read the file into a scalar (if we wouldn't
>                       # have undef'd $/, we'd have only gotten the
>                       # first line of the file.
>
> my $length = length $slurp; #length now has the number of bytes.
>
> if ("$length" > '50000')

Is there a reason you are quoting those things?  Wouldn't this work just as
well?

if ($length > 50000)

And if we are only interested in the size of the file in bytes, wouldn't
this work also, without needing to actually read in the file?

if ((-s $path_to_message_file) > 50000)
> ----------------8<----------------------------------8<--------------
>
> and another way would be to use stat.
>
> $ perldoc -f stat  (a much better way, IMHO)

Heh, nevermind...  :-)

Lauren





------------------------------

Date: Fri, 07 Jul 2000 15:38:01 -0400
From: Drew Simonis <care227@attglobal.net>
Subject: Re: Checking the size of an E-mail...
Message-Id: <39663199.2CB6CC63@attglobal.net>

Lauren Smith wrote:
> 
> >
> > if ("$length" > '50000')
> 
> Is there a reason you are quoting those things?  Wouldn't this work just as
> well?
> 

I'm a quote-a-holic.  I'd rather overquote than one day forget to 
quote something and mess it up =)


------------------------------

Date: Fri, 7 Jul 2000 21:45:51 +0200
From: "Raphael Pirker" <raphaelp@nr1webresource.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k5cb2$bms$16$1@news.t-online.com>

Thanks! I did come about that far. But the problem is that the e-mail is no
file! I can read a file, but I didn't figure out how to read in a "couple of
written bytes". Here's my sendmail code:

open(MAIL,"|$mailprog -t");

print MAIL "To: \"$yourname\" <$admin_recipient>\n";
print MAIL "From: \"$visitor_name\" <$visitor_email>\n";
print MAIL "Subject: $admin_subject\n";
if ($admin_cc ne "false") {
print MAIL "Cc: $admin_cc\n";
};
print MAIL "\n";
if ($include_from eq "1") {
 print MAIL "E-Mail sent by $visitor_email\n";
}
print MAIL "$admin_content\n";
print MAIL "$spam_feature\n";
print MAIL "$credits\n";
close(MAIL);




------------------------------

Date: Fri, 07 Jul 2000 16:14:27 -0400
From: Drew Simonis <care227@attglobal.net>
Subject: Re: Checking the size of an E-mail...
Message-Id: <39663A23.B1E389B4@attglobal.net>

Raphael Pirker wrote:

> print MAIL "To: \"$yourname\" <$admin_recipient>\n";
> print MAIL "From: \"$visitor_name\" <$visitor_email>\n";
> print MAIL "Subject: $admin_subject\n";
> if ($admin_cc ne "false") {
> print MAIL "Cc: $admin_cc\n";
> };
> print MAIL "\n";
> if ($include_from eq "1") {
>  print MAIL "E-Mail sent by $visitor_email\n";
> }
> print MAIL "$admin_content\n";
> print MAIL "$spam_feature\n";
> print MAIL "$credits\n";
> close(MAIL);


See if I understand this...  you want to instead measure the ammount 
of data written to the MAIL filehandle?


------------------------------

Date: Sat, 8 Jul 2000 06:31:44 +0200
From: "Raphael Pirker" <raphaelp@nr1webresource.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k6bgo$ojk$18$1@news.t-online.com>

correct!




------------------------------

Date: 8 Jul 2000 13:58:31 +0100
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k78hn$it2$1@orpheus.gellyfish.com>

On Fri, 07 Jul 2000 15:38:01 -0400 Drew Simonis wrote:
> Lauren Smith wrote:
>> 
>> >
>> > if ("$length" > '50000')
>> 
>> Is there a reason you are quoting those things?  Wouldn't this work just as
>> well?
>> 
> 
> I'm a quote-a-holic.  I'd rather overquote than one day forget to 
> quote something and mess it up =)

But that particular quoting is doing something that you dont want.  You are
doing a numeric comparison - you dont want to stringify the number in
$length - for all I know perl might optimise this away but I would be
willing to stick my neck out and suggest that it is causing unnecessary
work.

A simple rule of thumb is that if what you are dealing is a number then you
dont need or want the quotes.  In most *shells* an empty variable will
cause a fatal error if it is not quoted but the work that will happen
in Perl is that (if you asked for the warning of course) you will get the
'use of uninitialized variable' warning so on the whole you can skip the
quotes around variables everywhere except unless you want to explicitly
stringify it for some reason.

/J\
-- 
yapc::Europe in assocation with the Institute Of Contemporary Arts
   <http://www.yapc.org/Europe/>   <http://www.ica.org.uk>


------------------------------

Date: 8 Jul 2000 14:23:37 +0100
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k7a0p$nmn$1@orpheus.gellyfish.com>

On Fri, 7 Jul 2000 21:45:51 +0200 Raphael Pirker wrote:
> Thanks! I did come about that far. But the problem is that the e-mail is no
> file! I can read a file, but I didn't figure out how to read in a "couple of
> written bytes". Here's my sendmail code:
> 
> open(MAIL,"|$mailprog -t");
> 
> print MAIL "To: \"$yourname\" <$admin_recipient>\n";
> print MAIL "From: \"$visitor_name\" <$visitor_email>\n";
> print MAIL "Subject: $admin_subject\n";
> if ($admin_cc ne "false") {
> print MAIL "Cc: $admin_cc\n";
> };
> print MAIL "\n";
> if ($include_from eq "1") {
>  print MAIL "E-Mail sent by $visitor_email\n";
> }
> print MAIL "$admin_content\n";
> print MAIL "$spam_feature\n";
> print MAIL "$credits\n";
> close(MAIL);
> 
Well if you want to know how much you have output to MAIL the perhaps
you should put it all in a string first and then use length on that :

my $mail =<<EEEBAGUM;
To: "$yourname" <$admin_recipient>
From: "$visitor_name" <$visitor_email>
Subject: $admin_subject\n
EEEBAGUM

$mail .= "Cc: $admin_cc\n" if ($admin_cc ne 'false');
$mail .= "\n";
$mail .= "E-Mail sent by $visitor_email\n" if ($include_from);

my $mail .=<<EWARWOOWAR;
$admin_content
$spam_feature
$credits
EWARWOOWAR

$length = length $mail;

print MAIL, $mail ;

# etc

/J\
-- 
yapc::Europe in assocation with the Institute Of Contemporary Arts
   <http://www.yapc.org/Europe/>   <http://www.ica.org.uk>


------------------------------

Date: Sun, 09 Jul 2000 07:35:17 GMT
From: nospam.newton@gmx.li (Philip 'Yes, that's my address' Newton)
Subject: Re: Checking the size of an E-mail...
Message-Id: <39682741.73699153@news.nikoma.de>

On 8 Jul 2000 14:23:37 +0100, Jonathan Stowe <gellyfish@gellyfish.com>
wrote:

> print MAIL, $mail ;
            ^
oops. No?

Cheers,
Philip
-- 
Philip Newton <nospam.newton@gmx.li>
If you're not part of the solution, you're part of the precipitate.


------------------------------

Date: Thu, 6 Jul 2000 08:49:00 +0100
From: "diane lee" <diane_lj_lee@yahoo.co.uk>
Subject: cheer up Tad
Message-Id: <8k1dlb$q7c$1@supernews.com>

Grumpy get it was only a question!




------------------------------

Date: Thu, 6 Jul 2000 07:35:58 -0400
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: cheer up Tad
Message-Id: <slrn8m8rou.ip7.tadmc@magna.metronet.com>

On Thu, 6 Jul 2000 08:49:00 +0100, diane lee <diane_lj_lee@yahoo.co.uk> wrote:

>Grumpy get it was only a question!


It was only an *off-topic* question!


( Where "it" is undefined, since there are no References: )


-- 
    Tad McClellan                          SGML Consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: Thu, 06 Jul 2000 16:08:25 GMT
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: cheer up Tad
Message-Id: <Z9295.1497$iP2.143190@news.dircon.co.uk>

On Thu, 6 Jul 2000 07:35:58 -0400, Tad McClellan Wrote:
> On Thu, 6 Jul 2000 08:49:00 +0100, diane lee <diane_lj_lee@yahoo.co.uk> wrote:
> 
>>Grumpy get it was only a question!
> 
> 
> It was only an *off-topic* question!
> 
> 
> ( Where "it" is undefined, since there are no References: )
> 

I didnt see 'it' at all so 'it' didnt exist ....


/J\


------------------------------

Date: Thu, 06 Jul 2000 10:34:22 GMT
From: "gilbert bruyas" <gilbert.bruyas@free.fr>
Subject: Closing a socket with LWP
Message-Id: <OgZ85.2491$L83.5546024@nnrp5.proxad.net>

Hello Everyone
I use this perl script with:
Avec ActivePerl 5.6.0.613

#!c:\perl\bin\perl.exe -w
use LWP::UserAgent;
$browser = new LWP::UserAgent;
$browser->agent("WatchDog");
$browser->timeout(20);
$flag=0;
while () {
$request = new
HTTP::Request('GET',"http://$ARGV[0]/myCgi/wdcgi.pl?host=$ARGV[1]");
$response = $browser->request($request);
 ......
sleep 30;
}

The connections don't close we must wait for timeout.
I go through a firewall with limited number of connections.
Ex of netstat :
0   TCP    nt215303:80            PCKUDRTKUH:3245        TIME_WAIT
1   TCP    nt215303:80            PCKUDRTKUH:3247        TIME_WAIT
2   TCP    nt215303:80            PCKUDRTKUH:3249        TIME_WAIT
3   TCP    nt215303:80            PCKUDRTKUH:3251        TIME_WAIT

The last frame is :
TCP: Flags = 0x11 : .A...F
TCP: ..0..... = No urgent data
TCP: ...1.... = Acknowledgement field significant
TCP: ....0... = No Push function
TCP: .....0.. = No Reset
TCP: ......0. = No Synchronize
TCP: .......1 = No more data from sender

With a browser the last frame is :
TCP: Flags = 0x04 : ...R..
TCP: ..0..... = No urgent data
TCP: ...0.... = Acknowledgement field not significant
TCP: ....0... = No Push function
TCP: .....1.. = Reset the connection
TCP: ......0. = No Synchronize
TCP: .......0 = No Fin
In the netstat the connection goes from ESTABLISHED to closed.
Can i force a reset when LWP close the socket ?
Thank you for your help.






------------------------------

Date: Thu, 06 Jul 2000 01:52:16 GMT
From: salim@cygnos.com
Subject: Closing files connecting a parent to a child
Message-Id: <kDR85.21643$ZI2.1070966@news1.rdc1.on.wave.home.com>

Hello,
I am working on a Perl program that forks children processes. This code
segement is what I use to accmplish this:

use IO::File;
my %files=();
while (1) {
  my $fh = new IO::File;
  my $pid = open($fh, "|-");
  $numOfFiles++;
   until ($numOfFiles < 11) { # number of open files connecting parent to
   children cannot exceed 10
    	sleep 2;
    	&REAPER();
  }

  if ($pid){ # parent
    	$files{$pid} = $fh;
    	$fh->autoflush(1);
    	print $fh "...parameters to pass to child process for processing";
      } else {
    	exec "aProgram";
  }

  &REAPER();
}

sub REAPER {
	while( my $pid= waitpid(-1, &WNOHANG) ){
		if ( $pid == -1 ){
			#Ignore it
			} elsif(WIFEXITED($?)) {
				 if( defined $files{$pid} ){
					close $files{$pid} || warn "REAPER: Can't close file $files{$pid}:
					$!\n";
					delete $files{$pid};
					$numOfFiles--;
				}
			} else {
				Log "False Alarm on process $pid\n";
			}
	}


What I want to be able to achieve is to close files that were opened to
connect the parent to the children. Whenever the number of files reaches 10,
REAPER subrouting is called to clean up and close file handles pertaining to
dead children. When this happens, I get the error message "Cant close file:
no child process". Could you please shed some light on what you think I am
doing wrong? Why isn't perl able to close file handles that once connected a
parent to child should the latter die or exit gracefully? and finally how to
remedy the situation?

Thanks in advance

Salim

salim@cygnos.com


------------------------------

Date: 8 Jul 2000 18:17:36 -0700
From: JohnCasey <JohnCasey_member@newsguy.com>
Subject: Comparing fields of two files
Message-Id: <8k8jrg$12uq@drn.newsguy.com>

Hi,

   There are two files:

fileA  is as follows:                               fileB is as follows:
718#203#NY#                                          7036123456
901#257#AL#                                          2034567892
703#612#KY#                                          9012348956
 ...........                                          ............
 over  13000 entries

**********************************************************************

Basically, file A contains the state code corresponding to the first six
digits of a phone number.  Whereas fileB contains the phone number.
I need to find the corresponding state code for each phone number in
FileB  by searching fileA and then create a new File C with the phone number and
the state code.
For eg:   7036123456  MA
          2034567892  CT
          ........  



I wrote the following script and it seems to work fine, the only
problem being, it takes a long time to process the files . It took
over 45 minutes to complete and also took a lot of CPU cycles !.
*************
#!/usr/sbin/perl
open(BTN,"btnid") || die "Cannot open btnid file";
open(NPX,"npx") || die "Cannot open npx file" ;
open(BTNNPX,">btnnpx") || die "Cannot open btnnpx file";
while (<BTN>)
{
   chomp;
   $id = $_ ;
   $major = substr("$id",0,3);
   $minor = substr("$id",3,3);
   $search_id = join("#",$major,$minor);
   while (<NPX>)
   {
      chomp;
      if (/$search_id/)
      {
         ($id1,$id2,$state,$junk) = split(/#/,$_) ;
         print $search_id,"===",$state,"\n";
         close(NPX) || die "Cannot close npx file";  
         last;
      }
   }
   print BTNNPX "$id   $state \n";
   open(NPX,"npx") || die "Cannot open npx file";
}
close(BTN);
close(BINNPX);
close(NPX);             
******************


I also tried the "grep" method (using grep command) rather than
the  "Open File, pattern match, Closefile"  method used above.
The change in code was this:

*******
$search_id = join("#",$major,$minor);
$result = `grep $search_id npx` ;
if ( $? != 0 )
   {  @arrresult = `grep ^$major npx` ;
      $result = $arrresult[0];
   }
($id1,$id2,$state,$junk) = split(/#/,$result) ;
print " $search_id = $result","\n";   
********

   But, even this code seems to take over 30 minutes (slightly better than
the method used previously).   Is there a  elegant and quicker way
of doing this??

Thanks in advance
John



------------------------------

Date: 9 Jul 2000 03:02:08 GMT
From: ebohlman@netcom.com (Eric Bohlman)
Subject: Re: Comparing fields of two files
Message-Id: <8k8pvg$1hl$3@slb1.atl.mindspring.net>

JohnCasey (JohnCasey_member@newsguy.com) wrote:
: Basically, file A contains the state code corresponding to the first six
: digits of a phone number.  Whereas fileB contains the phone number.
: I need to find the corresponding state code for each phone number in
: FileB  by searching fileA and then create a new File C with the phone number and
: the state code.

The technical term for this is "joining" the two files (it's sometimes 
called a "relational join" and this operation is in fact at the heart of 
a relational database system).

: I wrote the following script and it seems to work fine, the only
: problem being, it takes a long time to process the files . It took
: over 45 minutes to complete and also took a lot of CPU cycles !.

Yes, the naive brute-force solution to doing a join is quite time 
consuming because its run time is proportional to the product of the 
numbers of entries in the two files (it didn't help that in your 
implementation one of the files had to be re-opened for every line in the 
other one, but that just multiplies the run time by a constant).

What you need to do is read file A into a hash keyed on the phone-number 
prefix, and the loop through file B, looking up each number's prefix in 
the hash.  That will give you a run time proportional to the *sum* of the 
numbers of entries in the two files, which should be *much* better.


------------------------------

Date: 08 Jul 2000 23:07:04 EDT
From: abigail@delanet.com (Abigail)
Subject: Re: Comparing fields of two files
Message-Id: <slrn8mfs49.t4j.abigail@alexandra.delanet.com>

JohnCasey (JohnCasey_member@newsguy.com) wrote on MMDIV September
MCMXCIII in <URL:news:8k8jrg$12uq@drn.newsguy.com>:
-- Hi,
--
--    There are two files:
--
-- fileA  is as follows:                               fileB is as follows:
-- 718#203#NY#                                          7036123456
-- 901#257#AL#                                          2034567892
-- 703#612#KY#                                          9012348956
-- ...........                                          ............
--  over  13000 entries
--
-- **********************************************************************
--
-- Basically, file A contains the state code corresponding to the first six
-- digits of a phone number.  Whereas fileB contains the phone number.
-- I need to find the corresponding state code for each phone number in
-- FileB  by searching fileA and then create a new File C with the phone number 
-- the state code.
-- For eg:   7036123456  MA
--           2034567892  CT
--           ........  

Could you describe *how* exactly this is supposed to work? In your 
examples, fileA contains '703#612#KY#' and fileB contains '7036123456'.
The output file has '7036123456  MA'.

Where is the MA coming from, and what happened to KY?

Please, people, if you have a problem and want a solution, give a proper
specification of the problem and certainly do define a problem with
vague or unclear examples. Often, people try to describe problems with
examples that don't even match the problem.


Your problem might be a variant of what's answered in the FAQ. However,
the problem is poorly phrased so who can tell?



Abigail
-- 
$_ = "\x3C\x3C\x45\x4F\x54";
print if s/<<EOT/<<EOT/e;
Just another Perl Hacker
EOT


------------------------------

Date: Sun, 09 Jul 2000 03:20:42 GMT
From: elephant@squirrelgroup.com (jason)
Subject: Re: Comparing fields of two files
Message-Id: <MPG.13d28cce515bc1d98977e@news>

JohnCasey writes ..
>   There are two files:
>
>fileA  is as follows:                               fileB is as follows:
>718#203#NY#                                          7036123456
>901#257#AL#                                          2034567892
>703#612#KY#                                          9012348956
>...........                                          ............
> over  13000 entries
>
>**********************************************************************
>
>Basically, file A contains the state code corresponding to the first six
>digits of a phone number.  Whereas fileB contains the phone number.
>I need to find the corresponding state code for each phone number in
>FileB  by searching fileA and then create a new File C with the phone number and
>the state code.
>For eg:   7036123456  MA
>          2034567892  CT
>          ........  
>
>
>
>I wrote the following script and it seems to work fine, the only
>problem being, it takes a long time to process the files . It took
>over 45 minutes to complete and also took a lot of CPU cycles !.

the large time is taken because you're doing a regex (a string 
operation) for every entry in fileA .. and you're doing it without any 
anchors .. so the regex doesn't just check the start of the string - but 
a number of characters in the string - each time .. for every line in 
fileA times the number of lines in fileB .. very inefficient

so .. first step would be to anchor the regex to the beginning of the 
string .. ie.

  if (/^$search_id/)

however you can do better than that .. you're not actually using any 
regular expression .. and there's a better way to do a string comparison 
than that

  if( $search_id eq substr( $_, 0, 6))

but I think you're still going about it all wrong .. because you're 
still processing every single line of your target file - when 
essentially you're dealing with numerical figures

you haven't said how big fileB is .. but it's only a few bytes per line 
 .. so it'd have to be pretty huge to take up any significant amount of 
RAM

so I'd pre-process fileB before you started - storing each value into an 
array and then sorting it into numerical order .. then I'd take each 
string from fileA and make it into a 10 digit number

  $num = substr( $fileALine, 0, 3). substr( $fileALine, 4, 3) * 10000;

then search for it in your array .. craft your own search routine .. 
assuming you've got an even spread of phone numbers then doing a binary 
search will work fairly ideally here .. it will be slightly different 
from a normal one because you'll probably want to find a bunch of phone 
numbers .. eg. 12345460001, 1234560002 etc. .. so your test will need to 
take that into account by testing whether values are at least 10000 more 
than your search value

you'll get to the point where what you're left with is two indexes .. 
which encompass all values between 1234560000 and 1234569999 inclusive 
 .. obviously then you'd output all those values with the current state 
code

without your input files it's hard to give you metrics on the above 
method .. but it should be fairly quick (especially in comparison with 
the original) .. you were searching through each entry in fileA for each 
entry in fileB .. ie. you were doing

  ( # lines in fileB ) times ( # lines in fileA ) string comparisons

instead - you're now doing one numerical sort of all the items in fileB 
 .. then you're doing

  ( # lines in fileA ) binary searches

which are doing numeric comparisons .. and will end up performing MUCH 
less than ( # lines in fileB ) numbers of tests

the trade off is memory .. you will need a chunk of memory to store all 
the fileB values in - but it shouldn't be too large (depending on how 
large fileB is and how Perl stores the values in an array)

-- 
 jason - elephant@squirrelgroup.com -


------------------------------

Date: Sun, 09 Jul 2000 03:29:49 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: Comparing fields of two files
Message-Id: <3967F1EF.49E1D965@rochester.rr.com>

JohnCasey wrote:
> 
> Hi,
> 
>    There are two files:
> 
> fileA  is as follows:                               fileB is as follows:
> 718#203#NY#                                          7036123456
> 901#257#AL#                                          2034567892
> 703#612#KY#                                          9012348956
> ...........                                          ............
>  over  13000 entries
> 
> **********************************************************************
> 
> Basically, file A contains the state code corresponding to the first six
> digits of a phone number.  Whereas fileB contains the phone number.
> I need to find the corresponding state code for each phone number in
> FileB  by searching fileA and then create a new File C with the phone number and
> the state code.
> For eg:   7036123456  MA
>           2034567892  CT
>           ........
> 
> I wrote the following script and it seems to work fine, the only
> problem being, it takes a long time to process the files . It took
> over 45 minutes to complete and also took a lot of CPU cycles !.
 ...

>    But, even this code seems to take over 30 minutes (slightly better than
> the method used previously).   Is there a  elegant and quicker way
> of doing this??
> 
> Thanks in advance
> John

Wow.  You must have a pretty fast system if your algorithm ran that
fast!  What you need to do is to make a hash out of the first six digits
and the state code, then run through the phone number list.  Something
like:

   use strict;
   my %state=();
   open FILEA,"filea" or die "Oops, couldn't open filea, $!\n";
   while(<FILEA>){
      if(/^(\d{3})#(\d{3})#(\w{2})/){
         $state{"$1$2"}=$3;
      }
   }
   close FILEA or die "Oops, couldn't close filea, $!\n";
   open FILEB,"fileb" or die "Oops, couldn't open fileb, $!\n";
   while(<FILEB>){
      chomp;
      if(/^(\d{6})/){
         print "$_  $state{$1}\n";
      }
   }
   close FILEB or die "Oops, couldn't close fileb, $!\n";

That should run in a few seconds for 13000 records.  You might want to
fancy it up with errors for any records that don't match, or phone
numbers that don't have a matching entry in the state hash.
-- 
Bob Walton


------------------------------

Date: Sun, 09 Jul 2000 03:50:42 GMT
From: kcivey@cpcug.org (Keith Calvert Ivey)
Subject: Re: Comparing fields of two files
Message-Id: <3968f5be.53167226@nntp.idsonline.com>

elephant@squirrelgroup.com (jason) wrote:

>so I'd pre-process fileB before you started - storing each value into an 
>array and then sorting it into numerical order .. then I'd take each 
>string from fileA and make it into a 10 digit number
>
>  $num = substr( $fileALine, 0, 3). substr( $fileALine, 4, 3) * 10000;
>
>then search for it in your array .. craft your own search routine .. 
>assuming you've got an even spread of phone numbers then doing a binary 
>search will work fairly ideally here .. it will be slightly different 
>from a normal one because you'll probably want to find a bunch of phone 
>numbers .. eg. 12345460001, 1234560002 etc. .. so your test will need to 
>take that into account by testing whether values are at least 10000 more 
>than your search value

What's the point of all that?  Just use a hash, as other have
suggested, mapping each combination of six digits to a state.
There's no need to write your own search routines when Perl has
them built in.

-- 
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----


------------------------------

Date: Sun, 09 Jul 2000 05:27:07 GMT
From: elephant@squirrelgroup.com (jason)
Subject: Re: Comparing fields of two files
Message-Id: <MPG.13d2aa707491474a989780@news>

Keith Calvert Ivey writes ..
>elephant@squirrelgroup.com (jason) wrote:
>
>>so I'd pre-process fileB before you started - storing each value into an 
>>array and then sorting it into numerical order .. then I'd take each 
>>string from fileA and make it into a 10 digit number
>>
>>  $num = substr( $fileALine, 0, 3). substr( $fileALine, 4, 3) * 10000;
>>
>>then search for it in your array .. craft your own search routine .. 
>>assuming you've got an even spread of phone numbers then doing a binary 
>>search will work fairly ideally here .. it will be slightly different 
>>from a normal one because you'll probably want to find a bunch of phone 
>>numbers .. eg. 12345460001, 1234560002 etc. .. so your test will need to 
>>take that into account by testing whether values are at least 10000 more 
>>than your search value
>
>What's the point of all that?  Just use a hash, as other have
>suggested, mapping each combination of six digits to a state.

the point ? .. speed .. and it's not really 'all that' .. you probably 
spent longer reading it than it would take to code it

>There's no need to write your own search routines when Perl has
>them built in.

there's no 'need' to make any ammendments to the code - it does work .. 
the problem as stated was a performance issue .. I just posted what I 
would do as a suggested improvement to the efficiency of the process

without the input data to test I can't say for certain .. and I don't 
have a good enough understanding of the hash internals to make a 
worthwhile comparison .. but a binary search through ordered numeric 
data is pretty fast - and may be faster than a string lookup in a 13000 
member hash

-- 
 jason - elephant@squirrelgroup.com -


------------------------------

Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V9 Issue 3554
**************************************


home help back first fref pref prev next nref lref last post