[16142] in Perl-Users-Digest
Perl-Users Digest, Issue: 3554 Volume: 9
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jul 10 15:37:37 2000
Date: Mon, 10 Jul 2000 12:37:23 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <963257843-v9-i3554@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Mon, 10 Jul 2000 Volume: 9 Number: 3554
Today's topics:
Checking the size of an E-mail... <raphaelp@nr1webresource.com>
Re: Checking the size of an E-mail... <care227@attglobal.net>
Re: Checking the size of an E-mail... <lauren_smith13@hotmail.com>
Re: Checking the size of an E-mail... <care227@attglobal.net>
Re: Checking the size of an E-mail... <raphaelp@nr1webresource.com>
Re: Checking the size of an E-mail... <care227@attglobal.net>
Re: Checking the size of an E-mail... <raphaelp@nr1webresource.com>
Re: Checking the size of an E-mail... <gellyfish@gellyfish.com>
Re: Checking the size of an E-mail... <gellyfish@gellyfish.com>
Re: Checking the size of an E-mail... (Philip 'Yes, that's my address' Newton)
cheer up Tad <diane_lj_lee@yahoo.co.uk>
Re: cheer up Tad (Tad McClellan)
Re: cheer up Tad <gellyfish@gellyfish.com>
Closing a socket with LWP <gilbert.bruyas@free.fr>
Closing files connecting a parent to a child salim@cygnos.com
Comparing fields of two files <JohnCasey_member@newsguy.com>
Re: Comparing fields of two files (Eric Bohlman)
Re: Comparing fields of two files (Abigail)
Re: Comparing fields of two files (jason)
Re: Comparing fields of two files <bwalton@rochester.rr.com>
Re: Comparing fields of two files (Keith Calvert Ivey)
Re: Comparing fields of two files (jason)
Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 7 Jul 2000 17:36:00 +0200
From: "Raphael Pirker" <raphaelp@nr1webresource.com>
Subject: Checking the size of an E-mail...
Message-Id: <8k4tmj$72o$16$1@news.t-online.com>
Hi all,
I've asked this question already, but I've been pointed to the perldoc -f
length which shows only:
=item length EXPR
=item length
Returns the length in bytes of the value of EXPR. If EXPR is
omitted, returns length of C<$_>.
and I really don't get how I could check whether an e-mail is over 50k in
size using this information!
Could anyone please help with maybe a sample-code?
Thanks in advance,
Raphael
------------------------------
Date: Fri, 07 Jul 2000 14:27:10 -0400
From: Drew Simonis <care227@attglobal.net>
Subject: Re: Checking the size of an E-mail...
Message-Id: <396620FE.3DB57597@attglobal.net>
Raphael Pirker wrote:
>
> length EXPR
>
> length
>
> Returns the length in bytes of the value of EXPR. If EXPR is
> omitted, returns length of $_.
>
> and I really don't get how I could check whether an e-mail is over 50k in
> size using this information!
Did you even try?
> Could anyone please help with maybe a sample-code?
What you should be posting is _your_ code that you tried to make work,
but need help with. Using length(), you'd do something like:
----------------8<----------------------------------8<--------------
#!/usr/bin/perl -w
use strict;
undef $/; # undefines the input record seperator, so I can get the
# whole file into the one variable. Generally you'd do
# this local to a block, but since this is a one off...
open LENGTH, "$path_to_message_file" or die "barf: $! \n";
# double quotes aren't required with an open() function call when
# specifying a scalar variable, but I like to keep in the habit of
# always quoting.
my $slurp = <LENGTH>; # read the file into a scalar (if we wouldn't
# have undef'd $/, we'd have only gotten the
# first line of the file.
my $length = length $slurp; #length now has the number of bytes.
if ("$length" > '50000')
{
#its pretty big
}
else
{
#do your thing
}
----------------8<----------------------------------8<--------------
and another way would be to use stat.
$ perldoc -f stat (a much better way, IMHO)
------------------------------
Date: Fri, 7 Jul 2000 12:05:56 -0700
From: "Lauren Smith" <lauren_smith13@hotmail.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k59ne$ilf$1@brokaw.wa.com>
Drew Simonis <care227@attglobal.net> wrote in message
news:396620FE.3DB57597@attglobal.net...
> Raphael Pirker wrote:
> >
> > length EXPR
> >
> > length
> >
> > Returns the length in bytes of the value of EXPR. If EXPR is
> > omitted, returns length of $_.
> >
> > and I really don't get how I could check whether an e-mail is over 50k
in
> > size using this information!
>
> ----------------8<----------------------------------8<--------------
> #!/usr/bin/perl -w
>
> use strict;
> undef $/; # undefines the input record seperator, so I can get the
> # whole file into the one variable. Generally you'd do
> # this local to a block, but since this is a one off...
>
> open LENGTH, "$path_to_message_file" or die "barf: $! \n";
>
> # double quotes aren't required with an open() function call when
> # specifying a scalar variable, but I like to keep in the habit of
> # always quoting.
>
> my $slurp = <LENGTH>; # read the file into a scalar (if we wouldn't
> # have undef'd $/, we'd have only gotten the
> # first line of the file.
>
> my $length = length $slurp; #length now has the number of bytes.
>
> if ("$length" > '50000')
Is there a reason you are quoting those things? Wouldn't this work just as
well?
if ($length > 50000)
And if we are only interested in the size of the file in bytes, wouldn't
this work also, without needing to actually read in the file?
if ((-s $path_to_message_file) > 50000)
> ----------------8<----------------------------------8<--------------
>
> and another way would be to use stat.
>
> $ perldoc -f stat (a much better way, IMHO)
Heh, nevermind... :-)
Lauren
------------------------------
Date: Fri, 07 Jul 2000 15:38:01 -0400
From: Drew Simonis <care227@attglobal.net>
Subject: Re: Checking the size of an E-mail...
Message-Id: <39663199.2CB6CC63@attglobal.net>
Lauren Smith wrote:
>
> >
> > if ("$length" > '50000')
>
> Is there a reason you are quoting those things? Wouldn't this work just as
> well?
>
I'm a quote-a-holic. I'd rather overquote than one day forget to
quote something and mess it up =)
------------------------------
Date: Fri, 7 Jul 2000 21:45:51 +0200
From: "Raphael Pirker" <raphaelp@nr1webresource.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k5cb2$bms$16$1@news.t-online.com>
Thanks! I did come about that far. But the problem is that the e-mail is no
file! I can read a file, but I didn't figure out how to read in a "couple of
written bytes". Here's my sendmail code:
open(MAIL,"|$mailprog -t");
print MAIL "To: \"$yourname\" <$admin_recipient>\n";
print MAIL "From: \"$visitor_name\" <$visitor_email>\n";
print MAIL "Subject: $admin_subject\n";
if ($admin_cc ne "false") {
print MAIL "Cc: $admin_cc\n";
};
print MAIL "\n";
if ($include_from eq "1") {
print MAIL "E-Mail sent by $visitor_email\n";
}
print MAIL "$admin_content\n";
print MAIL "$spam_feature\n";
print MAIL "$credits\n";
close(MAIL);
------------------------------
Date: Fri, 07 Jul 2000 16:14:27 -0400
From: Drew Simonis <care227@attglobal.net>
Subject: Re: Checking the size of an E-mail...
Message-Id: <39663A23.B1E389B4@attglobal.net>
Raphael Pirker wrote:
> print MAIL "To: \"$yourname\" <$admin_recipient>\n";
> print MAIL "From: \"$visitor_name\" <$visitor_email>\n";
> print MAIL "Subject: $admin_subject\n";
> if ($admin_cc ne "false") {
> print MAIL "Cc: $admin_cc\n";
> };
> print MAIL "\n";
> if ($include_from eq "1") {
> print MAIL "E-Mail sent by $visitor_email\n";
> }
> print MAIL "$admin_content\n";
> print MAIL "$spam_feature\n";
> print MAIL "$credits\n";
> close(MAIL);
See if I understand this... you want to instead measure the ammount
of data written to the MAIL filehandle?
------------------------------
Date: Sat, 8 Jul 2000 06:31:44 +0200
From: "Raphael Pirker" <raphaelp@nr1webresource.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k6bgo$ojk$18$1@news.t-online.com>
correct!
------------------------------
Date: 8 Jul 2000 13:58:31 +0100
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k78hn$it2$1@orpheus.gellyfish.com>
On Fri, 07 Jul 2000 15:38:01 -0400 Drew Simonis wrote:
> Lauren Smith wrote:
>>
>> >
>> > if ("$length" > '50000')
>>
>> Is there a reason you are quoting those things? Wouldn't this work just as
>> well?
>>
>
> I'm a quote-a-holic. I'd rather overquote than one day forget to
> quote something and mess it up =)
But that particular quoting is doing something that you dont want. You are
doing a numeric comparison - you dont want to stringify the number in
$length - for all I know perl might optimise this away but I would be
willing to stick my neck out and suggest that it is causing unnecessary
work.
A simple rule of thumb is that if what you are dealing is a number then you
dont need or want the quotes. In most *shells* an empty variable will
cause a fatal error if it is not quoted but the work that will happen
in Perl is that (if you asked for the warning of course) you will get the
'use of uninitialized variable' warning so on the whole you can skip the
quotes around variables everywhere except unless you want to explicitly
stringify it for some reason.
/J\
--
yapc::Europe in assocation with the Institute Of Contemporary Arts
<http://www.yapc.org/Europe/> <http://www.ica.org.uk>
------------------------------
Date: 8 Jul 2000 14:23:37 +0100
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: Checking the size of an E-mail...
Message-Id: <8k7a0p$nmn$1@orpheus.gellyfish.com>
On Fri, 7 Jul 2000 21:45:51 +0200 Raphael Pirker wrote:
> Thanks! I did come about that far. But the problem is that the e-mail is no
> file! I can read a file, but I didn't figure out how to read in a "couple of
> written bytes". Here's my sendmail code:
>
> open(MAIL,"|$mailprog -t");
>
> print MAIL "To: \"$yourname\" <$admin_recipient>\n";
> print MAIL "From: \"$visitor_name\" <$visitor_email>\n";
> print MAIL "Subject: $admin_subject\n";
> if ($admin_cc ne "false") {
> print MAIL "Cc: $admin_cc\n";
> };
> print MAIL "\n";
> if ($include_from eq "1") {
> print MAIL "E-Mail sent by $visitor_email\n";
> }
> print MAIL "$admin_content\n";
> print MAIL "$spam_feature\n";
> print MAIL "$credits\n";
> close(MAIL);
>
Well if you want to know how much you have output to MAIL the perhaps
you should put it all in a string first and then use length on that :
my $mail =<<EEEBAGUM;
To: "$yourname" <$admin_recipient>
From: "$visitor_name" <$visitor_email>
Subject: $admin_subject\n
EEEBAGUM
$mail .= "Cc: $admin_cc\n" if ($admin_cc ne 'false');
$mail .= "\n";
$mail .= "E-Mail sent by $visitor_email\n" if ($include_from);
my $mail .=<<EWARWOOWAR;
$admin_content
$spam_feature
$credits
EWARWOOWAR
$length = length $mail;
print MAIL, $mail ;
# etc
/J\
--
yapc::Europe in assocation with the Institute Of Contemporary Arts
<http://www.yapc.org/Europe/> <http://www.ica.org.uk>
------------------------------
Date: Sun, 09 Jul 2000 07:35:17 GMT
From: nospam.newton@gmx.li (Philip 'Yes, that's my address' Newton)
Subject: Re: Checking the size of an E-mail...
Message-Id: <39682741.73699153@news.nikoma.de>
On 8 Jul 2000 14:23:37 +0100, Jonathan Stowe <gellyfish@gellyfish.com>
wrote:
> print MAIL, $mail ;
^
oops. No?
Cheers,
Philip
--
Philip Newton <nospam.newton@gmx.li>
If you're not part of the solution, you're part of the precipitate.
------------------------------
Date: Thu, 6 Jul 2000 08:49:00 +0100
From: "diane lee" <diane_lj_lee@yahoo.co.uk>
Subject: cheer up Tad
Message-Id: <8k1dlb$q7c$1@supernews.com>
Grumpy get it was only a question!
------------------------------
Date: Thu, 6 Jul 2000 07:35:58 -0400
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: cheer up Tad
Message-Id: <slrn8m8rou.ip7.tadmc@magna.metronet.com>
On Thu, 6 Jul 2000 08:49:00 +0100, diane lee <diane_lj_lee@yahoo.co.uk> wrote:
>Grumpy get it was only a question!
It was only an *off-topic* question!
( Where "it" is undefined, since there are no References: )
--
Tad McClellan SGML Consulting
tadmc@metronet.com Perl programming
Fort Worth, Texas
------------------------------
Date: Thu, 06 Jul 2000 16:08:25 GMT
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: cheer up Tad
Message-Id: <Z9295.1497$iP2.143190@news.dircon.co.uk>
On Thu, 6 Jul 2000 07:35:58 -0400, Tad McClellan Wrote:
> On Thu, 6 Jul 2000 08:49:00 +0100, diane lee <diane_lj_lee@yahoo.co.uk> wrote:
>
>>Grumpy get it was only a question!
>
>
> It was only an *off-topic* question!
>
>
> ( Where "it" is undefined, since there are no References: )
>
I didnt see 'it' at all so 'it' didnt exist ....
/J\
------------------------------
Date: Thu, 06 Jul 2000 10:34:22 GMT
From: "gilbert bruyas" <gilbert.bruyas@free.fr>
Subject: Closing a socket with LWP
Message-Id: <OgZ85.2491$L83.5546024@nnrp5.proxad.net>
Hello Everyone
I use this perl script with:
Avec ActivePerl 5.6.0.613
#!c:\perl\bin\perl.exe -w
use LWP::UserAgent;
$browser = new LWP::UserAgent;
$browser->agent("WatchDog");
$browser->timeout(20);
$flag=0;
while () {
$request = new
HTTP::Request('GET',"http://$ARGV[0]/myCgi/wdcgi.pl?host=$ARGV[1]");
$response = $browser->request($request);
......
sleep 30;
}
The connections don't close we must wait for timeout.
I go through a firewall with limited number of connections.
Ex of netstat :
0 TCP nt215303:80 PCKUDRTKUH:3245 TIME_WAIT
1 TCP nt215303:80 PCKUDRTKUH:3247 TIME_WAIT
2 TCP nt215303:80 PCKUDRTKUH:3249 TIME_WAIT
3 TCP nt215303:80 PCKUDRTKUH:3251 TIME_WAIT
The last frame is :
TCP: Flags = 0x11 : .A...F
TCP: ..0..... = No urgent data
TCP: ...1.... = Acknowledgement field significant
TCP: ....0... = No Push function
TCP: .....0.. = No Reset
TCP: ......0. = No Synchronize
TCP: .......1 = No more data from sender
With a browser the last frame is :
TCP: Flags = 0x04 : ...R..
TCP: ..0..... = No urgent data
TCP: ...0.... = Acknowledgement field not significant
TCP: ....0... = No Push function
TCP: .....1.. = Reset the connection
TCP: ......0. = No Synchronize
TCP: .......0 = No Fin
In the netstat the connection goes from ESTABLISHED to closed.
Can i force a reset when LWP close the socket ?
Thank you for your help.
------------------------------
Date: Thu, 06 Jul 2000 01:52:16 GMT
From: salim@cygnos.com
Subject: Closing files connecting a parent to a child
Message-Id: <kDR85.21643$ZI2.1070966@news1.rdc1.on.wave.home.com>
Hello,
I am working on a Perl program that forks children processes. This code
segement is what I use to accmplish this:
use IO::File;
my %files=();
while (1) {
my $fh = new IO::File;
my $pid = open($fh, "|-");
$numOfFiles++;
until ($numOfFiles < 11) { # number of open files connecting parent to
children cannot exceed 10
sleep 2;
&REAPER();
}
if ($pid){ # parent
$files{$pid} = $fh;
$fh->autoflush(1);
print $fh "...parameters to pass to child process for processing";
} else {
exec "aProgram";
}
&REAPER();
}
sub REAPER {
while( my $pid= waitpid(-1, &WNOHANG) ){
if ( $pid == -1 ){
#Ignore it
} elsif(WIFEXITED($?)) {
if( defined $files{$pid} ){
close $files{$pid} || warn "REAPER: Can't close file $files{$pid}:
$!\n";
delete $files{$pid};
$numOfFiles--;
}
} else {
Log "False Alarm on process $pid\n";
}
}
What I want to be able to achieve is to close files that were opened to
connect the parent to the children. Whenever the number of files reaches 10,
REAPER subrouting is called to clean up and close file handles pertaining to
dead children. When this happens, I get the error message "Cant close file:
no child process". Could you please shed some light on what you think I am
doing wrong? Why isn't perl able to close file handles that once connected a
parent to child should the latter die or exit gracefully? and finally how to
remedy the situation?
Thanks in advance
Salim
salim@cygnos.com
------------------------------
Date: 8 Jul 2000 18:17:36 -0700
From: JohnCasey <JohnCasey_member@newsguy.com>
Subject: Comparing fields of two files
Message-Id: <8k8jrg$12uq@drn.newsguy.com>
Hi,
There are two files:
fileA is as follows: fileB is as follows:
718#203#NY# 7036123456
901#257#AL# 2034567892
703#612#KY# 9012348956
........... ............
over 13000 entries
**********************************************************************
Basically, file A contains the state code corresponding to the first six
digits of a phone number. Whereas fileB contains the phone number.
I need to find the corresponding state code for each phone number in
FileB by searching fileA and then create a new File C with the phone number and
the state code.
For eg: 7036123456 MA
2034567892 CT
........
I wrote the following script and it seems to work fine, the only
problem being, it takes a long time to process the files . It took
over 45 minutes to complete and also took a lot of CPU cycles !.
*************
#!/usr/sbin/perl
open(BTN,"btnid") || die "Cannot open btnid file";
open(NPX,"npx") || die "Cannot open npx file" ;
open(BTNNPX,">btnnpx") || die "Cannot open btnnpx file";
while (<BTN>)
{
chomp;
$id = $_ ;
$major = substr("$id",0,3);
$minor = substr("$id",3,3);
$search_id = join("#",$major,$minor);
while (<NPX>)
{
chomp;
if (/$search_id/)
{
($id1,$id2,$state,$junk) = split(/#/,$_) ;
print $search_id,"===",$state,"\n";
close(NPX) || die "Cannot close npx file";
last;
}
}
print BTNNPX "$id $state \n";
open(NPX,"npx") || die "Cannot open npx file";
}
close(BTN);
close(BINNPX);
close(NPX);
******************
I also tried the "grep" method (using grep command) rather than
the "Open File, pattern match, Closefile" method used above.
The change in code was this:
*******
$search_id = join("#",$major,$minor);
$result = `grep $search_id npx` ;
if ( $? != 0 )
{ @arrresult = `grep ^$major npx` ;
$result = $arrresult[0];
}
($id1,$id2,$state,$junk) = split(/#/,$result) ;
print " $search_id = $result","\n";
********
But, even this code seems to take over 30 minutes (slightly better than
the method used previously). Is there a elegant and quicker way
of doing this??
Thanks in advance
John
------------------------------
Date: 9 Jul 2000 03:02:08 GMT
From: ebohlman@netcom.com (Eric Bohlman)
Subject: Re: Comparing fields of two files
Message-Id: <8k8pvg$1hl$3@slb1.atl.mindspring.net>
JohnCasey (JohnCasey_member@newsguy.com) wrote:
: Basically, file A contains the state code corresponding to the first six
: digits of a phone number. Whereas fileB contains the phone number.
: I need to find the corresponding state code for each phone number in
: FileB by searching fileA and then create a new File C with the phone number and
: the state code.
The technical term for this is "joining" the two files (it's sometimes
called a "relational join" and this operation is in fact at the heart of
a relational database system).
: I wrote the following script and it seems to work fine, the only
: problem being, it takes a long time to process the files . It took
: over 45 minutes to complete and also took a lot of CPU cycles !.
Yes, the naive brute-force solution to doing a join is quite time
consuming because its run time is proportional to the product of the
numbers of entries in the two files (it didn't help that in your
implementation one of the files had to be re-opened for every line in the
other one, but that just multiplies the run time by a constant).
What you need to do is read file A into a hash keyed on the phone-number
prefix, and the loop through file B, looking up each number's prefix in
the hash. That will give you a run time proportional to the *sum* of the
numbers of entries in the two files, which should be *much* better.
------------------------------
Date: 08 Jul 2000 23:07:04 EDT
From: abigail@delanet.com (Abigail)
Subject: Re: Comparing fields of two files
Message-Id: <slrn8mfs49.t4j.abigail@alexandra.delanet.com>
JohnCasey (JohnCasey_member@newsguy.com) wrote on MMDIV September
MCMXCIII in <URL:news:8k8jrg$12uq@drn.newsguy.com>:
-- Hi,
--
-- There are two files:
--
-- fileA is as follows: fileB is as follows:
-- 718#203#NY# 7036123456
-- 901#257#AL# 2034567892
-- 703#612#KY# 9012348956
-- ........... ............
-- over 13000 entries
--
-- **********************************************************************
--
-- Basically, file A contains the state code corresponding to the first six
-- digits of a phone number. Whereas fileB contains the phone number.
-- I need to find the corresponding state code for each phone number in
-- FileB by searching fileA and then create a new File C with the phone number
-- the state code.
-- For eg: 7036123456 MA
-- 2034567892 CT
-- ........
Could you describe *how* exactly this is supposed to work? In your
examples, fileA contains '703#612#KY#' and fileB contains '7036123456'.
The output file has '7036123456 MA'.
Where is the MA coming from, and what happened to KY?
Please, people, if you have a problem and want a solution, give a proper
specification of the problem and certainly do define a problem with
vague or unclear examples. Often, people try to describe problems with
examples that don't even match the problem.
Your problem might be a variant of what's answered in the FAQ. However,
the problem is poorly phrased so who can tell?
Abigail
--
$_ = "\x3C\x3C\x45\x4F\x54";
print if s/<<EOT/<<EOT/e;
Just another Perl Hacker
EOT
------------------------------
Date: Sun, 09 Jul 2000 03:20:42 GMT
From: elephant@squirrelgroup.com (jason)
Subject: Re: Comparing fields of two files
Message-Id: <MPG.13d28cce515bc1d98977e@news>
JohnCasey writes ..
> There are two files:
>
>fileA is as follows: fileB is as follows:
>718#203#NY# 7036123456
>901#257#AL# 2034567892
>703#612#KY# 9012348956
>........... ............
> over 13000 entries
>
>**********************************************************************
>
>Basically, file A contains the state code corresponding to the first six
>digits of a phone number. Whereas fileB contains the phone number.
>I need to find the corresponding state code for each phone number in
>FileB by searching fileA and then create a new File C with the phone number and
>the state code.
>For eg: 7036123456 MA
> 2034567892 CT
> ........
>
>
>
>I wrote the following script and it seems to work fine, the only
>problem being, it takes a long time to process the files . It took
>over 45 minutes to complete and also took a lot of CPU cycles !.
the large time is taken because you're doing a regex (a string
operation) for every entry in fileA .. and you're doing it without any
anchors .. so the regex doesn't just check the start of the string - but
a number of characters in the string - each time .. for every line in
fileA times the number of lines in fileB .. very inefficient
so .. first step would be to anchor the regex to the beginning of the
string .. ie.
if (/^$search_id/)
however you can do better than that .. you're not actually using any
regular expression .. and there's a better way to do a string comparison
than that
if( $search_id eq substr( $_, 0, 6))
but I think you're still going about it all wrong .. because you're
still processing every single line of your target file - when
essentially you're dealing with numerical figures
you haven't said how big fileB is .. but it's only a few bytes per line
.. so it'd have to be pretty huge to take up any significant amount of
RAM
so I'd pre-process fileB before you started - storing each value into an
array and then sorting it into numerical order .. then I'd take each
string from fileA and make it into a 10 digit number
$num = substr( $fileALine, 0, 3). substr( $fileALine, 4, 3) * 10000;
then search for it in your array .. craft your own search routine ..
assuming you've got an even spread of phone numbers then doing a binary
search will work fairly ideally here .. it will be slightly different
from a normal one because you'll probably want to find a bunch of phone
numbers .. eg. 12345460001, 1234560002 etc. .. so your test will need to
take that into account by testing whether values are at least 10000 more
than your search value
you'll get to the point where what you're left with is two indexes ..
which encompass all values between 1234560000 and 1234569999 inclusive
.. obviously then you'd output all those values with the current state
code
without your input files it's hard to give you metrics on the above
method .. but it should be fairly quick (especially in comparison with
the original) .. you were searching through each entry in fileA for each
entry in fileB .. ie. you were doing
( # lines in fileB ) times ( # lines in fileA ) string comparisons
instead - you're now doing one numerical sort of all the items in fileB
.. then you're doing
( # lines in fileA ) binary searches
which are doing numeric comparisons .. and will end up performing MUCH
less than ( # lines in fileB ) numbers of tests
the trade off is memory .. you will need a chunk of memory to store all
the fileB values in - but it shouldn't be too large (depending on how
large fileB is and how Perl stores the values in an array)
--
jason - elephant@squirrelgroup.com -
------------------------------
Date: Sun, 09 Jul 2000 03:29:49 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: Comparing fields of two files
Message-Id: <3967F1EF.49E1D965@rochester.rr.com>
JohnCasey wrote:
>
> Hi,
>
> There are two files:
>
> fileA is as follows: fileB is as follows:
> 718#203#NY# 7036123456
> 901#257#AL# 2034567892
> 703#612#KY# 9012348956
> ........... ............
> over 13000 entries
>
> **********************************************************************
>
> Basically, file A contains the state code corresponding to the first six
> digits of a phone number. Whereas fileB contains the phone number.
> I need to find the corresponding state code for each phone number in
> FileB by searching fileA and then create a new File C with the phone number and
> the state code.
> For eg: 7036123456 MA
> 2034567892 CT
> ........
>
> I wrote the following script and it seems to work fine, the only
> problem being, it takes a long time to process the files . It took
> over 45 minutes to complete and also took a lot of CPU cycles !.
...
> But, even this code seems to take over 30 minutes (slightly better than
> the method used previously). Is there a elegant and quicker way
> of doing this??
>
> Thanks in advance
> John
Wow. You must have a pretty fast system if your algorithm ran that
fast! What you need to do is to make a hash out of the first six digits
and the state code, then run through the phone number list. Something
like:
use strict;
my %state=();
open FILEA,"filea" or die "Oops, couldn't open filea, $!\n";
while(<FILEA>){
if(/^(\d{3})#(\d{3})#(\w{2})/){
$state{"$1$2"}=$3;
}
}
close FILEA or die "Oops, couldn't close filea, $!\n";
open FILEB,"fileb" or die "Oops, couldn't open fileb, $!\n";
while(<FILEB>){
chomp;
if(/^(\d{6})/){
print "$_ $state{$1}\n";
}
}
close FILEB or die "Oops, couldn't close fileb, $!\n";
That should run in a few seconds for 13000 records. You might want to
fancy it up with errors for any records that don't match, or phone
numbers that don't have a matching entry in the state hash.
--
Bob Walton
------------------------------
Date: Sun, 09 Jul 2000 03:50:42 GMT
From: kcivey@cpcug.org (Keith Calvert Ivey)
Subject: Re: Comparing fields of two files
Message-Id: <3968f5be.53167226@nntp.idsonline.com>
elephant@squirrelgroup.com (jason) wrote:
>so I'd pre-process fileB before you started - storing each value into an
>array and then sorting it into numerical order .. then I'd take each
>string from fileA and make it into a 10 digit number
>
> $num = substr( $fileALine, 0, 3). substr( $fileALine, 4, 3) * 10000;
>
>then search for it in your array .. craft your own search routine ..
>assuming you've got an even spread of phone numbers then doing a binary
>search will work fairly ideally here .. it will be slightly different
>from a normal one because you'll probably want to find a bunch of phone
>numbers .. eg. 12345460001, 1234560002 etc. .. so your test will need to
>take that into account by testing whether values are at least 10000 more
>than your search value
What's the point of all that? Just use a hash, as other have
suggested, mapping each combination of six digits to a state.
There's no need to write your own search routines when Perl has
them built in.
--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----
------------------------------
Date: Sun, 09 Jul 2000 05:27:07 GMT
From: elephant@squirrelgroup.com (jason)
Subject: Re: Comparing fields of two files
Message-Id: <MPG.13d2aa707491474a989780@news>
Keith Calvert Ivey writes ..
>elephant@squirrelgroup.com (jason) wrote:
>
>>so I'd pre-process fileB before you started - storing each value into an
>>array and then sorting it into numerical order .. then I'd take each
>>string from fileA and make it into a 10 digit number
>>
>> $num = substr( $fileALine, 0, 3). substr( $fileALine, 4, 3) * 10000;
>>
>>then search for it in your array .. craft your own search routine ..
>>assuming you've got an even spread of phone numbers then doing a binary
>>search will work fairly ideally here .. it will be slightly different
>>from a normal one because you'll probably want to find a bunch of phone
>>numbers .. eg. 12345460001, 1234560002 etc. .. so your test will need to
>>take that into account by testing whether values are at least 10000 more
>>than your search value
>
>What's the point of all that? Just use a hash, as other have
>suggested, mapping each combination of six digits to a state.
the point ? .. speed .. and it's not really 'all that' .. you probably
spent longer reading it than it would take to code it
>There's no need to write your own search routines when Perl has
>them built in.
there's no 'need' to make any ammendments to the code - it does work ..
the problem as stated was a performance issue .. I just posted what I
would do as a suggested improvement to the efficiency of the process
without the input data to test I can't say for certain .. and I don't
have a good enough understanding of the hash internals to make a
worthwhile comparison .. but a binary search through ordered numeric
data is pretty fast - and may be faster than a string lookup in a 13000
member hash
--
jason - elephant@squirrelgroup.com -
------------------------------
Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V9 Issue 3554
**************************************