[29905] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1148 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 3 03:15:14 2008

Date: Thu, 3 Jan 2008 00:15:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 3 Jan 2008     Volume: 11 Number: 1148

Today's topics:
        compare 2 data files and extract fields for matched lin <srigowrisn@hotmail.com>
    Re: compare 2 data files and extract fields for matched <dn.perl@gmail.com>
    Re: compare 2 data files and extract fields for matched <simon.chao@fmr.com>
    Re: compare 2 data files and extract fields for matched <jurgenex@hotmail.com>
    Re: compare 2 data files and extract fields for matched <someone@example.com>
    Re: compare 2 data files and extract fields for matched <srigowrisn@hotmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 27 Dec 2007 08:25:13 -0800 (PST)
From: shree <srigowrisn@hotmail.com>
Subject: compare 2 data files and extract fields for matched lines
Message-Id: <ed7d3156-ac64-4056-a9c7-b887983dbc9a@p69g2000hsa.googlegroups.com>

Hello Friends,

The best way to describe what I'm trying to do is through an example.
I have 2 pipe delimited input files and want to extract a field from
file 2 and append it to file 1.  Note I would like Output file to have
the same number of rows as Input File 1, with an additional field
whose value if present in file 2, should be inserted in this new
field. If its not present, then insert '0000'.

Input File 1 (zipcode, city, state, county)
36003|Autaugaville|AL|AUTAUGA
36006|Billingsley|AL|AUTAUGA
72314|Birdeye|AR|CROSS
72324|Cherry Valley|AR|CROSS
57437|Eureka|SD|MCPHERSON
67460|Mc Pherson|KS|MCPHERSON
67464|Marquette|KS|MCPHERSON
69167|Tryon|NE|MCPHERSON
 ..
 ..
Input File 2 (county, state, county population)
AUTAUGA|AL|49730
CROSS|AR|19056
MCPHERSON|KS|29380
 ..

Desired Output (zipcode, city, state, county, county population)
36003|Autaugaville|AL|AUTAUGA|49730
36006|Billingsley|AL|AUTAUGA|49730
72314|Birdeye|AR|CROSS|19056
72324|Cherry Valley|AR|CROSS|19056
57437|Eureka|SD|MCPHERSON|0000
67460|Mc Pherson|KS|MCPHERSON|29380
67464|Marquette|KS|MCPHERSON|29380
69167|Tryon|NE|MCPHERSON|0000

---
I wrote the program below but it has logic error. Instead of getting
the above, I get the following.

Any guidance with fixing the code or perhaps a better way to do this
is really appreciated. The above is just a few lines from my real
input files, which are considerably larger.

Thank you and best wishes,
Shree


36003|Autaugaville|AL|AUTAUGA|49730
36006|Billingsley|AL|AUTAUGA|49730
72314|Birdeye|AR|CROSS|0000
72324|Cherry Valley|AR|CROSS|0000
57437|Eureka|SD|MCPHERSON|0000
67460|Mc Pherson|KS|MCPHERSON|0000
67464|Marquette|KS|MCPHERSON|0000
69167|Tryon|NE|MCPHERSON|0000
36003|Autaugaville|AL|AUTAUGA|0000
36006|Billingsley|AL|AUTAUGA|0000
72314|Birdeye|AR|CROSS|19056
72324|Cherry Valley|AR|CROSS|19056
57437|Eureka|SD|MCPHERSON|0000
67460|Mc Pherson|KS|MCPHERSON|0000
67464|Marquette|KS|MCPHERSON|0000
69167|Tryon|NE|MCPHERSON|0000
36003|Autaugaville|AL|AUTAUGA|0000
36006|Billingsley|AL|AUTAUGA|0000
72314|Birdeye|AR|CROSS|0000
72324|Cherry Valley|AR|CROSS|0000
57437|Eureka|SD|MCPHERSON|0000
67460|Mc Pherson|KS|MCPHERSON|29380
67464|Marquette|KS|MCPHERSON|29380
69167|Tryon|NE|MCPHERSON|0000

---------------------------------------------------------------------
#!/usr/bin/perl

use strict;
my $File_In1 = "dat1.txt";
my $File_In2 = "dat2.txt";
my (@array1, @array2) = ();
my ($line1, $line2) = "";
my ($zip, $city, $state, $county) = "";
my ($county2, $state2, $pop) = "";

open (FILE_IN1, $File_In1) or die "cannot open file in FILE_IN1 $!";
@array1 = <FILE_IN1>;
close (FILE_IN1);

open (FILE_IN2, $File_In2) or die "cannot open file in FILE_IN2 $!";
@array2 = <FILE_IN2>;
close (FILE_IN2);

foreach $line2 (@array2) {
	chomp ($line2);
	($county2, $state2, $pop) = split (/\|/, $line2);
	foreach $line1 (@array1) {
	chomp ($line1);
	($zip, $city, $state, $county) = split (/\|/, $line1);
		if  (($county2 eq $county)  && ($state2 eq $state)) {
			print "$zip|$city|$state|$county|$pop\n";
		} else {
			print "$zip|$city|$state|$county|0000\n";
		}
	}
}


------------------------------

Date: Thu, 27 Dec 2007 08:50:33 -0800 (PST)
From: "dn.perl@gmail.com" <dn.perl@gmail.com>
Subject: Re: compare 2 data files and extract fields for matched lines
Message-Id: <a237a0a1-1e49-462f-862c-d678381f05ea@e10g2000prf.googlegroups.com>

On Dec 27, shree <srigowr...@hotmail.com> wrote:
>
>
> open (FILE_IN1, $File_In1) or die "cannot open file in FILE_IN1 $!";
> @array1 = <FILE_IN1>;
> close (FILE_IN1);
>

Start with two sample files of 2-3 lines each.
Make sure it is okay to read the entire file
in an array by printing the value of @array1
and printing the value of $line1 after running
foreach $line1 (@array1); and read the files
line by line instead of reading them all at once.
Something like :
while ($line1 = <FILE_IN1>)  { statements }.
As part of debugging the code, print out the value
of $line1 to see for yourself what strings or arrays
your code is dealing with.



------------------------------

Date: Thu, 27 Dec 2007 09:03:36 -0800 (PST)
From: nolo contendere <simon.chao@fmr.com>
Subject: Re: compare 2 data files and extract fields for matched lines
Message-Id: <cd78ca24-a8ab-4139-8bb7-a1e937d4f884@q77g2000hsh.googlegroups.com>

On Dec 27, 11:25=A0am, shree <srigowr...@hotmail.com> wrote:
> Hello Friends,
>
> The best way to describe what I'm trying to do is through an example.
> I have 2 pipe delimited input files and want to extract a field from
> file 2 and append it to file 1. =A0Note I would like Output file to have
> the same number of rows as Input File 1, with an additional field
> whose value if present in file 2, should be inserted in this new
> field. If its not present, then insert '0000'.
>
> Input File 1 (zipcode, city, state, county)
> 36003|Autaugaville|AL|AUTAUGA
> 36006|Billingsley|AL|AUTAUGA
> 72314|Birdeye|AR|CROSS
> 72324|Cherry Valley|AR|CROSS
> 57437|Eureka|SD|MCPHERSON
> 67460|Mc Pherson|KS|MCPHERSON
> 67464|Marquette|KS|MCPHERSON
> 69167|Tryon|NE|MCPHERSON
> ..
> ..
> Input File 2 (county, state, county population)
> AUTAUGA|AL|49730
> CROSS|AR|19056
> MCPHERSON|KS|29380
> ..

stick this info in a hash.

so, start your script:

#!/usr/bin/perl

use strict; use warnings;

my %pop;
open my $fh2, '<', $file2 or die "can't open '$file2': $!\n";
while ( <$fh2> ) {
    chomp;
    my ( $county, $state, $population ) =3D split /\|/;
    $pop{"$state|$county"} =3D $population;
}
close $fh2;

# now you just need to open theother file, and do a lookup on the
"$state|$county" key
# if it's not there, append your 0000;

open my $fh1, '<', $file1 or die "can't open '$file1': $!\n";
while ( <$fh1> ) {
    chomp;
    my ( $zip, $city, $state, $county ) =3D split /\|/;
    if ( $pop{"$state|$county"} ) {
        print "$_|".$pop{"$state|$county"}."\n";
    }
    else {
        print "$_|0000\n";
    }
}
close $fh1;


__END__

* Note: untested

>
> Desired Output (zipcode, city, state, county, county population)
> 36003|Autaugaville|AL|AUTAUGA|49730
> 36006|Billingsley|AL|AUTAUGA|49730
> 72314|Birdeye|AR|CROSS|19056
> 72324|Cherry Valley|AR|CROSS|19056
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|29380
> 67464|Marquette|KS|MCPHERSON|29380
> 69167|Tryon|NE|MCPHERSON|0000
>
> ---
> I wrote the program below but it has logic error. Instead of getting
> the above, I get the following.
>
> Any guidance with fixing the code or perhaps a better way to do this
> is really appreciated. The above is just a few lines from my real
> input files, which are considerably larger.
>
> Thank you and best wishes,
> Shree
>
> 36003|Autaugaville|AL|AUTAUGA|49730
> 36006|Billingsley|AL|AUTAUGA|49730
> 72314|Birdeye|AR|CROSS|0000
> 72324|Cherry Valley|AR|CROSS|0000
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|0000
> 67464|Marquette|KS|MCPHERSON|0000
> 69167|Tryon|NE|MCPHERSON|0000
> 36003|Autaugaville|AL|AUTAUGA|0000
> 36006|Billingsley|AL|AUTAUGA|0000
> 72314|Birdeye|AR|CROSS|19056
> 72324|Cherry Valley|AR|CROSS|19056
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|0000
> 67464|Marquette|KS|MCPHERSON|0000
> 69167|Tryon|NE|MCPHERSON|0000
> 36003|Autaugaville|AL|AUTAUGA|0000
> 36006|Billingsley|AL|AUTAUGA|0000
> 72314|Birdeye|AR|CROSS|0000
> 72324|Cherry Valley|AR|CROSS|0000
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|29380
> 67464|Marquette|KS|MCPHERSON|29380
> 69167|Tryon|NE|MCPHERSON|0000
>
> ---------------------------------------------------------------------
> #!/usr/bin/perl
>
> use strict;
> my $File_In1 =3D "dat1.txt";
> my $File_In2 =3D "dat2.txt";
> my (@array1, @array2) =3D ();
> my ($line1, $line2) =3D "";
> my ($zip, $city, $state, $county) =3D "";
> my ($county2, $state2, $pop) =3D "";
>
> open (FILE_IN1, $File_In1) or die "cannot open file in FILE_IN1 $!";
> @array1 =3D <FILE_IN1>;
> close (FILE_IN1);
>
> open (FILE_IN2, $File_In2) or die "cannot open file in FILE_IN2 $!";
> @array2 =3D <FILE_IN2>;
> close (FILE_IN2);
>
> foreach $line2 (@array2) {
> =A0 =A0 =A0 =A0 chomp ($line2);
> =A0 =A0 =A0 =A0 ($county2, $state2, $pop) =3D split (/\|/, $line2);
> =A0 =A0 =A0 =A0 foreach $line1 (@array1) {
> =A0 =A0 =A0 =A0 chomp ($line1);
> =A0 =A0 =A0 =A0 ($zip, $city, $state, $county) =3D split (/\|/, $line1);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if =A0(($county2 eq $county) =A0&& ($state=
2 eq $state)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 print "$zip|$city|$state|$=
county|$pop\n";
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 print "$zip|$city|$state|$=
county|0000\n";
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 }
>
> }
>
>



------------------------------

Date: Thu, 27 Dec 2007 17:29:40 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: compare 2 data files and extract fields for matched lines
Message-Id: <sln7n3t26stn7augdb7t1290pheb830kns@4ax.com>

On shree <srigowrisn@hotmail.com> wrote:
>The best way to describe what I'm trying to do is through an example.

That helps somewhat.

>I have 2 pipe delimited input files and want to extract a field from
>file 2 and append it to file 1.  Note I would like Output file to have
>the same number of rows as Input File 1, with an additional field
>whose value if present in file 2, should be inserted in this new
>field. If its not present, then insert '0000'.

You forgot to mention and it is not clear from your example _which field_ is
the link between those 2 files.  

>Input File 1 (zipcode, city, state, county)
>36003|Autaugaville|AL|AUTAUGA
>36006|Billingsley|AL|AUTAUGA
>..
>..
>Input File 2 (county, state, county population)
>AUTAUGA|AL|49730
>CROSS|AR|19056
>MCPHERSON|KS|29380
>..
>
>Desired Output (zipcode, city, state, county, county population)
>36003|Autaugaville|AL|AUTAUGA|49730
>36006|Billingsley|AL|AUTAUGA|49730
>72314|Birdeye|AR|CROSS|19056
>
>---
>I wrote the program below but it has logic error. Instead of getting
>the above, I get the following.
>
>Any guidance with fixing the code or perhaps a better way to do this
>is really appreciated. The above is just a few lines from my real

[attempt with 2 arrays and nested loops snipped]

There is a much easier approach:
- read file 2 into a hash, using the link between the 2 files as the key and
the desired number as the value in each hash entry.
- then read file 1 line by line and if the key exists then write the line
with the hash value to the new file, otherwise write the line with 0000
appended to the new file.

Not only is this much easier to comprehend, it is also much faster with
O(n+m) instead of O(n*m).

jue


------------------------------

Date: Fri, 28 Dec 2007 05:43:00 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: compare 2 data files and extract fields for matched lines
Message-Id: <E10dj.42582$UZ4.14301@edtnps89>

shree wrote:
> 
> The best way to describe what I'm trying to do is through an example.
> I have 2 pipe delimited input files and want to extract a field from
> file 2 and append it to file 1.  Note I would like Output file to have
> the same number of rows as Input File 1, with an additional field
> whose value if present in file 2, should be inserted in this new
> field. If its not present, then insert '0000'.
> 
> Input File 1 (zipcode, city, state, county)
> 36003|Autaugaville|AL|AUTAUGA
> 36006|Billingsley|AL|AUTAUGA
> 72314|Birdeye|AR|CROSS
> 72324|Cherry Valley|AR|CROSS
> 57437|Eureka|SD|MCPHERSON
> 67460|Mc Pherson|KS|MCPHERSON
> 67464|Marquette|KS|MCPHERSON
> 69167|Tryon|NE|MCPHERSON
> ..
> ..
> Input File 2 (county, state, county population)
> AUTAUGA|AL|49730
> CROSS|AR|19056
> MCPHERSON|KS|29380
> ..
> 
> Desired Output (zipcode, city, state, county, county population)
> 36003|Autaugaville|AL|AUTAUGA|49730
> 36006|Billingsley|AL|AUTAUGA|49730
> 72314|Birdeye|AR|CROSS|19056
> 72324|Cherry Valley|AR|CROSS|19056
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|29380
> 67464|Marquette|KS|MCPHERSON|29380
> 69167|Tryon|NE|MCPHERSON|0000
> 
> ---
> I wrote the program below but it has logic error. Instead of getting
> the above, I get the following.
> 
> Any guidance with fixing the code or perhaps a better way to do this
> is really appreciated. The above is just a few lines from my real
> input files, which are considerably larger.
> 
> Thank you and best wishes,
> Shree
> 
> 
> 36003|Autaugaville|AL|AUTAUGA|49730
> 36006|Billingsley|AL|AUTAUGA|49730
> 72314|Birdeye|AR|CROSS|0000
> 72324|Cherry Valley|AR|CROSS|0000
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|0000
> 67464|Marquette|KS|MCPHERSON|0000
> 69167|Tryon|NE|MCPHERSON|0000
> 36003|Autaugaville|AL|AUTAUGA|0000
> 36006|Billingsley|AL|AUTAUGA|0000
> 72314|Birdeye|AR|CROSS|19056
> 72324|Cherry Valley|AR|CROSS|19056
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|0000
> 67464|Marquette|KS|MCPHERSON|0000
> 69167|Tryon|NE|MCPHERSON|0000
> 36003|Autaugaville|AL|AUTAUGA|0000
> 36006|Billingsley|AL|AUTAUGA|0000
> 72314|Birdeye|AR|CROSS|0000
> 72324|Cherry Valley|AR|CROSS|0000
> 57437|Eureka|SD|MCPHERSON|0000
> 67460|Mc Pherson|KS|MCPHERSON|29380
> 67464|Marquette|KS|MCPHERSON|29380
> 69167|Tryon|NE|MCPHERSON|0000
> 
> ---------------------------------------------------------------------
> #!/usr/bin/perl
> 
> use strict;
> my $File_In1 = "dat1.txt";
> my $File_In2 = "dat2.txt";
> my (@array1, @array2) = ();

That is the same as:

my @array1 = ();
my @array2;

> my ($line1, $line2) = "";

That is the same as:

my $line1 = "";
my $line2;

> my ($zip, $city, $state, $county) = "";

That is the same as:

my $zip = "";
my ($city, $state, $county);

> my ($county2, $state2, $pop) = "";

That is the same as:

my $county2 = "";
my ($state2, $pop);

Anyway, you should declare your variables in the smallest possible scope 
instead of all at the top of the file.


> open (FILE_IN1, $File_In1) or die "cannot open file in FILE_IN1 $!";
> @array1 = <FILE_IN1>;
> close (FILE_IN1);
> 
> open (FILE_IN2, $File_In2) or die "cannot open file in FILE_IN2 $!";
> @array2 = <FILE_IN2>;
> close (FILE_IN2);
> 
> foreach $line2 (@array2) {
> 	chomp ($line2);
> 	($county2, $state2, $pop) = split (/\|/, $line2);
> 	foreach $line1 (@array1) {
> 	chomp ($line1);
> 	($zip, $city, $state, $county) = split (/\|/, $line1);
> 		if  (($county2 eq $county)  && ($state2 eq $state)) {
> 			print "$zip|$city|$state|$county|$pop\n";
> 		} else {
> 			print "$zip|$city|$state|$county|0000\n";
> 		}
> 	}
> }

Try it like this:

#!/usr/bin/perl
use warnings;
use strict;

my $File_In1 = 'dat1.txt';
my $File_In2 = 'dat2.txt';

open FILE_IN2, '<', $File_In2 or die "cannot open '$File_In2' $!";

my %population;
while ( <FILE_IN2> ) {
     my ( $county, $state, $pop ) = /\A([^|]+)\|([^|]+)\|(\d+)\Z/;
     $population{ "$state|$county" } = $pop;
     }

close FILE_IN2;

open FILE_IN1, '<', $File_In1 or die "cannot open '$File_In1' $!";

while ( my $line = <FILE_IN1> ) {
     chomp $line;
     my ( $key ) = $line =~ /\|([^|]+\|[^|]+)\z/;
     print "$line|", $population{ $key } || '0000', "\n";
     }

close FILE_IN1;

__END__



John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Sat, 29 Dec 2007 13:45:53 -0800 (PST)
From: shree <srigowrisn@hotmail.com>
Subject: Re: compare 2 data files and extract fields for matched lines
Message-Id: <8189b706-72f3-4f3d-aa18-e17325bdca60@v4g2000hsf.googlegroups.com>


>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> my $File_In1 = 'dat1.txt';
> my $File_In2 = 'dat2.txt';
>
> open FILE_IN2, '<', $File_In2 or die "cannot open '$File_In2' $!";
>
> my %population;
> while ( <FILE_IN2> ) {
>      my ( $county, $state, $pop ) = /\A([^|]+)\|([^|]+)\|(\d+)\Z/;
>      $population{ "$state|$county" } = $pop;
>      }
>
> close FILE_IN2;
>
> open FILE_IN1, '<', $File_In1 or die "cannot open '$File_In1' $!";
>
> while ( my $line = <FILE_IN1> ) {
>      chomp $line;
>      my ( $key ) = $line =~ /\|([^|]+\|[^|]+)\z/;
>      print "$line|", $population{ $key } || '0000', "\n";
>      }
>
> close FILE_IN1;
>
> __END__
>
> John

Dear all,

Thanks for showing me how to do this. And an added thanks to John for
teaching good programming techniques in perl.

I was able to literally use the above and it worked like a charm.

Shree




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1148
***************************************


home help back first fref pref prev next nref lref last post