[33083] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4359 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 29 09:09:18 2015

Date: Thu, 29 Jan 2015 06:09:04 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 29 Jan 2015     Volume: 11 Number: 4359

Today's topics:
    Re: Compare 2 Hash of Hashes using sub-hash values <fmassion@web.de>
    Re: Compare 2 Hash of Hashes using sub-hash values <fmassion@web.de>
    Re: Compare 2 Hash of Hashes using sub-hash values <fmassion@web.de>
    Re: Compare 2 Hash of Hashes using sub-hash values <gravitalsun@hotmail.foo>
    Re: Compare 2 Hash of Hashes using sub-hash values <ben.usenet@bsb.me.uk>
    Re: Compare 2 Hash of Hashes using sub-hash values <fmassion@web.de>
    Re: Compare 2 Hash of Hashes using sub-hash values <ben.usenet@bsb.me.uk>
    Re: Global, Local, Static <see.my.sig@for.my.address>
    Re: Why are these brackets necessary? <see.my.sig@for.my.address>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 28 Jan 2015 22:47:59 -0800 (PST)
From: F Massion <fmassion@web.de>
Subject: Re: Compare 2 Hash of Hashes using sub-hash values
Message-Id: <e3365e5f-34ee-41d0-98df-ce154dafc33f@googlegroups.com>

Am Mittwoch, 28. Januar 2015 16:12:02 UTC+1 schrieb George Mpouras:
> On 28/1/2015 11:56, F Massion wrote:
> > I am not particularly good at Perl and do not get any further with the following:
> > I want to compare 2 relatively large tables with tab-separated columns which I have imported each in a Hash of Hashes.
> >
> 
> 
> # done !
> 
> 
> use Data::Dumper;$Data::Dumper::Sortkeys=1;
> 
> # Fill up the hashes
> my %hash1;
> my %hash2;
> for my $k1 (1..100) {for('a'..'c') {$hash1{$k1}->{$_}=1}}
> for my $k1 (1..100) {for('a'..'c') {$hash2{$k1}->{$_}=1}}
> 
> print "Hashes are ",
> Dumper(\%hash1) eq Dumper(\%hash2) ? 'same':'different'

I have run your version and got the message "Hashes are same" although the tables are different.
But I am not sure using Dumper is useful here, because I need a a table as an Output (in the same Format as the original tab-delimited files).


------------------------------

Date: Wed, 28 Jan 2015 22:59:04 -0800 (PST)
From: F Massion <fmassion@web.de>
Subject: Re: Compare 2 Hash of Hashes using sub-hash values
Message-Id: <6d0b9902-3deb-4196-8947-2b3cb813d3dd@googlegroups.com>

Am Donnerstag, 29. Januar 2015 02:20:33 UTC+1 schrieb Ben Bacarisse:
> F Massion <fmassion@web.de> writes:
>=20
> > I am not particularly good at Perl and do not get any further with the
> > following: I want to compare 2 relatively large tables with
> > tab-separated columns which I have imported each in a Hash of Hashes.
> >
> > File #1:
> > RecordID	Name	Language	Status
> > 715	surface technology	English	=20
> > 763	mineral abrasive	English	validated
> > 796	Zinkstaubgrundierung	German	validated
> >
> > File #2:
> > RecordID	Name	Language	Status
> > 763	mineral abrasive	English	not validated
> > 813	2-component prime coat	English	not validated
> > 815	finishing coat	English	=20
> >
> > I would like to use the value of certain columns to find out what are
> > the differences or the common entries between these 2 files.
> <snip>
> > This is my code:
> > #!/usr/bin/perl -w
> > use warnings; use strict;
> > open(FIRSTFILE, $ARGV[0]) || die("file $ARGV[0] cannot be opened!\n");
> > my %Firstfile =3D ();
> > while (<FIRSTFILE>)=20
> > {
> >     chomp;
> >     my ($RecordID, $Name, $Language, $Status) =3D split ('\t');
> >     $Firstfile{$RecordID}{'Name'} =3D $Name;
> >     $Firstfile{$RecordID}{'Language'} =3D $Language;
> >     $Firstfile{$RecordID}{'Status'} =3D $Status;
> >     print "$RecordID -- $Firstfile{$RecordID}{'Name'} :: $Firstfile{$Re=
cordID}{'Language'}\n" ;
> > }
> > open(SECONDFILE, $ARGV[1]) || die("file $ARGV[1] cannot be opened!\n");
> > my %Secondfile =3D ();
> > while (<SECONDFILE>)=20
> > {
> >     chomp;
> >     my ($RecordID, $Name, $Language, $Status) =3D split ('\t');
> >     $Secondfile{$RecordID}{'Name'} =3D $Name;
> >     $Secondfile{$RecordID}{'Language'} =3D $Language;
> >     $Secondfile{$RecordID}{'Status'} =3D $Status;
> >     print "$RecordID -- $Secondfile{$RecordID}{'Name'} :: $Secondfile{$=
RecordID}{'Status'}\n";
> > }
> > #QUESTIONS
> > #(1) Why do I lose the variables from the 2 loops above? I.e. I cannot
> > re-use $RecordID or $Name...
>=20
> These are declared as local variables inside the body of the loop.  You
> do that because you don't want the variables to be available outside.
> You could more the declaration to outside the loop, but why?  What do
> you want them for?
>=20

My aim is to fine-tune the comparison of these two tables. In real life the=
se are exports from terminology databases with thousands of records and Att=
ribute values. Thus I would like to say for example: "Which records in File=
#2 have the same RecordID as in File#2 but a different Status. Please Outpu=
t only the Record ID, the Name and the Status as a tab-delimited file". Thi=
s file can then be processed by the terminology database. In order to do th=
at type of oparation I need a fine granularity than $inkey / $outkey + $com=
pound_outvalue.


> > #(2) How can I access only certain variables from my table (to compare
> > or print), e.g. $Language or $Status
>=20
> I don't understand "variables from my table".  The table contains data
> which you access using keys, and the code shows you know how to do this.
> Can you give an example of what you want to to do and how you tried to
> do it?
>=20

See above. It is a granularity issue. The value of the outerkey seems to co=
ntain all fields in my table alltogether (Name, Language and Status in my e=
xample). But maybe i do not understand properly the functioning of HoH.

> > my $innerkey;
> > my $outerkey;
> >    foreach $innerkey ( keys %Firstfile ) {
> >    	if ($Firstfile{$innerkey}{'Status'} ne $Secondfile{$innerkey}{'Stat=
us'}){=20
> >    print "$innerkey:\:" ;
> >    foreach $outerkey ( keys %{$Firstfile{$innerkey}} ) {
> >    print "$Firstfile{$innerkey}{$outerkey}::" ;
> >    }
> >    print "\n" ;
> >  }}
> > close FIRSTFILE;
> > close SECONDFILE;
>=20
> <snip>
> --=20
> Ben.



------------------------------

Date: Wed, 28 Jan 2015 22:59:23 -0800 (PST)
From: F Massion <fmassion@web.de>
Subject: Re: Compare 2 Hash of Hashes using sub-hash values
Message-Id: <d84e408b-ab2f-49b1-890d-d25830fdd5c4@googlegroups.com>

Am Mittwoch, 28. Januar 2015 10:56:06 UTC+1 schrieb F Massion:
> I am not particularly good at Perl and do not get any further with the following:
> I want to compare 2 relatively large tables with tab-separated columns which I have imported each in a Hash of Hashes.
> 
> File #1:
> RecordID	Name	Language	Status
> 715	surface technology	English	 
> 763	mineral abrasive	English	validated
> 796	Zinkstaubgrundierung	German	validated
> 
> File #2:
> RecordID	Name	Language	Status
> 763	mineral abrasive	English	not validated
> 813	2-component prime coat	English	not validated
> 815	finishing coat	English	 
> 
> I would like to use the value of certain columns to find out what are the differences or the common entries between these 2 files.
> 
> My questions are (see comment in code):
> (1) Why do I lose the variables from the 2 HoHs after I have read them (i.e. I cannot re-use $RecordID or $Name...)
> (2) How can I access only certain variables from my table (to compare or print), e.g. $Language or $Status
> 
> This is my code:
> #!/usr/bin/perl -w
> use warnings; use strict;
> open(FIRSTFILE, $ARGV[0]) || die("file $ARGV[0] cannot be opened!\n");
> my %Firstfile = ();
> while (<FIRSTFILE>) 
> {
>     chomp;
>     my ($RecordID, $Name, $Language, $Status) = split ('\t');
>     $Firstfile{$RecordID}{'Name'} = $Name;
>     $Firstfile{$RecordID}{'Language'} = $Language;
>     $Firstfile{$RecordID}{'Status'} = $Status;
>     print "$RecordID -- $Firstfile{$RecordID}{'Name'} :: $Firstfile{$RecordID}{'Language'}\n" ;
> }
> open(SECONDFILE, $ARGV[1]) || die("file $ARGV[1] cannot be opened!\n");
> my %Secondfile = ();
> while (<SECONDFILE>) 
> {
>     chomp;
>     my ($RecordID, $Name, $Language, $Status) = split ('\t');
>     $Secondfile{$RecordID}{'Name'} = $Name;
>     $Secondfile{$RecordID}{'Language'} = $Language;
>     $Secondfile{$RecordID}{'Status'} = $Status;
>     print "$RecordID -- $Secondfile{$RecordID}{'Name'} :: $Secondfile{$RecordID}{'Status'}\n";
> }
> #QUESTIONS
> #(1) Why do I lose the variables from the 2 loops above? I.e. I cannot re-use $RecordID or $Name...
> #(2) How can I access only certain variables from my table (to compare or print), e.g. $Language or $Status
> my $innerkey;
> my $outerkey;
>    foreach $innerkey ( keys %Firstfile ) {
>    	if ($Firstfile{$innerkey}{'Status'} ne $Secondfile{$innerkey}{'Status'}){ 
>    print "$innerkey:\:" ;
>    foreach $outerkey ( keys %{$Firstfile{$innerkey}} ) {
>    print "$Firstfile{$innerkey}{$outerkey}::" ;
>    }
>    print "\n" ;
>  }}
> close FIRSTFILE;
> close SECONDFILE;
> 
> __END__
> CURRENT OUTPUT:
> ?RecordID -- Name :: Language
> 
> 715 -- surface technology :: English
> 763 -- mineral abrasive :: English
> 796 -- Zinkstaubgrundierung :: German
> RecordID -- Name :: Status
> 763 -- mineral abrasive :: not validated
> 813 -- 2-component prime coat :: not validated
> 815 -- finishing coat ::
> Use of uninitialized value in string ne at D:\Perl\loeschen_2.pl line 31, <SECONDFILE> line 4.
> 796::validated::German::Zinkstaubgrundierung::
> 763::validated::English::mineral abrasive::
> Use of uninitialized value in string ne at D:\Perl\loeschen_2.pl line 31, <SECONDFILE> line 4.
> 715:: ::English::surface technology::
> Use of uninitialized value in string ne at D:\Perl\loeschen_2.pl line 31, <SECONDFILE> line 4.
> ?RecordID::Status::Language::Name::


------------------------------

Date: Thu, 29 Jan 2015 10:13:39 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: Compare 2 Hash of Hashes using sub-hash values
Message-Id: <macq2k$q39$1@news.ntua.gr>

On 29/1/2015 08:47, F Massion wrote:
> Am Mittwoch, 28. Januar 2015 16:12:02 UTC+1 schrieb George Mpouras:
>> On 28/1/2015 11:56, F Massion wrote:
>>> I am not particularly good at Perl and do not get any further with the following:
>>> I want to compare 2 relatively large tables with tab-separated columns which I have imported each in a Hash of Hashes.
>>>
>>
>>
>> # done !
>>
>>
>> use Data::Dumper;$Data::Dumper::Sortkeys=1;
>>
>> # Fill up the hashes
>> my %hash1;
>> my %hash2;
>> for my $k1 (1..100) {for('a'..'c') {$hash1{$k1}->{$_}=1}}
>> for my $k1 (1..100) {for('a'..'c') {$hash2{$k1}->{$_}=1}}
>>
>> print "Hashes are ",
>> Dumper(\%hash1) eq Dumper(\%hash2) ? 'same':'different'
>
> I have run your version and got the message "Hashes are same" although the tables are different.
> But I am not sure using Dumper is useful here, because I need a a table as an Output (in the same Format as the original tab-delimited files).
>

this small peace of code is always working. probably you have something 
different in your mind that currently have not time to follow


------------------------------

Date: Thu, 29 Jan 2015 12:15:17 +0000
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: Compare 2 Hash of Hashes using sub-hash values
Message-Id: <87fvat7pgq.fsf@bsb.me.uk>

F Massion <fmassion@web.de> writes:

> Am Donnerstag, 29. Januar 2015 02:20:33 UTC+1 schrieb Ben Bacarisse:
>> F Massion <fmassion@web.de> writes:
>> 
>> > I am not particularly good at Perl and do not get any further with the
>> > following: I want to compare 2 relatively large tables with
>> > tab-separated columns which I have imported each in a Hash of Hashes.
>> >
>> > File #1:
>> > RecordID	Name	Language	Status
>> > 715	surface technology	English	 
>> > 763	mineral abrasive	English	validated
>> > 796	Zinkstaubgrundierung	German	validated
>> >
>> > File #2:
>> > RecordID	Name	Language	Status
>> > 763	mineral abrasive	English	not validated
>> > 813	2-component prime coat	English	not validated
>> > 815	finishing coat	English	 
>> >
>> > I would like to use the value of certain columns to find out what are
>> > the differences or the common entries between these 2 files.
>> <snip>
>> > This is my code:
>> > #!/usr/bin/perl -w
>> > use warnings; use strict;
>> > open(FIRSTFILE, $ARGV[0]) || die("file $ARGV[0] cannot be opened!\n");
>> > my %Firstfile = ();
>> > while (<FIRSTFILE>) 
>> > {
>> >     chomp;
>> >     my ($RecordID, $Name, $Language, $Status) = split ('\t');
>> >     $Firstfile{$RecordID}{'Name'} = $Name;
>> >     $Firstfile{$RecordID}{'Language'} = $Language;
>> >     $Firstfile{$RecordID}{'Status'} = $Status;
>> >     print "$RecordID -- $Firstfile{$RecordID}{'Name'} ::
>> > $Firstfile{$RecordID}{'Language'}\n" ;
>> > }
>> > open(SECONDFILE, $ARGV[1]) || die("file $ARGV[1] cannot be opened!\n");
>> > my %Secondfile = ();
>> > while (<SECONDFILE>) 
>> > {
>> >     chomp;
>> >     my ($RecordID, $Name, $Language, $Status) = split ('\t');
>> >     $Secondfile{$RecordID}{'Name'} = $Name;
>> >     $Secondfile{$RecordID}{'Language'} = $Language;
>> >     $Secondfile{$RecordID}{'Status'} = $Status;
>> >     print "$RecordID -- $Secondfile{$RecordID}{'Name'} ::
>> > $Secondfile{$RecordID}{'Status'}\n";
>> > }
>> > #QUESTIONS
>> > #(1) Why do I lose the variables from the 2 loops above? I.e. I cannot
>> > re-use $RecordID or $Name...
>> 
>> These are declared as local variables inside the body of the loop.  You
>> do that because you don't want the variables to be available outside.
>> You could more the declaration to outside the loop, but why?  What do
>> you want them for?
>> 
>
> My aim is to fine-tune the comparison of these two tables. In real
> life these are exports from terminology databases with thousands of
> records and Attribute values. Thus I would like to say for example:
> "Which records in File#2 have the same RecordID as in File#2 but a
> different Status. Please Output only the Record ID, the Name and the
> Status as a tab-delimited file". This file can then be processed by
> the terminology database. In order to do that type of oparation I need
> a fine granularity than $inkey / $outkey + $compound_outvalue.

Sorry, I'm lost.  The problem I am having is that your code is largely
correct.  You do this (slightly reformatted for line length) which tests
exactly what you want to test:

  foreach $innerkey ( keys %Firstfile ) {
    if ($Firstfile{$innerkey}{'Status'} ne
        $Secondfile{$innerkey}{'Status'}) {
           ...
    }
  }

but the code then tries to print more than you apparently need.  Is the
problem just that you don't know how to write

  print "$innerkey\t$Secondfile{$innerkey}{'Status'}\n";

?  (You don't say which of the two different statuses you want but
that's trivial to change.) 

>> > #(2) How can I access only certain variables from my table (to compare
>> > or print), e.g. $Language or $Status
>> 
>> I don't understand "variables from my table".  The table contains data
>> which you access using keys, and the code shows you know how to do this.
>> Can you give an example of what you want to to do and how you tried to
>> do it?
>> 
>
> See above. It is a granularity issue. The value of the outerkey seems
> to contain all fields in my table alltogether (Name, Language and
> Status in my example). But maybe i do not understand properly the
> functioning of HoH.

There's a problem with the terminology again.  $outerkey is just a
string.  It does not contain anything else.  You set it to each of the
keys in a record one at a time:

>> >    foreach $outerkey ( keys %{$Firstfile{$innerkey}} ) {
>> >       print "$Firstfile{$innerkey}{$outerkey}::" ;
>> >    }

but to print just the two values in the case you mention, you don't need
it.

-- 
Ben.


------------------------------

Date: Thu, 29 Jan 2015 05:01:35 -0800 (PST)
From: F Massion <fmassion@web.de>
Subject: Re: Compare 2 Hash of Hashes using sub-hash values
Message-Id: <1f88b936-5d8b-40de-a2e3-ecd0c0d9fc74@googlegroups.com>

Am Donnerstag, 29. Januar 2015 13:15:23 UTC+1 schrieb Ben Bacarisse:
> F Massion <fmassion@web.de> writes:
>=20
> > Am Donnerstag, 29. Januar 2015 02:20:33 UTC+1 schrieb Ben Bacarisse:
> >> F Massion <fmassion@web.de> writes:
> >>=20
> >> > I am not particularly good at Perl and do not get any further with t=
he
> >> > following: I want to compare 2 relatively large tables with
> >> > tab-separated columns which I have imported each in a Hash of Hashes=
 .
> >> >
> >> > File #1:
> >> > RecordID	Name	Language	Status
> >> > 715	surface technology	English	=20
> >> > 763	mineral abrasive	English	validated
> >> > 796	Zinkstaubgrundierung	German	validated
> >> >
> >> > File #2:
> >> > RecordID	Name	Language	Status
> >> > 763	mineral abrasive	English	not validated
> >> > 813	2-component prime coat	English	not validated
> >> > 815	finishing coat	English	=20
> >> >
> >> > I would like to use the value of certain columns to find out what ar=
e
> >> > the differences or the common entries between these 2 files.
> >> <snip>
> >> > This is my code:
> >> > #!/usr/bin/perl -w
> >> > use warnings; use strict;
> >> > open(FIRSTFILE, $ARGV[0]) || die("file $ARGV[0] cannot be opened!\n"=
);
> >> > my %Firstfile =3D ();
> >> > while (<FIRSTFILE>)=20
> >> > {
> >> >     chomp;
> >> >     my ($RecordID, $Name, $Language, $Status) =3D split ('\t');
> >> >     $Firstfile{$RecordID}{'Name'} =3D $Name;
> >> >     $Firstfile{$RecordID}{'Language'} =3D $Language;
> >> >     $Firstfile{$RecordID}{'Status'} =3D $Status;
> >> >     print "$RecordID -- $Firstfile{$RecordID}{'Name'} ::
> >> > $Firstfile{$RecordID}{'Language'}\n" ;
> >> > }
> >> > open(SECONDFILE, $ARGV[1]) || die("file $ARGV[1] cannot be opened!\n=
");
> >> > my %Secondfile =3D ();
> >> > while (<SECONDFILE>)=20
> >> > {
> >> >     chomp;
> >> >     my ($RecordID, $Name, $Language, $Status) =3D split ('\t');
> >> >     $Secondfile{$RecordID}{'Name'} =3D $Name;
> >> >     $Secondfile{$RecordID}{'Language'} =3D $Language;
> >> >     $Secondfile{$RecordID}{'Status'} =3D $Status;
> >> >     print "$RecordID -- $Secondfile{$RecordID}{'Name'} ::
> >> > $Secondfile{$RecordID}{'Status'}\n";
> >> > }
> >> > #QUESTIONS
> >> > #(1) Why do I lose the variables from the 2 loops above? I.e. I cann=
ot
> >> > re-use $RecordID or $Name...
> >>=20
> >> These are declared as local variables inside the body of the loop.  Yo=
u
> >> do that because you don't want the variables to be available outside.
> >> You could more the declaration to outside the loop, but why?  What do
> >> you want them for?
> >>=20
> >
> > My aim is to fine-tune the comparison of these two tables. In real
> > life these are exports from terminology databases with thousands of
> > records and Attribute values. Thus I would like to say for example:
> > "Which records in File#2 have the same RecordID as in File#2 but a
> > different Status. Please Output only the Record ID, the Name and the
> > Status as a tab-delimited file". This file can then be processed by
> > the terminology database. In order to do that type of oparation I need
> > a fine granularity than $inkey / $outkey + $compound_outvalue.
>=20
> Sorry, I'm lost.  The problem I am having is that your code is largely
> correct.  You do this (slightly reformatted for line length) which tests
> exactly what you want to test:
>=20
>   foreach $innerkey ( keys %Firstfile ) {
>     if ($Firstfile{$innerkey}{'Status'} ne
>         $Secondfile{$innerkey}{'Status'}) {
>            ...
>     }
>   }
>=20
> but the code then tries to print more than you apparently need.  Is the
> problem just that you don't know how to write
>=20
>   print "$innerkey\t$Secondfile{$innerkey}{'Status'}\n";
>=20
> ?  (You don't say which of the two different statuses you want but
> that's trivial to change.)=20

This is the amended Loop:
   foreach $innerkey ( keys %Firstfile ) {
   	if ($Firstfile{$innerkey}{'Status'} ne $Secondfile{$innerkey}{'Status'}=
){=20
   print "$innerkey:\:" ;
   foreach $outerkey ( keys %{$Firstfile{$innerkey}} ) {
#   print "$Firstfile{$innerkey}{$outerkey}::" ;  =3D my "old Version"
   print "$innerkey\t$Secondfile{$innerkey}{'Status'}\n"; # =3D your sugges=
tion
   }
   print "\n" ;
 }}

I get error messages about uninitialized values. I guess this value is $Sec=
ondfile{$innerkey}

   Use of uninitialized value in string ne  line 31, <SECONDFILE> line 4.
   Use of uninitialized value in concatenation (.) or string  line 35, <SEC=
ONDFILE> line 4.

Maybe I want to do something impossible. My idea was:

(1) I read each table separatebly in a hash of hashes. In the main hash I h=
ave a key which is in both tables (here the RecordID)
In the sub-hashes I use the column header as a key (Status, Language etc) a=
nd associate with the keys of the sub-hash the values of each column cell.
(2) After closing the 2 loops which populate my 2 HoH I make whatever compa=
rison I need (it is not always the same parameter I need to use) and output=
 the records matching my query. As there may be alot of columns in some tab=
les, I do not always need to get all the columns in the output. Sometimes i=
t is enough to get only 2 or 3 columns (e.g. Status, Name and RecordID) bec=
ause these are the fields I want to update in my database (using this data =
with other tools).

>=20
> >> > #(2) How can I access only certain variables from my table (to compa=
re
> >> > or print), e.g. $Language or $Status
> >>=20
> >> I don't understand "variables from my table".  The table contains data
> >> which you access using keys, and the code shows you know how to do thi=
s.
> >> Can you give an example of what you want to to do and how you tried to
> >> do it?
> >>=20
> >
> > See above. It is a granularity issue. The value of the outerkey seems
> > to contain all fields in my table alltogether (Name, Language and
> > Status in my example). But maybe i do not understand properly the
> > functioning of HoH.
>=20
> There's a problem with the terminology again.  $outerkey is just a
> string.  It does not contain anything else.  You set it to each of the
> keys in a record one at a time:
>=20
> >> >    foreach $outerkey ( keys %{$Firstfile{$innerkey}} ) {
> >> >       print "$Firstfile{$innerkey}{$outerkey}::" ;
> >> >    }
>=20
> but to print just the two values in the case you mention, you don't need
> it.
>=20
> --=20
> Ben.



------------------------------

Date: Thu, 29 Jan 2015 13:50:41 +0000
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: Compare 2 Hash of Hashes using sub-hash values
Message-Id: <87a9117l1q.fsf@bsb.me.uk>

F Massion <fmassion@web.de> writes:

> Am Donnerstag, 29. Januar 2015 13:15:23 UTC+1 schrieb Ben Bacarisse:
>> F Massion <fmassion@web.de> writes:
>> 
>> > Am Donnerstag, 29. Januar 2015 02:20:33 UTC+1 schrieb Ben Bacarisse:
>> >> F Massion <fmassion@web.de> writes:
>> >> 
>> >> > I am not particularly good at Perl and do not get any further with the
>> >> > following: I want to compare 2 relatively large tables with
>> >> > tab-separated columns which I have imported each in a Hash of Hashes.
>> >> >
>> >> > File #1:
>> >> > RecordID	Name	Language	Status
>> >> > 715	surface technology	English	 
>> >> > 763	mineral abrasive	English	validated
>> >> > 796	Zinkstaubgrundierung	German	validated
>> >> >
>> >> > File #2:
>> >> > RecordID	Name	Language	Status
>> >> > 763	mineral abrasive	English	not validated
>> >> > 813	2-component prime coat	English	not validated
>> >> > 815	finishing coat	English	 
>> >> >
>> >> > I would like to use the value of certain columns to find out what are
>> >> > the differences or the common entries between these 2 files.
>> >> <snip>
>> >> > This is my code:
>> >> > #!/usr/bin/perl -w
>> >> > use warnings; use strict;
>> >> > open(FIRSTFILE, $ARGV[0]) || die("file $ARGV[0] cannot be opened!\n");
>> >> > my %Firstfile = ();
>> >> > while (<FIRSTFILE>) 
>> >> > {
>> >> >     chomp;
>> >> >     my ($RecordID, $Name, $Language, $Status) = split ('\t');
>> >> >     $Firstfile{$RecordID}{'Name'} = $Name;
>> >> >     $Firstfile{$RecordID}{'Language'} = $Language;
>> >> >     $Firstfile{$RecordID}{'Status'} = $Status;
>> >> >     print "$RecordID -- $Firstfile{$RecordID}{'Name'} ::
>> >> > $Firstfile{$RecordID}{'Language'}\n" ;
>> >> > }
>> >> > open(SECONDFILE, $ARGV[1]) || die("file $ARGV[1] cannot be opened!\n");
>> >> > my %Secondfile = ();
>> >> > while (<SECONDFILE>) 
>> >> > {
>> >> >     chomp;
>> >> >     my ($RecordID, $Name, $Language, $Status) = split ('\t');
>> >> >     $Secondfile{$RecordID}{'Name'} = $Name;
>> >> >     $Secondfile{$RecordID}{'Language'} = $Language;
>> >> >     $Secondfile{$RecordID}{'Status'} = $Status;
>> >> >     print "$RecordID -- $Secondfile{$RecordID}{'Name'} ::
>> >> > $Secondfile{$RecordID}{'Status'}\n";
>> >> > }
>> >> > #QUESTIONS
>> >> > #(1) Why do I lose the variables from the 2 loops above? I.e. I cannot
>> >> > re-use $RecordID or $Name...
>> >> 
>> >> These are declared as local variables inside the body of the loop.  You
>> >> do that because you don't want the variables to be available outside.
>> >> You could more the declaration to outside the loop, but why?  What do
>> >> you want them for?
>> >> 
>> >
>> > My aim is to fine-tune the comparison of these two tables. In real
>> > life these are exports from terminology databases with thousands of
>> > records and Attribute values. Thus I would like to say for example:
>> > "Which records in File#2 have the same RecordID as in File#2 but a
>> > different Status. Please Output only the Record ID, the Name and the
>> > Status as a tab-delimited file". This file can then be processed by
>> > the terminology database. In order to do that type of oparation I need
>> > a fine granularity than $inkey / $outkey + $compound_outvalue.
>> 
>> Sorry, I'm lost.  The problem I am having is that your code is largely
>> correct.  You do this (slightly reformatted for line length) which tests
>> exactly what you want to test:
>> 
>>   foreach $innerkey ( keys %Firstfile ) {
>>     if ($Firstfile{$innerkey}{'Status'} ne
>>         $Secondfile{$innerkey}{'Status'}) {
>>            ...
>>     }
>>   }
>> 
>> but the code then tries to print more than you apparently need.  Is the
>> problem just that you don't know how to write
>> 
>>   print "$innerkey\t$Secondfile{$innerkey}{'Status'}\n";
>> 
>> ?  (You don't say which of the two different statuses you want but
>> that's trivial to change.) 
>
> This is the amended Loop:
>    foreach $innerkey ( keys %Firstfile ) {
>    	if ($Firstfile{$innerkey}{'Status'} ne $Secondfile{$innerkey}{'Status'}){ 
>    print "$innerkey:\:" ;
>    foreach $outerkey ( keys %{$Firstfile{$innerkey}} ) {
> #   print "$Firstfile{$innerkey}{$outerkey}::" ;  = my "old Version"
>    print "$innerkey\t$Secondfile{$innerkey}{'Status'}\n"; # = your suggestion
>    }
>    print "\n" ;
>  }}
>
> I get error messages about uninitialized values. I guess this value is $Secondfile{$innerkey}
>
>    Use of uninitialized value in string ne  line 31, <SECONDFILE> line 4.
>    Use of uninitialized value in concatenation (.) or string  line 35, <SECONDFILE> line 4.
>
> Maybe I want to do something impossible.

No.  The code is fine, but there are some rough edges.  The warnings are
just that -- warnings.  You can turn then off (remove -w and "use
warnings" from the script) or, better, you can fix them.  They refer to
the fact that some bits of the data are missing.

What do you want to do, for example, when the second file has no
corresponding record for an ID in the first?  Is that considered to be a
different status or something else?  Do you want to ignore it, or report
it?  To ignore it, you might do this:

foreach $innerkey ( keys %Firstfile ) {
    next unless exists $Secondfile{$innerkey};
    if ($Firstfile{$innerkey}{'Status'} ne $Secondfile{$innerkey}{'Status'}) { 
        print "$innerkey\t$Secondfile{$innerkey}{'Status'}\n" ;
    }
}

The key test being "exists $Secondfile{$innerkey}".  You can alter the
logic to report a missing record or do whatever else your situation
demands.

<snip>
-- 
Ben.


------------------------------

Date: Thu, 29 Jan 2015 03:03:23 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Global, Local, Static
Message-Id: <7OKdncxnEarsjlfJnZ2dnUVZ572dnZ2d@giganews.com>


On 1/27/2015 5:59 AM, Rainer Weikusat wrote:

> > So....  any way of doing that [retaining variable values between
> > calls] in Perl subroutines?
>
> Two, actually. A 'traditional' one, create a my variable in a scope only
> the subroutine can access,
>
> {
>      my $fred = 37;
>
>      sub retain
>      {
> 	print("$fred\n");
> 	++$fred;
>      }
> }
>
> retain();
> retain();
> retain();

Fascinating.  Let me play with that. Yes, quite useful I see.
Check out this amusing script I just cooked up:

#! /usr/bin/perl
use v5.14;
use strict;
use warnings;
use bignum;
BEGIN {
    my $i = 1;
    printf("%80s\n", $i);
    my $j = 1;
    printf("%80s\n", $j);
    sub Fibonacci {
       my $k = $i + $j;
       $i = $j;
       $j = $k;
       printf("%80s\n", $k);
    }
}
for (1..350) {
    Fibonacci;
}


> or use a 'state' variable (if available),
>
> use feature 'state';
>
> sub retain
> {
>      state $fred = 37;
>      print("$fred\n");
>      ++$fred;
> }
>
> retain();
> retain();
> retain();

Ah, ok, pretty much identical to C's "static" variables. Cool!
Thanks for the info!

I looked up "state" in perldoc on the bus the way to work today,
along with "local" and "our" (none of which I'd used or studied
yet). I'm now getting on handle on what these various variable
types are doing:

state    =>  saves state
local    =>  temporary value; revert when leave scope
my       =>  lexical
our      =>  global

I like "our" as a way to declare global variables that doesn't use
the ugly $::VariableName syntax, for when I need data to be visible
throughout a program.



-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/


------------------------------

Date: Thu, 29 Jan 2015 02:34:06 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Why are these brackets necessary?
Message-Id: <E42dnRr-bvsJkVfJnZ2dnUVZ57ydnZ2d@giganews.com>


On 1/28/2015 12:27 PM, jurgenex@hotmail.com wrote:

> A little while ago I had to take a test where one question after a long
> detour at the end came down to how many feet are 5/12 of one statute
> mile.
> How on earth would I possibly know how many feet are in one statute
> mile? Well, at least I knew that one foot is close to 30cm and one mile
> close to 1600m.
> So a quick computation 1mile * 1600m/mile * 5/12 * (1/0.3)feet/m got me
> a number that was close enough to select the correct multiple choice
> answer.

Not that this has much to do with Perl, but......

The history of the "statute mile" is all tied up with farming and plowed
fields and acres.  Around the time the "mile" was invented, plows fields
were measured using "Gunters's chains", which were chains exactly 66 feet
long, or as close as they could make them. A typical plot of plowed or
"furrowed" land was 1 chain x 10 chains. Since 10 chains was
"one furrow long" it was called a "1 furlong" = 10 chains = 660 feet.

 From that they defined "1 mile" = 8 furlongs = 8 x 660 = 5280 feet.

Area of (1chain x 10chains) was called "1 acre" = 66 x 660 = 43560 sq feet.


THEREFORE:
5/12 mile = (5/12)(8)(660)  feet  (1 mile = 8 furlongs = 8 x 660 feet)
           = (5/12)(24)(220) feet
           = (10)(220)       feet
           = 2200 feet


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4359
***************************************


home help back first fref pref prev next nref lref last post