[32172] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3437 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jul 7 18:14:24 2011

Date: Thu, 7 Jul 2011 15:14:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 7 Jul 2011     Volume: 11 Number: 3437

Today's topics:
        test two hash(refs) for equality <rweikusat@mssgmbh.com>
    Re: test two hash(refs) for equality <uri@StemSystems.com>
    Re: test two hash(refs) for equality <rweikusat@mssgmbh.com>
    Re: test two hash(refs) for equality <jurgenex@hotmail.com>
    Re: test two hash(refs) for equality <uri@StemSystems.com>
    Re: test two hash(refs) for equality <rweikusat@mssgmbh.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 07 Jul 2011 20:09:14 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: test two hash(refs) for equality
Message-Id: <87mxgpc3hh.fsf@sapphire.mobileactivedefense.com>

I'm somewhat surprised that there is no answer for this in the FAQ
(besides 'turn the content of both into a string and compare that').
Assuming that hash values can be compared with string comparisons and
that a value of undef does not need to be distinguished from an empty
string, the following subroutine seems to accomplish that:

sub cmp_href($$)
{
    my ($a, $b) = @_;
    my ($ka, $va, $kb, $vb, $rc);

 OUTER: {
	while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
	    last OUTER unless defined($ka) && defined($kb);

	    last OUTER unless
		$a->{$kb} eq $vb && $b->{$ka} eq $va
		    && exists($a->{$kb}) && exists($b->{$ka});
	}

	$rc = 1;
    }

    values(%$a);
    values(%$b);
    return $rc;
}

Any comments except references to CPAN modules and general "I don't
care about that [and neither should you]" statements would be very
much appreciated.


------------------------------

Date: Thu, 07 Jul 2011 15:49:43 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: test two hash(refs) for equality
Message-Id: <87aacpan1k.fsf@quad.sysarch.com>

>>>>> "RW" == Rainer Weikusat <rweikusat@mssgmbh.com> writes:

  RW> I'm somewhat surprised that there is no answer for this in the FAQ
  RW> (besides 'turn the content of both into a string and compare that').
  RW> Assuming that hash values can be compared with string comparisons and
  RW> that a value of undef does not need to be distinguished from an empty
  RW> string, the following subroutine seems to accomplish that:

  RW> sub cmp_href($$)
  RW> {
  RW>     my ($a, $b) = @_;

don't use $a and $b for vars. they are reserved for use by sort. even
lexically declared it is bad style. of course you won't listen to me.

  RW>     my ($ka, $va, $kb, $vb, $rc);

why not a quick test to see of the key counts are the same?

  RW>  OUTER: {
  RW> 	while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
  RW> 	    last OUTER unless defined($ka) && defined($kb);

keys are always defined so that test makes no sense. values can be
undef. the order of keys will likely be different so that won't check
key matching. at best it may check if the number of keys is the same but
that is a slow way to do it.

  RW> 	    last OUTER unless
  RW> 		$a->{$kb} eq $vb && $b->{$ka} eq $va

that will generate warnings if any value is undef. oh, you don't
care. but then an undef value will eq ''. also you use eq and that will
fail for number values in some cases. and the same issue applies to 0
and undef if you used ==.

  RW> 		    && exists($a->{$kb}) && exists($b->{$ka});

why test these after you test for equality? if the equality passes, then
exists will pass except for the undef issue i brought up.

  RW> Any comments except references to CPAN modules and general "I don't
  RW> care about that [and neither should you]" statements would be very
  RW> much appreciated.

just bad code. and it has been solved in several places. look in the
Test:: modules for some solutions.

uri

-- 
Uri Guttman  --  uri AT perlhunter DOT com  ---  http://www.perlhunter.com --
------------  Perl Developer Recruiting and Placement Services  -------------
-----  Perl Code Review, Architecture, Development, Training, Support -------


------------------------------

Date: Thu, 07 Jul 2011 21:24:50 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: test two hash(refs) for equality
Message-Id: <87box5bzzh.fsf@sapphire.mobileactivedefense.com>

"Uri Guttman" <uri@StemSystems.com> writes:
>>>>>> "RW" == Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>
>   RW> I'm somewhat surprised that there is no answer for this in the FAQ
>   RW> (besides 'turn the content of both into a string and compare that').
>   RW> Assuming that hash values can be compared with string comparisons and
>   RW> that a value of undef does not need to be distinguished from an empty
>   RW> string, the following subroutine seems to accomplish that:
>
>   RW> sub cmp_href($$)
>   RW> {
>   RW>     my ($a, $b) = @_;
>
> don't use $a and $b for vars. they are reserved for use by sort.

They are not reserved. The sort routine uses two variables with names
$a and $b in the symbol table of the module sort is invoked in (as far
as I understand the documentation). These $a and $b therefore don't
collide with lexical variables and they also don't collided with other
'package global' variables because sort localizes them (as it shoud do)

> even lexically declared it is bad style. of course you won't listen
> to me.

In my opinion, you are wrong.

>   RW>     my ($ka, $va, $kb, $vb, $rc);
>
> why not a quick test to see of the key counts are the same?

Because this test wouldn't be 'quick': It requires two additional
traversals of both hashes just to determine the key lists. I've done a
few benchmarks on this and the routine included in this posting was
the fastest implementation I could come up with (for my very limited
set of test cases, admittedly).

>
>   RW>  OUTER: {
>   RW> 	while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
>   RW> 	    last OUTER unless defined($ka) && defined($kb);
>
> keys are always defined so that test makes no sense.

It does make sense: Provided that one of the hashes contains less
key-value pairs than the other, one of the the each invocations will
return an emtpy list and in this case, either $ka or $kb will be undef
after the list assignment.

[...]

>   RW> 	    last OUTER unless
>   RW> 		$a->{$kb} eq $vb && $b->{$ka} eq $va
>
> that will generate warnings if any value is undef. oh, you don't
> care.

Indeed. Hash key exists but maps to undef is a perfectly possible
situation.

> but then an undef value will eq ''. also you use eq and that will
> fail for number values in some cases.

I specifically wrote

,----
| Assuming that hash values can be compared with string comparisons and
| that a value of undef does not need to be distinguished from an empty
| string,
`----

meaning, while I would like to know about these cases just to know
about them, I meant to exclude anything which cannot be compared with
eq for this comparison routine from the start: It is not supposed to
do that.

>   RW> 		    && exists($a->{$kb}) && exists($b->{$ka});
>
> why test these after you test for equality? if the equality passes, then
> exists will pass except for the undef issue i brought up.

Precisely: Provided that one of the hashes contained a key whose
values was either undef or the empty string and the other hash didn't
contain this key, the eq comparison will have returned 'they are
equal' and the exists check is supposed to copw with that.

>   RW> Any comments except references to CPAN modules and general "I don't
>   RW> care about that [and neither should you]" statements would be very
>   RW> much appreciated.
>
> just bad code.

You failed to provide any reasons for this summary judgement except
'I' (meaning, you) 'want to treat undef values specially'. That's your
prerogative, but I don't.

> and it has been solved in several places.

So what? I would be interested in other algorithms for solving this
problem (except the two other I used for testing). I'm not so much
interested in 'can be downloaded for free from the internet'
'solutions', especially if these aren't even detailed enough to actually
download them.


------------------------------

Date: Thu, 07 Jul 2011 13:28:10 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: test two hash(refs) for equality
Message-Id: <4j4c171c84oeh2o6438jnd7saatlgfke6a@4ax.com>

Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>I'm somewhat surprised that there is no answer for this in the FAQ
>(besides 'turn the content of both into a string and compare that').
>Assuming that hash values can be compared with string comparisons and
>that a value of undef does not need to be distinguished from an empty
>string, the following subroutine seems to accomplish that:
>
>sub cmp_href($$)
>{
>    my ($a, $b) = @_;
>    my ($ka, $va, $kb, $vb, $rc);
>
> OUTER: {
>	while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
>	    last OUTER unless defined($ka) && defined($kb);
>
>	    last OUTER unless
>		$a->{$kb} eq $vb && $b->{$ka} eq $va
>		    && exists($a->{$kb}) && exists($b->{$ka});
>	}
>
>	$rc = 1;
>    }
>
>    values(%$a);
>    values(%$b);
>    return $rc;
>}
>
>Any comments except references to CPAN modules and general "I don't
>care about that [and neither should you]" statements would be very
>much appreciated.

IMO your approach is way to complicated. And as Uri pointed out already
it has several logical flaws, too.

As a first step I would compare the size of the two hashes and then
check the value for each key (untested, algorithmic sketch only):

	my ($h1, $h2) = @_;
	return 0 unless scalar(keys(%$h1)) == scalar(keys(%$h2));
	#yes, scalar() is redundant, but this makes it very explicit
foreach (my $elem = keys %$h1) {
	  return 0 unless exists %$h2{$elem} # see note 1
	                 and %$h1{$elem}  == %$h2{$elem}  # see note 2
	}
	return 1;

1: This not only checks if each key from h1 exists in h2, too, (i.e.
keys(h1) is subset of keys(h2)), but because h1 and h2 also have the
same number of elements then the two sets of keys are identical.

2: You may have to adapt this comparison somewhat to accomodate your
special undef is equal to empty string equality.

jue


------------------------------

Date: Thu, 07 Jul 2011 16:32:50 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: test two hash(refs) for equality
Message-Id: <87liw996h9.fsf@quad.sysarch.com>

>>>>> "RW" == Rainer Weikusat <rweikusat@mssgmbh.com> writes:

  RW> "Uri Guttman" <uri@StemSystems.com> writes:
  >>>>>>> "RW" == Rainer Weikusat <rweikusat@mssgmbh.com> writes:
  >> 
  RW> I'm somewhat surprised that there is no answer for this in the FAQ
  RW> (besides 'turn the content of both into a string and compare that').
  RW> Assuming that hash values can be compared with string comparisons and
  RW> that a value of undef does not need to be distinguished from an empty
  RW> string, the following subroutine seems to accomplish that:
  >> 
  RW> sub cmp_href($$)
  RW> {
  RW> my ($a, $b) = @_;
  >> 
  >> don't use $a and $b for vars. they are reserved for use by sort.

  RW> They are not reserved. The sort routine uses two variables with names
  RW> $a and $b in the symbol table of the module sort is invoked in (as far
  RW> as I understand the documentation). These $a and $b therefore don't
  RW> collide with lexical variables and they also don't collided with other
  RW> 'package global' variables because sort localizes them (as it shoud do)

it is a convention. do you even care what other coders do or care about?
it is just a bad idea. don't use $a and $b outside of sort. can you even
allow this into your head?

  >> even lexically declared it is bad style. of course you won't listen
  >> to me.

  RW> In my opinion, you are wrong.

you are very off here. too bad as it is your loss. listening to others
is a useful skill.

  RW> my ($ka, $va, $kb, $vb, $rc);
  >> 
  >> why not a quick test to see of the key counts are the same?

  RW> Because this test wouldn't be 'quick': It requires two additional
  RW> traversals of both hashes just to determine the key lists. I've done a
  RW> few benchmarks on this and the routine included in this posting was
  RW> the fastest implementation I could come up with (for my very limited
  RW> set of test cases, admittedly).

and your test cases didn't cover all the bases as i pointed out.

  >> 
  RW> OUTER: {
  RW> while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
  RW> last OUTER unless defined($ka) && defined($kb);
  >> 
  >> keys are always defined so that test makes no sense.

  RW> It does make sense: Provided that one of the hashes contains less
  RW> key-value pairs than the other, one of the the each invocations will
  RW> return an emtpy list and in this case, either $ka or $kb will be undef
  RW> after the list assignment.

and i covered that point below. it is still a silly test. you can just
as easily scan one hash and check exists in the other and not need extra
defined tests.

  RW> [...]

  RW> last OUTER unless
  RW> $a->{$kb} eq $vb && $b->{$ka} eq $va
  >> 
  >> that will generate warnings if any value is undef. oh, you don't
  >> care.

  RW> Indeed. Hash key exists but maps to undef is a perfectly possible
  RW> situation.

and broken in other situations. then you are not looking for hash
equality as most people would define it but your limited string only, no
undef values hash similarity. you should state that in your
specification.

  RW> && exists($a->{$kb}) && exists($b->{$ka});
  >> 
  >> why test these after you test for equality? if the equality passes, then
  >> exists will pass except for the undef issue i brought up.

  RW> Precisely: Provided that one of the hashes contained a key whose
  RW> values was either undef or the empty string and the other hash didn't
  RW> contain this key, the eq comparison will have returned 'they are
  RW> equal' and the exists check is supposed to copw with that.

and if you reverse the order it would be clearer. but clarity and you
don't mix well it seems.


  RW> Any comments except references to CPAN modules and general "I don't
  RW> care about that [and neither should you]" statements would be very
  RW> much appreciated.
  >> 
  >> just bad code.

  RW> You failed to provide any reasons for this summary judgement except
  RW> 'I' (meaning, you) 'want to treat undef values specially'. That's your
  RW> prerogative, but I don't.

bad code is bad code. you just don't know how to recognize it yet. live
and learn.

  >> and it has been solved in several places.

  RW> So what? I would be interested in other algorithms for solving this
  RW> problem (except the two other I used for testing). I'm not so much
  RW> interested in 'can be downloaded for free from the internet'
  RW> 'solutions', especially if these aren't even detailed enough to actually
  RW> download them.

huh?? you asked for cpan modules and then you deny wanting them?
detailed to download them? several of the test modules COME with
perl. if you lifted a finger you could find the subs in question in a
few seconds. wow.

uri

-- 
Uri Guttman  --  uri AT perlhunter DOT com  ---  http://www.perlhunter.com --
------------  Perl Developer Recruiting and Placement Services  -------------
-----  Perl Code Review, Architecture, Development, Training, Support -------


------------------------------

Date: Thu, 07 Jul 2011 21:59:50 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: test two hash(refs) for equality
Message-Id: <874o2xbyd5.fsf@sapphire.mobileactivedefense.com>

Jürgen Exner <jurgenex@hotmail.com> writes:
> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>>I'm somewhat surprised that there is no answer for this in the FAQ
>>(besides 'turn the content of both into a string and compare that').
>>Assuming that hash values can be compared with string comparisons and
>>that a value of undef does not need to be distinguished from an empty
>>string, the following subroutine seems to accomplish that:
>>
>>sub cmp_href($$)
>>{
>>    my ($a, $b) = @_;
>>    my ($ka, $va, $kb, $vb, $rc);
>>
>> OUTER: {
>>	while (($ka, $va, $kb, $vb) = (each(%$a), each(%$b))) {
>>	    last OUTER unless defined($ka) && defined($kb);
>>
>>	    last OUTER unless
>>		$a->{$kb} eq $vb && $b->{$ka} eq $va
>>		    && exists($a->{$kb}) && exists($b->{$ka});
>>	}
>>
>>	$rc = 1;
>>    }
>>
>>    values(%$a);
>>    values(%$b);
>>    return $rc;
>>}
>>
>>Any comments except references to CPAN modules and general "I don't
>>care about that [and neither should you]" statements would be very
>>much appreciated.
>
> IMO your approach is way to complicated. And as Uri pointed out already
> it has several logical flaws, too.

Uri has mostly 'pointed out' that he didn't understand the code, as
exemplified in his 'the keys are always defined so this test does
nothing' and 'why the exist check after the comparison' remarks.

> As a first step I would compare the size of the two hashes and then
> check the value for each key (untested, algorithmic sketch only):
>
> 	my ($h1, $h2) = @_;
> 	return 0 unless scalar(keys(%$h1)) == scalar(keys(%$h2));
> 	#yes, scalar() is redundant, but this makes it very explicit
> foreach (my $elem = keys %$h1) {
> 	  return 0 unless exists %$h2{$elem} # see note 1
> 	                 and %$h1{$elem}  == %$h2{$elem}  # see note 2
> 	}
> 	return 1;
>
> 1: This not only checks if each key from h1 exists in h2, too, (i.e.
> keys(h1) is subset of keys(h2)), but because h1 and h2 also have the
> same number of elements then the two sets of keys are identical.

As a complete subroutine:

sub cmp_href_4($$)
{
 	my ($h1, $h2) = @_;
	my @k;

	@k = keys(%$h1);
 	return 0 unless @k == keys(%$h2);

	foreach my $elem  (@k) {
	    return 0
		unless exists($h2->{$elem}) # see note 1
		    and $h1->{$elem}  == $h2->{$elem};  # see note 2
	}

	return 1;
}

That's similar to my 'naive' first implementation. Provided the hashes
are small and they a rather different than identical, it is not bad. 

> 2: You may have to adapt this comparison somewhat to accomodate your
> special undef is equal to empty string equality.

[rw@sapphire]/tmp $perl -e 'print undef eq "", "\n"'
1

Even in absence of that, it is not 'my special undef is equal to empty
string equality', cf

	The following code works for single-level arrays.  It uses a
	stringwise comparison, and does not distinguish defined versus
	undefined empty strings.  Modify if you have other needs.
        
               $are_equal = compare_arrays(\@frogs, \@toads);

               sub compare_arrays {
                       my ($first, $second) = @_;
                       no warnings;  # silence spurious -w undef complaints
                       return 0 unless @$first == @$second;
                       for (my $i = 0; $i < @$first; $i++) {
                               return 0 if $first->[$i] ne $second->[$i];
                               }
                       return 1;
              }

(this text is part of the perlfaq4 document on the computer I was
using).


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3437
***************************************


home help back first fref pref prev next nref lref last post