[32941] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4217 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon May 19 14:14:18 2014

Date: Mon, 19 May 2014 11:14:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 19 May 2014     Volume: 11 Number: 4217

Today's topics:
        Push regex search result into hash with multiple values fmassion@web.de
    Re: Push regex search result into hash with multiple va <rvtol+usenet@xs4all.nl>
    Re: Push regex search result into hash with multiple va <gamo@telecable.es>
    Re: Push regex search result into hash with multiple va <PointedEars@web.de>
    Re: Push regex search result into hash with multiple va fmassion@web.de
    Re: Push regex search result into hash with multiple va <ben.usenet@bsb.me.uk>
    Re: Push regex search result into hash with multiple va <PointedEars@web.de>
    Re: Push regex search result into hash with multiple va <rvtol+usenet@xs4all.nl>
    Re: Push regex search result into hash with multiple va fmassion@web.de
    Re: Push regex search result into hash with multiple va <PointedEars@web.de>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 18 May 2014 23:40:56 -0700 (PDT)
From: fmassion@web.de
Subject: Push regex search result into hash with multiple values
Message-Id: <e927e980-5ae8-4c9f-a80b-ae283134e0b0@googlegroups.com>

I have 2 lists:

List of words:
cat
dog
List of phrases:
This is a cat
This is another cat
This is a dog
This is a cat and not a dog

I wand have a hash with all phrases (=values) matching the word "cat" (key) or "dog" (other key)

In my code I only get the last value of each search. Obvisouly I am doing something wrong here. Any suggestions?

Here my code:

#!/usr/bin/perl -w
# Open words file
open(WORDLIST,$ARGV[0]) || die("Cannot open $ARGV[0]!\n");
@words = <WORDLIST>;
# Close words file
close(WORDLIST);
# Open phrases file
open(PHRASELIST,$ARGV[1]) || die("Cannot open $ARGV[1])!\n");
@phrase = <PHRASELIST>;
# Close phrases file
close(PHRASELIST);
# Create empty hash for results
%phrasefound = ();
foreach $word (@words) {
	for($phrasecount=0 ; $phrasecount <= $#phrase ; $phrasecount++) {  # Counts from 0 to last array entry
	$phrase = $phrase[$phrasecount];
	chomp $word;
	chomp $phrase;
	if ($phrase =~ m/$word/i) {
		# push into hash
		$phrasefound{$word} = $phrase;
		print $word."-->".$phrasefound{$word}."\n"; #this is to check if it works. I get here all values
	}}}
	# output hash
	print "Hash result:\n----------\n";
	foreach $word (keys %phrasefound) {
		print "$word --> $phrasefound{$word}\n"; #I get only the last match
	}


------------------------------

Date: Mon, 19 May 2014 09:00:46 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
To: fmassion@web.de
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <5379AC1E.9020908@xs4all.nl>

On 2014-05-19 08:40, fmassion@web.de wrote:

> I have 2 lists:
>
> List of words:
> cat
> dog
> List of phrases:
> This is a cat
> This is another cat
> This is a dog
> This is a cat and not a dog
>
> I wand have a hash with all phrases (=values) matching the word "cat" (key) or "dog" (other key)


catdog.pl
- - - - - - - - - - - - - - -
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my @keys = qw(
     cat
     dog
);

my %found;

for my $phrase (<DATA>) {
     chomp $phrase;

     for my $key (@keys) {
         $key = quotemeta($key);

         push @{ $found{ $key } }, $phrase
           if $phrase =~ /\b$key\b/;
     }
}

print Dumper( \%found );

__DATA__
This is a cat
This is another cat
This is a dog
This is a cat and not a dog
- - - - - - - - - - - - - - -


$VAR1 = {
           'dog' => [
                      'This is a dog',
                      'This is a cat and not a dog'
                    ],
           'cat' => [
                      'This is a cat',
                      'This is another cat',
                      'This is a cat and not a dog'
                    ]
         };

-- 
Ruud



------------------------------

Date: Mon, 19 May 2014 09:01:06 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <llca84$96q$1@speranza.aioe.org>

El 19/05/14 08:40, fmassion@web.de escribió:
> I have 2 lists:
>
> List of words:
> cat
> dog
> List of phrases:
> This is a cat
> This is another cat
> This is a dog
> This is a cat and not a dog
>
> I wand have a hash with all phrases (=values) matching the word "cat" (key) or "dog" (other key)
>
> In my code I only get the last value of each search. Obvisouly I am doing something wrong here. Any suggestions?
>
> Here my code:
>
> #!/usr/bin/perl -w
> # Open words file
> open(WORDLIST,$ARGV[0]) || die("Cannot open $ARGV[0]!\n");
> @words = <WORDLIST>;
> # Close words file
> close(WORDLIST);
> # Open phrases file
> open(PHRASELIST,$ARGV[1]) || die("Cannot open $ARGV[1])!\n");
> @phrase = <PHRASELIST>;
> # Close phrases file
> close(PHRASELIST);
> # Create empty hash for results
> %phrasefound = ();
> foreach $word (@words) {
> 	for($phrasecount=0 ; $phrasecount <= $#phrase ; $phrasecount++) {  # Counts from 0 to last array entry
> 	$phrase = $phrase[$phrasecount];
> 	chomp $word;
> 	chomp $phrase;
> 	if ($phrase =~ m/$word/i) {
> 		# push into hash
> 		$phrasefound{$word} = $phrase;
> 		print $word."-->".$phrasefound{$word}."\n"; #this is to check if it works. I get here all values
> 	}}}
> 	# output hash
> 	print "Hash result:\n----------\n";
> 	foreach $word (keys %phrasefound) {
> 		print "$word --> $phrasefound{$word}\n"; #I get only the last match
> 	}
>

That's because you overwrite $phrasefound{$word} with $phrase.
Maybe you want $phrasefound{$phrase} = $word;
if you want to store each phrase.

HTH

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Mon, 19 May 2014 11:42:24 +0200
From: Thomas 'PointedEars' Lahn <PointedEars@web.de>
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <23353611.VtiZOVDWJD@PointedEars.de>

Dr.Ruud wrote:
^^^^^^^
Please fix.

> On 2014-05-19 08:40, fmassion@web.de wrote:
>> I have 2 lists:
>>
>> List of words:
>> cat
>> dog
>> List of phrases:
>> This is a cat
>> This is another cat
>> This is a dog
>> This is a cat and not a dog
>>
>> I wand have a hash with all phrases (=values) matching the word "cat"
>> (key) or "dog" (other key)
> 
> catdog.pl
> - - - - - - - - - - - - - - -
> #!/usr/bin/env perl
> use strict;
> use warnings;
> use Data::Dumper;
> 
> my @keys = qw(
>      cat
>      dog
> );
> 
> my %found;
> 
> for my $phrase (<DATA>) {
>      chomp $phrase;
> 
>      for my $key (@keys) {
              ^^^^
>          $key = quotemeta($key);
           ^^^^
Be extra careful with “for(each)” loops in Perl.

This for-each-loop will make $key an lvalue-iterator so that the assignment 
operation *modifies* the elements of the array @keys for each $phrase; not 
what one wants here.  For example, suppose “@keys[0] eq "foo.bar"”, then 
after the assignment operation it will “eq "foo\\.bar"” for the first 
phrase, "foo\\\\\.bar" for the second, "foo\\\\\\\\\\\.bar" for the third, 
and so on:

$ perl -e 'use strict; use warnings; my @keys = ("foo.bar", "bar.baz"); for 
my $i ((1, 2, 3)) { for my $key (@keys) { $key = quotemeta($key); CORE::say 
join(", ", @keys); }}'
foo\.bar, bar.baz
foo\.bar, bar\.baz
foo\\\.bar, bar\.baz
foo\\\.bar, bar\\\.baz
foo\\\\\\\.bar, bar\\\.baz
foo\\\\\\\.bar, bar\\\\\\\.baz

This can be avoided with

  for (@keys) {
    my $key = quotemeta($_);

    # …
  }

equivalent to

  foreach (@keys) {
    my $key = quotemeta($_);

    # …
  }

so that $key would be a block-scoped variable.  See perlsyn(1).

(I find it useful to use “for” for C-style “for” loops and “foreach” for 
for-each loops; YMMV.)
 
>          push @{ $found{ $key } }, $phrase

But I see no reason to quotemeta($key) for this operation.  You would want 
the keys of %found to be the original keys, not the (RE-)quoted ones.

>            if $phrase =~ /\b$key\b/;

Only for this operation $key needs to be quoted (unless one either *wants* 
regular expression matching or is certain *not* to have words containing 
other ASCII characters than matched by /[A-Za-z_0-9]/; see perlfunc(1)):

           # see perlre(1)
           if $phrase =~ /\b\Q$key\E\b/;

However, in that case $key only needs to be quoted *once*.  Therefore, it 
appears to be prudent to create a hash whose keys are the original keys, and 
the corresponding values are the quoted keys:

  # see perlfunc(1)
  my %keys = map { $_ => quotemeta($_) } @keys;

or

  my %keys = map { $_ => quotemeta($_) } qw(
    cat
    dog
  );

in the first place, and then

  my %found;
  
  for my $phrase (<DATA>) {
    chomp $phrase;
 
    while (my ($key, $quoted_key) = each %keys) {
      push @{ $found{ $key } }, $phrase
        if $phrase =~ /\b$quoted_key\b/;
    }
  }

$phrase may not need to be chomp()ed here.  See perlfunc(1) again.
In summary:

$ perl -e '

use strict;
use warnings;
use Data::Dumper;

my %keys = map { $_ => quotemeta($_) } qw(foo.bar  bar.baz);
my @phrases = qw(foo.bar baz  foo bar.baz  foo.bar.baz);
my %found;

foreach my $phrase (@phrases) {
  while (my ($key, $quoted_key) = each %keys) {
    push @{ $found{ $key } }, $phrase
      if $phrase =~ /\b$quoted_key\b/;
  }
}

print Dumper(\%found);

'
$VAR1 = {
          'foo.bar' => [
                         'foo.bar baz',
                         'foo.bar.baz'
                       ],
          'bar.baz' => [
                         'foo bar.baz',
                         'foo.bar.baz'
                       ]
        };

An interesting experiment to test this approach is to reduce the number of 
spaces between the elements of @phrases in the declaration to one.

-- 
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.


------------------------------

Date: Mon, 19 May 2014 03:29:08 -0700 (PDT)
From: fmassion@web.de
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <2eafcbac-d4b8-4308-9014-73aba5c392a9@googlegroups.com>

Thanks for all the suggestions. I have tried to understand everything, but I am not a programmer....

I could use several suggestions, however my aim is to end up with a plain list of matches, i.e. either to process $VAR1 in the examples above or try another approach.

I have tried to push the values into an array and split it at the end. This seems to work fine. Here my code:

#!/usr/bin/perl -w
# Open words file
open(WORDLIST,$ARGV[0]) || die("Cannot open $ARGV[0]!\n");
@words = <WORDLIST>;
# # Close words file
close(WORDLIST);
# # Open phrases file
open(PHRASELIST,$ARGV[1]) || die("Cannot open $ARGV[1])!\n");
@phrase = <PHRASELIST>;
# Close phrases file
close(PHRASELIST);
# Create empty hash for results
for my $phrase (@phrase) {
	chomp $phrase;
for my $word (@words) {
	chomp $word;
	if ($phrase =~ m/$word/i) {
	$found = $word."\t".$phrase;
	# push into hash
	push (@result, $found);
}}}
foreach (@result){  # go through the result array
($word, $phrase) = split(/\t/,$_); 
 print "$word --> $phrase\n";
}


------------------------------

Date: Mon, 19 May 2014 12:08:14 +0100
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <0.c5a4ec26b5359cb45112.20140519120814BST.877g5i10wx.fsf@bsb.me.uk>

fmassion@web.de writes:

> Thanks for all the suggestions. I have tried to understand everything,
> but I am not a programmer....
>
> I could use several suggestions,

One that you may not have spotted (or considered as a suggestion if you
did spot it) is to include

  use strict;
  use warnings;

at the top of your program.  Experience shows that these catch many
mistakes and it's a good habit to get into.

> however my aim is to end up with a
> plain list of matches, i.e. either to process $VAR1 in the examples
> above or try another approach.
>
> I have tried to push the values into an array and split it at the
> end. This seems to work fine. Here my code:

If the posted code works, then your original post misrepresented what
you want.  You said you wanted a hash, and the suggestion was that it
should be used to collect together all the matching phrases.  The code
below does not do that.

Dr Ruud presented a solution (with an small error, but that won't affect
you unless your words have peculiar characters in them).  the solution
is the right way to do this in Perl, so your best plan is to ask about
it until you follow it.  The key line is this:

  push @{ $found{ $key } }, $phrase;

The magic is in the outer @{...}.  It converts the plain scalar value
$found{$key} into a reference to an array, onto which a new phrase can
be pushed.

Some details:

> #!/usr/bin/perl -w

Add:

  use strict;
  use warnings;

yYou will then have to declare all the undeclared globals (it's not
many, but it's really worthwhile doing).

> # Open words file
> open(WORDLIST,$ARGV[0]) || die("Cannot open $ARGV[0]!\n");
> @words = <WORDLIST>;
> # # Close words file
> close(WORDLIST);
> # # Open phrases file
> open(PHRASELIST,$ARGV[1]) || die("Cannot open $ARGV[1])!\n");
> @phrase = <PHRASELIST>;
> # Close phrases file
> close(PHRASELIST);

These comments just create noise.  If the reader does not know what the
commented line does, the comment won't really help.

> # Create empty hash for results

And this one seems to be wrong.  I see no hash being created.

> for my $phrase (@phrase) {
> 	chomp $phrase;
> for my $word (@words) {
> 	chomp $word;
> 	if ($phrase =~ m/$word/i) {

Dr Ruud's code had two things related to this.  First, he added \b at
each end.  This matched only are a word boundary.  Do you want to match
"cathedral" again the word cat?  You don't say, but probably not.
Second, it quotes the special characters in the word, so that ., * and
so on don't have their technical meanings anymore.

> 	$found = $word."\t".$phrase;

What if the phrase has a tab in it?  I with I had a pound for every line
of code I've fixed that had a comment like "there'll never be a Ctrl-A
in this data...".

> 	# push into hash

It pushed onto a plain array, not a hash.

> 	push (@result, $found);
> }}}
> foreach (@result){  # go through the result array
> ($word, $phrase) = split(/\t/,$_); 
>  print "$word --> $phrase\n";
> }

-- 
Ben.


------------------------------

Date: Mon, 19 May 2014 13:40:58 +0200
From: Thomas 'PointedEars' Lahn <PointedEars@web.de>
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <3513420.FWcAz8jbVk@PointedEars.de>

Ben Bacarisse wrote:

> fmassion@web.de writes:
> Dr Ruud presented a solution (with an small error, but that won't affect
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> you unless your words have peculiar characters in them). 
> 
>> for my $phrase (@phrase) {
>> chomp $phrase;
>> for my $word (@words) {
>> chomp $word;
>> if ($phrase =~ m/$word/i) {
> 
> Dr Ruud's code had two things related to this.  […]
> Second, it quotes the special characters in the word, so that ., * and
> so on don't have their technical meanings anymore.

And it does that not only *repeatedly* but also for the *keys*, which I do 
not consider a “small error, but a logically flawed algorithm.  The quoting 
is either unnecessary or desastrous (as in “not a solution”) this way.  
Hence my correction.
 
-- 
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.


------------------------------

Date: Mon, 19 May 2014 17:13:59 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
To: Thomas 'PointedEars' Lahn <usenet@PointedEars.de>
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <537A1FB7.5010608@xs4all.nl>

On 2014-05-19 11:42, Thomas 'PointedEars' Lahn wrote:

>> for my $phrase (<DATA>) {
>>       chomp $phrase;
>>
>>       for my $key (@keys) {
>                ^^^^
>>           $key = quotemeta($key);
>             ^^^^
> Be extra careful with “for(each)” loops in Perl.
>
> This for-each-loop will make $key an lvalue-iterator so that the assignment
> operation *modifies* the elements of the array @keys for each $phrase; not
> what one wants here.

Thanks Thomas, good catch. I started limping because I wanted to put 
several things in the same example. So I considered quotemeta(), 
index(), word boundaries, etc.


   -       $key = quotemeta($key);
   -
           push @{ $found{ $key } }, $phrase
   -         if $phrase =~ /\b$key\b/;
   +         if $phrase =~ /\b\Q$key\E\b/

I'll leave it to the OP to decide on the usefulness of the word boundaries.


 > it appears to be prudent to create a hash whose keys are the
 > original keys, and the corresponding values are the quoted keys

Or use the compiled regular expressions as values. Also because Perl 
keeps moving forward with them.

-- 
Ruud




------------------------------

Date: Mon, 19 May 2014 10:20:08 -0700 (PDT)
From: fmassion@web.de
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <95302fa2-dd26-4f84-b199-4e951cc39cf7@googlegroups.com>

Thank you, Ben

> These comments just create noise.  If the reader does not know what the
> commented line does, the comment won't really help.
> 
Your're right.

>   use strict;
>   use warnings;
> 
 I've now done so until all Messages stating that a global symbol requires an explicit package Name disappear. However, I get no result anymore. If I use my previous code (without "my" and without "use strict" I get the results

Here the new code:
#!/usr/bin/perl -w
use strict; 
use warnings; 
open(WORDLIST,$ARGV[0]) || die("Cannot open $ARGV[0]!\n");
my @words = <WORDLIST>;
close(WORDLIST);
open(PHRASELIST,$ARGV[1]) || die("Cannot open $ARGV[1])!\n");
my @phrase = <PHRASELIST>;
close(PHRASELIST);
for my $phrase (@phrase) {
    chomp $phrase;
for my $word (@words) {
    chomp $word;
    if ($phrase =~ m/\b$word\b/i) {
    my $found = $word."\t".$phrase;
    push (my @result, $found);
    print @result."\n";
}}}
foreach (my @result){  # go through the result array
my ($word, $phrase) = split(/\t/,$_); 
 print "$word --> $phrase\n";
}
  


------------------------------

Date: Mon, 19 May 2014 19:22:28 +0200
From: Thomas 'PointedEars' Lahn <PointedEars@web.de>
Subject: Re: Push regex search result into hash with multiple values
Message-Id: <4131348.xHUG06BlPz@PointedEars.de>

Dr.Ruud wrote:

> On 2014-05-19 11:42, Thomas 'PointedEars' Lahn wrote:
>> Dr.Ruud wrote:
>>> for my $phrase (<DATA>) {
>>>       […]
>>>       for my $key (@keys) {
>>                ^^^^
>>>           $key = quotemeta($key);
>>             ^^^^
>> […]
>> This for-each-loop will make $key an lvalue-iterator so that the
>> assignment operation *modifies* the elements of the array @keys for each
>> $phrase; not what one wants here.
> 
> Thanks Thomas, good catch. […]

You're welcome.

> […]
>            push @{ $found{ $key } }, $phrase
>    -         if $phrase =~ /\b$key\b/;
>    +         if $phrase =~ /\b\Q$key\E\b/
> […]
> 
> > it appears to be prudent to create a hash whose keys are the
> > original keys, and the corresponding values are the quoted keys
> 
> Or use the compiled regular expressions as values. Also because Perl
> keeps moving forward with them.

What do you mean by that?

-- 
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4217
***************************************


home help back first fref pref prev next nref lref last post