[32897] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4175 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Mar 17 18:09:28 2014

Date: Mon, 17 Mar 2014 15:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 17 Mar 2014     Volume: 11 Number: 4175

Today's topics:
    Re: ethereum <maus@mail.com>
    Re: ethereum <jurgenex@hotmail.com>
    Re: ethereum <gravitalsun@hotmail.foo>
    Re: ethereum <maus@mail.com>
    Re: ethereum <jurgenex@hotmail.com>
        HOLY SH*T! HUMANS ORIGINATED IN THE DEVONIAN <troll@bitch.invalid>
        Removing lines containing same first string boundaries? <tuxedo@mailinator.com>
    Re: Removing lines containing same first string boundar <jurgenex@hotmail.com>
    Re: Removing lines containing same first string boundar <tuxedo@mailinator.com>
    Re: Removing lines containing same first string boundar <johnblack@nospam.com>
    Re: Removing lines containing same first string boundar <news@lawshouse.org>
    Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
    Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
    Re: Removing lines containing same first string boundar <tuxedo@mailinator.com>
    Re: Removing lines containing same first string boundar <jurgenex@hotmail.com>
    Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
    Re: Removing lines containing same first string boundar <johnblack@nospam.com>
    Re: Removing lines containing same first string boundar <Vorzakir@invalid.invalid>
    Re: Removing lines containing same first string boundar <jurgenex@hotmail.com>
    Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
    Re: Removing lines containing same first string boundar <tuxedo@mailinator.com>
    Re: Removing lines containing same first string boundar <cwilbur@chromatico.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 17 Mar 2014 09:55:04 GMT
From: greymausg <maus@mail.com>
Subject: Re: ethereum
Message-Id: <slrnlidha1.4el.maus@gmaus.org>

On 2014-03-12, George Mpouras <gravitalsun@hotmail.foo> wrote:
> Have anyone here experiment with ethereum ( www.ethereum.org ) or try to 
> write a prototype Perl interface ? It looks very interesting project.


Interesting?... Puzzling.. fuzzy picture of mountains a la Japan

<html>
 <head>
  <style>
   body{
    background-image: url(background.gif);
    background-attachment: fixed;
    background-repeat: no-repeat;
    background-position: 50% 50%;
   }
  </style>
 </head>
 <body>
 </body>
</html>

Presumably we will know more..
PS, are you involved?

-- 
maus
 .
  .
 ...


------------------------------

Date: Mon, 17 Mar 2014 03:13:23 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: ethereum
Message-Id: <gkidi9pfrenaoujrc4kf16e7r96jf9c6ou@4ax.com>

greymausg <maus@mail.com> wrote:
>On 2014-03-12, George Mpouras <gravitalsun@hotmail.foo> wrote:
>> Have anyone here experiment with ethereum ( www.ethereum.org ) or try to 
>> write a prototype Perl interface ? It looks very interesting project.
>
>
>Interesting?... Puzzling.. fuzzy picture of mountains a la Japan
>
><html>
> <head>
>  <style>
>   body{
>    background-image: url(background.gif);
>    background-attachment: fixed;
>    background-repeat: no-repeat;
>    background-position: 50% 50%;
>   }
>  </style>
> </head>
> <body>
> </body>
></html>
>
>Presumably we will know more..

That is easy. You are using the wrong interpreter. The code above is
HTML, not Perl.

jue


------------------------------

Date: Mon, 17 Mar 2014 13:52:14 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: ethereum
Message-Id: <lg6nk8$2qoq$1@news.ntua.gr>

>
> Presumably we will know more..
> PS, are you involved?
>

sorry, I am not involved, but I will try to write an interface if found 
the time


------------------------------

Date: 17 Mar 2014 13:55:04 GMT
From: greymausg <maus@mail.com>
Subject: Re: ethereum
Message-Id: <slrnlidujl.6h4.maus@gmaus.org>

On 2014-03-17, Jürgen Exner <jurgenex@hotmail.com> wrote:
> greymausg <maus@mail.com> wrote:
>>On 2014-03-12, George Mpouras <gravitalsun@hotmail.foo> wrote:
>>> Have anyone here experiment with ethereum ( www.ethereum.org ) or try to 
>>> write a prototype Perl interface ? It looks very interesting project.
>>
>>
>>Interesting?... Puzzling.. fuzzy picture of mountains a la Japan
>>
>><html>
>> <head>
>>  <style>
>>   body{
>>    background-image: url(background.gif);
>>    background-attachment: fixed;
>>    background-repeat: no-repeat;
>>    background-position: 50% 50%;
>>   }
>>  </style>
>> </head>
>> <body>
>> </body>
>></html>
>>
>>Presumably we will know more..
>
> That is easy. You are using the wrong interpreter. The code above is
> HTML, not Perl.
>
> jue

I know that, its very basic HTML. I was wondering how that
would be dealth with Perl. I presume its a site in building


-- 
maus
 .
  .
 ...


------------------------------

Date: Mon, 17 Mar 2014 07:40:39 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: ethereum
Message-Id: <d92ei91bjb64jnd6arrm6mq9buofm4ahmc@4ax.com>

greymausg <maus@mail.com> wrote:
>On 2014-03-17, Jürgen Exner <jurgenex@hotmail.com> wrote:
>> greymausg <maus@mail.com> wrote:
>>>On 2014-03-12, George Mpouras <gravitalsun@hotmail.foo> wrote:
>>>> Have anyone here experiment with ethereum ( www.ethereum.org ) or try to 
>>>> write a prototype Perl interface ? It looks very interesting project.
>>>
>>>
>>>Interesting?... Puzzling.. fuzzy picture of mountains a la Japan
>>>
>>><html>
>>> <head>
>>>  <style>
>>>   body{
>>>    background-image: url(background.gif);
>>>    background-attachment: fixed;
>>>    background-repeat: no-repeat;
>>>    background-position: 50% 50%;
>>>   }
>>>  </style>
>>> </head>
>>> <body>
>>> </body>
>>></html>
>>>
>>>Presumably we will know more..
>>
>> That is easy. You are using the wrong interpreter. The code above is
>> HTML, not Perl.
>>
>> jue
>
>I know that, its very basic HTML. I was wondering how that
>would be dealth with Perl. 

You would typically use HTML::Parser and then go from there.

jue


------------------------------

Date: Mon, 17 Mar 2014 04:50:19 -0400
From: ASSODON <troll@bitch.invalid>
Subject: HOLY SH*T! HUMANS ORIGINATED IN THE DEVONIAN
Message-Id: <MPG.2d90815744d89924989689@news.eternal-september.org>

=======================
>BREAKING NEWSSSSSSSSS
=======================
>
RICHARD LEAKEY JUST DIED DUE TO HEART FAILURE!
>
THE REASONS DESCRIBED BY THE MEDICAL TEAM IS THAT HIS WORK WAS 
DISPROVEN, BY NONE OTHER THAN YOUR OWN BASTARD, THRINAXODON.
>
THIS CAUSED LEAKEY'S HEART TO EXPLODE!
>
THRINAXODON DANCED WITH JOY AS HE WAS GRANTED $600,000,000,000.000!
>
TO WASTE YOUR TIME EVEN FURTHER, CHECK OUT THESE LINKS BELOW.
===========================
EVIDENCE THAT HUMANS LIVED IN THE DEVONIAN:

https://groups.google.com/group/sci.bio.paleontology/browse_thread/threa
d/6f501c469c7af24f#


https://groups.google.com/group/sci.bio.paleontology/browse_thread/threa
d/3aad75c16afb0b82#


====================================

http://thrinaxodon.wordpress.com/

===================================

THRINAXODON ONLY HAD THIS TO SAY:

"I..I...I...Can't believe it. This completely disproved Darwinian
orthodoxy."

===================================

THE BASTARDS AT THE SMITHSONIAN, AND THE LEAKEY FOUNDATION ARE ERODING
WITH FEAR.

===========================
THESE ASSHOLES ARE GOING TO DIE:
THOMAS AQUINAS;
ALDOUS HUXLEY;
BOB CASANVOVA;
SkyEyes;
DAVID IAIN GRIEG;
MARK ISAAK;
JOHN HARSHAM;
RICHARD NORMAN;
DR. DOOLITTLE;
CHARLES DARWIN;
MARK HORTON;
ERIK SIMPSON;
HYPATIAB7;
PAUL J. GANS;
JILLERY;
WIKI TRIK;
THRINAXODON;
PETER NYIKOS;
RON OKIMOTO;
JOHN S. WILKINS
===========================

THRINAXODON WAS SCOURING ANOTHER DEVONIAN FOSSIL BED, AND FOUND A
HUMAN SKULL, AND A HUMAN FEMUR. HE ANALYSED THE FINDS, AND SAW THAT
THEY WERE NOT NORMAL ROCKS. THESE WERE FOSSILIZED BONES. THEY EVEN HAD
TOOTH MARKS ON THEM. SO, THRINAXODON BROUGHT THEM TO THE LEAKEY
FOUNDATION, THEY UTTERLY DISMISSED IT, AND SAID, "We want to keep
people thinking that humans evolved 2 Ma." THRINAXODON BROUGHT HIS
SWORD, AND SAID, "SCIENCE CORRECTS ITSELF." RICHARD LEAKEY SAID, "That
is a myth, for people to believe in science." THRINAXODON PLANS TO
BRING DOOM TO SCIENCE, ITSELF.

============================

THRINAXODON IS NOW ON REDDIT 



------------------------------

Date: Mon, 17 Mar 2014 19:26:07 +0100
From: Tuxedo <tuxedo@mailinator.com>
Subject: Removing lines containing same first string boundaries?
Message-Id: <lg7eo7$7kc$1@news.albasani.net>

I have a plain text file with each line in the format:

Start of line followed immediately by a string of character(s), a 
whitespace, another string, a newline.

-------- file.txt -------

SOMESTRING XXX 
SOMESTRING ZZZ 
SOMEOTHERSTRING YYYZZ23 
DIFFERENTSTRING HELLO

-----------

I would like to output each line that contains a string of a first 
character sequence but not repeat any line(s) with the same string as a 
first character sequence that appear further down the file. The output of 
running a perl procedure against the above file would then be:

SOMESTRING XXX 
SOMEOTHERSTRING YYYZZ23 
DIFFERENTSTRING HELLO

In other words, no repetition should occur of any first word boundary on 
each line in case the sequence happens to reappear on other line(s) as a 
first character boundary before each line's first whitespace.

Alternatively, if given a parameter such as '^SOMESTRING' the output 
against the file would be narrowed down to:

SOMESTRING XXX 

The second character string boundary (XXX) after the whitespace is 
arbitrary and should not affect the result but can be included in the 
output even if it happens to match SOMESTRING. So the output becomes the 
first occurence of ^SOMESTRING plus the remaining characters on the same 
line up until newline.

Or if for example '^SOME' is passed as a parameter, the result would be:

SOMESTRING XXX 
SOMEOTHERSTRING YYYZZ23 

In which ways can this be done efficiently in Perl?

Many thanks for any ideas.

Tuxedo


------------------------------

Date: Mon, 17 Mar 2014 11:36:13 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <90gei9lrvhhu01e6hn4h8nbcpqv2mt8le5@4ax.com>

Tuxedo <tuxedo@mailinator.com> wrote:
>I have a plain text file with each line in the format:
>
>Start of line followed immediately by a string of character(s), a 
>whitespace, another string, a newline.
>
>-------- file.txt -------
>
>SOMESTRING XXX 
>SOMESTRING ZZZ 
>SOMEOTHERSTRING YYYZZ23 
>DIFFERENTSTRING HELLO
>
>-----------
>
>I would like to output each line that contains a string of a first 
>character sequence but not repeat any line(s) with the same string as a 
>first character sequence that appear further down the file. The output of 
>running a perl procedure against the above file would then be:
>
>SOMESTRING XXX 
>SOMEOTHERSTRING YYYZZ23 
>DIFFERENTSTRING HELLO
>
>In other words, no repetition should occur of any first word boundary on 
>each line in case the sequence happens to reappear on other line(s) as a 
>first character boundary before each line's first whitespace.

What have you tried so far? Where are you stuck? What doesn't work as
expected?

I would simply use split() to isolate the leading word and then use a
hash to track which words have already been printed.

>Alternatively, if given a parameter such as '^SOMESTRING' the output 
>against the file would be narrowed down to:
>
>SOMESTRING XXX 

perldoc -f grep


jue


------------------------------

Date: Mon, 17 Mar 2014 19:50:03 +0100
From: Tuxedo <tuxedo@mailinator.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <lg7g4v$ah6$1@news.albasani.net>

Jürgen Exner wrote:

[...]

> What have you tried so far? Where are you stuck? What doesn't work as
> expected?

Nothing so far, as I didn't try anything yet.

> I would simply use split() to isolate the leading word and then use a
> hash to track which words have already been printed.

[...]

> perldoc -f grep

Thanks for the above pointers. I will look into these.

Tuxedo



------------------------------

Date: Mon, 17 Mar 2014 14:03:08 -0500
From: John Black <johnblack@nospam.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <MPG.2d9102d5d16769439897cd@news.eternal-september.org>

In article <lg7eo7$7kc$1@news.albasani.net>, tuxedo@mailinator.com says...
> 
> I have a plain text file with each line in the format:
> 
> Start of line followed immediately by a string of character(s), a 
> whitespace, another string, a newline.
> 
> -------- file.txt -------
> 
> SOMESTRING XXX 
> SOMESTRING ZZZ 
> SOMEOTHERSTRING YYYZZ23 
> DIFFERENTSTRING HELLO
> 
> -----------
> 
> I would like to output each line that contains a string of a first 
> character sequence but not repeat any line(s) with the same string as a 
> first character sequence that appear further down the file. The output of 
> running a perl procedure against the above file would then be:
> 
> SOMESTRING XXX 
> SOMEOTHERSTRING YYYZZ23 
> DIFFERENTSTRING HELLO
> 
> In other words, no repetition should occur of any first word boundary on 
> each line in case the sequence happens to reappear on other line(s) as a 
> first character boundary before each line's first whitespace.
> 
> Alternatively, if given a parameter such as '^SOMESTRING' the output 
> against the file would be narrowed down to:
> 
> SOMESTRING XXX 
> 
> The second character string boundary (XXX) after the whitespace is 
> arbitrary and should not affect the result but can be included in the 
> output even if it happens to match SOMESTRING. So the output becomes the 
> first occurence of ^SOMESTRING plus the remaining characters on the same 
> line up until newline.
> 
> Or if for example '^SOME' is passed as a parameter, the result would be:
> 
> SOMESTRING XXX 
> SOMEOTHERSTRING YYYZZ23 
> 
> In which ways can this be done efficiently in Perl?
> 
> Many thanks for any ideas.
> 
> Tuxedo

Just use a regex to match each line to your input parameter.

   if ($line =~ /($param\w*)\s+.*/) {  # If the first string matches input parameter
      $string1 = $1;                   # Extract string1 from the current line
   
      if (grep {$string1 ne $_} @matched_array {  # If string1 has not been seen before
         print $line;                             # Print the line
         push (@matched_array, $string1);         # Put the string in the matched array
      }
   }      

I have not tested this for errors but that is what I would start with.  Not shown is the loop 
to grab each line of the file into $line.

John Black


------------------------------

Date: Mon, 17 Mar 2014 19:42:25 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <X7udnewigKK8zbrOnZ2dnUVZ8uadnZ2d@giganews.com>

On 17/03/14 18:26, Tuxedo wrote:
> I would like to output each line that contains

Strong smell of homework assignment here ...

-- 

Henry Law            Manchester, England


------------------------------

Date: Mon, 17 Mar 2014 19:42:47 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <871ty08vd4.fsf@sable.mobileactivedefense.com>

Tuxedo <tuxedo@mailinator.com> writes:
> I have a plain text file with each line in the format:
>
> Start of line followed immediately by a string of character(s), a 
> whitespace, another string, a newline.
>
> -------- file.txt -------
>
> SOMESTRING XXX 
> SOMESTRING ZZZ 
> SOMEOTHERSTRING YYYZZ23 
> DIFFERENTSTRING HELLO
>
> -----------
>
> I would like to output each line that contains a string of a first 
> character sequence but not repeat any line(s) with the same string as a 
> first character sequence that appear further down the file.

[...]

> Alternatively, if given a parameter such as '^SOMESTRING' the output 
> against the file would be narrowed down to:
>
> SOMESTRING XXX 
>
> The second character string boundary (XXX) after the whitespace is 
> arbitrary and should not affect the result

[...]

-----------
my (%seen, $filter, $tag);

$filter = $ARGV[0] // '.';

while (<STDIN>) {
    ($tag) = /^(\S+)/ or next;

    next if $seen{$tag} || !/$filter/o;

    $seen{$tag} = 1;
    print;
}
-----------

NB: This will silently ignore lines which don't start with a sequence of
non-whitespace characters.


------------------------------

Date: Mon, 17 Mar 2014 19:45:51 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87wqfs7gnk.fsf@sable.mobileactivedefense.com>

John Black <johnblack@nospam.com> writes:
> In article <lg7eo7$7kc$1@news.albasani.net>, tuxedo@mailinator.com says...
>> I have a plain text file with each line in the format:
>> 
>> Start of line followed immediately by a string of character(s), a 
>> whitespace, another string, a newline.
>> 
>> -------- file.txt -------
>> 
>> SOMESTRING XXX 
>> SOMESTRING ZZZ 
>> SOMEOTHERSTRING YYYZZ23 
>> DIFFERENTSTRING HELLO
>> 
>> -----------
>> 
>> I would like to output each line that contains a string of a first 
>> character sequence but not repeat any line(s) with the same string as a 
>> first character sequence

[...]

>> Alternatively, if given a parameter such as '^SOMESTRING' the output 
>> against the file would be narrowed down to:
>> 
>> SOMESTRING XXX 

[...]

> Just use a regex to match each line to your input parameter.
>
>    if ($line =~ /($param\w*)\s+.*/) {  # If the first string matches input parameter
>       $string1 = $1;                   # Extract string1 from the current line
>    
>       if (grep {$string1 ne $_} @matched_array {  # If string1 has not been seen before
>          print $line;                             # Print the line
>          push (@matched_array, $string1);         # Put the string in the matched array
>       }
>    }

This is atrociously inefficient/ unscalable as the running time of the
algorithm is proportional to the square of the number of lines which are
printed.


------------------------------

Date: Mon, 17 Mar 2014 21:08:47 +0100
From: Tuxedo <tuxedo@mailinator.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <lg7kog$jmo$1@news.albasani.net>

Rainer Weikusat wrote:

[...]

> -----------
> my (%seen, $filter, $tag);
> 
> $filter = $ARGV[0] // '.';
> 
> while (<STDIN>) {
>     ($tag) = /^(\S+)/ or next;
> 
>     next if $seen{$tag} || !/$filter/o;
> 
>     $seen{$tag} = 1;
>     print;
> }
> -----------
> 
> NB: This will silently ignore lines which don't start with a sequence of
> non-whitespace characters.

Many thanks for the above procedure.

How exactly should it be run against a file and keyword parameter?

Tuxedo


------------------------------

Date: Mon, 17 Mar 2014 13:08:43 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <fdlei9571i7crrgff0gmnk7c5l4t6g527v@4ax.com>

John Black <johnblack@nospam.com> wrote:
   
>      if (grep {$string1 ne $_} @matched_array {  # If string1 has not been seen before
>         print $line;                             # Print the line
>         push (@matched_array, $string1);         # Put the string in the matched array
>      }

Why are you using an O(n^2) algorithm when the same can be achieved with
an O(n) algorithm using a hash, which on top of everything else is even
simpler to write?

jue


------------------------------

Date: Mon, 17 Mar 2014 20:32:18 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87siqg7ei4.fsf@sable.mobileactivedefense.com>

Tuxedo <tuxedo@mailinator.com> writes:
> Rainer Weikusat wrote:
>
> [...]
>
>> -----------
>> my (%seen, $filter, $tag);
>> 
>> $filter = $ARGV[0] // '.';
>> 
>> while (<STDIN>) {
>>     ($tag) = /^(\S+)/ or next;
>> 
>>     next if $seen{$tag} || !/$filter/o;
>> 
>>     $seen{$tag} = 1;
>>     print;
>> }
>> -----------
>> 
>> NB: This will silently ignore lines which don't start with a sequence of
>> non-whitespace characters.
>
> Many thanks for the above procedure.
>
> How exactly should it be run against a file and keyword parameter?

It expects data to process on stdin and a keyword, if any, as first
argument, eg, assuming that the script text is in a file named a.pl and
the test text you posted in a file named text,

[rw@sable]/tmp#perl a.pl SOME <text 
SOMESTRING XXX 
SOMEOTHERSTRING YYYZZ23 

NB: while (<>) doesn't localize $_ implicitly. If this is supposed to be
part of a larger program, a

local $_;

might need to be added in a suitable scope to avoid overwriting someone
else's $_.





------------------------------

Date: Mon, 17 Mar 2014 15:39:26 -0500
From: John Black <johnblack@nospam.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <MPG.2d91197798ca7fd09897ce@news.eternal-september.org>

In article <fdlei9571i7crrgff0gmnk7c5l4t6g527v@4ax.com>, jurgenex@hotmail.com says...
> 
> John Black <johnblack@nospam.com> wrote:
>    
> >      if (grep {$string1 ne $_} @matched_array {  # If string1 has not been seen before
> >         print $line;                             # Print the line
> >         push (@matched_array, $string1);         # Put the string in the matched array
> >      }
> 
> Why are you using an O(n^2) algorithm when the same can be achieved with
> an O(n) algorithm using a hash, which on top of everything else is even
> simpler to write?
> 
> jue

Are you saying that a lookup for a key in a hash array does not have to search the hash for a 
key match in a similar way to how grep is searching an array?

John Black


------------------------------

Date: Mon, 17 Mar 2014 20:46:56 +0000 (UTC)
From: Randy Westlund <Vorzakir@invalid.invalid>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <lg7n00$8q7$1@dont-email.me>

On 2014-03-17, John Black wrote:
> In article <fdlei9571i7crrgff0gmnk7c5l4t6g527v@4ax.com>, jurgenex@hotmail.com says...
>> 
>> John Black <johnblack@nospam.com> wrote:
>>    
>> >      if (grep {$string1 ne $_} @matched_array {  # If string1 has not been seen before
>> >         print $line;                             # Print the line
>> >         push (@matched_array, $string1);         # Put the string in the matched array
>> >      }
>> 
>> Why are you using an O(n^2) algorithm when the same can be achieved with
>> an O(n) algorithm using a hash, which on top of everything else is even
>> simpler to write?
>> 
>> jue
>
> Are you saying that a lookup for a key in a hash array does not have to search the hash for a 
> key match in a similar way to how grep is searching an array?
>
> John Black

Yes.  The idea of a hash is that it can transform (aka hash) the key
(an O(1) operation) into something like an array index, which allows
an index-based lookup (another O(1) operation).

The downside of using a hash is memory overhead in setting up the
hash table.


------------------------------

Date: Mon, 17 Mar 2014 13:50:25 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <jknei9lahorv2mdectpu29aih4s7ql543n@4ax.com>

John Black <johnblack@nospam.com> wrote:

>Are you saying that a lookup for a key in a hash array does not have to search the hash for a 
>key match in a similar way to how grep is searching an array?

Yes. That's why it's called a hash in the first place. Accessing a hash
element is typically O(1).
Occasionally it may be a little bit worse, and in very extreme cases it
could even be O(n). But those extreme cases are usually artificial and
don't happen in real life, at last not with a reasonable implementation
of the hashing algorithm.

jue


------------------------------

Date: Mon, 17 Mar 2014 20:54:16 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87ob147dhj.fsf@sable.mobileactivedefense.com>

John Black <johnblack@nospam.com> writes:
> In article <fdlei9571i7crrgff0gmnk7c5l4t6g527v@4ax.com>, jurgenex@hotmail.com says...
>> 
>> John Black <johnblack@nospam.com> wrote:
>>    
>> >      if (grep {$string1 ne $_} @matched_array {  # If string1 has not been seen before
>> >         print $line;                             # Print the line
>> >         push (@matched_array, $string1);         # Put the string in the matched array
>> >      }
>> 
>> Why are you using an O(n^2) algorithm when the same can be achieved with
>> an O(n) algorithm using a hash, which on top of everything else is even
>> simpler to write?
>> 
>> jue
>
> Are you saying that a lookup for a key in a hash array does not have
> to search the hash for a key match in a similar way to how grep is
> searching an array?

Of course not. A 'hash' is not an array. It's an associative array
implemented as hash table with separate chaining. This means it is an
array of pointers to linked lists containing the actual entries. When
searching for a key, the key is transformed to a number by using it as
input for the so-called 'hash function' and 'truncating' the resulting
number down to the current size of the list pointer array via
mod. Assuming that H(key) is the hash function, an array index is
calculated as

ndx = H(key) % @hash_array

Nowadays, using hash arrays whose sizes are powers of two is common
because the modulo-operation can then be performed with a binary
and. The list $hash_array[ndx] points to is then searched for the actual
key. In ideal conditions (when the table is large enough), all list
sizes will be 1 which means the lookup requires only a single string
comparison. Even if the list has a number of entries (because of
so-called 'hash collisions'), it will still be much shorter than a list
containing all key/value-pairs in the hash if a 'reasonable' hash
function is used (it is supposed to produce a 'uniform' distribution of
values).





------------------------------

Date: Mon, 17 Mar 2014 21:54:53 +0100
From: Tuxedo <tuxedo@mailinator.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <lg7nf8$p0h$1@news.albasani.net>

Rainer Weikusat wrote:

[...]

> It expects data to process on stdin and a keyword, if any, as first
> argument, eg, assuming that the script text is in a file named a.pl and
> the test text you posted in a file named text,
> 
> [rw@sable]/tmp#perl a.pl SOME <text
> SOMESTRING XXX
> SOMEOTHERSTRING YYYZZ23
> 
> NB: while (<>) doesn't localize $_ implicitly. If this is supposed to be
> part of a larger program, a
> 
> local $_;
> 
> might need to be added in a suitable scope to avoid overwriting someone
> else's $_.

Many thanks for the detailed how-to! I will test.

Tuxedo



------------------------------

Date: Mon, 17 Mar 2014 16:18:50 -0400
From: Charlton Wilbur <cwilbur@chromatico.net>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87lhw8lgt1.fsf@new.chromatico.net>

>>>>> "HL" == Henry Law <news@lawshouse.org> writes:

    HL> On 17/03/14 18:26, Tuxedo wrote:
    >> I would like to output each line that contains

    HL> Strong smell of homework assignment here ...

Indeed.  Highly detailed and precise specification combined with "I
haven't tried anything yet...."

Tuxedo, go and try.  Come back when you have a specific, concrete
question that isn't "Solve my problem for me."

Charlton

-- 
Charlton Wilbur
cwilbur@chromatico.net


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4175
***************************************


home help back first fref pref prev next nref lref last post