[32898] in Perl-Users-Digest
Perl-Users Digest, Issue: 4176 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Mar 19 09:09:30 2014
Date: Wed, 19 Mar 2014 06:09:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 19 Mar 2014 Volume: 11 Number: 4176
Today's topics:
Re: HOLY SH*T! HUMANS ORIGINATED IN THE DEVONIAN <nospam@thanks.invalid>
Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
Re: Removing lines containing same first string boundar <m@rtij.nl.invlalid>
Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
Re: Removing lines containing same first string boundar <kaz@kylheku.com>
Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
Re: Removing lines containing same first string boundar (Tim McDaniel)
Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
Re: Removing lines containing same first string boundar <kaz@kylheku.com>
Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
Re: Removing lines containing same first string boundar <john@castleamber.com>
Re: Removing lines containing same first string boundar <ben.usenet@bsb.me.uk>
Re: Removing lines containing same first string boundar <rweikusat@mobileactivedefense.com>
Re: Removing lines containing same first string boundar <johnblack@nospam.com>
Re: Removing lines containing same first string boundar (Tim McDaniel)
Re: Removing lines containing same first string boundar <m@rtij.nl.invlalid>
Re: Removing lines containing same first string boundar <m@rtij.nl.invlalid>
Re: Removing lines containing same first string boundar <m@rtij.nl.invlalid>
Re: Removing lines containing same first string boundar <whynot@pozharski.name>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 18 Mar 2014 16:17:36 +0000 (UTC)
From: Juha Nieminen <nospam@thanks.invalid>
Subject: Re: HOLY SH*T! HUMANS ORIGINATED IN THE DEVONIAN
Message-Id: <lg9rit$80i$1@adenine.netfront.net>
In comp.lang.c++ ASSODON <troll@bitch.invalid> wrote:
> THRINAXODON DANCED WITH JOY AS HE WAS GRANTED $600,000,000,000.000!
I find it interesting, from a psychological perspective, that you are
not even *pretending* that you are not lying and making stuff up.
You pretty much imply it as clearly as it possibly can be, and clearly
don't care. Yet, nevertheless, you accuse others of lying.
I don't think you are simply a troll who does this for his own amusement,
because even trolls get tired of the same old joke, and move to other
things. I get a feeling of this being more obsessive in nature.
There's something probably very wrong inside your head. I really think
you should seek professional help for your mental problems.
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
------------------------------
Date: Mon, 17 Mar 2014 22:18:10 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <874n2w79lp.fsf@sable.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
[...]
> -----------
> my (%seen, $filter, $tag);
>
> $filter = $ARGV[0] // '.';
>
> while (<STDIN>) {
> ($tag) = /^(\S+)/ or next;
>
> next if $seen{$tag} || !/$filter/o;
>
> $seen{$tag} = 1;
> print;
> }
> -----------
>
> NB: This will silently ignore lines which don't start with a sequence of
> non-whitespace characters.
There are a couple of more possible gotchas with this, in particular,
- the /o flag means the regex will be compiled exactly once per program
run. This might become an issue in conditions where the value of
$filter can change, ie, if thas was put into a subroutine.
$filter = qr/$filter/
could be used to pre-compile the current $filter in such cases.
- the filter match is performed on the whole input line, not only on the
tag, meaning, it will match in the 2nd and subsequent fields if not
anchored to the beginning of the line.
$tag !~ /$filter/
could be used instead.
------------------------------
Date: Tue, 18 Mar 2014 11:02:29 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <l6ilva-55n.ln1@news.rtij.nl>
On Mon, 17 Mar 2014 13:50:25 -0700, Jürgen Exner wrote:
> John Black <johnblack@nospam.com> wrote:
>
>>Are you saying that a lookup for a key in a hash array does not have to
>>search the hash for a key match in a similar way to how grep is
>>searching an array?
>
> Yes. That's why it's called a hash in the first place. Accessing a hash
> element is typically O(1).
> Occasionally it may be a little bit worse, and in very extreme cases it
> could even be O(n). But those extreme cases are usually artificial and
> don't happen in real life, at last not with a reasonable implementation
> of the hashing algorithm.
And Perl has a very "reasonable" implementation, much effort has been put
into that (if only to defeat DOS-attacks).
M4
------------------------------
Date: Tue, 18 Mar 2014 15:33:34 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87zjknwmgh.fsf@sable.mobileactivedefense.com>
Jürgen Exner <jurgenex@hotmail.com> writes:
> John Black <johnblack@nospam.com> wrote:
>
>>Are you saying that a lookup for a key in a hash array does not have to search the hash for a
>>key match in a similar way to how grep is searching an array?
>
> Yes. That's why it's called a hash in the first place. Accessing a hash
> element is typically O(1).
This is a little too simplistic. The the amount of work necessarily to
locate a random key which is stored in the hash is proportional to the
average length of each used hash chain divided by 2 and the time to
determine that some key is not in the hash is proportional to the
average length of a hash chain. Assuming a 'good' hash function is being
used (which also covers the case of 'degenerated' key sets better known
as 'algorithmic complexity attack), the average length of a hash chain
should be the number of keys in the hash divided by the number of entries
in the array holding the chain pointers. That still proportional to the
number of entries in the hash, aka O(n), except that it is possible to
make the table sufficiently large to ensure that the average length of a
hash chain will be small.
As an example:
[rw@sable]~#perl -MDevel::Peek -e '%a = 'aaaa' .. 'zzzz'; Dump(\%a)'
SV = RV(0x817b824) at 0x817b818
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x8195d28
SV = PVHV(0x8180984) at 0x8195d28
REFCNT = 2
FLAGS = (SHAREKEYS)
ARRAY = 0xb6807008 (0:110767, 1:94524, 2:40904, 3:12427, 4:2864, 5:545, 6:99, 7:12, 8:2)
hash quality = 98.6%
KEYS = 228488
FILL = 151377
MAX = 262143
RITER = -1
EITER = 0x0
Elt "utei" HASH = 0x8ccc0001
SV = PV(0xaffa700) at 0xb70b0bf8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0xb000720 "utej"\0
CUR = 4
LEN = 8
Elt "vqmw" HASH = 0xe2100002
SV = PV(0xb07d430) at 0xb70cf8c8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0xb0843d0 "vqmx"\0
CUR = 4
LEN = 8
Elt "yehi" HASH = 0x8d680003
SV = PV(0xb22c828) at 0xb230360
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0xb233fc0 "yehj"\0
CUR = 4
LEN = 8
There are 228,488 keys in the hash and the chain pointer table has a
size of 262,143 which means the average length of a chain should be
about 0.87. However, only 151,377 table entries are in use so
'average chain length' for the 'locate key which exists' case is about
1.5 and about 42% of the available table slots are unused (the 'ARRAY ='
line shows the actual distribution of chain lengths).
Assuming the key set is small, there's also the problem that calculating
the hash value is usually more expensive than comparing two keys and
locating an item on a linked-list is more expensive than locating one in
an array.
The general idea behind 'using a hash' is that determining
exactly when this become better than doing linear searches in an array
isn't worth the effort for a small number of entries and that hash
lookups will very likely perform better than most linear searches as the
key set gets larger (at the expense of wasting a possibly 'large' amount
of memory).
------------------------------
Date: Tue, 18 Mar 2014 15:36:40 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87vbvbwmbb.fsf@sable.mobileactivedefense.com>
Martijn Lievaart <m@rtij.nl.invlalid> writes:
> On Mon, 17 Mar 2014 13:50:25 -0700, Jürgen Exner wrote:
>
>> John Black <johnblack@nospam.com> wrote:
>>
>>>Are you saying that a lookup for a key in a hash array does not have to
>>>search the hash for a key match in a similar way to how grep is
>>>searching an array?
>>
>> Yes. That's why it's called a hash in the first place. Accessing a hash
>> element is typically O(1).
>> Occasionally it may be a little bit worse, and in very extreme cases it
>> could even be O(n). But those extreme cases are usually artificial and
>> don't happen in real life, at last not with a reasonable implementation
>> of the hashing algorithm.
>
> And Perl has a very "reasonable" implementation, much effort has been put
> into that (if only to defeat DOS-attacks).
This is similar to the rotating hash, but it actually mixes the
internal state. It takes 9n+9 instructions and produces a full
4-byte result. Preliminary analysis suggests there are no
funnels.
This hash was not in the original Dr. Dobb's article. I
implemented it to fill a set of requirements posed by Colin
Plumb. Colin ended up using an even simpler (and weaker) hash
that was sufficient for his purpose.
http://burtleburtle.net/bob/hash/doobs.html
[SCNR]
------------------------------
Date: Tue, 18 Mar 2014 17:32:46 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <20140318102446.784@kylheku.com>
On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
> I have a plain text file with each line in the format:
>
> Start of line followed immediately by a string of character(s), a
> whitespace, another string, a newline.
>
> -------- file.txt -------
>
> SOMESTRING XXX
> SOMESTRING ZZZ
> SOMEOTHERSTRING YYYZZ23
> DIFFERENTSTRING HELLO
This can be implemented using a very simple, clear on-liner in awk, right from
your shell prompt.
The lines marked <- are my tty input; the others are awk output:
$ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
A B <-
A B
A C <-
A D <-
D 1 <-
D 1
D 2 <-
D 3 <-
A Z <-
------------------------------
Date: Tue, 18 Mar 2014 17:38:08 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87iorbwgov.fsf@sable.mobileactivedefense.com>
Kaz Kylheku <kaz@kylheku.com> writes:
> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>> I have a plain text file with each line in the format:
>>
>> Start of line followed immediately by a string of character(s), a
>> whitespace, another string, a newline.
>>
>> -------- file.txt -------
>>
>> SOMESTRING XXX
>> SOMESTRING ZZZ
>> SOMEOTHERSTRING YYYZZ23
>> DIFFERENTSTRING HELLO
>
> This can be implemented using a very simple, clear on-liner in awk, right from
> your shell prompt.
>
> The lines marked <- are my tty input; the others are awk output:
>
> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'
------------------------------
Date: Tue, 18 Mar 2014 17:57:48 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <lga1es$39q$1@reader1.panix.com>
In article <87iorbwgov.fsf@sable.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>Kaz Kylheku <kaz@kylheku.com> writes:
>> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>>> I have a plain text file with each line in the format:
>>>
>>> Start of line followed immediately by a string of character(s), a
>>> whitespace, another string, a newline.
>>>
>>> -------- file.txt -------
>>>
>>> SOMESTRING XXX
>>> SOMESTRING ZZZ
>>> SOMEOTHERSTRING YYYZZ23
>>> DIFFERENTSTRING HELLO
>>
>> This can be implemented using a very simple, clear on-liner in awk, right from
>> your shell prompt.
>>
>> The lines marked <- are my tty input; the others are awk output:
>>
>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
>
>perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'
perl -ane 'print if !$seen{$F[0]}++'
perl -ane '$seen{$F[0]}++ || print'
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Tue, 18 Mar 2014 18:12:24 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87eh1zwf3r.fsf@sable.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
> In article <87iorbwgov.fsf@sable.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>Kaz Kylheku <kaz@kylheku.com> writes:
>>> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>>>> I have a plain text file with each line in the format:
>>>>
>>>> Start of line followed immediately by a string of character(s), a
>>>> whitespace, another string, a newline.
>>>>
>>>> -------- file.txt -------
>>>>
>>>> SOMESTRING XXX
>>>> SOMESTRING ZZZ
>>>> SOMEOTHERSTRING YYYZZ23
>>>> DIFFERENTSTRING HELLO
>>>
>>> This can be implemented using a very simple, clear on-liner in awk, right from
>>> your shell prompt.
>>>
>>> The lines marked <- are my tty input; the others are awk output:
>>>
>>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
>>
>>perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'
>
> perl -ane 'print if !$seen{$F[0]}++'
>
> perl -ane '$seen{$F[0]}++ || print'
Well, there are quite a few more more-or-less bizarre variants, eg,
perl -ape '$_=$seen{$F[0]}++?"":$_'
or
perl -ape '$seen{$F[0]}++&&undef$_'
------------------------------
Date: Tue, 18 Mar 2014 18:17:20 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <20140318105901.927@kylheku.com>
On 2014-03-18, Tim McDaniel <tmcd@panix.com> wrote:
> In article <87iorbwgov.fsf@sable.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>Kaz Kylheku <kaz@kylheku.com> writes:
>>> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>>>> I have a plain text file with each line in the format:
>>>>
>>>> Start of line followed immediately by a string of character(s), a
>>>> whitespace, another string, a newline.
>>>>
>>>> -------- file.txt -------
>>>>
>>>> SOMESTRING XXX
>>>> SOMESTRING ZZZ
>>>> SOMEOTHERSTRING YYYZZ23
>>>> DIFFERENTSTRING HELLO
>>>
>>> This can be implemented using a very simple, clear on-liner in awk, right from
>>> your shell prompt.
>>>
>>> The lines marked <- are my tty input; the others are awk output:
>>>
>>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
>>
>>perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'
>
> perl -ane 'print if !$seen{$F[0]}++'
>
> perl -ane '$seen{$F[0]}++ || print'
gawk '{if(!seen[$1]++)print}'
(GNU Awk has bignums, which takes care of the overflow.)
------------------------------
Date: Tue, 18 Mar 2014 18:39:09 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87a9cnwdv6.fsf@sable.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> tmcd@panix.com (Tim McDaniel) writes:
>> In article <87iorbwgov.fsf@sable.mobileactivedefense.com>,
>> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>>Kaz Kylheku <kaz@kylheku.com> writes:
>>>> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>>>>> I have a plain text file with each line in the format:
>>>>>
>>>>> Start of line followed immediately by a string of character(s), a
>>>>> whitespace, another string, a newline.
>>>>>
>>>>> -------- file.txt -------
>>>>>
>>>>> SOMESTRING XXX
>>>>> SOMESTRING ZZZ
>>>>> SOMEOTHERSTRING YYYZZ23
>>>>> DIFFERENTSTRING HELLO
>>>>
>>>> This can be implemented using a very simple, clear on-liner in awk, right from
>>>> your shell prompt.
>>>>
>>>> The lines marked <- are my tty input; the others are awk output:
>>>>
>>>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
>>>
>>>perl -ane 'print, $seen{$F[0]} = 1 unless $seen{$F[0]}'
>>
>> perl -ane 'print if !$seen{$F[0]}++'
>>
>> perl -ane '$seen{$F[0]}++ || print'
>
> Well, there are quite a few more more-or-less bizarre variants, eg,
>
> perl -ape '$_=$seen{$F[0]}++?"":$_'
>
> or
>
> perl -ape '$seen{$F[0]}++&&undef$_'
Coming to think of that, no self-respecting quibbler would ever use a
hash named %seen. And this auto-split thing is also much too
straight-forward. So what about
perl -pe '$£{(/(\S+)/)[0]}++&&undef$_'
?
------------------------------
Date: Tue, 18 Mar 2014 14:11:37 -0600
From: John Bokma <john@castleamber.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87k3brjmh2.fsf@castleamber.com>
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> Coming to think of that, no self-respecting quibbler would ever use a
> hash named %seen. And this auto-split thing is also much too
> straight-forward. So what about
>
> perl -pe '$£{(/(\S+)/)[0]}++&&undef$_'
Still too easy to read ;-)
--
John Bokma j3b
Blog: http://johnbokma.com/ Perl Consultancy: http://castleamber.com/
Perl for books: http://johnbokma.com/perl/help-in-exchange-for-books.html
------------------------------
Date: Tue, 18 Mar 2014 20:29:09 +0000
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <0.5e210075d963f7db3641.20140318202909GMT.87bnx345ey.fsf@bsb.me.uk>
Kaz Kylheku <kaz@kylheku.com> writes:
> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>> I have a plain text file with each line in the format:
>>
>> Start of line followed immediately by a string of character(s), a
>> whitespace, another string, a newline.
>>
>> -------- file.txt -------
>>
>> SOMESTRING XXX
>> SOMESTRING ZZZ
>> SOMEOTHERSTRING YYYZZ23
>> DIFFERENTSTRING HELLO
>
> This can be implemented using a very simple, clear on-liner in awk, right from
> your shell prompt.
>
> The lines marked <- are my tty input; the others are awk output:
>
> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
There are one-line Perl versions as well of course. Maybe
perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'
<snip>
--
Ben.
------------------------------
Date: Tue, 18 Mar 2014 20:46:00 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <87wqfrutfb.fsf@sable.mobileactivedefense.com>
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Kaz Kylheku <kaz@kylheku.com> writes:
>
>> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>>> I have a plain text file with each line in the format:
>>>
>>> Start of line followed immediately by a string of character(s), a
>>> whitespace, another string, a newline.
>>>
>>> -------- file.txt -------
>>>
>>> SOMESTRING XXX
>>> SOMESTRING ZZZ
>>> SOMEOTHERSTRING YYYZZ23
>>> DIFFERENTSTRING HELLO
>>
>> This can be implemented using a very simple, clear on-liner in awk, right from
>> your shell prompt.
>>
>> The lines marked <- are my tty input; the others are awk output:
>>
>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
>
> There are one-line Perl versions as well of course. Maybe
>
> perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'
That's a neat idea. Obvious extension of that:
perl -ane '$seen{$F[0]} //= print'
------------------------------
Date: Tue, 18 Mar 2014 20:54:59 -0500
From: John Black <johnblack@nospam.com>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <MPG.2d92b4ea5a9c92b09897cf@news.eternal-september.org>
In article <87wqfrutfb.fsf@sable.mobileactivedefense.com>, rweikusat@mobileactivedefense.com
says...
>
> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> > Kaz Kylheku <kaz@kylheku.com> writes:
> >
> >> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
> >>> I have a plain text file with each line in the format:
> >>>
> >>> Start of line followed immediately by a string of character(s), a
> >>> whitespace, another string, a newline.
> >>>
> >>> -------- file.txt -------
> >>>
> >>> SOMESTRING XXX
> >>> SOMESTRING ZZZ
> >>> SOMEOTHERSTRING YYYZZ23
> >>> DIFFERENTSTRING HELLO
> >>
> >> This can be implemented using a very simple, clear on-liner in awk, right from
> >> your shell prompt.
> >>
> >> The lines marked <- are my tty input; the others are awk output:
> >>
> >> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
> >
> > There are one-line Perl versions as well of course. Maybe
> >
> > perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'
>
> That's a neat idea. Obvious extension of that:
>
> perl -ane '$seen{$F[0]} //= print'
I've written many untilities and tools in Perl and I don't understand these one liners at
all...
John Black
------------------------------
Date: Wed, 19 Mar 2014 04:54:30 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <lgb7u6$r41$1@reader1.panix.com>
In article <MPG.2d92b4ea5a9c92b09897cf@news.eternal-september.org>,
John Black <johnblack@nospam.com> wrote:
>In article <87wqfrutfb.fsf@sable.mobileactivedefense.com>,
>rweikusat@mobileactivedefense.com
>says...
>>
>> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> > Kaz Kylheku <kaz@kylheku.com> writes:
>> >
>> >> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>> >>> I have a plain text file with each line in the format:
>> >>>
>> >>> Start of line followed immediately by a string of character(s), a
>> >>> whitespace, another string, a newline.
>> >>>
>> >>> -------- file.txt -------
>> >>>
>> >>> SOMESTRING XXX
>> >>> SOMESTRING ZZZ
>> >>> SOMEOTHERSTRING YYYZZ23
>> >>> DIFFERENTSTRING HELLO
>> >>
>> >> This can be implemented using a very simple, clear on-liner in
>> >> awk, right from your shell prompt.
>> >>
>> >> The lines marked <- are my tty input; the others are awk output:
>> >>
>> >> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
>> >
>> > There are one-line Perl versions as well of course. Maybe
>> >
>> > perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'
>>
>> That's a neat idea. Obvious extension of that:
>>
>> perl -ane '$seen{$F[0]} //= print'
>
>I've written many untilities and tools in Perl and I don't understand
>these one liners at all...
I can't speak for other people's motives, but for me, I tend to see
Perl one-liners as humor, in most cases. Occasionally there's a
clever technique that's useful and maintainable, and of course I'm
excluding simple education in powerful features that I didn't know
about (e.g., s///r is relatively new). But in most cases, I consider
it to be humor, and of course showing off one's Perl l33t skyllZ (or
however the hep cats express it to-day).
But I suggest that you try to decrypt them, just as a learning
exercise. If there are specific points that still confuse you, please
ask about them here.
"man perlrun" on most systems explains the command line. "perl -p -e"
is something I use occasionally; "perl -pie" even less often; I've
never had to use "perl -a".
This subthread have used the fact that "print" is a function that
returns true if the printing succeeded, which it really ought to do.
"perldoc -f print" should give you its docco.
"//" is a newish operator: "man perlop". "||" would have worked just
as well in this case, I think -- the return values of print on my
system appear to be 1 and undef.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Wed, 19 Mar 2014 08:04:26 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <q4snva-loh.ln1@news.rtij.nl>
On Tue, 18 Mar 2014 15:36:40 +0000, Rainer Weikusat wrote:
> This is similar to the rotating hash, but it actually mixes the
> internal state. It takes 9n+9 instructions and produces a full 4-
byte
> result. Preliminary analysis suggests there are no funnels.
>
> This hash was not in the original Dr. Dobb's article. I
implemented it
> to fill a set of requirements posed by Colin Plumb. Colin ended up
> using an even simpler (and weaker) hash that was sufficient for
his
> purpose.
>
> http://burtleburtle.net/bob/hash/doobs.html
Interesting read. Thx.
M4
------------------------------
Date: Wed, 19 Mar 2014 08:21:22 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <i4tnva-loh.ln1@news.rtij.nl>
On Tue, 18 Mar 2014 20:46:00 +0000, Rainer Weikusat wrote:
> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> Kaz Kylheku <kaz@kylheku.com> writes:
>>
>>> On 2014-03-17, Tuxedo <tuxedo@mailinator.com> wrote:
>>>> I have a plain text file with each line in the format:
>>>>
>>>> Start of line followed immediately by a string of character(s), a
>>>> whitespace, another string, a newline.
>>>>
>>>> -------- file.txt -------
>>>>
>>>> SOMESTRING XXX SOMESTRING ZZZ SOMEOTHERSTRING YYYZZ23 DIFFERENTSTRING
>>>> HELLO
>>>
>>> This can be implemented using a very simple, clear on-liner in awk,
>>> right from your shell prompt.
>>>
>>> The lines marked <- are my tty input; the others are awk output:
>>>
>>> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
>>
>> There are one-line Perl versions as well of course. Maybe
>>
>> perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'
>
> That's a neat idea. Obvious extension of that:
>
> perl -ane '$seen{$F[0]} //= print'
As someone already said, %seen is boring.
perl -ane '$_{$F[0]} //= print'
M4
------------------------------
Date: Wed, 19 Mar 2014 08:26:02 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <adtnva-loh.ln1@news.rtij.nl>
On Wed, 19 Mar 2014 04:54:30 +0000, Tim McDaniel wrote:
> I can't speak for other people's motives, but for me, I tend to see Perl
> one-liners as humor, in most cases. Occasionally there's a clever
> technique that's useful and maintainable, and of course I'm excluding
> simple education in powerful features that I didn't know about (e.g.,
> s///r is relatively new). But in most cases, I consider it to be humor,
> and of course showing off one's Perl l33t skyllZ (or however the hep
> cats express it to-day).
In this group, yes. A way to show of your skill in a humorous way.
But I use Perl one-liners all the time to get stuff done and this last
one-liner really is a neat way to achieve the asked result -- a
relatively common requirement.
Truth to tell, I would be more likely to do
awk '{print$1}' | sort -u
but that is besides the point. One liners are often very useful and I use
them all the time. Especially -i (in-place editing) can be very useful to
do the same operation on many files at once.
>
> "man perlrun" on most systems explains the command line. "perl -p -e"
> is something I use occasionally; "perl -pie" even less often; I've never
> had to use "perl -a".
I use it often, but even more often awk is better if you use -a.
M4
------------------------------
Date: Wed, 19 Mar 2014 09:20:45 +0200
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Removing lines containing same first string boundaries?
Message-Id: <slrnliihad.mns.whynot@orphan.zombinet>
with <MPG.2d92b4ea5a9c92b09897cf@news.eternal-september.org> John Black wrote:
> In article <87wqfrutfb.fsf@sable.mobileactivedefense.com>,
> rweikusat@mobileactivedefense.com says...
>> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> > Kaz Kylheku <kaz@kylheku.com> writes:
*SKIP*
>> >> $ awk '{ if (! ($1 in seen)) { print $0 ; seen[$1] } }'
>> > There are one-line Perl versions as well of course. Maybe
>> > perl -ane '$seen{$F[0]} = print unless $seen{$F[0]}'
>> That's a neat idea. Obvious extension of that:
>> perl -ane '$seen{$F[0]} //= print'
> I've written many untilities and tools in Perl and I don't understand
> these one liners at all...
I didn't either. Then I've learned I have to practice skills before
understanding.
--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 4176
***************************************