[31665] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2928 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun May 2 00:09:29 2010

Date: Sat, 1 May 2010 21:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 1 May 2010     Volume: 11 Number: 2928

Today's topics:
        gui tk rich text box how can I design one? <robin1@cnsp.com>
    Re: gui tk rich text box how can I design one? <sreservoir@gmail.com>
    Re: gui tk rich text box how can I design one? <ben@morrow.me.uk>
        One liner to remove duplicate records <nickli2000@gmail.com>
    Re: One liner to remove duplicate records sln@netherlands.com
    Re: One liner to remove duplicate records <yankeeinexile@gmail.com>
    Re: One liner to remove duplicate records <john@castleamber.com>
    Re: One liner to remove duplicate records sln@netherlands.com
    Re: One liner to remove duplicate records <tadmc@seesig.invalid>
    Re: One liner to remove duplicate records <rvtol+usenet@xs4all.nl>
    Re: One liner to remove duplicate records <jurgenex@hotmail.com>
    Re: Overriding a require'd module's subroutine <rodent@gmail.com>
    Re: Overriding a require'd module's subroutine <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 1 May 2010 14:20:51 -0700 (PDT)
From: Robin <robin1@cnsp.com>
Subject: gui tk rich text box how can I design one?
Message-Id: <eb142422-b2fe-4abc-bcfd-c003c78a93df@i9g2000yqi.googlegroups.com>

Does anyone have suggestions about how I could design my own tk rich
text box widget? Is this hard.... I imagine it is? Thanks,
-Robin


------------------------------

Date: Sat, 01 May 2010 20:23:35 -0400
From: sreservoir <sreservoir@gmail.com>
Subject: Re: gui tk rich text box how can I design one?
Message-Id: <hrigmj$pma$2@speranza.aioe.org>

On 5/1/2010 5:20 PM, Robin wrote:
> Does anyone have suggestions about how I could design my own tk rich
> text box widget? Is this hard.... I imagine it is? Thanks,

use Tk;

-- 

  "Six by nine. Forty two."
  "That's it. That's all there is."
  "I always thought something was fundamentally wrong with the universe."


------------------------------

Date: Sun, 2 May 2010 02:05:54 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: gui tk rich text box how can I design one?
Message-Id: <iovua7-or4.ln1@osiris.mauzo.dyndns.org>


Quoth Robin <robin1@cnsp.com>:
> Does anyone have suggestions about how I could design my own tk rich
> text box widget? Is this hard.... I imagine it is? Thanks,

It's probably not as hard as you might think. Start by reading the
sections "TAGS" and "THE SELECTION" in Tk::Text. (You will also need a
decent understanding of the X selection model, since Tk uses this model
even under other window systems:
http://www.jwz.org/doc/x-cut-and-paste.html explains it all pretty
well.)

Ben



------------------------------

Date: Fri, 30 Apr 2010 08:55:12 -0700 (PDT)
From: Ninja Li <nickli2000@gmail.com>
Subject: One liner to remove duplicate records
Message-Id: <dee3aec6-1ec4-4fb3-9d35-c24887a3ce60@j27g2000vbp.googlegroups.com>

Hi,

I have a file with the following sample data delimited by "|" with
duplicate records:

20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|Ashley Cole|1.09|1.08|
20100430|20100429|Bill Thompson|0.76|0.78|
20100429|20100428|Time Apache|2.10|2.24|

The first three fields "date_1", "date_2" and "name" are unique
identifiers of a record.

Is there a simple way, like a one liner to remove the duplicates such
as with "John Smith"?

Thanks in advance.

Nick Li


------------------------------

Date: Fri, 30 Apr 2010 09:06:52 -0700
From: sln@netherlands.com
Subject: Re: One liner to remove duplicate records
Message-Id: <nuvlt599fnqsoa9rjri3mjip47santrhkr@4ax.com>

On Fri, 30 Apr 2010 08:55:12 -0700 (PDT), Ninja Li <nickli2000@gmail.com> wrote:

>Hi,
>
>I have a file with the following sample data delimited by "|" with
>duplicate records:
>
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|Ashley Cole|1.09|1.08|
>20100430|20100429|Bill Thompson|0.76|0.78|
>20100429|20100428|Time Apache|2.10|2.24|
>
>The first three fields "date_1", "date_2" and "name" are unique
>identifiers of a record.
>
>Is there a simple way, like a one liner to remove the duplicates such
>as with "John Smith"?
>
>Thanks in advance.
>
>Nick Li

I could think of a way, but it takes 2 lines, sorry.
-sln


------------------------------

Date: Fri, 30 Apr 2010 11:23:12 -0500
From: Lawrence Statton/XE1-N1GAK <yankeeinexile@gmail.com>
Subject: Re: One liner to remove duplicate records
Message-Id: <yankeeinexile-D103CA.11231230042010@eastus.newsfusion.net>

In article 
<dee3aec6-1ec4-4fb3-9d35-c24887a3ce60@j27g2000vbp.googlegroups.com>,
 Ninja Li <nickli2000@gmail.com> wrote:

> Hi,
> 
> I have a file with the following sample data delimited by "|" with
> duplicate records:
> 
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|Ashley Cole|1.09|1.08|
> 20100430|20100429|Bill Thompson|0.76|0.78|
> 20100429|20100428|Time Apache|2.10|2.24|
> 
> The first three fields "date_1", "date_2" and "name" are unique
> identifiers of a record.
> 
> Is there a simple way, like a one liner to remove the duplicates such
> as with "John Smith"?
> 
> Thanks in advance.
> 
> Nick Li

Yes.  I've split the one line into many just so it fits into
72 columns.

You didn't state what to do if the lines with duplicate primary keys 
were not exact duplicates (e.g. what would you like to happen if there 
were another line at the end of the file 

20100430|20100429|John Smith|99.99|99.99

?)

#!/usr/bin/perl
use strict;
use warnings;

my %seen;

print for
  map { join '|',( @$_, "\n") } 
  grep !$seen{$_->[0].$_->[1].$_->[2]}++,  
  map +[split /\|/,$_],
  map {chomp; $_}
  <DATA>;



__DATA__
20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|Ashley Cole|1.09|1.08|
20100430|20100429|Bill Thompson|0.76|0.78|
20100429|20100428|Time Apache|2.10|2.24|


------------------------------

Date: Fri, 30 Apr 2010 11:28:37 -0500
From: John Bokma <john@castleamber.com>
Subject: Re: One liner to remove duplicate records
Message-Id: <877hnonbuy.fsf@castleamber.com>

Ninja Li <nickli2000@gmail.com> writes:

> Hi,
>
> I have a file with the following sample data delimited by "|" with
> duplicate records:
>
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|Ashley Cole|1.09|1.08|
> 20100430|20100429|Bill Thompson|0.76|0.78|
> 20100429|20100428|Time Apache|2.10|2.24|
>
> The first three fields "date_1", "date_2" and "name" are unique
> identifiers of a record.
>
> Is there a simple way, like a one liner to remove the duplicates such
> as with "John Smith"?

Yes.

But have you tried to write a multi-line Perl program first? Moving from
a working Perl program to a one-liner might be easier than starting
straight with the one-liner.

Also read up on what the various options of perl do.

-- 
John Bokma                                                               j3b

Hacking & Hiking in Mexico -  http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development


------------------------------

Date: Fri, 30 Apr 2010 10:01:41 -0700
From: sln@netherlands.com
Subject: Re: One liner to remove duplicate records
Message-Id: <r53mt5p1amsu1cn4jk8bjhchsssje31vme@4ax.com>

On Fri, 30 Apr 2010 09:06:52 -0700, sln@netherlands.com wrote:

>On Fri, 30 Apr 2010 08:55:12 -0700 (PDT), Ninja Li <nickli2000@gmail.com> wrote:
>
>>Hi,
>>
>>I have a file with the following sample data delimited by "|" with
>>duplicate records:
>>
>>20100430|20100429|John Smith|-0.07|-0.08|
>>20100430|20100429|John Smith|-0.07|-0.08|
>>20100430|20100429|Ashley Cole|1.09|1.08|
>>20100430|20100429|Bill Thompson|0.76|0.78|
>>20100429|20100428|Time Apache|2.10|2.24|
>>
>>The first three fields "date_1", "date_2" and "name" are unique
>>identifiers of a record.
>>
>>Is there a simple way, like a one liner to remove the duplicates such
>>as with "John Smith"?
>>
>>Thanks in advance.
>>
>>Nick Li
>
>I could think of a way, but it takes 2 lines, sorry.

Wait, this might work.

c:\temp>perl -a -F"\|" -n -e "/^$/ and next or !exists $hash{$key = join '',@F[0
 ..2]} and ++$hash{$key} and print" file.txt
20100430|20100429|John Smith|-0.07|-0.08|
20100430|20100429|Ashley Cole|1.09|1.08|
20100430|20100429|Bill Thompson|0.76|0.78|
20100429|20100428|Time Apache|2.10|2.24|

c:\temp>

-sln


------------------------------

Date: Fri, 30 Apr 2010 12:11:34 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: One liner to remove duplicate records
Message-Id: <slrnhtm3hk.bhl.tadmc@tadbox.sbcglobal.net>

Ninja Li <nickli2000@gmail.com> wrote:
> Hi,
>
> I have a file with the following sample data delimited by "|" with
> duplicate records:
>
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|Ashley Cole|1.09|1.08|
> 20100430|20100429|Bill Thompson|0.76|0.78|
> 20100429|20100428|Time Apache|2.10|2.24|
>
> The first three fields "date_1", "date_2" and "name" are unique
> identifiers of a record.
>
> Is there a simple way, like a one liner to remove the duplicates such
> as with "John Smith"?


No, but there is a complex way like a one liner:

    perl -ni -e 'print unless @seen{/([^|]*\|[^|]*\|[^|]*\|)/}++' file

one liners are very often hideous things that no Real Programmer
would actually use, but you asked for it...


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.


------------------------------

Date: Fri, 30 Apr 2010 21:14:10 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: One liner to remove duplicate records
Message-Id: <4bdb2c02$0$22919$e4fe514c@news.xs4all.nl>

Ninja Li wrote:

> I have a file with the following sample data delimited by "|" with
> duplicate records:
> 
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|John Smith|-0.07|-0.08|
> 20100430|20100429|Ashley Cole|1.09|1.08|
> 20100430|20100429|Bill Thompson|0.76|0.78|
> 20100429|20100428|Time Apache|2.10|2.24|
> 
> The first three fields "date_1", "date_2" and "name" are unique
> identifiers of a record.
> 
> Is there a simple way, like a one liner to remove the duplicates such
> as with "John Smith"?

If the data is as strict as presented, you can use

     sort -u <input

     sort <input |uniq

or simply use the whole line as a hash key:

     perl -wne'$_{$_}++ or print' <input

(the first underscore is not really necessary)

-- 
Ruud


------------------------------

Date: Fri, 30 Apr 2010 18:56:16 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: One liner to remove duplicate records
Message-Id: <lf2nt5lltmul0j54iivhdcv3qqp5egp31u@4ax.com>

Ninja Li <nickli2000@gmail.com> wrote:
>I have a file with the following sample data delimited by "|" with
>duplicate records:
>
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|John Smith|-0.07|-0.08|
>20100430|20100429|Ashley Cole|1.09|1.08|
>20100430|20100429|Bill Thompson|0.76|0.78|
>20100429|20100428|Time Apache|2.10|2.24|
>
>The first three fields "date_1", "date_2" and "name" are unique
>identifiers of a record.
>
>Is there a simple way, like a one liner to remove the duplicates such
>as with "John Smith"?

Your data is sorted already, so a simple call to 'uniq' will do the job:
http://en.wikipedia.org/wiki/Uniq

jue


------------------------------

Date: Fri, 30 Apr 2010 16:05:32 -0700 (PDT)
From: Ratty <rodent@gmail.com>
Subject: Re: Overriding a require'd module's subroutine
Message-Id: <d1e06d8d-cb77-41b0-a9a1-5c9031f13b1b@a2g2000prd.googlegroups.com>

On Apr 29, 10:23=A0am, "Mumia W." <paduille.4061.mumia.w
+nos...@earthlink.net> wrote:
> On 04/27/2010 03:47 PM, Ratty wrote:
>
>
>
> > I'm using the MARC::Batch module. It refuses to process records with
> > character encoding issues. It dies with a warning about line 166 in
> > Encode.pm. I can use eval to make it skip bad records instead but I
> > don't want that either. I want it to do the best it can. What I need
> > to do is modify Encode::decode so it does not die when it can't decode
> > a string. This works if I add an eval to Encode.pm sitting in my perl/
> > lib directory. But I don't want to do it this way. I want everything
> > restricted to my one calling program. I seem to remember doing
> > something similar years ago by simply copying the subroutine, with
> > package name, into my program and my program would use that instead.
> > But it doesn't work for me this time. No errors, it simply ignores it.
> > Perhaps because I'm not calling the module directly this time, but
> > rather it is being called somewhere in the bowels of MARC::Record,
> > which I'm also not calling directly.
>
> > use MARC::Batch;
>
> > ## Programming here
>
> > ## Attempt to override
> > sub Encode::decode($$;$)
> > {
> > =A0 =A0 my ($name,$octets,$check) =3D @_;
> > =A0 =A0 return undef unless defined $octets;
> > =A0 =A0 $octets .=3D '' if ref $octets;
> > =A0 =A0 $check ||=3D0;
> > =A0 =A0 my $enc =3D find_encoding($name);
> > =A0 =A0 unless(defined $enc){
> > =A0 =A0require Carp;
> > =A0 =A0Carp::croak("Unknown encoding '$name'");
> > =A0 =A0 }
> > =A0 =A0 ## Add eval heres
> > =A0 =A0 my $string;
> > =A0 =A0 eval { $string =3D $enc->decode($octets,$check); };
> > =A0 =A0 $_[1] =3D $octets if $check and !($check & LEAVE_SRC());
> > =A0 =A0 return $string;
> > }
>
> > What's the most elegant way to redefine somebody else's subroutine?
> > BTW, I also tried:
>
> > {
> > local *Encode::decode =3D \&myDecode;
> > }
>
> > Doesn't work either
>
> I can't see why that doesn't work, but I suggest placing your new
> Encode::decode before the use statement for MARC::Batch. Perhaps
> MARC::Batch stores a reference to the subroutine.

OK, got it to work - don't understand why though.

I tried just pasting the entire contents of Encode.pm on to the end
of my program and that worked. Then I started removing things until it
stopped. Found out it requires a use statement in there. As in:

package Encode;

use Encode::Alias;

sub decode($$;$)
{
    my ($name,$octets,$check) =3D @_;
    my $altstring =3D $octets;
    return undef unless defined $octets;
    $octets .=3D '' if ref $octets;
    $check ||=3D0;
    my $enc =3D find_encoding($name);
    unless(defined $enc){
    require Carp;
    Carp::croak("Unknown encoding '$name'");
    }
    my $string;
    eval { $string =3D $enc->decode($octets,$check); };
    $_[1] =3D $octets if $check and !($check & LEAVE_SRC());
    if ($@) {
        return $altstring;
    } else {
        return $string;
    }
}

Doesn't work without it. I wonder why.



------------------------------

Date: Sat, 1 May 2010 01:05:25 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Overriding a require'd module's subroutine
Message-Id: <5r7sa7-19s2.ln1@osiris.mauzo.dyndns.org>


Quoth Ratty <rodent@gmail.com>:
> On Apr 29, 10:23 am, "Mumia W." <paduille.4061.mumia.w
> +nos...@earthlink.net> wrote:
> > On 04/27/2010 03:47 PM, Ratty wrote:
> >
> > > I'm using the MARC::Batch module. It refuses to process records with
> > > character encoding issues. It dies with a warning about line 166 in
> > > Encode.pm. I can use eval to make it skip bad records instead but I
> > > don't want that either. I want it to do the best it can. What I need
> > > to do is modify Encode::decode so it does not die when it can't decode
> > > a string.

I presume MARC::Batch is calling Encode::decode with a CHECK argument
(the optional third argument) of FB_CROAK? Is there any way to tell it
not to do that?

> > > This works if I add an eval to Encode.pm sitting in my perl/
> > > lib directory. But I don't want to do it this way.

Good. That would be an extremely bad idea.

> > > I want everything
> > > restricted to my one calling program. I seem to remember doing
> > > something similar years ago by simply copying the subroutine, with
> > > package name, into my program and my program would use that instead.
> > > But it doesn't work for me this time. No errors, it simply ignores it.
> > > Perhaps because I'm not calling the module directly this time, but
> > > rather it is being called somewhere in the bowels of MARC::Record,
> > > which I'm also not calling directly.
> >
> > > use MARC::Batch;
> >
> > > ## Programming here
> >
> > > ## Attempt to override
> > > sub Encode::decode($$;$)
> > > {
> > >     my ($name,$octets,$check) = @_;
> > >     return undef unless defined $octets;
> > >     $octets .= '' if ref $octets;
> > >     $check ||=0;
> > >     my $enc = find_encoding($name);
> > >     unless(defined $enc){
> > >    require Carp;
> > >    Carp::croak("Unknown encoding '$name'");
> > >     }
> > >     ## Add eval heres
> > >     my $string;
> > >     eval { $string = $enc->decode($octets,$check); };
> > >     $_[1] = $octets if $check and !($check & LEAVE_SRC());
> > >     return $string;
> > > }
> >
> > > What's the most elegant way to redefine somebody else's subroutine?
> > > BTW, I also tried:
> >
> > > {
> > > local *Encode::decode = \&myDecode;
> > > }
> >
> > > Doesn't work either
> >
> > I can't see why that doesn't work, but I suggest placing your new
> > Encode::decode before the use statement for MARC::Batch. Perhaps
> > MARC::Batch stores a reference to the subroutine.

If MARC::Batch imports Encode::decode into its own namespace, and calls
it unqualified, then the imported sub will be a ref to the sub that
happened to be in *Encode::decode at the time of the import. There are
two ways to handle this: either temporarily redefine &Encode::decode
while MARC::Batch is loaded, like this:

    use Encode;

    BEGIN {
        local *Encode::decode = \&my_decode;
        require MARC::Batch;
        MARC::Batch->import(...);
    }

(pass the same arguments to 'import' as you previously passed to 'use');
or redefine &MARC::Batch::decode *after* it's loaded, like this:

    use MARC::Batch ...;

    # later

    {
        local *MARC::Batch::decode = \&my_decode;

        # call functions that might die
    }

which has the advantage of only overriding MARC::Batch's behaviour when
you need to.

> OK, got it to work - don't understand why though.
> 
> I tried just pasting the entire contents of Encode.pm on to the end
> of my program and that worked. Then I started removing things until it
> stopped. Found out it requires a use statement in there. As in:
> 
> package Encode;
> 
> use Encode::Alias;
> 
> sub decode($$;$)
> {
>     my ($name,$octets,$check) = @_;
>     my $altstring = $octets;
>     return undef unless defined $octets;
>     $octets .= '' if ref $octets;
>     $check ||=0;
>     my $enc = find_encoding($name);

This function is defined in Encode.pm. If you don't load Encode.pm, it
won't be defined, so you need to 'use Encode' first.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2928
***************************************


home help back first fref pref prev next nref lref last post