[30255] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1498 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu May 1 14:09:50 2008

Date: Thu, 1 May 2008 11:09:16 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 1 May 2008     Volume: 11 Number: 1498

Today's topics:
    Re: cperl-mode.el <nospam-abuse@ilyaz.org>
    Re: Frequency in large datasets <simon.chao@fmr.com>
    Re: Frequency in large datasets <1usa@llenroc.ude.invalid>
    Re: Frequency in large datasets <syscjm@sumire.gwu.edu>
    Re: Frequency in large datasets <syscjm@sumire.gwu.edu>
    Re: Frequency in large datasets <syscjm@sumire.gwu.edu>
    Re: Help: Replace Help <RedGrittyBrick@SpamWeary.foo>
    Re: Help: Replace Help <jurgenex@hotmail.com>
    Re: Help: Replace Help <1usa@llenroc.ude.invalid>
    Re: Help: Replace Help <jo@nosp.invalid>
    Re: pop langs website ranking <jon@ffconsultancy.com>
    Re: Read 20 lines when pressing n for next <spamtrap@dot-app.org>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 1 May 2008 16:23:48 +0000 (UTC)
From:  Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: cperl-mode.el
Message-Id: <fvcqqk$27u1$1@agate.berkeley.edu>

[A complimentary Cc of this posting was sent to
Ben Bullock
<benkasminbullock@gmail.com>], who wrote in article <fvc4uf$pa5$1@ml.accsnet.ne.jp>:
> > From when updating a copyright message counts as a maintainance?

> Obviously it doesn't, but the file at least appears to be newer than the one 
> in the Perl source code.

As I said, in-Perl version is more obsolete.  On the other hand, it
causes much fewer problems.

> >>, which works very well - it isn't broken.
> >
> > Says who?  Do you monitor for bug reports, and how they are "fixed"?

> I'm only describing my own experience. It isn't "broken" in the sense of 
> being unuseable. I'm writing Perl code almost every day & using the 
> emacs-source cperl-mode.el, and have yet to notice any serious bugs.

Given the bugs people report with Emacs' version (ALL of which are
fixed by switching to the "genuine" one), you must have pretty high
bugs tolerance...

> The only thing which I've noticed is that it has an annoying habit
> of instantly complaining if it can't find the end of a here string
> or regular expression, which is kind of silly since usually one
> doesn't write the end of the thing immediately after writing the
> beginning of it.

True.  Do you have any idea how would I be able to detect this
situation?  Point after start of construct and modified, or what?

Can one detect that the ELisp code is called from an async handler?
 
> >> What it looks like is a fork. But the above comment about
> >> cperl-mode.el in the Emacs tree being out of date and broken is
> >> itself now out of date.
> >
> > I have no idea what you want to say here...

> What I want to say I thought was clear, but apparently not. I raised the 
> issue of cperl-mode.el due to the FAQ question apparently being out of date. 
> If you have a different story, perhaps you can spell it out for us ignorant 
> people.

I think I did.

> Anyway, you're the author of cperl-mode.el, something probably not many 
> people noticed from reading the discussion, so thank you for making this 
> mode, which is useful to me.

Noted, and very appreciated.

Yours,
Ilya


------------------------------

Date: Thu, 1 May 2008 08:54:14 -0700 (PDT)
From: nolo contendere <simon.chao@fmr.com>
Subject: Re: Frequency in large datasets
Message-Id: <1aa7f96f-7458-4d3a-9c1c-ff437dfeec41@b64g2000hsa.googlegroups.com>

On May 1, 7:26=A0am, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
> benkasminbull...@gmail.com (Ben Bullock) wrote innews:fvbj3s$l7u$1@ml.accs=
net.ne.jp:
>
> > A. Sinan Unur <1...@llenroc.ude.invalid> wrote:
>
> Hope this helps you become more comfortable with the notion that reading
> a 47 GB file is a boneheaded move. It is boneheaded if I do it, if Larry
> Wall does it, if Superman does it ... you get the picture I hope.
>

I don't think it would be boneheaded if Superman did it...I mean, he's
SUPERMAN.


------------------------------

Date: Thu, 01 May 2008 15:57:45 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Frequency in large datasets
Message-Id: <Xns9A9179AE0803Dasu1cornelledu@127.0.0.1>

nolo contendere <simon.chao@fmr.com> wrote in
news:1aa7f96f-7458-4d3a-9c1c-ff437dfeec41@b64g2000hsa.googlegroups.com: 

> On May 1, 7:26 am, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
>> benkasminbull...@gmail.com (Ben Bullock) wrote
>> innews:fvbj3s$l7u$1@ml.accs 
> net.ne.jp:
>>
>> > A. Sinan Unur <1...@llenroc.ude.invalid> wrote:
>>
>> Hope this helps you become more comfortable with the notion that
>> reading a 47 GB file is a boneheaded move. It is boneheaded if I do
>> it, if Larry Wall does it, if Superman does it ... you get the
>> picture I hope. 
>>
> 
> I don't think it would be boneheaded if Superman did it...I mean, he's
> SUPERMAN.

But attempting to slurp a 47 GB files is the equivalent of having a 
cryptonite slurpee in the morning.

Not good.

;-)

Sinan

-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


------------------------------

Date: Thu, 01 May 2008 11:40:54 -0500
From: Chris Mattern <syscjm@sumire.gwu.edu>
Subject: Re: Frequency in large datasets
Message-Id: <slrng1jskm.ss8.syscjm@sumire.gwu.edu>

On 2008-05-01, Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
> Cosmic Cruizer wrote:
>> I've been able to reduce my dataset by 75%, but it still leaves me with a 
>> file of 47 gigs. I'm trying to find the frequency of each line using:
>> 
>>  open(TEMP, "< $tempfile")             || die "cannot open file $tempfile: 
>> $!";
>>     foreach (<TEMP>) {
>>       $seen{$_}++;
>>     }
>>   close(TEMP)                           || die "cannot close file 
>> $tempfile: $!";
>> 
>> My program keeps aborting after a few minutes because the computer runs out 
>> of memory.
>
> This line:
>
>>     foreach (<TEMP>) {
>
> reads the whole file into memory. You should read the file line by line 
> instead by replacing it with:
>
>      while (<TEMP>) {
>
Which still leaves him with a hash that keeps each unique line in the file
as a separate key.  Betcha it doesn't fit.  Basic UNIX utilities can do
this, though I will admit I can't guarantee that sort can handle something
this big:

sort tempfile | uniq -c

-- 
             Christopher Mattern

NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities


------------------------------

Date: Thu, 01 May 2008 11:42:16 -0500
From: Chris Mattern <syscjm@sumire.gwu.edu>
Subject: Re: Frequency in large datasets
Message-Id: <slrng1jsn7.ss8.syscjm@sumire.gwu.edu>

On 2008-05-01, Cosmic Cruizer <XXjbhuntxx@white-star.com> wrote:
> Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in
> news:67so01F2nertiU1@mid.individual.net: 
>
>> Cosmic Cruizer wrote:
>>> I've been able to reduce my dataset by 75%, but it still leaves me
>>> with a file of 47 gigs. I'm trying to find the frequency of each line
>>> using: 
>>> 
>>>  open(TEMP, "< $tempfile")             || die "cannot open file
>>>  $tempfile: 
>>> $!";
>>>     foreach (<TEMP>) {
>>>       $seen{$_}++;
>>>     }
>>>   close(TEMP)                           || die "cannot close file 
>>> $tempfile: $!";
>>> 
>>> My program keeps aborting after a few minutes because the computer
>>> runs out of memory.
>> 
>> This line:
>> 
>>>     foreach (<TEMP>) {
>> 
>> reads the whole file into memory. You should read the file line by
>> line instead by replacing it with:
>> 
>>      while (<TEMP>) {
>> 
>
><sigh> As both you and Sinan pointed out... I'm using foreach. Everywhere 
> else I used the while statement to get me to this point. This solves the 
> problem.
>
> Thank you.

Didn't realize your file had so many duplicates (and thus such a small
set of unique lines).  If it works, that's great!


-- 
             Christopher Mattern

NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities


------------------------------

Date: Thu, 01 May 2008 11:43:22 -0500
From: Chris Mattern <syscjm@sumire.gwu.edu>
Subject: Re: Frequency in large datasets
Message-Id: <slrng1jspa.ss8.syscjm@sumire.gwu.edu>

On 2008-05-01, nolo contendere <simon.chao@fmr.com> wrote:
> On May 1, 7:26 am, "A. Sinan Unur" <1...@llenroc.ude.invalid> wrote:
>> benkasminbull...@gmail.com (Ben Bullock) wrote innews:fvbj3s$l7u$1@ml.accsnet.ne.jp:
>>
>> > A. Sinan Unur <1...@llenroc.ude.invalid> wrote:
>>
>> Hope this helps you become more comfortable with the notion that reading
>> a 47 GB file is a boneheaded move. It is boneheaded if I do it, if Larry
>> Wall does it, if Superman does it ... you get the picture I hope.
>>
>
> I don't think it would be boneheaded if Superman did it...I mean, he's
> SUPERMAN.

Hey, Superman can do boneheaded things.  It's not like he's Chuck Norris.


-- 
             Christopher Mattern

NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities


------------------------------

Date: Thu, 01 May 2008 14:28:24 +0100
From: RedGrittyBrick <RedGrittyBrick@SpamWeary.foo>
Subject: Re: Help: Replace Help
Message-Id: <4819c57b$0$10629$fa0fcedb@news.zen.co.uk>

Amy Lee wrote:
> On Thu, 01 May 2008 12:50:48 +0000, Jürgen Exner wrote:
> 
>> Amy Lee <openlinuxsource@gmail.com> wrote:
>>
>>> I wanna replace A to C, C to A, G to U, U to G.
>>
>> 	tr {ACGU}{CAUG};
>>
> could you tell me what {} stands for? 
> 

{} stands for {}

They are just used to group the characters to be replaced and their 
replacements.

The following are all equivalent

  tr/ACGU/CAUG/;
  tr!ACGU!CAUG!;
  tr-ACGU-CAUG-;
  tr.ACGU.CAUG.;

  tr{ACGU}{CAUG};
  tr(ACGU)(CAUG);
  tr[ACGU][CAUG];
  tr<ACGU>(CAUG);

Perl lets you use almost any character as a delimiter/separator for the 
two groups of characters, you can instead use any of a few types of 
bracket or brace like characters to group the two sets of characters.

Choose whatever characters make the code clearest to readers. The oldest 
form is the first shown above but people can use one of the other forms 
for greater clarity if, for example, they need to translate '/' to 
something else.

-- 
RGB


------------------------------

Date: Thu, 01 May 2008 13:36:22 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Help: Replace Help
Message-Id: <lfhj145tifkppp2jptnigk96mhrq95q202@4ax.com>

Amy Lee <openlinuxsource@gmail.com> wrote:
>On Thu, 01 May 2008 12:50:48 +0000, Jürgen Exner wrote:
>> Much better option: use tr{}{}
>> 
>> 	tr {ACGU}{CAUG};
>> 
> And could you tell me what {} stands for? 

Hmmmm, what do you mean? It's just curly brackets or braces, see
http://en.wikipedia.org/wiki/Brackets#Uses_of_.E2.80.9C.7B.E2.80.9D_and_.E2.80.9C.7D.E2.80.9D

And maybe 'perldoc perlop', section 'Quotes and quote-like Operators'.

jue


------------------------------

Date: Thu, 01 May 2008 13:46:27 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Help: Replace Help
Message-Id: <Xns9A91636B6B641asu1cornelledu@127.0.0.1>

Ben Bullock <benkasminbullock@gmail.com> wrote in news:fvcf0t$pa6$1
@ml.accsnet.ne.jp:

> On Thu, 01 May 2008 20:29:38 +0800, Amy Lee wrote:
> 
>> So how to solve this kind of order problem? I suppose that the
>> replacement must process at the same time.
> 
> For single letters you can use
> 
> tr/ACGU/CAUG/;
> 
> If the strings to swap are longer than a single character,
> 
> s/A/unlikely/g;
> s/C/A/g;
> s/unlikely/C/g;
> s/G/unlikely/g;
> s/U/G/g;
> s/unlikely/U/g;
> 
> where "unlikely" is a string which is unlikely to occur in your data.

A simple lookup table driven solution would obviate the need to make 
assumptions about the unlikeliness of a given character as well as 
getting rid of the multiple substitutions.

Sinan

-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


------------------------------

Date: 01 May 2008 14:11:53 GMT
From: Jo <jo@nosp.invalid>
Subject: Re: Help: Replace Help
Message-Id: <4819cfa9$0$14355$e4fe514c@news.xs4all.nl>

RedGrittyBrick said:
> The following are all equivalent
>   tr/ACGU/CAUG/;
>   ...
>   tr[ACGU][CAUG];

I'd like to add that whitespace is allowed also. This can help writing
readable code:

 tr [ACGU]
    [CAUG];




------------------------------

Date: Thu, 01 May 2008 17:50:32 +0100
From: Jon Harrop <jon@ffconsultancy.com>
Subject: Re: pop langs website ranking
Message-Id: <x6SdnUwlzLGga4TVRVnyhQA@plusnet>

xahlee@gmail.com wrote:
> Alexa's data is more reliable than quantcast.

Alexa claim to have accurate data on lots of sites but I just tried to
correlate their data with the exact data on our web server and the
discrepancies are huge. For example, combining our number of absolute
visitors with their measure of "reach" for our site indicates that there
are 58 billion internet users.

So their data are not even order-of-magnitude accurate. The only web analyst
I ever met was an astrophysicist so this does not really surprise me. ;-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com/products/?u


------------------------------

Date: Thu, 01 May 2008 12:35:23 -0400
From: Sherman Pendley <spamtrap@dot-app.org>
Subject: Re: Read 20 lines when pressing n for next
Message-Id: <m1lk2u0ypg.fsf@dot-app.org>

"Gordon Etly" <get@bentsys.com> writes:

> A. Sinan Unur wrote:
>> Chris Mattern <syscjm@sumire.gwu.edu> wrote in
>> news:slrng1h584.r80.syscjm@sumire.gwu.edu:
>>
>> > On 2008-04-30, A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
>> > > s9uzaa@gmail.com wrote in news:37b9eb38-e188-4dc2-b3a7-
>> > > 5f09cc3b81ea@a70g2000hsh.googlegroups.com:
>> > >
>> > > > I would like to write a perl script with the following criteria
>> > > > match.
>> > >
>> > > Give it a shot. Then post any questions you might encounter 
>> > > (please
>> > > read the posting guidelines first).
>> > >
>> > > > 1. open any text file taken the name from the command line.
>> > > > 2. read top 20 lines and stops, then
>> > > > 3. ask to press letter "n or p" (for next/previous) to print 
>> > > > next
>> > > > or previous 20 lines.
>> > > >  would appreciate any kind of help.
>> > > > 4. must have subroutine used.
>> > >
>> > > Look up $. in perldoc perlvar
>> > >
>> > I got $5 that says this is homework.  "must have subroutine used" is
>> > a dead giveaway.
>>
>> Agreed. Which is why he does not get any fish before showing his
>> attempts at fishing for himself ;-)
>
> Why do you all just assume it's a homework assignment?

Read step four: "Must have subroutine used."

It doesn't take a psychic or a genius to identify homework when it has
requirements like that.

> Could it not just 
> as well be a simplified work project? I would not be at all surprised if 
> this was something handed down by one's boss or project manager, and 
> written in a simplified form (which is what one *should* do, no?)

Uh - no. The point of hiring a professional programmer is that the manager
should *not* have to write detailed, trivial instructions like "be sure to
use subroutines." One gets that kind of assignment at school, but at work
one is expected to know how to do the job without all the hand-holding.

sherm--

-- 
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1498
***************************************


home help back first fref pref prev next nref lref last post