[32168] in Perl-Users-Digest
Perl-Users Digest, Issue: 3433 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Jul 5 16:09:34 2011
Date: Tue, 5 Jul 2011 13:09:15 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 5 Jul 2011 Volume: 11 Number: 3433
Today's topics:
emacs lisp text processing example (html5 figure/figcap <xahlee@gmail.com>
Re: emacs lisp text processing example (html5 figure/fi <StefanMandl@web.de>
Re: emacs lisp text processing example (html5 figure/fi <xahlee@gmail.com>
Re: Module to check overlap? <ela@yantai.org>
Re: perl prevayler? --------------- <glex_no-spam@qwest-spam-no.invalid>
Re: perl prevayler? www.prevayler.org <jimsgibson@gmail.com>
Posting Guidelines for comp.lang.perl.misc ($Revision: tadmc@seesig.invalid
Re: sort scientific notation value after alphabet <cartercc@gmail.com>
Re: sort scientific notation value after alphabet <uri@StemSystems.com>
Re: sort scientific notation value after alphabet <jondk@FAKE.EMAIL.net>
Re: sort scientific notation value after alphabet <jondk@FAKE.EMAIL.net>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 3 Jul 2011 23:36:08 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: emacs lisp text processing example (html5 figure/figcaption)
Message-Id: <bf6bcb4b-3f87-4dbe-84e4-1fa157871161@k23g2000pri.googlegroups.com>
OMG, emacs lisp beats perl/python again!
Hiya all, another little emacs lisp tutorial from the tiny Xah's Edu
Corner.
=E3=80=88Emacs Lisp: Processing HTML: Transform Tags to HTML5 =E2=80=9Cfigu=
re=E2=80=9D and
=E2=80=9Cfigcaption=E2=80=9D Tags=E3=80=89
xahlee.org/emacs/elisp_batch_html5_tag_transform.html
plain text version follows.
------------------------------------------
Emacs Lisp: Processing HTML: Transform Tags to HTML5 =E2=80=9Cfigure=E2=80=
=9D and
=E2=80=9Cfigcaption=E2=80=9D Tags
Xah Lee, 2011-07-03
Another triumph of using elisp for text processing over perl/python.
----------------------------
The Problem
--------------
Summary
I want batch transform the image tags in 5 thousand html files to use
HTML5's new =E2=80=9Cfigure=E2=80=9D and =E2=80=9Cfigcaption=E2=80=9D tags.
I want to be able to view each change interactively, while optionally
give it a =E2=80=9Cgo ahead=E2=80=9D to do the whole job in batch.
Interactive eye-ball verification on many cases lets me be reasonably
sure the transform is done correctly. Yet i don't want to spend days
to think/write/test a mathematically correct program that otherwise
can be finished in 30 min with human interaction.
--------------
Detail
HTML5 has the following new tags: =E2=80=9Cfigure=E2=80=9D and =E2=80=9Cfig=
caption=E2=80=9D. They are
used like this:
<figure>
<img src=3D"cat.jpg" alt=3D"my cat" width=3D"167" height=3D"106">
<figcaption>my cat!</figcaption>
</figure>
(For detail, see: HTML5 =E2=80=9Cfigure=E2=80=9D =EF=BC=86 =E2=80=9Cfigurec=
aption=E2=80=9D Tags Browser
Support)
On my website, i used a similar structure. They look like this:
<div class=3D"img">
<img src=3D"cat.jpg" alt=3D"my cat" width=3D"167" height=3D"106">
<p class=3D"cpt">my cat!</p>
</div>
So, i want to replace them with the HTML5's new tags. This can be done
with a regex. Here's the =E2=80=9Cfind=E2=80=9D regex:
<div class=3D"img">
?<img src=3D"\([^.]+?\)\.jpg" alt=3D"\([^"]+?\)" width=3D"\([0-9]+?\)"
height=3D"\([0-9]+?\)">?
<p class=3D"cpt">\([^<]+?\)</p>
?</div>
Here's the replacement string:
<figure>
<img src=3D"\1.jpg" alt=3D"\2" width=3D"\3" height=3D"\4">
<figcaption>\5</figcaption>
</figure>
Then, you can use =E2=80=9Cfind-file=E2=80=9D and dired's =E2=80=9Cdired-do=
-query-replace-
regexp=E2=80=9D to work on your 5 thousand pages. Nice. (See: Emacs:
Interactively Find =EF=BC=86 Replace String Patterns on Multiple Files.)
However, the problem here is more complicated. The image file may be
jpg or png or gif. Also, there may be more than one image per group.
Also, the caption part may also contain complicated html. Here's some
examples:
<div class=3D"img">
<img src=3D"cat1.jpg" alt=3D"my cat" width=3D"200" height=3D"200">
<img src=3D"cat2.jpg" alt=3D"my cat" width=3D"200" height=3D"200">
<p class=3D"cpt">my 2 cats</p>
</div>
<div class=3D"img">
<img src=3D"jamie_cat.jpg" alt=3D"jamie's cat" width=3D"167" height=3D"106"=
>
<p class=3D"cpt">jamie's cat! Her blog is <a href=3D"http://example.com/
jamie/">http://example.com/jamie/</a></p>
</div>
So, a solution by regex is out.
----------------------------
Solution
The solution is pretty simple. Here's the major steps:
Use =E2=80=9Cfind-lisp-find-files=E2=80=9D to traverse a dir.
For each file, open it.
Search for the string <div class=3D"img">
Use =E2=80=9Csgml-skip-tag-forward=E2=80=9D to jump to its closing tag.
Save the positions of these tag begin/end positions.
Ask user if she wants to replace. If so, do it. (using =E2=80=9Cdelete-
region=E2=80=9D and =E2=80=9Cinsert=E2=80=9D)
Repeat.
Here's the code:
;; -*- coding: utf-8 -*-
;; 2011-07-03
;; replace image tags to use html5's =E2=80=9Cfigure=E2=80=9D and =E2=80=
=9Cfigcaption=E2=80=9D tags.
;; Example. This:
;; <div class=3D"img">=E2=80=A6</div>
;; should become this
;; <figure>=E2=80=A6</figure>
;; do this for all files in a dir.
;; rough steps:
;; find the <div class=3D"img">
;; use sgml-skip-tag-forward to move to the ending tag.
;; save their positions.
(defun my-process-file (fpath)
"process the file at fullpath FPATH ..."
(let (mybuff p1 p2 p3 p4 )
(setq mybuff (find-file fpath))
(widen)
(goto-char 0) ;; in case buffer already open
(while (search-forward "<div class=3D\"img\">" nil t)
(progn
(setq p2 (point) )
(backward-char 17) ; beginning of =E2=80=9Cdiv=E2=80=9D tag
(setq p1 (point) )
(forward-char 1)
(sgml-skip-tag-forward 1) ; move to the closing tag
(setq p4 (point) )
(backward-char 6) ; beginning of the closing div tag
(setq p3 (point) )
(narrow-to-region p1 p4)
(when (y-or-n-p "replace?")
(progn
(delete-region p3 p4 )
(goto-char p3)
(insert "</figure>")
(delete-region p1 p2 )
(goto-char p1)
(insert "<figure>")
(widen) ) ) ) )
(when (not (buffer-modified-p mybuff)) (kill-buffer mybuff) )
) )
(require 'find-lisp)
(let (outputBuffer)
(setq outputBuffer "*xah img/figure replace output*" )
(with-output-to-temp-buffer outputBuffer
(mapc 'my-process-file (find-lisp-find-files "~/web/xahlee_org/
emacs/" "\\.html$"))
(princ "Done deal!")
) )
Seems pretty simple right?
The =E2=80=9Cp1=E2=80=9D and =E2=80=9Cp2=E2=80=9D variables are the positio=
ns of start/end of <div
class=3D"img">. The =E2=80=9Cp3=E2=80=9D and =E2=80=9Cp4=E2=80=9D is the st=
art/end of it's closing tag </
div>.
We also used a little trick with =E2=80=9Cwiden=E2=80=9D and =E2=80=9Cnarro=
w-to-region=E2=80=9D. It
lets me see just the part that i'm interested. It narrows to the
beginning/end of the div.img. This makes eye-balling a bit easier.
The real time-saver is the =E2=80=9Csgml-skip-tag-forward=E2=80=9D function=
from =E2=80=9Chtml-
mode=E2=80=9D. Without that, one'd have to write a mini-parser to deal with
html's nested ways to be able to locate the proper ending tag.
Using the above code, i can comfortably eye-ball and press =E2=80=9Cy=E2=80=
=9D at the
rate of about 5 per second. That makes 300 replacements per minute. I
have 5000+ files. If we presume there are 6k replacement to be made,
then at 5 per second means 20 minutes sitting there pressing =E2=80=9Cy=E2=
=80=9D.
Quite tiresome.
So, now, the next step is simply to remove the asking (y-or-n-p
"replace?"). Or, if i'm absolutely paranoid, i can make emacs write
into a log buffer for every replacement it makes (together with the
file path). When the batch replacement is done (probably under 3
minutes), i can simply scan thru the log to see if any replacement
went wrong. For how to do that, see: Emacs Lisp: Multi-Pair String
Replacement with Report.
But what about replacing <p class=3D"cpt">=E2=80=A6</p> with <figcaption>=
=E2=80=A6</
figcaption>?
I simply copy and pasted the above code into a new file, just made
changes in 4 places. So, the replacing figcaption part is considered a
separete batch job. Of course, one could spend extra hour or so to
make the code do them both in one pass, but is that one extra hour of
thinking =EF=BC=86 coding worthwhile for this one-time job?
I =E2=99=A5 Emacs, do you?
---------------------------------
PS perl and python solution welcome. I haven't looked at perl or
python's html parser libs for 5+ years.
Though, 2 little requirement:
1. it must be correct, of course. Cannot tolerate the possiblility
that maybe one out of a thousand replacement it introduced a
mismatched tag. (but you can assume that all the input html files are
w3c valid)
2. it must not change the formatting of the html pages. i.e. adding/
removing spaces or tabs.
Xah
------------------------------
Date: Mon, 4 Jul 2011 12:13:56 -0700 (PDT)
From: "S.Mandl" <StefanMandl@web.de>
Subject: Re: emacs lisp text processing example (html5 figure/figcaption)
Message-Id: <d5bdb9c9-6a04-456d-b7cd-0d02efdd9e70@j28g2000vbp.googlegroups.com>
Nice. I guess that XSLT would be another (the official) approach for
such a task.
Is there an XSLT-engine for Emacs?
-- Stefan
------------------------------
Date: Tue, 5 Jul 2011 12:47:21 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: Re: emacs lisp text processing example (html5 figure/figcaption)
Message-Id: <a4094579-d5ec-4b91-a855-f186a38bc984@q34g2000prf.googlegroups.com>
On Jul 4, 12:13=A0pm, "S.Mandl" <StefanMa...@web.de> wrote:
> Nice. I guess that XSLT would be another (the official) approach for
> such a task.
> Is there an XSLT-engine for Emacs?
>
> -- Stefan
haven't used XSLT, and don't know if there's one in emacs...
it'd be nice if someone actually give a example...
Xah
------------------------------
Date: Mon, 4 Jul 2011 12:15:35 -0700
From: "ela" <ela@yantai.org>
Subject: Re: Module to check overlap?
Message-Id: <iurb52$dgk$1@ijustice.itsc.cuhk.edu.hk>
"Ted Zlatanov" <tzz@lifelogs.com> wrote in message
news:8762nmjisc.fsf@lifelogs.com...
>
> There are several areas involved here: set operations (pure math),
> inversion and regular interval lists (data structures) and interval
> normalization and set operations (algorithms). It's definitely not
> simple stuff, but worth learning if you find it interesting and useful.
>
> As I said, you can try the SQL route if this is too complicated. You
> can express all this in SQL, but it will be very specific code that you
> probably will not be able to reuse. It will certainly involve less
> algorithms and more logic.
>
> Working with lists of intervals is fine and not so complicated; I think
> Jurgen explained it. Follow up to his post with questions if you want
> to take that approach.
>
> Ted
Both Jürgen and you are very nice to help me formulate my problem. Although
I haven't followed the inversion list approach, I really benefit a lot from
your discussion (and of course Jürgen's comments). I finally use inclusion
list as it is within my capability (use several if and else to check start's
and end's and move indices accordingly).
Thanks again!
:-)
------------------------------
Date: Tue, 05 Jul 2011 16:16:41 +0000
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: perl prevayler? ---------------
Message-Id: <4e1338e9$0$73615$815e3792@news.qwest.net>
On 07/01/11 06:08, gavino wrote:
> even ncier without any oo bs
>
> can someone translate this to perl?
>
> transactions log to disk
>
> updates n queries on in memory data
>
> dump all in memory data once an hour or so
>
> throw away postgresql/oracle/other shit
#!/usr/bin/perl
print "transactions log to disk\n\n",
"updates n queries on in memory data\n\n",
"dump all in memory data once an hour or so\n\n",
"throw away postgresql/oracle/other sh--\n\n";
------------------------------
Date: Tue, 05 Jul 2011 09:43:46 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: perl prevayler? www.prevayler.org
Message-Id: <050720110943465722%jimsgibson@gmail.com>
In article
<00981bcc-9b56-4bd3-a906-b1f344cede87@s33g2000prg.googlegroups.com>,
gavino <gavcomedy@gmail.com> wrote:
> even ncier without any oo bs
>
> can someone translate this to perl?
>
> transactions log to disk
>
> updates n queries on in memory data
>
> dump all in memory data once an hour or so
>
> throw away postgresql/oracle/other shit
From www.prevayler.org:
"Prevayler is an open source object persistence library for Java. It is
an implementation of the Prevalent System design pattern, in which
business objects are kept live in memory and transactions are journaled
for system recovery. Prevayler is the simplest and fastest way to
provide ACID persistence for your 'plain old Java objects'".
What part of this do you want to port to Perl? It kind of sounds like a
database. Try googling for "perl persistence framework".
--
Jim Gibson
------------------------------
Date: Tue, 05 Jul 2011 02:17:03 -0500
From: tadmc@seesig.invalid
Subject: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.9 $)
Message-Id: <k-edncK5kufyJ4_TnZ2dnUVZ5vGdnZ2d@giganews.com>
Outline
Before posting to comp.lang.perl.misc
Must
- Check the Perl Frequently Asked Questions (FAQ)
- Check the other standard Perl docs (*.pod)
Really Really Should
- Lurk for a while before posting
- Search a Usenet archive
If You Like
- Check Other Resources
Posting to comp.lang.perl.misc
Is there a better place to ask your question?
- Question should be about Perl, not about the application area
How to participate (post) in the clpmisc community
- Carefully choose the contents of your Subject header
- Use an effective followup style
- Speak Perl rather than English, when possible
- Ask perl to help you
- Do not re-type Perl code
- Provide enough information
- Do not provide too much information
- Do not post binaries, HTML, or MIME
Social faux pas to avoid
- Asking a Frequently Asked Question
- Asking a question easily answered by a cursory doc search
- Asking for emailed answers
- Beware of saying "doesn't work"
- Sending a "stealth" Cc copy
Be extra cautious when you get upset
- Count to ten before composing a followup when you are upset
- Count to ten after composing and before posting when you are upset
-----------------------------------------------------------------
Posting Guidelines for comp.lang.perl.misc ($Revision: 1.9 $)
This newsgroup, commonly called clpmisc, is a technical newsgroup
intended to be used for discussion of Perl related issues (except job
postings), whether it be comments or questions.
As you would expect, clpmisc discussions are usually very technical in
nature and there are conventions for conduct in technical newsgroups
going somewhat beyond those in non-technical newsgroups.
The article at:
http://www.catb.org/~esr/faqs/smart-questions.html
describes how to get answers from technical people in general.
This article describes things that you should, and should not, do to
increase your chances of getting an answer to your Perl question. It is
available in POD, HTML and plain text formats at:
http://www.rehabitation.com/clpmisc.shtml
For more information about netiquette in general, see the "Netiquette
Guidelines" at:
http://andrew2.andrew.cmu.edu/rfc/rfc1855.html
A note to newsgroup "regulars":
Do not use these guidelines as a "license to flame" or other
meanness. It is possible that a poster is unaware of things
discussed here. Give them the benefit of the doubt, and just
help them learn how to post, rather than assume that they do
know and are being the "bad kind" of Lazy.
A note about technical terms used here:
In this document, we use words like "must" and "should" as
they're used in technical conversation (such as you will
encounter in this newsgroup). When we say that you *must* do
something, we mean that if you don't do that something, then
it's unlikely that you will benefit much from this group.
We're not bossing you around; we're making the point without
lots of words.
Do *NOT* send email to the maintainer of these guidelines. It will be
discarded unread. The guidelines belong to the newsgroup so all
discussion should appear in the newsgroup. I am just the secretary that
writes down the consensus of the group.
Before posting to comp.lang.perl.misc
Must
This section describes things that you *must* do before posting to
clpmisc, in order to maximize your chances of getting meaningful replies
to your inquiry and to avoid getting flamed for being lazy and trying to
have others do your work.
The perl distribution includes documentation that is copied to your hard
drive when you install perl. Also installed is a program for looking
things up in that (and other) documentation named 'perldoc'.
You should either find out where the docs got installed on your system,
or use perldoc to find them for you. Type "perldoc perldoc" to learn how
to use perldoc itself. Type "perldoc perl" to start reading Perl's
standard documentation.
Check the Perl Frequently Asked Questions (FAQ)
Checking the FAQ before posting is required in Big 8 newsgroups in
general, there is nothing clpmisc-specific about this requirement.
You are expected to do this in nearly all newsgroups.
You can use the "-q" switch with perldoc to do a word search of the
questions in the Perl FAQs.
Check the other standard Perl docs (*.pod)
The perl distribution comes with much more documentation than is
available for most other newsgroups, so in clpmisc you should also
see if you can find an answer in the other (non-FAQ) standard docs
before posting.
It is *not* required, or even expected, that you actually *read* all of
Perl's standard docs, only that you spend a few minutes searching them
before posting.
Try doing a word-search in the standard docs for some words/phrases
taken from your problem statement or from your very carefully worded
"Subject:" header.
Really Really Should
This section describes things that you *really should* do before posting
to clpmisc.
Lurk for a while before posting
This is very important and expected in all newsgroups. Lurking means
to monitor a newsgroup for a period to become familiar with local
customs. Each newsgroup has specific customs and rituals. Knowing
these before you participate will help avoid embarrassing social
situations. Consider yourself to be a foreigner at first!
Search a Usenet archive
There are tens of thousands of Perl programmers. It is very likely
that your question has already been asked (and answered). See if you
can find where it has already been answered.
One such searchable archive is:
http://groups.google.com/advanced_search
If You Like
This section describes things that you *can* do before posting to
clpmisc.
Check Other Resources
You may want to check in books or on web sites to see if you can
find the answer to your question.
But you need to consider the source of such information: there are a
lot of very poor Perl books and web sites, and several good ones
too, of course.
Posting to comp.lang.perl.misc
There can be 200 messages in clpmisc in a single day. Nobody is going to
read every article. They must decide somehow which articles they are
going to read, and which they will skip.
Your post is in competition with 199 other posts. You need to "win"
before a person who can help you will even read your question.
These sections describe how you can help keep your article from being
one of the "skipped" ones.
Is there a better place to ask your question?
Question should be about Perl, not about the application area
It can be difficult to separate out where your problem really is,
but you should make a conscious effort to post to the most
applicable newsgroup. That is, after all, where you are the most
likely to find the people who know how to answer your question.
Being able to "partition" a problem is an essential skill for
effectively troubleshooting programming problems. If you don't get
that right, you end up looking for answers in the wrong places.
It should be understood that you may not know that the root of your
problem is not Perl-related (the two most frequent ones are CGI and
Operating System related), so off-topic postings will happen from
time to time. Be gracious when someone helps you find a better place
to ask your question by pointing you to a more applicable newsgroup.
How to participate (post) in the clpmisc community
Carefully choose the contents of your Subject header
You have 40 precious characters of Subject to win out and be one of
the posts that gets read. Don't waste them. Take care while
composing them, they are the key that opens the door to getting an
answer.
Spend them indicating what aspect of Perl others will find if they
should decide to read your article.
Do not spend them indicating "experience level" (guru, newbie...).
Do not spend them pleading (please read, urgent, help!...).
Do not spend them on non-Subjects (Perl question, one-word
Subject...)
For more information on choosing a Subject see "Choosing Good
Subject Lines":
http://www.cpan.org/authors/id/D/DM/DMR/subjects.post
Part of the beauty of newsgroup dynamics, is that you can contribute
to the community with your very first post! If your choice of
Subject leads a fellow Perler to find the thread you are starting,
then even asking a question helps us all.
Use an effective followup style
When composing a followup, quote only enough text to establish the
context for the comments that you will add. Always indicate who
wrote the quoted material. Never quote an entire article. Never
quote a .signature (unless that is what you are commenting on).
Intersperse your comments *following* each section of quoted text to
which they relate. Unappreciated followup styles are referred to as
"top-posting", "Jeopardy" (because the answer comes before the
question), or "TOFU" (Text Over, Fullquote Under).
Reversing the chronology of the dialog makes it much harder to
understand (some folks won't even read it if written in that style).
For more information on quoting style, see:
http://web.presby.edu/~nnqadmin/nnq/nquote.html
Speak Perl rather than English, when possible
Perl is much more precise than natural language. Saying it in Perl
instead will avoid misunderstanding your question or problem.
Do not say: I have variable with "foo\tbar" in it.
Instead say: I have $var = "foo\tbar", or I have $var = 'foo\tbar',
or I have $var = <DATA> (and show the data line).
Ask perl to help you
You can ask perl itself to help you find common programming mistakes
by doing two things: enable warnings (perldoc warnings) and enable
"strict"ures (perldoc strict).
You should not bother the hundreds/thousands of readers of the
newsgroup without first seeing if a machine can help you find your
problem. It is demeaning to be asked to do the work of a machine. It
will annoy the readers of your article.
You can look up any of the messages that perl might issue to find
out what the message means and how to resolve the potential mistake
(perldoc perldiag). If you would like perl to look them up for you,
you can put "use diagnostics;" near the top of your program.
Do not re-type Perl code
Use copy/paste or your editor's "import" function rather than
attempting to type in your code. If you make a typo you will get
followups about your typos instead of about the question you are
trying to get answered.
Provide enough information
If you do the things in this item, you will have an Extremely Good
chance of getting people to try and help you with your problem!
These features are a really big bonus toward your question winning
out over all of the other posts that you are competing with.
First make a short (less than 20-30 lines) and *complete* program
that illustrates the problem you are having. People should be able
to run your program by copy/pasting the code from your article. (You
will find that doing this step very often reveals your problem
directly. Leading to an answer much more quickly and reliably than
posting to Usenet.)
Describe *precisely* the input to your program. Also provide example
input data for your program. If you need to show file input, use the
__DATA__ token (perldata.pod) to provide the file contents inside of
your Perl program.
Show the output (including the verbatim text of any messages) of
your program.
Describe how you want the output to be different from what you are
getting.
If you have no idea at all of how to code up your situation, be sure
to at least describe the 2 things that you *do* know: input and
desired output.
Do not provide too much information
Do not just post your entire program for debugging. Most especially
do not post someone *else's* entire program.
Do not post binaries, HTML, or MIME
clpmisc is a text only newsgroup. If you have images or binaries
that explain your question, put them in a publically accessible
place (like a Web server) and provide a pointer to that location. If
you include code, cut and paste it directly in the message body.
Don't attach anything to the message. Don't post vcards or HTML.
Many people (and even some Usenet servers) will automatically filter
out such messages. Many people will not be able to easily read your
post. Plain text is something everyone can read.
Social faux pas to avoid
The first two below are symptoms of lots of FAQ asking here in clpmisc.
It happens so often that folks will assume that it is happening yet
again. If you have looked but not found, or found but didn't understand
the docs, say so in your article.
Asking a Frequently Asked Question
It should be understood that you may have missed the applicable FAQ
when you checked, which is not a big deal. But if the Frequently
Asked Question is worded similar to your question, folks will assume
that you did not look at all. Don't become indignant at pointers to
the FAQ, particularly if it solves your problem.
Asking a question easily answered by a cursory doc search
If folks think you have not even tried the obvious step of reading
the docs applicable to your problem, they are likely to become
annoyed.
If you are flamed for not checking when you *did* check, then just
shrug it off (and take the answer that you got).
Asking for emailed answers
Emailed answers benefit one person. Posted answers benefit the
entire community. If folks can take the time to answer your
question, then you can take the time to go get the answer in the
same place where you asked the question.
It is OK to ask for a *copy* of the answer to be emailed, but many
will ignore such requests anyway. If you munge your address, you
should never expect (or ask) to get email in response to a Usenet
post.
Ask the question here, get the answer here (maybe).
Beware of saying "doesn't work"
This is a "red flag" phrase. If you find yourself writing that,
pause and see if you can't describe what is not working without
saying "doesn't work". That is, describe how it is not what you
want.
Sending a "stealth" Cc copy
A "stealth Cc" is when you both email and post a reply without
indicating *in the body* that you are doing so.
Be extra cautious when you get upset
Count to ten before composing a followup when you are upset
This is recommended in all Usenet newsgroups. Here in clpmisc, most
flaming sub-threads are not about any feature of Perl at all! They
are most often for what was seen as a breach of netiquette. If you
have lurked for a bit, then you will know what is expected and won't
make such posts in the first place.
But if you get upset, wait a while before writing your followup. I
recommend waiting at least 30 minutes.
Count to ten after composing and before posting when you are upset
After you have written your followup, wait *another* 30 minutes
before committing yourself by posting it. You cannot take it back
once it has been said.
AUTHOR
Tad McClellan and many others on the comp.lang.perl.misc newsgroup.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
------------------------------
Date: Tue, 5 Jul 2011 11:46:59 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: sort scientific notation value after alphabet
Message-Id: <1d365937-6482-4477-a0ce-69c084cc0b3e@o13g2000yqj.googlegroups.com>
On Jun 26, 12:39=A0pm, "ela" <e...@yantai.org> wrote:
> I have a very large table (1 million+ records) that has five fields and t=
he
> first two rows are shown:
>
> =A0 =A0 =A0 Identity End C-value Score ID
> =A0 =A0 =A0 _113_TTAG 26831 8.00E-38 163 282859772
> =A0 =A0 =A0 _193_TTAG 26831 8.00E-68 163 282859772
>
> I wanna sort the file first by the field "Identity" then "C-value" in
> ascending order (the smallest comes first).
A quick and dirty (if simple minded) brute force solution, if you
don't mind reading the file into memory, would be to create a key by
concatenating the fields you want, using the concatenated string as a
key, throwing the line into a hash, and printing the sorted hash, like
this (obviously untested):
my %datahash;
while (<IN>)
{
my ($ident, $end, $cv, $score, $id) =3D split;
my $key =3D sprintf("%s%s", $ident, $cv);
$datahash{$key} =3D $_;
}
foreach my $key (sort keys %datahash)
{
print OUT $datahash{$key};
}
CC
------------------------------
Date: Tue, 05 Jul 2011 15:11:32 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: sort scientific notation value after alphabet
Message-Id: <87mxgszgnv.fsf@quad.sysarch.com>
>>>>> "c" == ccc31807 <cartercc@gmail.com> writes:
c> On Jun 26, 12:39 pm, "ela" <e...@yantai.org> wrote:
>> I have a very large table (1 million+ records) that has five fields and the
>> first two rows are shown:
>>
>> Identity End C-value Score ID
>> _113_TTAG 26831 8.00E-38 163 282859772
>> _193_TTAG 26831 8.00E-68 163 282859772
>>
>> I wanna sort the file first by the field "Identity" then "C-value" in
>> ascending order (the smallest comes first).
c> A quick and dirty (if simple minded) brute force solution, if you
c> don't mind reading the file into memory, would be to create a key by
c> concatenating the fields you want, using the concatenated string as a
c> key, throwing the line into a hash, and printing the sorted hash, like
c> this (obviously untested):
c> my %datahash;
c> while (<IN>)
c> {
c> my ($ident, $end, $cv, $score, $id) = split;
why declare vars you never use? make them undef or slice the results of
the split.
c> my $key = sprintf("%s%s", $ident, $cv);
why sprintf there? you don't format the strings so that would just be a
simple "$ident$cv". in fact you should format the strings to make sure
the floats always print the same way so the sort properly.
and this is all done for you and better in Sort::Maker. also there is an
article in there on doing this type of sort (called the GRT).
uri
--
Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com --
------------ Perl Developer Recruiting and Placement Services -------------
----- Perl Code Review, Architecture, Development, Training, Support -------
------------------------------
Date: Tue, 05 Jul 2011 15:34:47 -0400
From: Jon Du Kim <jondk@FAKE.EMAIL.net>
Subject: Re: sort scientific notation value after alphabet
Message-Id: <861bd$4e13675e$ce534406$9652@news.eurofeeds.com>
Well, did you know that perl syntax is roughly a superset of awk's?
In fact if I take your little awk one liner and run it through the a2p
utility
I get the following perl.
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
open(SORT__K1_1__K3_3G, '|sort -k1,1 -k3,3g') ||
die 'Cannot pipe to "sort -k1,1 -k3,3g".';
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
while (<>) {
chomp; # strip record separator
print $_ if $. == 1;;
if ($. > 1) {
print SORT__K1_1__K3_3G $_;
}
}
Like all machine generated code it is kind of tough to look at
but it really isn't so bad and should work just fine (although I haven't
tried it).
At the very least it should provide an interesting code reading
opportunity. :)
On 6/26/2011 12:39 PM, ela wrote:
> I have a very large table (1 million+ records) that has five fields and the
> first two rows are shown:
>
> Identity End C-value Score ID
> _113_TTAG 26831 8.00E-38 163 282859772
> _193_TTAG 26831 8.00E-68 163 282859772
>
> I wanna sort the file first by the field "Identity" then "C-value" in
> ascending order (the smallest comes first). While it can achieved by:
>
> awk 'NR==1; NR> 1 {print $0 | "sort -k1,1 -k3,3g"}' infile> infile.sort
>
> the code cannot be incorporated in Perl and so I was kindly suggested to
> study something like:
>
> http://en.wikipedia.org/wiki/Schwartzian_transform
> http://www.perlhowto.com/sort_ordering_by_multiple_columns
>
> but neither of them suggests an integrated solution to sort this kind of
> alphabetical and scientific-notation based table. Since this problem appears
> to be common, I wonder there has already any built-in syntax to handle this.
>
>
>
>
------------------------------
Date: Tue, 05 Jul 2011 15:45:41 -0400
From: Jon Du Kim <jondk@FAKE.EMAIL.net>
Subject: Re: sort scientific notation value after alphabet
Message-Id: <32dc1$4e1369ec$ce534406$14751@news.eurofeeds.com>
ha ha ha. Ok, the a2p converter cheats like crazy it seems.
I just took a closer look at the generated code and see that
the sort is done by opening a pipe to the shell.
Still, maybe that is ok for you in the environment you have. Certainly
the native `sort`
command is likely to be much faster than the equivalent perl code in
most every case.
On 7/5/2011 3:34 PM, Jon Du Kim wrote:
> Well, did you know that perl syntax is roughly a superset of awk's?
> In fact if I take your little awk one liner and run it through the a2p
> utility
> I get the following perl.
>
> #!/usr/bin/perl
> eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
> if $running_under_some_shell;
> # this emulates #! processing on NIH machines.
> # (remove #! line above if indigestible)
>
> eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
> # process any FOO=bar switches
>
> open(SORT__K1_1__K3_3G, '|sort -k1,1 -k3,3g') ||
>
> die 'Cannot pipe to "sort -k1,1 -k3,3g".';
>
> $, = ' '; # set output field separator
> $\ = "\n"; # set output record separator
>
> while (<>) {
> chomp; # strip record separator
> print $_ if $. == 1;;
>
> if ($. > 1) {
> print SORT__K1_1__K3_3G $_;
> }
> }
>
> Like all machine generated code it is kind of tough to look at
> but it really isn't so bad and should work just fine (although I haven't
> tried it).
> At the very least it should provide an interesting code reading
> opportunity. :)
>
> On 6/26/2011 12:39 PM, ela wrote:
>> I have a very large table (1 million+ records) that has five fields
>> and the
>> first two rows are shown:
>>
>> Identity End C-value Score ID
>> _113_TTAG 26831 8.00E-38 163 282859772
>> _193_TTAG 26831 8.00E-68 163 282859772
>>
>> I wanna sort the file first by the field "Identity" then "C-value" in
>> ascending order (the smallest comes first). While it can achieved by:
>>
>> awk 'NR==1; NR> 1 {print $0 | "sort -k1,1 -k3,3g"}' infile> infile.sort
>>
>> the code cannot be incorporated in Perl and so I was kindly suggested to
>> study something like:
>>
>> http://en.wikipedia.org/wiki/Schwartzian_transform
>> http://www.perlhowto.com/sort_ordering_by_multiple_columns
>>
>> but neither of them suggests an integrated solution to sort this kind of
>> alphabetical and scientific-notation based table. Since this problem
>> appears
>> to be common, I wonder there has already any built-in syntax to handle
>> this.
>>
>>
>>
>>
>
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3433
***************************************