[32163] in Perl-Users-Digest
Perl-Users Digest, Issue: 3428 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Jun 28 21:09:24 2011
Date: Tue, 28 Jun 2011 18:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 28 Jun 2011 Volume: 11 Number: 3428
Today's topics:
Convert Outlook msg message to mime format <dpich@polartel.com>
Re: Convert Outlook msg message to mime format <*@eli.users.panix.com>
Re: Module to check overlap? <tzz@lifelogs.com>
Re: Module to check overlap? <nospam.gravitalsun@hotmail.com.nospam>
Re: open letter to Sherm Pendley about ShuX jaialai.technology@gmail.com
Re: Posting Guidelines for comp.lang.perl.misc ($Revisi <ralph@happydays.com>
Re: read small file, get array of hashes? [newbie] <uri@StemSystems.com>
Re: sort scientific notation value after alphabet <xhoster@gmail.com>
Re: utf-8 of a string <jurgenex@hotmail.com>
Re: utf-8 of a string <*@eli.users.panix.com>
Re: WebOs - Perl Question <edgrsprj@ix.netcom.com>
Re: WebOs - Perl Question <sherm.pendley@gmail.com>
Re: WebOs - Perl Question <edgrsprj@ix.netcom.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 28 Jun 2011 11:02:48 -0500
From: Don Pich <dpich@polartel.com>
Subject: Convert Outlook msg message to mime format
Message-Id: <YKKdnVQzrse1ZpTTnZ2dnUVZ_s6dnZ2d@polarcomm.com>
I have found the following perl script online to convert a msg file saved
from Outlook into a mime file.
<code>
#!/usr/bin/perl -w
use strict;
use warnings;
use Email::Outlook::Message;
use Email::LocalDelivery;
use Getopt::Long;
use Pod::Usage;
use File::Basename;
use vars qw($VERSION);
$VERSION = "0.903";
my $verbose = '';
my $mboxfile = '';
my $help = '';
GetOptions(
'mbox=s' => \$mboxfile,
'verbose' => \$verbose,
'help|?' => \$help) or pod2usage(2);
pod2usage(1) if $help;
defined $ARGV[0] or pod2usage(2);
foreach my $file (@ARGV) {
my $mail = new Email::Outlook::Message($file, $verbose)->to_email_mime-
>as_string;
if ($mboxfile ne '') {
Email::LocalDelivery->deliver($mail, $mboxfile);
} else {
my $basename = basename($file, qr/\.msg/i);
my $outfile = "$basename.mime";
open OUT, ">:utf8", $outfile
or die "Can't open $outfile for writing: $!";
print OUT $mail;
close OUT;
}
}
</code>
It is working good, and a huge THANK YOU to the author. The reason I am
posing here is that if I want to use this script, I need to initiate it
by entering the following:
<code>
$ perl ./msg.pl *msg
</code>
What I am trying to accomplish is scan the entire directory for msg
files, and parse them accordingly.
<code>
my @mimefiles = <*.mime>;
foreach my $mimefile (@mimefiles) {
foo....
}
</code>
As a novice perl programmer, could someone offer a suggestion at how I
would go about accomplishing my goal of not having to entire the ' *msg'?
Thank you for your time.
------------------------------
Date: Tue, 28 Jun 2011 19:21:34 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: Convert Outlook msg message to mime format
Message-Id: <eli$1106281456@qz.little-neck.ny.us>
In comp.lang.perl.misc, Don Pich <dpich@polartel.com> wrote:
> I have found the following perl script online to convert a msg file saved
> from Outlook into a mime file.
...
> It is working good, and a huge THANK YOU to the author. The reason I am
> posing here is that if I want to use this script, I need to initiate it
> by entering the following:
>
> <code>
> $ perl ./msg.pl *msg
> </code>
>
> What I am trying to accomplish is scan the entire directory for msg
> files, and parse them accordingly.
How about this:
<code>
$ perl pathto/msg.pl directory/with/msg/files
$ perl pathto/msg.pl some/particular/file*.msg
$ perl pathto/msg.pl directory/ and/file.msg
</code>
Change the body like this (untested, might have bugs):
<code>
# function to process one msg file
sub processone($) {
my $file = shift;
my $mail = new Email::Outlook::Message($file, $verbose)->to_email_mime->as_string;
if ($mboxfile ne '') {
Email::LocalDelivery->deliver($mail, $mboxfile);
} else {
my $basename = basename($file, qr/\.msg$/i);
my $outfile = "$basename.mime";
open OUT, ">:utf8", $outfile
or die "Can't open $outfile for writing: $!";
print OUT $mail;
close OUT;
}
}
# scan every argument looking for files or directories
foreach my $arg (@ARGV) {
if(-f $arg) {
# if a file, make sure it is a .msg
if($arg =~ /\.msg$/i) {
processone($arg);
} else {
warn "$0: skipping non .msg: $arg\n";
}
} elsif(-d $arg) {
# it's a directory, so read everything there
if(!opendir(DIR, $arg)) {
warn "$0: directory $arg: $!\n";
next;
}
my $entry;
my $file;
# check all the files in the directory, do not recurse into
# subdirectories
while(defined($entry = readdir(DIR))) {
$file = "$arg/$entry";
if(-f $file) {
# only process msg files, and don't bother warning about others
if($file =~ /\.msg$/i) {
processone($file);
}
}
}
closedir(DIR);
} else {
warn "$0: not a file or directory: $arg\n";
}
}
</code>
This sort of loop and process template is quite useful. I also use a
variation that works on IDs, not files. In that case if $arg is a file
I treat it as a file of IDs one per line, otherwise I treat it as an
ID. All my Flickr API stuff works that way, for example.
<code>
$ flickr-addtoset $setid imgid1 imgid2 fileofids
</code>
Elijah
------
has a variation of addtoset that looks a set id up by regexp
------------------------------
Date: Tue, 28 Jun 2011 08:34:40 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Module to check overlap?
Message-Id: <87ei2eoz9r.fsf@lifelogs.com>
On Tue, 28 Jun 2011 06:36:30 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote:
TZ> Sure. Inversion lists are inefficient for "checkerboard" patterns,
TZ> e.g. the set of all the odd numbers. They are efficient when your data
TZ> has long runs of adjacent numbers.
...
TZ> The set of "completely overlapping" points is the union of your two
TZ> inversion lists.
TZ> The set of "partially overlapping" points is the intersection of your
TZ> two inversion lists.
TZ> Think of your problem as a set problem, not as operations on ranges.
TZ> Your database ranges and the inversion lists we build from them are just
TZ> representations of the underlying truth. So in terms of sets and set
TZ> membership, what are you missing?
I should mention here that in addition to ranges (pairs of integers) and
inversion lists, you have other options:
- bitstrings, which can be managed with vec(), can get large but doing
bit logic on them (e.g. finding the union or intersection) is really
easy. Negation is really slow, though.
- R-trees (look them up on Wikipedia)
- if your ranges fit in a 32-bit integer, you can pretend they are IP
addresses and use modules like Net::Netmask to do the set operations.
- PostgreSQL supports time intervals, I believe, so you could store your
data as a 64-bit timestamp (I'm not sure of the details). I don't
know if that will work at all, but maybe something using SQL would be
a good solution for your specific situation, since you said you are
using a database.
I hope that helps... I didn't mean to give you the impression that
inversion lists were the only answer, only that they seem to fit your
problem well.
Ted
------------------------------
Date: Tue, 28 Jun 2011 16:37:36 +0300
From: "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam>
Subject: Re: Module to check overlap?
Message-Id: <iuclak$m1$1@news.ntua.gr>
I do not think at this problem inversion lists will do any good; they will
only inverse the problem.
------------------------------
Date: Tue, 28 Jun 2011 09:55:35 -0400
From: jaialai.technology@gmail.com
Subject: Re: open letter to Sherm Pendley about ShuX
Message-Id: <iucmgl$urr$1@speranza.aioe.org>
On 6/27/11 10:40 PM, Sherm Pendley wrote:
> jaialai.technology@gmail.com writes:
>
>> First, I love he fact that I can use "ShuX" like I did 10+ years ago
>> with MacPerl!
>> My request: Back in the day when ShuX would open a small sound file
>> would play. Some hickish voice saying "Awww Shucks!". I am feeling
>> a bit of nostalgia here but could you bring that back?
>
> It was never in ShuX - maybe you're thinking of the original Shuck that
> came with MacPerl?
>
>> Maybe leave it as an option (off by default?) for people
>> with a bit of nostalgia?
>
> It's an easy enough thing to add... I'll see if I can track down the
> original sound file.
Awesome! Yes, I was thinking of Shuck. Its been so long
I forgot the original spelling. ;)
------------------------------
Date: Tue, 28 Jun 2011 20:13:20 -0400
From: Ralph Malph <ralph@happydays.com>
Subject: Re: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.9 $)
Message-Id: <825f8$4e0a6e24$ce534406$30896@news.eurofeeds.com>
Very interestingly San Franscisco State comp. sci. professor Kent Bottles
has a persuasive essay about UseNet that has a lengthy section on
posting guidelines such as these.
See Prof. Bottles' blog(its the top article) at
http://www.bottleguy.com/
On 6/28/2011 3:16 AM, tadmc@seesig.invalid wrote:
> Outline
> Before posting to comp.lang.perl.misc
> Must
> - Check the Perl Frequently Asked Questions (FAQ)
> - Check the other standard Perl docs (*.pod)
> Really Really Should
> - Lurk for a while before posting
> - Search a Usenet archive
> If You Like
> - Check Other Resources
> Posting to comp.lang.perl.misc
> Is there a better place to ask your question?
> - Question should be about Perl, not about the application area
> How to participate (post) in the clpmisc community
> - Carefully choose the contents of your Subject header
> - Use an effective followup style
> - Speak Perl rather than English, when possible
> - Ask perl to help you
> - Do not re-type Perl code
> - Provide enough information
> - Do not provide too much information
> - Do not post binaries, HTML, or MIME
> Social faux pas to avoid
> - Asking a Frequently Asked Question
> - Asking a question easily answered by a cursory doc search
> - Asking for emailed answers
> - Beware of saying "doesn't work"
> - Sending a "stealth" Cc copy
> Be extra cautious when you get upset
> - Count to ten before composing a followup when you are upset
> - Count to ten after composing and before posting when you are upset
> -----------------------------------------------------------------
>
> Posting Guidelines for comp.lang.perl.misc ($Revision: 1.9 $)
> This newsgroup, commonly called clpmisc, is a technical newsgroup
> intended to be used for discussion of Perl related issues (except job
> postings), whether it be comments or questions.
>
> As you would expect, clpmisc discussions are usually very technical in
> nature and there are conventions for conduct in technical newsgroups
> going somewhat beyond those in non-technical newsgroups.
>
> The article at:
>
> http://www.catb.org/~esr/faqs/smart-questions.html
>
> describes how to get answers from technical people in general.
>
> This article describes things that you should, and should not, do to
> increase your chances of getting an answer to your Perl question. It is
> available in POD, HTML and plain text formats at:
>
> http://www.rehabitation.com/clpmisc.shtml
>
> For more information about netiquette in general, see the "Netiquette
> Guidelines" at:
>
> http://andrew2.andrew.cmu.edu/rfc/rfc1855.html
>
> A note to newsgroup "regulars":
>
> Do not use these guidelines as a "license to flame" or other
> meanness. It is possible that a poster is unaware of things
> discussed here. Give them the benefit of the doubt, and just
> help them learn how to post, rather than assume that they do
> know and are being the "bad kind" of Lazy.
>
> A note about technical terms used here:
>
> In this document, we use words like "must" and "should" as
> they're used in technical conversation (such as you will
> encounter in this newsgroup). When we say that you *must* do
> something, we mean that if you don't do that something, then
> it's unlikely that you will benefit much from this group.
> We're not bossing you around; we're making the point without
> lots of words.
>
> Do *NOT* send email to the maintainer of these guidelines. It will be
> discarded unread. The guidelines belong to the newsgroup so all
> discussion should appear in the newsgroup. I am just the secretary that
> writes down the consensus of the group.
>
> Before posting to comp.lang.perl.misc
> Must
> This section describes things that you *must* do before posting to
> clpmisc, in order to maximize your chances of getting meaningful replies
> to your inquiry and to avoid getting flamed for being lazy and trying to
> have others do your work.
>
> The perl distribution includes documentation that is copied to your hard
> drive when you install perl. Also installed is a program for looking
> things up in that (and other) documentation named 'perldoc'.
>
> You should either find out where the docs got installed on your system,
> or use perldoc to find them for you. Type "perldoc perldoc" to learn how
> to use perldoc itself. Type "perldoc perl" to start reading Perl's
> standard documentation.
>
> Check the Perl Frequently Asked Questions (FAQ)
> Checking the FAQ before posting is required in Big 8 newsgroups in
> general, there is nothing clpmisc-specific about this requirement.
> You are expected to do this in nearly all newsgroups.
>
> You can use the "-q" switch with perldoc to do a word search of the
> questions in the Perl FAQs.
>
> Check the other standard Perl docs (*.pod)
> The perl distribution comes with much more documentation than is
> available for most other newsgroups, so in clpmisc you should also
> see if you can find an answer in the other (non-FAQ) standard docs
> before posting.
>
> It is *not* required, or even expected, that you actually *read* all of
> Perl's standard docs, only that you spend a few minutes searching them
> before posting.
>
> Try doing a word-search in the standard docs for some words/phrases
> taken from your problem statement or from your very carefully worded
> "Subject:" header.
>
> Really Really Should
> This section describes things that you *really should* do before posting
> to clpmisc.
>
> Lurk for a while before posting
> This is very important and expected in all newsgroups. Lurking means
> to monitor a newsgroup for a period to become familiar with local
> customs. Each newsgroup has specific customs and rituals. Knowing
> these before you participate will help avoid embarrassing social
> situations. Consider yourself to be a foreigner at first!
>
> Search a Usenet archive
> There are tens of thousands of Perl programmers. It is very likely
> that your question has already been asked (and answered). See if you
> can find where it has already been answered.
>
> One such searchable archive is:
>
> http://groups.google.com/advanced_search
>
> If You Like
> This section describes things that you *can* do before posting to
> clpmisc.
>
> Check Other Resources
> You may want to check in books or on web sites to see if you can
> find the answer to your question.
>
> But you need to consider the source of such information: there are a
> lot of very poor Perl books and web sites, and several good ones
> too, of course.
>
> Posting to comp.lang.perl.misc
> There can be 200 messages in clpmisc in a single day. Nobody is going to
> read every article. They must decide somehow which articles they are
> going to read, and which they will skip.
>
> Your post is in competition with 199 other posts. You need to "win"
> before a person who can help you will even read your question.
>
> These sections describe how you can help keep your article from being
> one of the "skipped" ones.
>
> Is there a better place to ask your question?
> Question should be about Perl, not about the application area
> It can be difficult to separate out where your problem really is,
> but you should make a conscious effort to post to the most
> applicable newsgroup. That is, after all, where you are the most
> likely to find the people who know how to answer your question.
>
> Being able to "partition" a problem is an essential skill for
> effectively troubleshooting programming problems. If you don't get
> that right, you end up looking for answers in the wrong places.
>
> It should be understood that you may not know that the root of your
> problem is not Perl-related (the two most frequent ones are CGI and
> Operating System related), so off-topic postings will happen from
> time to time. Be gracious when someone helps you find a better place
> to ask your question by pointing you to a more applicable newsgroup.
>
> How to participate (post) in the clpmisc community
> Carefully choose the contents of your Subject header
> You have 40 precious characters of Subject to win out and be one of
> the posts that gets read. Don't waste them. Take care while
> composing them, they are the key that opens the door to getting an
> answer.
>
> Spend them indicating what aspect of Perl others will find if they
> should decide to read your article.
>
> Do not spend them indicating "experience level" (guru, newbie...).
>
> Do not spend them pleading (please read, urgent, help!...).
>
> Do not spend them on non-Subjects (Perl question, one-word
> Subject...)
>
> For more information on choosing a Subject see "Choosing Good
> Subject Lines":
>
> http://www.cpan.org/authors/id/D/DM/DMR/subjects.post
>
> Part of the beauty of newsgroup dynamics, is that you can contribute
> to the community with your very first post! If your choice of
> Subject leads a fellow Perler to find the thread you are starting,
> then even asking a question helps us all.
>
> Use an effective followup style
> When composing a followup, quote only enough text to establish the
> context for the comments that you will add. Always indicate who
> wrote the quoted material. Never quote an entire article. Never
> quote a .signature (unless that is what you are commenting on).
>
> Intersperse your comments *following* each section of quoted text to
> which they relate. Unappreciated followup styles are referred to as
> "top-posting", "Jeopardy" (because the answer comes before the
> question), or "TOFU" (Text Over, Fullquote Under).
>
> Reversing the chronology of the dialog makes it much harder to
> understand (some folks won't even read it if written in that style).
> For more information on quoting style, see:
>
> http://web.presby.edu/~nnqadmin/nnq/nquote.html
>
> Speak Perl rather than English, when possible
> Perl is much more precise than natural language. Saying it in Perl
> instead will avoid misunderstanding your question or problem.
>
> Do not say: I have variable with "foo\tbar" in it.
>
> Instead say: I have $var = "foo\tbar", or I have $var = 'foo\tbar',
> or I have $var =<DATA> (and show the data line).
>
> Ask perl to help you
> You can ask perl itself to help you find common programming mistakes
> by doing two things: enable warnings (perldoc warnings) and enable
> "strict"ures (perldoc strict).
>
> You should not bother the hundreds/thousands of readers of the
> newsgroup without first seeing if a machine can help you find your
> problem. It is demeaning to be asked to do the work of a machine. It
> will annoy the readers of your article.
>
> You can look up any of the messages that perl might issue to find
> out what the message means and how to resolve the potential mistake
> (perldoc perldiag). If you would like perl to look them up for you,
> you can put "use diagnostics;" near the top of your program.
>
> Do not re-type Perl code
> Use copy/paste or your editor's "import" function rather than
> attempting to type in your code. If you make a typo you will get
> followups about your typos instead of about the question you are
> trying to get answered.
>
> Provide enough information
> If you do the things in this item, you will have an Extremely Good
> chance of getting people to try and help you with your problem!
> These features are a really big bonus toward your question winning
> out over all of the other posts that you are competing with.
>
> First make a short (less than 20-30 lines) and *complete* program
> that illustrates the problem you are having. People should be able
> to run your program by copy/pasting the code from your article. (You
> will find that doing this step very often reveals your problem
> directly. Leading to an answer much more quickly and reliably than
> posting to Usenet.)
>
> Describe *precisely* the input to your program. Also provide example
> input data for your program. If you need to show file input, use the
> __DATA__ token (perldata.pod) to provide the file contents inside of
> your Perl program.
>
> Show the output (including the verbatim text of any messages) of
> your program.
>
> Describe how you want the output to be different from what you are
> getting.
>
> If you have no idea at all of how to code up your situation, be sure
> to at least describe the 2 things that you *do* know: input and
> desired output.
>
> Do not provide too much information
> Do not just post your entire program for debugging. Most especially
> do not post someone *else's* entire program.
>
> Do not post binaries, HTML, or MIME
> clpmisc is a text only newsgroup. If you have images or binaries
> that explain your question, put them in a publically accessible
> place (like a Web server) and provide a pointer to that location. If
> you include code, cut and paste it directly in the message body.
> Don't attach anything to the message. Don't post vcards or HTML.
> Many people (and even some Usenet servers) will automatically filter
> out such messages. Many people will not be able to easily read your
> post. Plain text is something everyone can read.
>
> Social faux pas to avoid
> The first two below are symptoms of lots of FAQ asking here in clpmisc.
> It happens so often that folks will assume that it is happening yet
> again. If you have looked but not found, or found but didn't understand
> the docs, say so in your article.
>
> Asking a Frequently Asked Question
> It should be understood that you may have missed the applicable FAQ
> when you checked, which is not a big deal. But if the Frequently
> Asked Question is worded similar to your question, folks will assume
> that you did not look at all. Don't become indignant at pointers to
> the FAQ, particularly if it solves your problem.
>
> Asking a question easily answered by a cursory doc search
> If folks think you have not even tried the obvious step of reading
> the docs applicable to your problem, they are likely to become
> annoyed.
>
> If you are flamed for not checking when you *did* check, then just
> shrug it off (and take the answer that you got).
>
> Asking for emailed answers
> Emailed answers benefit one person. Posted answers benefit the
> entire community. If folks can take the time to answer your
> question, then you can take the time to go get the answer in the
> same place where you asked the question.
>
> It is OK to ask for a *copy* of the answer to be emailed, but many
> will ignore such requests anyway. If you munge your address, you
> should never expect (or ask) to get email in response to a Usenet
> post.
>
> Ask the question here, get the answer here (maybe).
>
> Beware of saying "doesn't work"
> This is a "red flag" phrase. If you find yourself writing that,
> pause and see if you can't describe what is not working without
> saying "doesn't work". That is, describe how it is not what you
> want.
>
> Sending a "stealth" Cc copy
> A "stealth Cc" is when you both email and post a reply without
> indicating *in the body* that you are doing so.
>
> Be extra cautious when you get upset
> Count to ten before composing a followup when you are upset
> This is recommended in all Usenet newsgroups. Here in clpmisc, most
> flaming sub-threads are not about any feature of Perl at all! They
> are most often for what was seen as a breach of netiquette. If you
> have lurked for a bit, then you will know what is expected and won't
> make such posts in the first place.
>
> But if you get upset, wait a while before writing your followup. I
> recommend waiting at least 30 minutes.
>
> Count to ten after composing and before posting when you are upset
> After you have written your followup, wait *another* 30 minutes
> before committing yourself by posting it. You cannot take it back
> once it has been said.
>
> AUTHOR
> Tad McClellan and many others on the comp.lang.perl.misc newsgroup.
>
------------------------------
Date: Tue, 28 Jun 2011 11:30:06 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: read small file, get array of hashes? [newbie]
Message-Id: <87iprqx9c1.fsf@quad.sysarch.com>
>>>>> "JWK" == John W Krahn <jwkrahn@example.com> writes:
JWK> Tad McClellan wrote:
>> Jim Gibson<jimsgibson@gmail.com> wrote:
>>> In article
>>> <ddbdc213-72f7-403f-9091-cb183dcaff4a@p31g2000vbs.googlegroups.com>,
>>> gry<georgeryoung@gmail.com> wrote:
>>
>>>> print 'dbname of tests[1]=', $test{'dbname'};
>>>
>>> Double-quote interpolation is usually better than string concatenation.
>>
>>
>> Yes it is, but there is no string concatenation there...
JWK> No _explicit_ string concatenation there...
JWK> Using print implies that the list will be concatenated together before
JWK> output to the default filehandle.
if i am not mistaken, that isn't true. try benchmarking a print with a
list of strings and a print with that list concatenated (by . or
join). i bet the print list will be slower by more than a trivial
amount.
i always advocate print rarely, print late. meaning do your own
buffering before you call print and print only after you have all the
text collected. that way you can decide where to print in the higher
level logic.
uri
--
Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com --
------------ Perl Developer Recruiting and Placement Services -------------
----- Perl Code Review, Architecture, Development, Training, Support -------
------------------------------
Date: Mon, 27 Jun 2011 19:10:34 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: sort scientific notation value after alphabet
Message-Id: <4e0a6d58$0$2394$ed362ca5@nr5-q3a.newsreader.com>
ela wrote:
> I have a very large table (1 million+ records) that has five fields and the
> first two rows are shown:
>
> Identity End C-value Score ID
> _113_TTAG 26831 8.00E-38 163 282859772
> _193_TTAG 26831 8.00E-68 163 282859772
>
> I wanna sort the file first by the field "Identity" then "C-value" in
> ascending order (the smallest comes first). While it can achieved by:
>
> awk 'NR==1; NR > 1 {print $0 | "sort -k1,1 -k3,3g"}' infile > infile.sort
If it can be achieved by that, then I would use that.
> the code cannot be incorporated in Perl and so I was kindly suggested to
> study something like:
>
> http://en.wikipedia.org/wiki/Schwartzian_transform
That is going to load the entire data set into memory.
> http://www.perlhowto.com/sort_ordering_by_multiple_columns
I haven't read that, but it will probably load the entire data set into
memory as well.
Perl is not the answer to every question.
Xho
------------------------------
Date: Tue, 28 Jun 2011 07:27:05 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: utf-8 of a string
Message-Id: <f8oj07pjrkv4a5e3ejn9kuqgopbok307qb@4ax.com>
"dn.perl@gmail.com" <dn.perl@gmail.com> wrote:
>I am asking part one of my questions here because I do not know where
>else to ask it.
> I have been sent a string in some language whose alphabet
> is not known to me. How can I find utf-8 representation
> of this string?
Simple. By using one of the modules that do encoding conversions. Just
use whatever encoding the string is in before and then convert it into
UTF-8. Years ago I have used Text::IConv very successfully.
You don't know the encoding of your string? Well, then you have a real
problem and can stop right there. Guessing an encoding based on the mere
binary data is an AI project.
I seem to remember that years ago someone mentioned a module that
heuristically guesses which encoding (and language?) a particular byte
string may have. But even in the best of cases that is just a guess.
>Part two of my doubts is related to perl. But I haven't really got
>around to grappling with it because right now, part one is an obstacle
>to me.
>
>The next step in part two would be to read a CLOB field in a database,
>and grep it in a perl script to check whether the above string appears
>in it. I can read this CLOB field and write it to an excel sheet, and
>the excel sheet shows the CLOB data to me. I would like to check
>whether this CLOB data contains the string which has been sent me.
The index() function will do that perfectly fine:
index STR,SUBSTR,POSITION
index STR,SUBSTR
The index function searches for one string within another[...]
>Is there any forum-FAQ or code where I can find some answers/pointers
>to my questions?
The Perl FAQ is part of any standard Perl installation and is sitting
right there on your hard drive, see "perldoc perldoc". You are looking
for the perlfaq page or the -q option.
jue
------------------------------
Date: Tue, 28 Jun 2011 18:55:48 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: utf-8 of a string
Message-Id: <eli$1106281447@qz.little-neck.ny.us>
In comp.lang.perl.misc, Jürgen Exner <jurgenex@hotmail.com> wrote:
> You don't know the encoding of your string? Well, then you have a real
> problem and can stop right there. Guessing an encoding based on the mere
> binary data is an AI project.
This is a standard application of n-grams / trigrams. A well-trained
trigram database can make a very good guess to the encoding and
language of your input text. Training the thing is the challenge.
There might be research databases available, I just noticed a Google
Labs one and a Microsoft Research one in a search, but I don't know
how easy they are to use.
> I seem to remember that years ago someone mentioned a module that
> heuristically guesses which encoding (and language?) a particular byte
> string may have. But even in the best of cases that is just a guess.
Text::NSP (N-Gram Statistics Project) seems relevant, but I haven't
used it.
Elijah
------
did n-gram analysis of all his email for a few years to ID language
------------------------------
Date: Tue, 28 Jun 2011 14:03:42 -0500
From: "E.D.G." <edgrsprj@ix.netcom.com>
Subject: Re: WebOs - Perl Question
Message-Id: <E7WdnQQL0_oSuJfTnZ2dnUVZ_u-dnZ2d@earthlink.com>
"Sherm Pendley" <sherm.pendley@gmail.com> wrote in message
news:m2iprqekgk.fsf@sherm.shermpendley.com...
> "E.D.G." <edgrsprj@ix.netcom.com> writes:
>
> Seeing as how neither the TouchPad nor WebOS 3.0 has been released yet,
> I'm guessing the answer is "no."
Thanks for the comments.
One of the HP Web pages looks to me like it is stating that they
began taking orders for the TouchPad with WebOs 3 on June 20, 2011. I don't'
know if that means that they are actually shipping them. But I would guess
that this is the case.
In any event, if Perl programs cannot be easily run on any of the
WebOs operating systems then I myself would be inclined not to invest in a
computer that uses that operating system unless the price were especially
attractive. I do a certain amount of "invention" type work. And it can
often be important for that type of effort to be able to create customized
software for use in running specialty applications.
------------------------------
Date: Tue, 28 Jun 2011 15:33:01 -0400
From: Sherm Pendley <sherm.pendley@gmail.com>
Subject: Re: WebOs - Perl Question
Message-Id: <m2sjqtn442.fsf@sherm.shermpendley.com>
"E.D.G." <edgrsprj@ix.netcom.com> writes:
> "Sherm Pendley" <sherm.pendley@gmail.com> wrote in message
> news:m2iprqekgk.fsf@sherm.shermpendley.com...
>> "E.D.G." <edgrsprj@ix.netcom.com> writes:
>>
>> Seeing as how neither the TouchPad nor WebOS 3.0 has been released yet,
>> I'm guessing the answer is "no."
>
> Thanks for the comments.
>
> One of the HP Web pages looks to me like it is stating that they
> began taking orders for the TouchPad with WebOs 3 on June 20, 2011. I
> don't' know if that means that they are actually shipping them. But I
> would guess that this is the case.
That's the problem with guessing - sometimes you guess wrong.
No. They're not shipping yet. A couple of ship dates have come and gone,
and the latest target is July 17th.
sherm--
------------------------------
Date: Tue, 28 Jun 2011 15:52:53 -0500
From: "E.D.G." <edgrsprj@ix.netcom.com>
Subject: Re: WebOs - Perl Question
Message-Id: <PeadnSwIvsC1opfTnZ2dnUVZ_v2dnZ2d@earthlink.com>
"Sherm Pendley" <sherm.pendley@gmail.com> wrote in message
news:m2sjqtn442.fsf@sherm.shermpendley.com...
> No. They're not shipping yet. A couple of ship dates have come and gone,
> and the latest target is July 17th.
July 17, 2011 is not that far away if they can maintain that shipping
date. And the history of computers should, I feel, have taught companies
that it is better to delay a release date than start shipping a product that
might have some flaws that could get customers upset.
Also, I would expect that HP might be thinking that they needed to
let people know that a TouchPad computer was going to become available so
that their regular customers did not migrate to some other company just to
get one. These types of computers look like they should be fairly popular,
especially the type that HP is reportedly offering as it appears that it is
supposed to be able to effectively and easily interface with a cell
telephone they are marketing.
So, where are computers in general (and their programs) heading?
They seem to keep changing all the time.
As a part-time inventor who has been working with computers for many
years I believe that I might have a fairly good idea regarding what the
"ultimate" computer could look like and what it will do. Such a computer
could be built today. The basic technology already exists. But the final
product is probably several generations more advanced than existing
computers and their operating systems.
By "ultimate" I mean the most sophisticated type of computer that
could be built before the next major step which would involve actually
creating computers that are intended to be parts of our biological systems.
The thought of having computers merged into our biological systems
would probably be a little frightening to many people (including me).
However, I feel that this is something that is going to have to be done to
at least a limited extent so that certain types of medical problems such as
diabetes can be managed. An internal computer could monitor blood sugar
levels and when necessary, automatically dispense some compound that would
keep the sugar levels in a safe range. Medical researchers are I understand
already working on that type of computer. I don't know what its present
status might be. But I would not be surprised to hear that an early model
of such a system already exists.
E.D.G.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3428
***************************************