[30461] in Perl-Users-Digest
Perl-Users Digest, Issue: 1704 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jul 9 00:14:29 2008
Date: Tue, 8 Jul 2008 21:14:20 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 8 Jul 2008 Volume: 11 Number: 1704
Today's topics:
Question about Encode (Windows-1252 to utf-8) williams.wilkie@gmail.com
Re: Question regarding Encode <ben@morrow.me.uk>
Re: Question regarding Encode <fawaka@gmail.com>
Re: Value of "Programming perl" 1st Ed.? <uri@stemsystems.com>
Re: Value of "Programming perl" 1st Ed.? <skyler@shaw.ca>
Re: Value of "Programming perl" 1st Ed.? <uri@stemsystems.com>
Re: XML::DOM question <g.seesig@gmail.com>
Re: XML::DOM question <spamtrap@dot-app.org>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 8 Jul 2008 16:40:53 -0700 (PDT)
From: williams.wilkie@gmail.com
Subject: Question about Encode (Windows-1252 to utf-8)
Message-Id: <9e19c4ef-2c6a-433e-88f0-c9877568ac41@y38g2000hsy.googlegroups.com>
Hello! I have recently been turned on to Encode. We have some folks
who are copying and pasting from Word straight into our CMS and the
need to convert from "Windows-1252" to "utf-8" is now critical.
For a one liner I have been using this....
perl -MEncode=from_to -i -pe 'from_to($_, "windows-1252", "utf-8")'
file1.txt file2.txt
Works good for editing in place.
My quandry is that now I need to tackle multiple files in a directory
and another developer mentioned that if "UTF-8" and "Windows-1252" are
intermixed in a file that it may get confused and I should do a
transliteration like..
tr/\x93/\N{LEFT DOUBLE QUOTATION MARK}/;
I wonder if that's really true and when it comes to open and closing
file handles for this should I be using something like "binmode
OUTPUTFILEHANDLE, ':bytes';"
I am impressed with Encode but any advice or words that anyone wants
to throw in would be greatly appreciated.
Wilkie
flames go quietly to /dev/null
------------------------------
Date: Wed, 9 Jul 2008 00:15:50 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Question regarding Encode
Message-Id: <629dk5-kgq1.ln1@osiris.mauzo.dyndns.org>
Quoth williams.wilkie@gmail.com:
>
> We have several users in our company cutting and pasting from Word
> into our CMS and now have the need to convert from "Windows-1252" to
> "utf-8"
>
> for in-place editing I have been using
> perl -MEncode=from_to -i -pe 'from_to($_, "windows-1252", "utf-8")'
> file1.txt file2.txt
>
> I am now needing to convert multiple files in a dir and another
> developer mentioned that if UTF-8 and Windows-1252 are intermixed then
> there could be some confusion of the two character sets together.
> Transliteration was suggested..
>
> tr/\x92/\N{RIGHT SINGLE QUOTATION MARK}/;
>
> for example.
>
> What I am wondering is if that is indeed the case. I don't want to
> have to resort to transliteration if it isn't necessary.
I'm not quite sure what the concerns are here, but it sounds a lot like
superstition. If each file is consistent within itself, then from_to
will work perfectly well; you can use Encode::Guess to figure out
whether a file is UTF8 or 1252, and since you're only using UTF8 or an
8bit superset of ASCII, it should be 100% reliable. It would be best to
feed the whole file to guess_encoding in one go (use File::Slurp rather
than <> or -p), and specify UTF8 first on the list, so that pure ASCII
is guessed as utf8 rather than 1252 (since either is valid, and you
don't need to re-encode that file).
If some files contain some portions in UTF8 and some portions in 1252,
then you have a serious problem whatever tool you use. My suggestion
would be to attempt to find blocks you can split the file into, where
each block is guaranteed to have a consistent encoding. Then you can
pass these blocks to guess_encoding individually.
Ben
>
> Maybe I need some kind of check to see if a file is encoded a certain
> way before figuring out how to jump into it. I can't ever remember
> using Encode before and now we need it on a massive scope.
>
> Any advice would be appreciated.
> Wilkie
>
> Flames go quietly to /dev/null
--
'Deserve [death]? I daresay he did. Many live that deserve death. And some die
that deserve life. Can you give it to them? Then do not be too eager to deal
out death in judgement. For even the very wise cannot see all ends.'
ben@morrow.me.uk
--
I've seen things you people wouldn't believe: attack ships on fire off
the shoulder of Orion; I watched C-beams glitter in the dark near the
Tannhauser Gate. All these moments will be lost, in time, like tears in rain.
Time to die. ben@morrow.me.uk
------------------------------
Date: Wed, 09 Jul 2008 01:24:52 +0200
From: Leon Timmermans <fawaka@gmail.com>
Subject: Re: Question regarding Encode
Message-Id: <12d83$4873f744$89e0e08f$31166@news1.tudelft.nl>
On Tue, 08 Jul 2008 14:47:55 -0700, williams.wilkie wrote:
> We have several users in our company cutting and pasting from Word into
> our CMS and now have the need to convert from "Windows-1252" to "utf-8"
>
> for in-place editing I have been using perl -MEncode=from_to -i -pe
> 'from_to($_, "windows-1252", "utf-8")' file1.txt file2.txt
>
That looks OK.
>
> I am now needing to convert multiple files in a dir and another
> developer mentioned that if UTF-8 and Windows-1252 are intermixed then
> there could be some confusion of the two character sets together.
> Transliteration was suggested..
>
I'm not sure I correctly understand the problem. Do you mean some files
are one encoding, some in an other (an issue, but probably solvable), or
that some files use multiple encodings within one file (a very big
problem).
> tr/\x92/\N{RIGHT SINGLE QUOTATION MARK}/;
>
> for example.
>
Please don't do that. That sort of thing will bite you in the ass.
> What I am wondering is if that is indeed the case. I don't want to have
> to resort to transliteration if it isn't necessary.
>
> Maybe I need some kind of check to see if a file is encoded a certain
> way before figuring out how to jump into it. I can't ever remember using
> Encode before and now we need it on a massive scope.
>
There are some heuristic algorithms to do just that, but to be honest I
would assume all data is in the same encoding unless you have proof
otherwise. If it isn't, your CMS *REALLY* screwed up.
Regards,
Leon Timmermans
------------------------------
Date: Tue, 08 Jul 2008 22:25:14 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Value of "Programming perl" 1st Ed.?
Message-Id: <x7fxqkxbcl.fsf@mail.sysarch.com>
>>>>> "S" == Skyler <skyler@shaw.ca> writes:
S> Uri Guttman wrote:
>>>>>>> "S" == Skyler <skyler@shaw.ca> writes:
S> No, that's your sorry rule. You have no authority to impose
>> that on
S> anyone so stop kidding yourself, stop performing homo sexual acts in
S> public, and get a real life.
>> wow, you're a homophobe too! how typical of the angry loser who can't
>> get any perl action here.
S> If "perl action" (and I thought you made big arguements about spelling
S> Perl right) means have to suck up to assholes like you, then no
S> thanks. I get plenty of "perl action" in my line of work. I make great
S> wonderful use of it, but that doesn't mean I'm going to pander to
S> snotty pricks like yourself in order to be successful.
nah, i don't want suckups. they usually don't know any perl. and you
haven't shown any perl skills yet either. so that makes you a suckup!
congratulations on that award!
>> >> i drain my diluted piss on you!! now go clean up and keep quiet.
S> Too late, I think Sherman already drank before it could hit the
S> ground. And why is it green? Have you been drinking anti-freeze again?
>> wow, such wit. i am ashamed to be sparring with you.
S> Yeah, unarmed people usually are.
wow, junior high school raises its revered head! i am waiting for i am
rubber and you are glue. now i order you to keep posting and digging
deeper. i have no fear about what others here will say or think about
me. but for a newcomer like you, entering many killfiles at once is a
good thing. and making you respond is so easy.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Tue, 08 Jul 2008 17:53:02 -0700
From: Skyler <skyler@shaw.ca>
Subject: Re: Value of "Programming perl" 1st Ed.?
Message-Id: <QZTck.12139$89.1697@nlpi069.nbdc.sbc.com>
Uri Guttman wrote:
>>>>>> "S" == Skyler <skyler@shaw.ca> writes:
>
> S> Uri Guttman wrote:
> >>>>>>> "S" == Skyler <skyler@shaw.ca> writes:
> S> No, that's your sorry rule. You have no authority to impose
> >> that on
> S> anyone so stop kidding yourself, stop performing homo sexual acts in
> S> public, and get a real life.
> >> wow, you're a homophobe too! how typical of the angry loser who can't
> >> get any perl action here.
>
> S> If "perl action" (and I thought you made big arguements about spelling
> S> Perl right) means have to suck up to assholes like you, then no
> S> thanks. I get plenty of "perl action" in my line of work. I make great
> S> wonderful use of it, but that doesn't mean I'm going to pander to
> S> snotty pricks like yourself in order to be successful.
>
> nah, i don't want suckups. they usually don't know any perl. and you
> haven't shown any perl skills yet either. so that makes you a suckup!
> congratulations on that award!
You're not a very good liar, you know. Seems you have a whole legion of
followers ready to kiss your feet at a moments notice.
> >> >> i drain my diluted piss on you!! now go clean up and keep quiet.
> S> Too late, I think Sherman already drank before it could hit the
> S> ground. And why is it green? Have you been drinking anti-freeze again?
> >> wow, such wit. i am ashamed to be sparring with you.
>
> S> Yeah, unarmed people usually are.
>
> wow, junior high school raises its revered head!
Yep, keep proving that you are utterly unarmed and you don't even know
how to throw a rock. Go home already.
-sky
------------------------------
Date: Wed, 09 Jul 2008 01:43:20 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Value of "Programming perl" 1st Ed.?
Message-Id: <x7lk0bu91k.fsf@mail.sysarch.com>
>>>>> "S" == Skyler <skyler@shaw.ca> writes:
S> Uri Guttman wrote:
>>>>>>> "S" == Skyler <skyler@shaw.ca> writes:
S> Uri Guttman wrote:
>> >>>>>>> "S" == Skyler <skyler@shaw.ca> writes:
S> No, that's your sorry rule. You have no authority to impose
>> >> that on
S> anyone so stop kidding yourself, stop performing homo sexual acts in
S> public, and get a real life.
>> >> wow, you're a homophobe too! how typical of the angry loser who can't
>> >> get any perl action here.
S> If "perl action" (and I thought you made big arguements about
>> spelling
S> Perl right) means have to suck up to assholes like you, then no
S> thanks. I get plenty of "perl action" in my line of work. I make great
S> wonderful use of it, but that doesn't mean I'm going to pander to
S> snotty pricks like yourself in order to be successful.
>> nah, i don't want suckups. they usually don't know any perl. and you
>> haven't shown any perl skills yet either. so that makes you a suckup!
>> congratulations on that award!
S> You're not a very good liar, you know. Seems you have a whole legion
S> of followers ready to kiss your feet at a moments notice.
i am not a good liar, how observant of you!! thank you for such kind
words.
but notice you don't see people saying plonk after my posts. you will be
seeing those soon enough. keep digging all the way to china!
S> Yep, keep proving that you are utterly unarmed and you don't even know
S> how to throw a rock. Go home already.
but i am home. you don't get it. you are the one who will leave. i am
having fun chortling over your pitiful existance in this group. please
post your cpan id or some of your perl code. i will have fun with that
too. my cpan id is uri. see if you can understand any of it (and it is
considered very good code overall but you wouldn't know so why tell
you).
have fun being you!
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Tue, 8 Jul 2008 17:34:03 -0700
From: "Gordon Corbin Etly" <g.seesig@gmail.com>
Subject: Re: XML::DOM question
Message-Id: <6difbtF2pporU1@mid.individual.net>
Sherman Pendley wrote:
> Konstantinos Agouros <elwood@agouros.de> writes:
> > if XML::DOM fails in getElementsByTagName by Script terminates. Is
> > there a way to catch this?
> Does is fail immediately, when you call this method? If so, you could
> try wrapping the call in a block eval:
>
> eval {
> my $nodeList = $doc->getElementsByTagName('FOO');
> }
>
> if ($@) {
> # Handle the failure
> warn $@;
> }
Keep in mind that $nodeList should be declared outside (and before) that
eval statement, otherwise it's gone with the wind once the eval block
finishes. And I know it's sample code, but it may be worth it to
quick-note the fact that your eval is missing a semicolon at the end -
the syntax error message wasn't too helpful, until I realized that was
the problem. In a complex program it would be that more harder to weed
out :)
--
Gordon C. Etly
Email: perl -e "print q{}.reverse(q{moc.liamg@ylte.nodrog})"
------------------------------
Date: Tue, 08 Jul 2008 23:27:18 -0400
From: Sherman Pendley <spamtrap@dot-app.org>
Subject: Re: XML::DOM question
Message-Id: <m1iqvfaga1.fsf@dot-app.org>
"Gordon Corbin Etly" <g.seesig@gmail.com> writes:
> Sherman Pendley wrote:
>> Konstantinos Agouros <elwood@agouros.de> writes:
>
>> > if XML::DOM fails in getElementsByTagName by Script terminates. Is
>> > there a way to catch this?
>
>> Does is fail immediately, when you call this method? If so, you could
>> try wrapping the call in a block eval:
>>
>> eval {
>> my $nodeList = $doc->getElementsByTagName('FOO');
>> }
>>
>> if ($@) {
>> # Handle the failure
>> warn $@;
>> }
>
> Keep in mind that $nodeList should be declared outside (and before) that
> eval statement, otherwise it's gone with the wind once the eval block
> finishes. And I know it's sample code, but it may be worth it to
> quick-note the fact that your eval is missing a semicolon at the end -
> the syntax error message wasn't too helpful, until I realized that was
> the problem. In a complex program it would be that more harder to weed
> out :)
Agreed on all counts. Thanks for catching my mistakes - that'll teach
me to post without proofreading. :-)
sherm--
--
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 1704
***************************************