[24297] in Perl-Users-Digest
Perl-Users Digest, Issue: 6488 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Apr 29 18:06:56 2004
Date: Thu, 29 Apr 2004 15:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 29 Apr 2004 Volume: 10 Number: 6488
Today's topics:
(R) character in RegEXP (Tsu-na-mi)
Re: (R) character in RegEXP <flavell@ph.gla.ac.uk>
Re: (R) character in RegEXP <jtc@shell.dimensional.com>
ANNOUNCE: Audio::M4pDecrypt 0.01 released <wherrera@lynxview.com>
ANNOUNCE: Javascript::MD5 1.03 <ron@savage.net.au>
ANNOUNCE: Javascript::SHA1 1.00 <ron@savage.net.au>
ANNOUNCE: Spreadsheet::WriteExcel 0.43 <jmcnamara@cpan.org>
ANNOUNCE: Spreadsheet::WriteExcelXML 0.02 <jmcnamara@cpan.org>
Re: convert utf8 to latin-1/iso-8859-1 <shailesh@nothing.but.net>
Re: convert utf8 to latin-1/iso-8859-1 <shailesh@nothing.but.net>
Re: convert utf8 to latin-1/iso-8859-1 <shailesh@nothing.but.net>
Re: Count how many times find and replaced happened <tadmc@augustmail.com>
Re: Count how many times find and replaced happened <webmaster @ infusedlight . net>
cywin versus activestate on xp <beau@oblios-cap.com>
generating time series graphs with perl <a5ufv8u02@sneakemail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 29 Apr 2004 12:32:12 -0700
From: tsunami@zedxinc.com (Tsu-na-mi)
Subject: (R) character in RegEXP
Message-Id: <a6b1e337.0404291132.1541084e@posting.google.com>
Hi,
I am having trouble getting a simple regexp to recognize the
registered trademark symbol (R) when it is read from XML. The XML
uses ® for the symbol, and if I print the string after parsing,
it prints correctly. However, the regexp:
$string =~ s/(R)/somethingelse/g;
does not recognize the (R) symbol. NOTE: (R) is the single-ASCII
character. I also tried using \x{AE} which did not work either. The
regular TM symbol doesn't work either, and seems to throw everything
into unicode mode, screwing up other stuff like the bullet and
copyright symbols.
So my question is, If I have XML like :
<P>This is my Widget®</P>
And read it into a string with XML::Parser, how should I address this
character (and any char > 256 if you know).
For the record, I am using Perl 5.8.3 on Red Hat 9.0. Thanks for any
help anyone can provide.
------------------------------
Date: Thu, 29 Apr 2004 21:58:55 +0100
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: (R) character in RegEXP
Message-Id: <Pine.LNX.4.53.0404292110370.28014@ppepc56.ph.gla.ac.uk>
On Thu, 29 Apr 2004, Tsu-na-mi wrote:
> I am having trouble getting a simple regexp to recognize the
> registered trademark symbol (R) when it is read from XML. The XML
> uses ® for the symbol,
But what are you giving to Perl - the character itself, or that
numerical character reference?
> However, the regexp:
>
> $string =~ s/(R)/somethingelse/g;
>
> does not recognize the (R) symbol.
You said the XML contains ®
How do you expect Perl to know what it's intended to mean?
Or were you getting that as a result from the XML parser? (sorry if
your description didn't seem clear enough).
> NOTE: (R) is the single-ASCII character.
Ahem. The ASCII code does not contain this character. ASCII is a
7-bit code, which is at the basis of numerous 8-bit codings.
Presumably you're thinking in terms of iso-8859-1, or maybe even
Windows-1252, as the 8-bit coding.
> I also tried using \x{AE} which did not work either. The
> regular TM symbol doesn't work either, and seems to throw everything
> into unicode mode, screwing up other stuff like the bullet and
> copyright symbols.
> So my question is,
Not so fast! Let's see some Perl code first. Preferably a
manageable-sized snippet that's complete enough to run for ourselves
and that demonstrates the problem you're experiencing. You're
obviously in a quagmire, and folks would like to help you, but if you
keep yelling and struggling, then you're liable to just get deeper;
stay calm, take it step by step, show us your working...
> If I have XML like :
>
> <P>This is my Widget®</P>
>
> And read it into a string with XML::Parser, how should I address this
> character (and any char > 256 if you know).
With respect, my advice would have to be that you -do- need to take a
while out with perldoc perluniintro and (maybe) perlunicode, to get a
start on Perl's handling of unicode. Just getting a prescription
handed out isn't going to be a great deal of help - one needs to
understand this sufficiently for it to make sense, rather than just
applying magic incantations.
> For the record, I am using Perl 5.8.3 on Red Hat 9.0. Thanks for any
> help anyone can provide.
Looks OK. But I think you need to (a) show a bit more of your working
and (b) understand the difference between Perl's legacy 8-bit handling
and its Unicode character model.
So, let's see a bit more of your code, if we're to help with the
code. But your part of the bargain would be (no offence intended) to
get a bit more up to speed with the underlying principles.
hope this helps.
------------------------------
Date: 29 Apr 2004 15:17:02 -0600
From: Jim Cochrane <jtc@shell.dimensional.com>
Subject: Re: (R) character in RegEXP
Message-Id: <slrnc92s6e.ts8.jtc@shell.dimensional.com>
In article <a6b1e337.0404291132.1541084e@posting.google.com>, Tsu-na-mi wrote:
> Hi,
>
> I am having trouble getting a simple regexp to recognize the
> registered trademark symbol (R) when it is read from XML. The XML
> uses ® for the symbol, and if I print the string after parsing,
> it prints correctly. However, the regexp:
>
> $string =~ s/(R)/somethingelse/g;
If this is literally what is in your code, you're using the (...) grouping
construct. If you want to literally match '(R)', you need to escape the
parens: s/\(R\)/...
>
> does not recognize the (R) symbol. NOTE: (R) is the single-ASCII
> character. I also tried using \x{AE} which did not work either. The
> regular TM symbol doesn't work either, and seems to throw everything
> into unicode mode, screwing up other stuff like the bullet and
> copyright symbols.
>
> So my question is, If I have XML like :
>
><P>This is my Widget®</P>
>
> And read it into a string with XML::Parser, how should I address this
> character (and any char > 256 if you know).
>
> For the record, I am using Perl 5.8.3 on Red Hat 9.0. Thanks for any
> help anyone can provide.
--
Jim Cochrane; jtc@dimensional.com
[When responding by email, include the term non-spam in the subject line to
get through my spam filter.]
------------------------------
Date: Wed, 28 Apr 2004 23:41:08 GMT
From: Bill <wherrera@lynxview.com>
Subject: ANNOUNCE: Audio::M4pDecrypt 0.01 released
Message-Id: <Hwy59J.oBG@zorch.sf-bay.org>
The pure Perl module Audio::M4pDecrypt has been posted to CPAN.
NAME
Audio::M4pDecrypt -- DRMS decryption of Apple iTunes style MP4 player files
DESCRIPTION
Perl port of the DeDRMS.cs program by Jon Lech Johansen
SYNOPSIS
use Audio::M4pDecrypt;
my $mp4file = 'myfile';
my $outfile = 'mydecodedfile';
my $deDRMS = new Audio::M4pDecrypt;
$deDRMS->DeDRMS($mp4file, $outfile);
--Bill
------------------------------
Date: Wed, 28 Apr 2004 09:42:13 GMT
From: Ron Savage <ron@savage.net.au>
Subject: ANNOUNCE: Javascript::MD5 1.03
Message-Id: <Hwy58q.oAF@zorch.sf-bay.org>
The pure Perl module Javascript::MD5 1.03
is available immediately from CPAN,
and from http://savage.net.au/Perl-modules.html.
On-line docs, and a *.ppd for ActivePerl are also
available from the latter site.
An extract from the docs:
1.03 Tue Apr 27 14:59:04 2004
- Replace Yahoo!'s version of the Javascript with Paul Johnston's version,
from his web site: http://pajhome.org.uk/crypt/md5
- Change the name of the function you call in your submit button code,
from RetMD5() - the Yahoo! name - to str2hex_md5() - a name more in keeping
with Paul's naming convention
- Add 2 extra functions, str2b64_md5() and str2str_md5(), to return other versions
of the digest
--
Cheers
Ron Savage, ron@savage.net.au on 28/04/2004
http://savage.net.au/index.html
------------------------------
Date: Wed, 28 Apr 2004 09:43:16 GMT
From: Ron Savage <ron@savage.net.au>
Subject: ANNOUNCE: Javascript::SHA1 1.00
Message-Id: <Hwy58x.1vsy@zorch.sf-bay.org>
The pure Perl module Javascript::SHA1 1.00
is available immediately from CPAN,
and from http://savage.net.au/Perl-modules.html.
On-line docs, and a *.ppd for ActivePerl are also
available from the latter site.
An extract from the docs:
1.00 Fri Mar 05 10:23:29 2004
- Original version
- The Javascript is Paul Johnston's version, from his web site:
http://pajhome.org.uk/crypt/md5
- There are 3 functions, str2hex_sha1(), str2b64_sha1() and str2str_sha1(), to return various versions
of the digest
--
Cheers
Ron Savage, ron@savage.net.au on 28/04/2004
http://savage.net.au/index.html
------------------------------
Date: Wed, 28 Apr 2004 23:05:23 GMT
From: John McNamara <jmcnamara@cpan.org>
Subject: ANNOUNCE: Spreadsheet::WriteExcel 0.43
Message-Id: <Hwy59A.oBM@zorch.sf-bay.org>
======================================================================
ANNOUNCE
Spreadsheet::WriteExcel version 0.43 has been uploaded to CPAN.
http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcel
======================================================================
NAME
Spreadsheet::WriteExcel - Write formatted text and numbers to a
cross-platform Excel binary file.
======================================================================
CHANGES
Minor release
! Fixed lonstanding bug where page setup features didn't
show up in OpenOffice.
! Fixed localised @_ bug when using threaded perls.
======================================================================
DESCRIPTION
The Spreadsheet::WriteExcel module can be used create a cross-
platform Excel binary file. Multiple worksheets can be added to a
workbook and formatting can be applied to cells. Text, numbers,
formulas and hyperlinks and images can be written to the cells.
The Excel file produced by this module is compatible with Excel 5,
95, 97, 2000 and 2002.
The module will work on the majority of Windows, UNIX and
Macintosh platforms. Generated files are also compatible with the
Linux/UNIX spreadsheet applications Gnumeric and OpenOffice.
The generated files are not compatible with MS Access.
This module cannot be used to read an Excel file. See
Spreadsheet::ParseExcel or look at the main documentation for some
suggestions.
This module cannot be used to write to an existing Excel file.
======================================================================
SYNOPSIS
To write a string, a formatted string, a number and a formula to
the first worksheet in an Excel workbook called perl.xls:
use Spreadsheet::WriteExcel;
# Create a new Excel workbook
my $workbook = Spreadsheet::WriteExcel->new("perl.xls");
# Add a worksheet
$worksheet = $workbook->addworksheet();
# Add and define a format
$format = $workbook->addformat(); # Add a format
$format->set_bold();
$format->set_color('red');
$format->set_align('center');
# Write a formatted and unformatted string
$col = $row = 0;
$worksheet->write($row, $col, "Hi Excel!", $format);
$worksheet->write(1, $col, "Hi Excel!");
# Write a number and a formula using A1 notation
$worksheet->write('A3', 1.2345);
$worksheet->write('A4', '=SIN(PI()/4)');
======================================================================
REQUIREMENTS
This module requires Perl 5.005 (or later), Parse::RecDescent and
File::Temp
http://search.cpan.org/search?dist=Parse-RecDescent
http://search.cpan.org/search?dist=File-Temp
======================================================================
AUTHOR
John McNamara (jmcnamara@cpan.org)
--
------------------------------
Date: Wed, 28 Apr 2004 23:03:55 GMT
From: John McNamara <jmcnamara@cpan.org>
Subject: ANNOUNCE: Spreadsheet::WriteExcelXML 0.02
Message-Id: <Hwy595.oAx@zorch.sf-bay.org>
======================================================================
ANNOUNCE
Spreadsheet::WriteExcel version 0.02 has been uploaded to CPAN.
http://search.cpan.org/~jmcnamara/Spreadsheet-WriteExcelXML/
======================================================================
NAME
Spreadsheet::WriteExcelXML - Create an Excel file in XML format.
======================================================================
DESCRIPTION
The Spreadsheet::WriteExcelXML module can be used to create an
Excel file in XML format. The Excel XML format is supported in
Excel 2002 and 2003.
Multiple worksheets can be added to a workbook and formatting
can be applied to cells. Text, numbers, and formulas can be
written to the cells. The module supports strings up to 32,767
characters and the strings can be in UTF8 format.
Spreadsheet::WriteExcelXML uses the same interface as
Spreadsheet::WriteExcel.
This module cannot, as yet, be used to write to an existing
Excel XML file.
======================================================================
SYNOPSIS
To write a string, a formatted string, a number and a formula to
the first worksheet in an Excel XML spreadsheet called perl.xml:
use Spreadsheet::WriteExcelXML;
# Create a new Excel workbook
my $workbook = Spreadsheet::WriteExcelXML->new("perl.xml");
# Add a worksheet
$worksheet = $workbook->add_worksheet();
# Add and define a format
$format = $workbook->add_format(); # Add a format
$format->set_bold();
$format->set_color('red');
$format->set_align('center');
# Write a formatted and unformatted string.
$col = $row = 0;
$worksheet->write($row, $col, "Hi Excel!", $format);
$worksheet->write(1, $col, "Hi Excel!");
# Write a number and a formula using A1 notation
$worksheet->write('A3', 1.2345);
$worksheet->write('A4', '=SIN(PI()/4)');
======================================================================
REQUIREMENTS
This module requires Perl 5.005 (or later).
======================================================================
INSTALLATION
Use the standard Unix style installation, a ppm for Windows
users will be available in the next release:
Unzip and untar the module as follows or use winzip:
tar -zxvf Spreadsheet-WriteExcel-0.xx.tar.gz
The module can be installed using the standard Perl procedure:
perl Makefile.PL
make
make test
make install # You may need to be root
make clean # or make realclean
======================================================================
AUTHOR
John McNamara (jmcnamara@cpan.org)
--
------------------------------
Date: Thu, 29 Apr 2004 19:35:59 GMT
From: Shailesh <shailesh@nothing.but.net>
Subject: Re: convert utf8 to latin-1/iso-8859-1
Message-Id: <zOckc.39929$Vp5.6607@fe2.columbus.rr.com>
Nevermind, the trademark symbol is a Microsoft specific extension to
the Latin1 character set, and in actuality has no representation in
iso-8859-1. That is why it is correctly being erased by Perl during
the conversion.
http://casa.colorado.edu/~ajsh/iso8859-1.html
http://www.fourmilab.ch/webtools/demoroniser/
Shailesh wrote:
> I have a file that contains the trademark character encoded in
> utf8/Unicode (i.e. it takes up 3 octets of space). I can read this file
> into a string like this:
>
> $filename = "testutf8.txt";
> $source = IO::File->new( $filename, 'r' );
> binmode( $source, ':utf8' );
> @filedata = <$source>;
> $line = $filedata[0];
>
> The problem is, I want to write out the $line to another file encoded as
> iso-8859-1 (a.k.a latin-1 and extended US ASCII). The trademark
> character should be translated to a single octet with value 153. I have
> tried encode/decode and pack/unpack with no success. The trademark
> character either gets erased, or converted into three separate gibberish
> characters. The only thing that seems to work is
> HTML::Entities::encode_entities, which correctly detects the trademark
> symbol in utf8, and converts it to the entity "™". Any help?
>
> I have attached the utf8 encoded file, which contains "abc|123", where
> '|' is the TM symbol. Note that the file is 9 bytes, as expected. Below
> is the rest of the testing code.
>
> # Open a test output file
> $dest = IO::File->new("outenc.txt" , 'w' );
>
> # Write a correct trademark symbol
> $test = pack("C", 153);
> print "char is: ".$test."\n";
> $dest->write($test);
> $dest->write("\n");
>
> # Write the utf8 line converted to iso-8859-1 -- HOW TO CONVERT IT?
> $dest->write($line);
>
> $source->close();
> $dest->close();
>
>
> ------------------------------------------------------------------------
>
> abc™123
------------------------------
Date: Thu, 29 Apr 2004 19:57:19 GMT
From: Shailesh <shailesh@nothing.but.net>
Subject: Re: convert utf8 to latin-1/iso-8859-1
Message-Id: <z6dkc.40077$Vp5.31173@fe2.columbus.rr.com>
This one-liner does the conversion correctly:
$line = Unicode::String::utf8($line)->latin1();
Shailesh wrote:
> Nevermind, the trademark symbol is a Microsoft specific extension to the
> Latin1 character set, and in actuality has no representation in
> iso-8859-1. That is why it is correctly being erased by Perl during the
> conversion.
>
> http://casa.colorado.edu/~ajsh/iso8859-1.html
>
> http://www.fourmilab.ch/webtools/demoroniser/
>
>
> Shailesh wrote:
>
>> I have a file that contains the trademark character encoded in
>> utf8/Unicode (i.e. it takes up 3 octets of space). I can read this
>> file into a string like this:
>>
>> $filename = "testutf8.txt";
>> $source = IO::File->new( $filename, 'r' );
>> binmode( $source, ':utf8' );
>> @filedata = <$source>;
>> $line = $filedata[0];
>>
>> The problem is, I want to write out the $line to another file encoded
>> as iso-8859-1 (a.k.a latin-1 and extended US ASCII). The trademark
>> character should be translated to a single octet with value 153. I
>> have tried encode/decode and pack/unpack with no success. The
>> trademark character either gets erased, or converted into three
>> separate gibberish characters. The only thing that seems to work is
>> HTML::Entities::encode_entities, which correctly detects the trademark
>> symbol in utf8, and converts it to the entity "™". Any help?
>>
>> I have attached the utf8 encoded file, which contains "abc|123", where
>> '|' is the TM symbol. Note that the file is 9 bytes, as expected.
>> Below is the rest of the testing code.
>>
>> # Open a test output file
>> $dest = IO::File->new("outenc.txt" , 'w' );
>>
>> # Write a correct trademark symbol
>> $test = pack("C", 153);
>> print "char is: ".$test."\n";
>> $dest->write($test);
>> $dest->write("\n");
>>
>> # Write the utf8 line converted to iso-8859-1 -- HOW TO CONVERT IT?
>> $dest->write($line);
>>
>> $source->close();
>> $dest->close();
>>
>>
>> ------------------------------------------------------------------------
>>
>> abc™123
------------------------------
Date: Thu, 29 Apr 2004 21:44:35 GMT
From: Shailesh <shailesh@nothing.but.net>
Subject: Re: convert utf8 to latin-1/iso-8859-1
Message-Id: <7Hekc.41033$Vp5.40794@fe2.columbus.rr.com>
Shailesh wrote:
> This one-liner does the conversion correctly:
>
> $line = Unicode::String::utf8($line)->latin1();
Above only seems to work for trademark symbol. This is more reliable:
use utf8;
s/([\x{80}-\x{FFFF}])//gse;
From: http://perl-xml.sourceforge.net/faq/#encoding_conversion
------------------------------
Date: Thu, 29 Apr 2004 15:52:59 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Count how many times find and replaced happened
Message-Id: <slrnc92qpb.3qo.tadmc@magna.augustmail.com>
Robin <webmaster@infusedlight> wrote:
> From: "Robin" <webmaster @ infusedlight . net>
Please choose one posting address and stick to it.
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Thu, 29 Apr 2004 13:37:31 -0800
From: "Robin" <webmaster @ infusedlight . net>
Subject: Re: Count how many times find and replaced happened
Message-Id: <c6rsp9$lnp$1@reader2.nmix.net>
"Tad McClellan" <tadmc@augustmail.com> wrote in message
news:slrnc92qpb.3qo.tadmc@magna.augustmail.com...
> Robin <webmaster@infusedlight> wrote:
>
> > From: "Robin" <webmaster @ infusedlight . net>
>
>
> Please choose one posting address and stick to it.
yeah I did...sorry.
-Robin
------------------------------
Date: Thu, 29 Apr 2004 20:03:27 GMT
From: usenet_spam_cygwin <beau@oblios-cap.com>
Subject: cywin versus activestate on xp
Message-Id: <Pine.CYG.4.58.0404291259130.1808@beren>
Howdy. A google search "activestate versus cygwin" didn't do me much
good, so I'm asking the fine folks at comp.lang.perl.misc: any clear
reason to use one over the other? My objective, really, is as seamless as
possible an experience as I work my way through the llama book for the
first time. Many thanks! (My feelings won't be hurt by backchannel
responses if you feel it's not sufficiently on topic.)
--
beau
------------------------------
Date: Thu, 29 Apr 2004 21:45:06 GMT
From: Po Boy <a5ufv8u02@sneakemail.com>
Subject: generating time series graphs with perl
Message-Id: <pan.2004.04.29.21.46.50.587897@sneakemail.com>
I'm trying to find a perl module that will help me make a certain kind of
graph. I believe it's called a time-series graph, but I'm not sure. It's a
graph of how two variables behave over time. You can see a couple examples
at:
http://www.trendmacro.com/a/goodman/keyIndicators/pvCharting.asp
and
http://www.thestreet.com/comment/openbook/1332231.html
I have tried using gnuplot and the GD package for these graphs, but have
been unable to get either one to generate reasonable looking graphs. I
believe it's because the data that I'm graphing is not strictly described
by any function since some X value can have multiple Y values (just at
different times).
Has anyone had success in graphing this kind of data? Can you recommend a
module or some documentation or pointers that may help me out?
Looking forward to any help you can give me!
-pb
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6488
***************************************