[31370] in Perl-Users-Digest
Perl-Users Digest, Issue: 2622 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Oct 6 00:09:29 2009
Date: Mon, 5 Oct 2009 21:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 5 Oct 2009 Volume: 11 Number: 2622
Today's topics:
FAQ 4.72 How do I determine whether a scalar is a numbe <brian@theperlreview.com>
FAQ 5.21 Why can't I just open(FH, ">file.lock")? <brian@theperlreview.com>
FAQ 8.45 How do I install a module from CPAN? <brian@theperlreview.com>
FAQ 9.13 How do I edit my .htpasswd and .htgroup files <brian@theperlreview.com>
FAQ 9.3 How can I get better error messages from a CGI <brian@theperlreview.com>
How to parse section of html code? <jegan473@comcast.net>
Re: How to parse section of html code? <ben@morrow.me.uk>
Re: How to parse section of html code? <jurgenex@hotmail.com>
Re: How to parse section of html code? sln@netherlands.com
load frequently used modules from one include file <sanjeeb25@gmail.com>
Re: load frequently used modules from one include file <ben@morrow.me.uk>
regex bug: "variable ... will not stay shared" <gb345@invalid.com>
Re: regex bug: "variable ... will not stay shared" <ben@morrow.me.uk>
Re: regex bug: "variable ... will not stay shared" <gb345@invalid.com>
Re: regex bug: "variable ... will not stay shared" sln@netherlands.com
Replace Unicode character <ryanchan404@gmail.com>
Re: Replace Unicode character <bugbear@trim_papermule.co.uk_trim>
Re: Replace Unicode character <ryanchan404@gmail.com>
Re: Replace Unicode character <benkasminbullock@gmail.com>
Re: Replace Unicode character <hjp-usenet2@hjp.at>
Re: Replace Unicode character <OJZGSRPBZVCX@spammotel.com>
Re: Replace Unicode character sln@netherlands.com
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 06 Oct 2009 04:00:05 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 4.72 How do I determine whether a scalar is a number/whole/integer/float?
Message-Id: <9hzym.16730$ma7.9300@newsfe04.iad>
This is an excerpt from the latest version perlfaq4.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
4.72: How do I determine whether a scalar is a number/whole/integer/float?
Assuming that you don't care about IEEE notations like "NaN" or
"Infinity", you probably just want to use a regular expression.
if (/\D/) { print "has nondigits\n" }
if (/^\d+$/) { print "is a whole number\n" }
if (/^-?\d+$/) { print "is an integer\n" }
if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
{ print "a C float\n" }
There are also some commonly used modules for the task. Scalar::Util
(distributed with 5.8) provides access to perl's internal function
"looks_like_number" for determining whether a variable looks like a
number. Data::Types exports functions that validate data types using
both the above and other regular expressions. Thirdly, there is
"Regexp::Common" which has regular expressions to match various types of
numbers. Those three modules are available from the CPAN.
If you're on a POSIX system, Perl supports the "POSIX::strtod" function.
Its semantics are somewhat cumbersome, so here's a "getnum" wrapper
function for more convenient access. This function takes a string and
returns the number it found, or "undef" for input that isn't a C float.
The "is_numeric" function is a front end to "getnum" if you just want to
say, "Is this a float?"
sub getnum {
use POSIX qw(strtod);
my $str = shift;
$str =~ s/^\s+//;
$str =~ s/\s+$//;
$! = 0;
my($num, $unparsed) = strtod($str);
if (($str eq '') || ($unparsed != 0) || $!) {
return undef;
}
else {
return $num;
}
}
sub is_numeric { defined getnum($_[0]) }
Or you could check out the String::Scanf module on the CPAN instead. The
"POSIX" module (part of the standard Perl distribution) provides the
"strtod" and "strtol" for converting strings to double and longs,
respectively.
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
------------------------------
Date: Mon, 05 Oct 2009 10:00:09 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 5.21 Why can't I just open(FH, ">file.lock")?
Message-Id: <Jsjym.77568$4t6.40680@newsfe06.iad>
This is an excerpt from the latest version perlfaq5.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
5.21: Why can't I just open(FH, ">file.lock")?
A common bit of code NOT TO USE is this:
sleep(3) while -e "file.lock"; # PLEASE DO NOT USE
open(LCK, "> file.lock"); # THIS BROKEN CODE
This is a classic race condition: you take two steps to do something
which must be done in one. That's why computer hardware provides an
atomic test-and-set instruction. In theory, this "ought" to work:
sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)
or die "can't open file.lock: $!";
except that lamentably, file creation (and deletion) is not atomic over
NFS, so this won't work (at least, not every time) over the net. Various
schemes involving link() have been suggested, but these tend to involve
busy-wait, which is also less than desirable.
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
------------------------------
Date: Mon, 05 Oct 2009 16:00:05 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 8.45 How do I install a module from CPAN?
Message-Id: <9Koym.57856$944.56051@newsfe09.iad>
This is an excerpt from the latest version perlfaq8.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
8.45: How do I install a module from CPAN?
The easiest way is to have a module also named CPAN do it for you. This
module comes with perl version 5.004 and later.
$ perl -MCPAN -e shell
cpan shell -- CPAN exploration and modules installation (v1.59_54)
ReadLine support enabled
cpan> install Some::Module
To manually install the CPAN module, or any well-behaved CPAN module for
that matter, follow these steps:
1 Unpack the source into a temporary area.
2
perl Makefile.PL
3
make
4
make test
5
make install
If your version of perl is compiled without dynamic loading, then you
just need to replace step 3 (make) with make perl and you will get a new
perl binary with your extension linked in.
See ExtUtils::MakeMaker for more details on building extensions. See
also the next question, "What's the difference between require and
use?".
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
------------------------------
Date: Mon, 05 Oct 2009 04:00:08 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 9.13 How do I edit my .htpasswd and .htgroup files with Perl?
Message-Id: <cbeym.316455$vp.283681@newsfe12.iad>
This is an excerpt from the latest version perlfaq9.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
9.13: How do I edit my .htpasswd and .htgroup files with Perl?
The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a consistent
OO interface to these files, regardless of how they're stored. Databases
may be text, dbm, Berkeley DB or any database with a DBI compatible
driver. HTTPD::UserAdmin supports files used by the "Basic" and "Digest"
authentication schemes. Here's an example:
use HTTPD::UserAdmin ();
HTTPD::UserAdmin
->new(DB => "/foo/.htpasswd")
->add($username => $password);
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
------------------------------
Date: Mon, 05 Oct 2009 22:00:03 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 9.3 How can I get better error messages from a CGI program?
Message-Id: <D%tym.38559$bP1.15409@newsfe24.iad>
This is an excerpt from the latest version perlfaq9.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
9.3: How can I get better error messages from a CGI program?
Use the CGI::Carp module. It replaces "warn" and "die", plus the normal
Carp modules "carp", "croak", and "confess" functions with more verbose
and safer versions. It still sends them to the normal server error log.
use CGI::Carp;
warn "This is a complaint";
die "But this one is serious";
The following use of CGI::Carp also redirects errors to a file of your
choice, placed in a BEGIN block to catch compile-time warnings as well:
BEGIN {
use CGI::Carp qw(carpout);
open(LOG, ">>/var/local/cgi-logs/mycgi-log")
or die "Unable to append to mycgi-log: $!\n";
carpout(*LOG);
}
You can even arrange for fatal errors to go back to the client browser,
which is nice for your own debugging, but might confuse the end user.
use CGI::Carp qw(fatalsToBrowser);
die "Bad error here";
Even if the error happens before you get the HTTP header out, the module
will try to take care of this to avoid the dreaded server 500 errors.
Normal warnings still go out to the server error log (or wherever you've
sent them with "carpout") with the application name and date stamp
prepended.
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
------------------------------
Date: Mon, 05 Oct 2009 23:05:29 GMT
From: James Egan <jegan473@comcast.net>
Subject: How to parse section of html code?
Message-Id: <ZYuym.18511$UY2.9562@en-nntp-04.dc1.easynews.com>
I have some .html files, and the files contain a line in which I need to
extract a substring. In the two example lines below, I need to extract
the "CUSTOMER SERVICE" string in the first, and "EXECUTIVE ACCOUNTING" in
the second. I tried using Text::Balanced, but it did not work as
expected. How can I parse this value from the html code?
-Thanks
<td><span class="FirstColumn">Department Desc:</span></td><td><span
class="Value">001</span><span class="Value">(CUSTOMER SERVICE) </
span><span class="Prompt">Vs:</span><span class="Value">(SMITH, WILLIAM )
</span></td>
<td><span class="FirstColumn">Department Desc:</span></td><td><span
class="Value">001</span><span class="Value">(EXECUTIVE ACCOUNTING) </
span><span class="Prompt">Vs:</span><span class="Value">(JONES, JANE )</
span></td>
------------------------------
Date: Tue, 6 Oct 2009 01:30:15 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to parse section of html code?
Message-Id: <nlfqp6-474.ln1@osiris.mauzo.dyndns.org>
Quoth James Egan <jegan473@comcast.net>:
> I have some .html files, and the files contain a line in which I need to
> extract a substring. In the two example lines below, I need to extract
> the "CUSTOMER SERVICE" string in the first, and "EXECUTIVE ACCOUNTING" in
> the second. I tried using Text::Balanced, but it did not work as
> expected. How can I parse this value from the html code?
>
>
> <td><span class="FirstColumn">Department Desc:</span></td><td><span
> class="Value">001</span><span class="Value">(CUSTOMER SERVICE) </
> span><span class="Prompt">Vs:</span><span class="Value">(SMITH, WILLIAM )
> </span></td>
>
>
> <td><span class="FirstColumn">Department Desc:</span></td><td><span
> class="Value">001</span><span class="Value">(EXECUTIVE ACCOUNTING) </
> span><span class="Prompt">Vs:</span><span class="Value">(JONES, JANE )</
> span></td>
With the two lines you have given, I would use
/\(([^)]*)\)/
If that is not sufficient, you will need to better explain what possible
forms the input data might take.
If the HTML structure is fixed (say, the data you want is always in the
second <span> in the second <td> in a <tr>) and the document is true
valid HTML, you could also use XML::LibXML's HTML parsing functions and
apply an XPath expression to the result. Depending on how the input is
generated, this may be more robust against minor changes in formatting.
Ben
------------------------------
Date: Mon, 05 Oct 2009 18:21:20 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: How to parse section of html code?
Message-Id: <fn6lc5hkpgk9lapp2573omvnudfodgl1sc@4ax.com>
James Egan <jegan473@comcast.net> wrote:
[...]
>expected. How can I parse this value from the html code?
Canned answer #2: by using a proper HTML parser.
See "perldoc -q HTML":
How do I remove HTML from a string?
or DejaNews for former answers to this frequent question.
jue
------------------------------
Date: Mon, 05 Oct 2009 20:37:16 -0700
From: sln@netherlands.com
Subject: Re: How to parse section of html code?
Message-Id: <hjelc59vhivsm3u2jidahu93ii2rkiqcgf@4ax.com>
On Mon, 05 Oct 2009 23:05:29 GMT, James Egan <jegan473@comcast.net> wrote:
>I have some .html files, and the files contain a line in which I need to
>extract a substring. In the two example lines below, I need to extract
>the "CUSTOMER SERVICE" string in the first, and "EXECUTIVE ACCOUNTING" in
>the second. I tried using Text::Balanced, but it did not work as
>expected. How can I parse this value from the html code?
>
>-Thanks
>
>
><td><span class="FirstColumn">Department Desc:</span></td><td><span
>class="Value">001</span><span class="Value">(CUSTOMER SERVICE) </
>span><span class="Prompt">Vs:</span><span class="Value">(SMITH, WILLIAM )
></span></td>
>
>
><td><span class="FirstColumn">Department Desc:</span></td><td><span
>class="Value">001</span><span class="Value">(EXECUTIVE ACCOUNTING) </
>span><span class="Prompt">Vs:</span><span class="Value">(JONES, JANE )</
>span></td>
Your html lines look tightly constructed, so a loose regex
maybe?
-sln
---------
'CUSTOMER SERVICE'
'EXECUTIVE ACCOUNTING'
'PAYROL'
---------
use strict;
use warnings;
my $file = join '', <DATA>;
my @depts = map {chomp; length() ? $_ : ()}
split /^ .+? >001< .+? \( | \) .+? >001< .+? \( | \) .+? $ /xs, $file ;
for (@depts) {
print "'$_'\n";
}
__DATA__
<td><span class="FirstColumn">Department Desc:</span></td><td><span
class="Value">001</span><span class="Value">(CUSTOMER SERVICE) </
span><span class="Prompt">Vs:</span><span class="Value">(SMITH, WILLIAM )
</span></td>
<td><span class="FirstColumn">Department Desc:</span></td><td><span
class="Value">001</span><span class="Value">(EXECUTIVE ACCOUNTING) </
span><span class="Prompt">Vs:</span><span class="Value">(JONES, JANE )</
span></td>
<td><span class="FirstColumn">Department Desc:</span></td><td><span
class="Value">001</span><span class="Value">(PAYROL) </
span><span class="Prompt">Vs:</span><span class="Value">(SMITH, WILLIAM )
</span></td>
------------------------------
Date: Mon, 5 Oct 2009 04:13:11 -0700 (PDT)
From: sanjeeb <sanjeeb25@gmail.com>
Subject: load frequently used modules from one include file
Message-Id: <26ba8e43-2e2d-4be6-b865-a5741ba91779@m33g2000pri.googlegroups.com>
I know use,do,require but could not achieved the folowing requirement.
I have several modules which are used frequently in my project. i want
to create a file which have all the modules included and just load
that file which inturn load the modules in the include file.
e.g strict,Data::Dumper,Switch ..these modules are used frequently.
I want to create a file "LoadBasemodules" and just put all the above
modules.
In my project file include "LoadBasemodules" which inturn loads the
strict,Data::Dumper,Switch etc.
Is there any way out in perl?
------------------------------
Date: Mon, 5 Oct 2009 13:56:26 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: load frequently used modules from one include file
Message-Id: <q07pp6-bg.ln1@osiris.mauzo.dyndns.org>
Quoth sanjeeb <sanjeeb25@gmail.com>:
> I know use,do,require but could not achieved the folowing requirement.
>
> I have several modules which are used frequently in my project. i want
> to create a file which have all the modules included and just load
> that file which inturn load the modules in the include file.
>
> e.g strict,Data::Dumper,Switch ..these modules are used frequently.
Don't use Switch. It was a very bad idea, and can lead to very odd
failures.
> I want to create a file "LoadBasemodules" and just put all the above
> modules.
>
> In my project file include "LoadBasemodules" which inturn loads the
> strict,Data::Dumper,Switch etc.
>
> Is there any way out in perl?
In general this is not terribly easy. The most straightforward solution
for most modules is probably something like
package LocalBasemodules;
sub import {
my $pkg = caller;
eval qq{
package $pkg;
use Data::Dumper;
};
}
but that won't work for modules like strict which have lexical effect.
For such modules simply calling their import method like
package LocalBasemodules;
sub import {
require strict;
strict->import;
}
will often work.
Ben
------------------------------
Date: Mon, 5 Oct 2009 14:50:07 +0000 (UTC)
From: gb345 <gb345@invalid.com>
Subject: regex bug: "variable ... will not stay shared"
Message-Id: <had12u$10r$1@reader1.panix.com>
Some code that used to work fine with perl 5.8.8 now generates a
warning when I run it with perl 5.10.0. The error is caused by a
recursive regex that is supposed to match a balanced expression.
This regex is defined using qr//, and is assigned to the variable
$pex, and mentions this variable in its definition.
The warning message is:
Variable "$pex" will not stay shared at (re_eval 5) line 2.
Variable "$pex" will not stay shared at (re_eval 6) line 2.
Here's the beast:
my $pex;
$pex = qr/
\{
(?:
(?:
"
( (?> (?:[^"\\]|\\[^"])* ) (?> \\" (?:[^"\\]|\\[^"])* )* )
"
|
(?!\{) ( [^",]* ) (?<!\})
|
(??{ $pex })
)
(?:
,
(?:
"
( (?> (?:[^"\\]|\\[^"])* ) (?> \\" (?:[^"\\]|\\[^"])* )* )
"
|
(?!\{) ( [^",]* ) (?<!\})
|
(??{ $pex })
)
)*
)?
\}
/x;
I want to fix whatever it is that the warning is warning about
(rather than simply turn off the warning), but it's not clear to
me exactly what the problem is (especially since this code has been
performing flawlessly up to now).
Any clarifications of what the error message is actually saying,
or suggestions to fix the problem would be much appreciated.
Many thanks in advance!
Gabe
------------------------------
Date: Mon, 5 Oct 2009 16:31:40 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: regex bug: "variable ... will not stay shared"
Message-Id: <s3gpp6-em1.ln1@osiris.mauzo.dyndns.org>
Quoth gb345 <gb345@invalid.com>:
>
> Some code that used to work fine with perl 5.8.8 now generates a
> warning when I run it with perl 5.10.0. The error is caused by a
> recursive regex that is supposed to match a balanced expression.
> This regex is defined using qr//, and is assigned to the variable
> $pex, and mentions this variable in its definition.
>
> The warning message is:
>
> Variable "$pex" will not stay shared at (re_eval 5) line 2.
> Variable "$pex" will not stay shared at (re_eval 6) line 2.
>
> Here's the beast:
>
> my $pex;
> $pex = qr/
<snip>
> (??{ $pex })
<snip>
>
> I want to fix whatever it is that the warning is warning about
> (rather than simply turn off the warning), but it's not clear to
> me exactly what the problem is (especially since this code has been
> performing flawlessly up to now).
It's documented that using lexical variables inside (??{}) constructions
has been buggy since they were first introduced. The reason is to do with
the details of how and when the compiled closure closes over the
variables, and it is apparently rather hard to fix.
The workaround is to use 'our' variables instead.
Ben
------------------------------
Date: Mon, 5 Oct 2009 16:33:48 +0000 (UTC)
From: gb345 <gb345@invalid.com>
Subject: Re: regex bug: "variable ... will not stay shared"
Message-Id: <had75c$9tf$1@reader1.panix.com>
In <s3gpp6-em1.ln1@osiris.mauzo.dyndns.org> Ben Morrow <ben@morrow.me.uk> writes:
>The workaround is to use 'our' variables instead.
Thanks.
G.
------------------------------
Date: Mon, 05 Oct 2009 12:01:44 -0700
From: sln@netherlands.com
Subject: Re: regex bug: "variable ... will not stay shared"
Message-Id: <k9gkc55r7h4q56ln5944ugg8dubkposmb9@4ax.com>
On Mon, 5 Oct 2009 14:50:07 +0000 (UTC), gb345 <gb345@invalid.com> wrote:
>
>
>
>
>Some code that used to work fine with perl 5.8.8 now generates a
>warning when I run it with perl 5.10.0. The error is caused by a
>recursive regex that is supposed to match a balanced expression.
>This regex is defined using qr//, and is assigned to the variable
>$pex, and mentions this variable in its definition.
>
>The warning message is:
>
>Variable "$pex" will not stay shared at (re_eval 5) line 2.
>Variable "$pex" will not stay shared at (re_eval 6) line 2.
>
<snip>
>
>I want to fix whatever it is that the warning is warning about
>(rather than simply turn off the warning), but it's not clear to
>me exactly what the problem is (especially since this code has been
>performing flawlessly up to now).
>
>Any clarifications of what the error message is actually saying,
>or suggestions to fix the problem would be much appreciated.
>
>Many thanks in advance!
>
>Gabe
I don't get that message on my 5.10.0 multi-threaded x86 build
(although seen it before). Its probably the usage, certain parts
were untouched.
Can you post a sample usage that generates the warning.
Also, I don't see how this matches balanced text/expression.
It doesen't on my sample. Maybe you could describe what it is
you are trying to get with it. Are you trying to capture?
-sln
------------------------------
Date: Mon, 5 Oct 2009 08:18:20 -0700 (PDT)
From: Ryan Chan <ryanchan404@gmail.com>
Subject: Replace Unicode character
Message-Id: <42c9db97-136a-456c-b235-359dda904149@v15g2000prn.googlegroups.com>
Hello,
Below my code which want to replace unicode character "=E2=96=A1" with empt=
y
string, what wrong with the code?
###################
use strict;
use warnings;
use utf8;
my $s =3D "=E2=96=A1"; # hex value =3D A1BC
$s =3D~ s/\xA1\xBC//gi;
print $s;
###################
Thanks.
------------------------------
Date: Mon, 05 Oct 2009 16:24:07 +0100
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: Re: Replace Unicode character
Message-Id: <r8ednZAW4O8KklfXnZ2dnUVZ8jdi4p2d@brightview.co.uk>
Ryan Chan wrote:
> Hello,
>
> Below my code which want to replace unicode character "â–¡" with empty
> string, what wrong with the code?
>
> ###################
>
> use strict;
> use warnings;
> use utf8;
>
> my $s = "â–¡"; # hex value = A1BC
>
> $s =~ s/\xA1\xBC//gi;
> print $s;
Your regexp replaces TWO characters, first one A1, second one BC.
Since your target string does not contain either of these
characters, nothing happens.
BugBear
------------------------------
Date: Mon, 5 Oct 2009 08:28:12 -0700 (PDT)
From: Ryan Chan <ryanchan404@gmail.com>
Subject: Re: Replace Unicode character
Message-Id: <d2468874-9e78-4718-9a11-40c904e7663b@u16g2000pru.googlegroups.com>
Hello,
On Oct 5, 11:24=A0pm, bugbear <bugbear@trim_papermule.co.uk_trim> wrote:
> Your regexp replaces TWO characters, first one A1, second one BC.
>
> Since your target string does not contain either of these
> characters, nothing happens.
>
> =A0 BugBear
even I use
$s =3D~ s/\xA1BC//gi;
the same...
Thanks anyway
------------------------------
Date: Mon, 5 Oct 2009 08:36:18 -0700 (PDT)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: Replace Unicode character
Message-Id: <1ed95788-915f-448a-8695-e2812bdb81f6@v37g2000prg.googlegroups.com>
On Oct 6, 12:28=A0am, Ryan Chan <ryanchan...@gmail.com> wrote:
> $s =3D~ s/\xA1BC//gi;
\x{A1BC} works though.
It's documented in "perldoc perlunicode".
According to Unicode::UCD this is the character "YI SYLLABLE LIEX".
------------------------------
Date: Mon, 5 Oct 2009 20:07:34 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Replace Unicode character
Message-Id: <slrnhckdf7.rif.hjp-usenet2@hrunkner.hjp.at>
On 2009-10-05 15:36, Ben Bullock <benkasminbullock@gmail.com> wrote:
> On Oct 6, 12:28 am, Ryan Chan <ryanchan...@gmail.com> wrote:
>> $s =~ s/\xA1BC//gi;
>
> \x{A1BC} works though.
>
> It's documented in "perldoc perlunicode".
>
> According to Unicode::UCD this is the character "YI SYLLABLE LIEX".
>
Also note that UTF-8 "\xA1\xBC" is not equivalent to U+A1BC. In fact
"\xA1\xBC" is not a valid UTF-8 character at all, U+A1BC is
"\xEA\x86\xBC" in UTF-8, and the character in Ryan's posting was U+25A1
(WHITE SQUARE) or "\xE2\x96\xA1" in UTF-8.
hp
------------------------------
Date: Mon, 05 Oct 2009 20:56:39 +0200
From: "Jochen Lehmeier" <OJZGSRPBZVCX@spammotel.com>
Subject: Re: Replace Unicode character
Message-Id: <op.u1cb8pfvmk9oye@frodo>
On Mon, 05 Oct 2009 17:18:20 +0200, Ryan Chan <ryanchan404@gmail.com>
wrote:
> Below my code which want to replace unicode character "â–¡" with empty
> string, what wrong with the code?
Since it has not been spelled out yet:
$s contains one character. The regex contains two characters. One
character never matches two characters.
Funnily, if you're working in an utf8 environment, even a simple \xA1 can
actually be stored as two *bytes*:
> perl -e '$s="\xa1"; print $s; binmode STDOUT,":encoding(utf8)"; print
> $s;' | hexdump -C
00000000 a1 c2 a1 |...|
00000003
------------------------------
Date: Mon, 05 Oct 2009 14:53:32 -0700
From: sln@netherlands.com
Subject: Re: Replace Unicode character
Message-Id: <44pkc5p5ms5d54jqhv2d1ke1tdqckibrpn@4ax.com>
On Mon, 05 Oct 2009 20:56:39 +0200, "Jochen Lehmeier" <OJZGSRPBZVCX@spammotel.com> wrote:
>On Mon, 05 Oct 2009 17:18:20 +0200, Ryan Chan <ryanchan404@gmail.com>
>wrote:
>
>> Below my code which want to replace unicode character "?" with empty
>> string, what wrong with the code?
>
>Since it has not been spelled out yet:
>
>$s contains one character. The regex contains two characters. One
>character never matches two characters.
>
>Funnily, if you're working in an utf8 environment, even a simple \xA1 can
>actually be stored as two *bytes*:
>
>> perl -e '$s="\xa1"; print $s; binmode STDOUT,":encoding(utf8)"; print
>> $s;' | hexdump -C
>00000000 a1 c2 a1 |...|
>00000003
I guess scalar data can actually be stored as bytes (0..255) before say
decoding octets into Perl's internal form. Either the resultant string
is all ASCII or a mix with the utf8 flag turned on (character semantics).
I think this is the base storage strategy for Perl. It speeds things up.
Encoding just converts it back into octets, turning off the utf8 flag
(byte semantics). This process is not always symetrical and there is
sometimes more than one encoding representations of the same thing.
Sort of a bastardized system.
-sln
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2622
***************************************