[31337] in Perl-Users-Digest
Perl-Users Digest, Issue: 2582 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Sep 4 06:09:41 2009
Date: Fri, 4 Sep 2009 03:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 4 Sep 2009 Volume: 11 Number: 2582
Today's topics:
Re: command perl - SR (Tim McDaniel)
Re: command perl - SR sln@netherlands.com
Re: command perl - SR sln@netherlands.com
Re: command perl - SR sln@netherlands.com
Re: create a form with cgi and a multidimensional array sln@netherlands.com
Re: create a form with cgi and a multidimensional array <mstep@podiuminternational.org>
Re: create a form with cgi and a multidimensional array <mstep@podiuminternational.org>
Re: Data cleaning issue involving bad wide characters i <mvdwege@mail.com>
Re: Data cleaning issue involving bad wide characters i sln@netherlands.com
Re: Data cleaning issue involving bad wide characters i sln@netherlands.com
FAQ 2.4 I copied the perl binary from one machine to an <brian@theperlreview.com>
FAQ 2.7 Is there an ISO or ANSI certified version of Pe <brian@theperlreview.com>
FAQ 4.22 How do I expand function calls in a string? <brian@theperlreview.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 3 Sep 2009 20:33:26 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: command perl - SR
Message-Id: <h7p96m$ft5$1@reader1.panix.com>
In article <44c40c9e-fd7f-44f4-b5d3-8991ac52fcc7@j9g2000vbp.googlegroups.com>,
fred <fred78980@yahoo.com> wrote:
>Would like to replace my third field (empty) for each five lines by
>foo. This command is not correct. Can you help me fix it ?
>
>!perl -pe 's(&&)($n++ = 5 ? (&foo&)eg' text.txt
I'm afraid that your syntax and semantics have enough wrong that it
would take a while to explain.
I think that approach is barking up the wrong tree anyway. It looks
more like array element manipulation than string processing. I think
it's easier to use split(), either explicitly or implicitly, to split
the input into an array, and then just manipulate $field[2].
You didn't explicitly specify what to do
- if the third field is not empty.
I gather that you would want it to be left alone.
- if there is no third field.
I assume that no third field is to be added.
and the test data didn't include it such test cases.
I also include a test case to make sure that the right number of empty
fields (other than the third) are preserved.
perl '-F&' -wape 'use strict;
if (@F > 2 && $F[2] eq "") {
$F[2] = "foo"; $_ = join("&", @F);
}' 093.txt
applied to
xxxx&(ght)(hgf)&&(yyt)
xx9x&(gg)(ff)&&(yyt)
oixxx&(hfd)(jj)&&(yyt)
xxxx&(jj)(kk)&&(yyt)
xjhxxx&(jj)(j)&&(yyt)
xjhxxx&(jj)(j)&NO FOO HERE&(yyt)
only_one_field
two&fields
&null&&fields&5 more&&&&&
produces
xxxx&(ght)(hgf)&foo&(yyt)
xx9x&(gg)(ff)&foo&(yyt)
oixxx&(hfd)(jj)&foo&(yyt)
xxxx&(jj)(kk)&foo&(yyt)
xjhxxx&(jj)(j)&foo&(yyt)
xjhxxx&(jj)(j)&NO FOO HERE&(yyt)
only_one_field
two&fields
&null&foo&fields&5 more&&&&&
(I don't know a variable to use instead of the repeated "&".)
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Thu, 03 Sep 2009 14:26:34 -0700
From: sln@netherlands.com
Subject: Re: command perl - SR
Message-Id: <puc0a5tg3ft5b7mhm0fel7l94t8v9akhkc@4ax.com>
On Thu, 3 Sep 2009 10:55:18 -0700 (PDT), fred <fred78980@yahoo.com> wrote:
>Would like to replace my third field (empty) for each five lines by
>foo. This command is not correct. Can you help me fix it ?
>
>!perl -pe 's(&&)($n++ = 5 ? (&foo&)eg' text.txt
>
>xxxx&(ght)(hgf)&&(yyt)
>xx9x&(gg)(ff)&&(yyt)
>oixxx&(hfd)(jj)&&(yyt)
>xxxx&(jj)(kk)&&(yyt)
>xjhxxx&(jj)(j)&&(yyt)
>
>
>Thanks
perl -pe 's/^(.+&.+&)(&.+)$/$1foo$2/' text.txt
or
perl -pe 's/^((?:.+&){2})(&.+)$/$1foo$2/' text.txt
-sln
------------------------------
Date: Thu, 03 Sep 2009 14:49:58 -0700
From: sln@netherlands.com
Subject: Re: command perl - SR
Message-Id: <0ae0a595kqa0s51kpu738k020s3aoe5djv@4ax.com>
On Thu, 3 Sep 2009 10:55:18 -0700 (PDT), fred <fred78980@yahoo.com> wrote:
>Would like to replace my third field (empty) for each five lines by
>foo. This command is not correct. Can you help me fix it ?
>
>!perl -pe 's(&&)($n++ = 5 ? (&foo&)eg' text.txt
>
>xxxx&(ght)(hgf)&&(yyt)
>xx9x&(gg)(ff)&&(yyt)
>oixxx&(hfd)(jj)&&(yyt)
>xxxx&(jj)(kk)&&(yyt)
>xjhxxx&(jj)(j)&&(yyt)
>
>
>Thanks
If only the first 5 lines:
perl -0 -pe 's/^(.+&.+&)(&.+)/++$n<6 ? $1.foo.$2 : $1.$2/mge' text.txt
or
perl -0 -pe 's/^((?:.+&){2})(&.+)/++$n<6 ? $1.foo.$2 : $1.$2/mge' text.txt
-sln
------------------------------
Date: Thu, 03 Sep 2009 15:07:47 -0700
From: sln@netherlands.com
Subject: Re: command perl - SR
Message-Id: <0ef0a5d06be5c9dnr676l7vhe39ji8c73d@4ax.com>
On Thu, 3 Sep 2009 10:55:18 -0700 (PDT), fred <fred78980@yahoo.com> wrote:
>Would like to replace my third field (empty) for each five lines by
>foo. This command is not correct. Can you help me fix it ?
>
>!perl -pe 's(&&)($n++ = 5 ? (&foo&)eg' text.txt
>
>xxxx&(ght)(hgf)&&(yyt)
>xx9x&(gg)(ff)&&(yyt)
>oixxx&(hfd)(jj)&&(yyt)
>xxxx&(jj)(kk)&&(yyt)
>xjhxxx&(jj)(j)&&(yyt)
>
>
>Thanks
Or, to fix what you have (untested):
perl -0 -pe 's/(&&)/++$n < 6 ? "&foo&" : $1/eg' text.txt
or
perl -0 -pe 's(&&)(++$n < 6 ? "&foo&" : "&&")eg' text.txt
-sln
------------------------------
Date: Fri, 04 Sep 2009 03:01:09 -0700
From: sln@netherlands.com
Subject: Re: create a form with cgi and a multidimensional array (0/1)
Message-Id: <jgo1a5tmldp9grvfhmagv228ahvs52idag@4ax.com>
On Fri, 4 Sep 2009 00:39:25 -0700 (PDT), Marek <mstep@podiuminternational.org> wrote:
>On 2 Sep., 21:33, s...@netherlands.com wrote:
>> snip
>
>
>Hello -sln!
>
>
>
>Thank you very much for your smart suggestion and sorry for my late
>answer! I was trying to adapt your smart suggestion over two days
>now.
>
>Unfortunately in the @menus there is nothing. I boiled down my script
>to the following. Perhaps I oversaw something?
>
My fault, I rushed it. It is fixed below, and it runs on my
machine. You may need to cleanup the output format
of indentation on the cgi generated stuff.
Thanks, it was fun and I learned some details on page generation.
Maybe I will put some more time into it.
Let me know how it works out.
-sln
========================
use strict;
use warnings;
use CGI qw(:standard escapeHTML);
use CGI::Carp qw(fatalsToBrowser);
$CGI::DISABLE_UPLOADS = $CGI::DISABLE_UPLOADS = 1;
$CGI::POST_MAX = $CGI::POST_MAX = 4096;
my $color;
my @element_liste = (
{
type => "ueberschrift",
name => "Abholdatum/Abholzeit",
color => "#C0C0C4"
},
{
type => 'popup',
bez => 'Tag/Monat/Jahr/ - std:min',
date_time => [
{
name => 'day',
value => [
"01", "02", "03", "04", "05", "06", "07", "08",
"09", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31"
]
},
{
name => 'month',
value => [
"Jan", "Feb", "Mär", "Apr", "Mai", "Jun",
"Jul", "Aug", "Sep", "Okt", "Nov", "Dez"
]
},
{
name => 'year',
value => [ '2009', '2010', '2011', '2012' ]
}
]
}
);
print header ( -charset => 'utf-8' ),
start_html(
{
-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title => 'FormTest',
-style =>
{ 'src' => '../style/style.css' }
}
);
my $table_start = <<"EOF";
<table width="1006" align="center" border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="left" valign="middle" bgcolor="#0E1E3F" rowspan="7">
<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
</td>
<td align="left" valign="middle" bgcolor="#0E1E3F" colspan="2">
<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
</td>
<td align="left" valign="middle" bgcolor="#0E1E3F" rowspan="7">
<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
</td>
</tr>
<tr>
<td bgcolor="#EDD4C3" colspan="2">
<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
</td>
</tr>
EOF
my $table_end = <<"EOF";
<tr>
<td bgcolor="#EDD4C3" colspan="2" height="3">
<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
</td>
</tr>
<tr>
<td bgcolor="#0E1E3F" colspan="2" height="3">
<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
</td>
</tr>
</table>
EOF
my $aktion = lc( param("aktion") );
if ( $aktion eq "" )
{
show_form( \@element_liste );
}
print end_html();
exit(0);
#
#-------------------------------------
# Eingabe-Formular anzeigen
sub show_form {
my $element_liste_ref = shift;
my @zeilen;
print start_form ( -action => url() );
print $table_start;
foreach my $f ( @{$element_liste_ref} ) {
my $type = $f->{type};
my $bez = $f->{bez};
if ( $type eq "ueberschrift" )
{
$color = $f->{color};
my $ueberschrift = $f->{name};
push @zeilen,
Tr(
{ -bgcolor => "$color" },
td( { -colspan => '2' }, h1($ueberschrift) )
);
}
elsif ( $type eq 'popup' and $f->{bez} =~ /Monat/ )
{
my @menus = ();
for (@{$f->{date_time}}) {
push @menus,
popup_menu(
-name => $_->{name},
-value => $_->{value}
);
}
push @zeilen,
Tr(
{ -bgcolor => "$color" },
td( {-align => 'center' }, $f->{bez} ),
td( {-align => 'left' }, @menus )
);
# or, any of these works just fine ...
#
# Tr(
# { -bgcolor => "$color", -align => 'center' },
# td( { -colspan => '2' }, $f->{bez}, @menus ) # <- this uses one td()
# );
#
# Tr(
# { -bgcolor => "$color" },
# td( $f->{bez} ), td( @menus )
# );
}
}
foreach my $zeile (@zeilen) {
print $zeile;
}
print qq(<tr><td bgcolor="#EDD4C3" colspan="2">);
print p( { -align => 'right' },
submit( -name => "aktion", -value => "Absenden" ) );
print "</td></tr>";
print $table_end;
print endform;
print "<p> </p>";
}
__END__
------------------------------
Date: Fri, 4 Sep 2009 00:17:12 -0700 (PDT)
From: Marek <mstep@podiuminternational.org>
Subject: Re: create a form with cgi and a multidimensional array
Message-Id: <1b860736-c4be-44af-969b-76be57a9032e@38g2000yqr.googlegroups.com>
On 2 Sep., 21:33, s...@netherlands.com wrote:
> snip
Hello -sln!
Thank you very much for your smart suggestion and sorry for my late
answer! I was trying to adapt your smart suggestion over two days
now.
Unfortunately in the @menus there is nothing. I boiled down my script
to the following. Perhaps I oversaw something?
Thank you again!
marek
#! /usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard escapeHTML);
use CGI::Carp qw(fatalsToBrowser);
$CGI::DISABLE_UPLOADS =3D $CGI::DISABLE_UPLOADS =3D 1;
$CGI::POST_MAX =3D $CGI::POST_MAX =3D 4096;
my $color;
my @element_liste =3D (
{
type =3D> "ueberschrift",
name =3D> "Abholdatum/Abholzeit",
color =3D> "#C0C0C4"
},
{
type =3D> 'popup',
bez =3D> 'Tag/Monat/Jahr/ - std:min',
my $date_time =3D> [
{
name =3D> 'day',
value =3D> [
"01", "02", "03", "04", "05", "06", "07", "08",
"09", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31"
]
},
{
name =3D> 'month',
value =3D> [
"Jan", "Feb", "M=E4r", "Apr", "Mai", "Jun",
"Jul", "Aug", "Sep", "Okt", "Nov", "Dez"
]
},
{
name =3D> 'year',
value =3D> [ '2009', '2010', '2011', '2012' ]
}
]
}
);
print header ( -charset =3D> 'utf-8' ),
start_html(
{
-dtd =3D> '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title =3D> 'FormTest',
-style =3D>
{ 'src' =3D> '../style/style.css' }
}
);
my $table_start =3D <<"EOF";
<table width=3D"1006" align=3D"center" border=3D"0" cellspacing=3D"0"
cellpadding=3D"0">
<tr>
<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" rowspan=3D"7">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" colspan=3D"2">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" rowspan=3D"7">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
</tr>
<tr>
<td bgcolor=3D"#EDD4C3" colspan=3D"2">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
</tr>
EOF
my $table_end =3D <<"EOF";
<tr>
<td bgcolor=3D"#EDD4C3" colspan=3D"2" height=3D"3">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
</tr>
<tr>
<td bgcolor=3D"#0E1E3F" colspan=3D"2" height=3D"3">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
</tr>
</table>
EOF
my $aktion =3D lc( param("aktion") );
if ( $aktion eq "" )
{
show_form( \@element_liste );
}
print end_html();
exit(0);
#
----------------------------------------------------------------------
# Eingabe-Formular anzeigen
sub show_form {
my $element_liste_ref =3D shift;
my @zeilen;
print start_form ( -action =3D> url() );
print $table_start;
foreach my $f ( @{$element_liste_ref} ) {
my $type =3D $f->{type};
my $bez =3D $f->{bez};
if ( $type eq "ueberschrift" ) {
$color =3D $f->{color};
my $ueberschrift =3D $f->{name};
push(
@zeilen,
Tr(
{ -bgcolor =3D> "$color" },
td( { -colspan =3D> '2' }, h1($ueberschrift) )
)
);
}
elsif ( $type eq 'popup' and $f->{bez} =3D~ /Monat/ )
{
my @menus;
for (@$date_time) {
push @menus,
popup_menu(
-name =3D> $_->{name},
-value =3D> $_->{value}
);
}
push @zeilen,
Tr( { -bgcolor =3D> "$color" }, td( $f->{bez} ), td
( @menus ) );
}
}
foreach my $zeile (@zeilen) {
print $zeile;
}
print qq(<tr><td bgcolor=3D"#EDD4C3" colspan=3D"2">);
print p( { -align =3D> 'right' },
submit( -name =3D> "aktion", -value =3D> "Absenden" ) );
print "</td></tr>";
print $table_end;
print endform;
print "<p> </p>";
}
__END__
------------------------------
Date: Fri, 4 Sep 2009 00:39:25 -0700 (PDT)
From: Marek <mstep@podiuminternational.org>
Subject: Re: create a form with cgi and a multidimensional array
Message-Id: <de132be0-7809-4a03-ad6e-427d7e3fd1bd@e8g2000yqo.googlegroups.com>
On 2 Sep., 21:33, s...@netherlands.com wrote:
> snip
Hello -sln!
Thank you very much for your smart suggestion and sorry for my late
answer! I was trying to adapt your smart suggestion over two days
now.
Unfortunately in the @menus there is nothing. I boiled down my script
to the following. Perhaps I oversaw something?
Thank you again!
marek
#! /usr/bin/perl
use strict;
use warnings;
use CGI qw(:standard escapeHTML);
use CGI::Carp qw(fatalsToBrowser);
$CGI::DISABLE_UPLOADS =3D $CGI::DISABLE_UPLOADS =3D 1;
$CGI::POST_MAX =3D $CGI::POST_MAX =3D 4096;
my $color;
my @element_liste =3D (
{
type =3D> "ueberschrift",
name =3D> "Abholdatum/Abholzeit",
color =3D> "#C0C0C4"
},
{
type =3D> 'popup',
bez =3D> 'Tag/Monat/Jahr/ - std:min',
my $date_time =3D> [
{
name =3D> 'day',
value =3D> [
"01", "02", "03", "04", "05", "06", "07", "08",
"09", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31"
]
},
{
name =3D> 'month',
value =3D> [
"Jan", "Feb", "M=E4r", "Apr", "Mai", "Jun",
"Jul", "Aug", "Sep", "Okt", "Nov", "Dez"
]
},
{
name =3D> 'year',
value =3D> [ '2009', '2010', '2011', '2012' ]
}
]
}
);
print header ( -charset =3D> 'utf-8' ),
start_html(
{
-dtd =3D> '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title =3D> 'FormTest',
-style =3D>
{ 'src' =3D> '../style/style.css' }
}
);
my $table_start =3D <<"EOF";
<table width=3D"1006" align=3D"center" border=3D"0" cellspacing=3D"0"
cellpadding=3D"0">
<tr>
<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" rowspan=3D"7">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" colspan=3D"2">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" rowspan=3D"7">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
</tr>
<tr>
<td bgcolor=3D"#EDD4C3" colspan=3D"2">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
</tr>
EOF
my $table_end =3D <<"EOF";
<tr>
<td bgcolor=3D"#EDD4C3" colspan=3D"2" height=3D"3">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
</tr>
<tr>
<td bgcolor=3D"#0E1E3F" colspan=3D"2" height=3D"3">
<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
</td>
</tr>
</table>
EOF
my $aktion =3D lc( param("aktion") );
if ( $aktion eq "" )
{
show_form( \@element_liste );
}
print end_html();
exit(0);
#
----------------------------------------------------------------------
# Eingabe-Formular anzeigen
sub show_form {
my $element_liste_ref =3D shift;
my @zeilen;
print start_form ( -action =3D> url() );
print $table_start;
foreach my $f ( @{$element_liste_ref} ) {
my $type =3D $f->{type};
my $bez =3D $f->{bez};
if ( $type eq "ueberschrift" ) {
$color =3D $f->{color};
my $ueberschrift =3D $f->{name};
push(
@zeilen,
Tr(
{ -bgcolor =3D> "$color" },
td( { -colspan =3D> '2' }, h1($ueberschrift) )
)
);
}
elsif ( $type eq 'popup' and $f->{bez} =3D~ /Monat/ )
{
my @menus;
for (@$date_time) {
push @menus,
popup_menu(
-name =3D> $_->{name},
-value =3D> $_->{value}
);
}
push @zeilen,
Tr( { -bgcolor =3D> "$color" }, td( $f->{bez} ), td
( @menus ) );
}
}
foreach my $zeile (@zeilen) {
print $zeile;
}
print qq(<tr><td bgcolor=3D"#EDD4C3" colspan=3D"2">);
print p( { -align =3D> 'right' },
submit( -name =3D> "aktion", -value =3D> "Absenden" ) );
print "</td></tr>";
print $table_end;
print endform;
print "<p> </p>";
}
__END__
------------------------------
Date: Fri, 04 Sep 2009 07:44:33 +0200
From: Mart van de Wege <mvdwege@mail.com>
Subject: Re: Data cleaning issue involving bad wide characters in what ought to be ascii data
Message-Id: <86d467w9fi.fsf@gareth.avalon.lan>
Jürgen Exner <jurgenex@hotmail.com> writes:
> Ted Byers <r.ted.byers@gmail.com> wrote:
>>I thought I'd have to resort to a regex, if I could figure out what to
>>scan for, but if there is a perl package that will make it easier to
>>deal with this odd character, great.
>
> Forgot to mention:
> There is Text::Iconv (see
> http://search.cpan.org/~mpiotr/Text-Iconv-1.7/Iconv.pm) which will
> convert text between different encodings. However I have no idea what it
> does with characters that do not exist in the target character set.
>
If it uses iconv, or works the same as iconv, it'll drop them.
Mart
--
"We will need a longer wall when the revolution comes."
--- AJS, quoting an uncertain source.
------------------------------
Date: Thu, 03 Sep 2009 16:07:07 -0700
From: sln@netherlands.com
Subject: Re: Data cleaning issue involving bad wide characters in what ought to be ascii data
Message-Id: <t4i0a551jfa4kn02k6lfn08konvepkikdc@4ax.com>
On Thu, 3 Sep 2009 07:10:36 -0700 (PDT), Ted Byers <r.ted.byers@gmail.com> wrote:
>Again, I am trying to automatically process data I receive by email,
>so I have no control over the data that is coming in.
>
>The data is supposed to be plain text/HTML, but there are quite a
>number of records where the contraction "rec'd" is misrepresented when
>written to standard out as "Rec\342\200\231d"
>
>When the data is written to a file, these characters are represented
>by the character ' when it is opened using notepad, but by the string
>'’' when it is opened by open office.
>
>So how do I tell what character it is when in three different contexts
>it is displayed in three different ways? How can I make certain that
>when I either print it or store it in my DB, I get the correct
>"rec'd" (or, better, "received")?
>
>I suspect a minor glitch in the software that makes and send the email
>as this is the ONLY string where what ought to be an ascii ' character
>is identified as a wide character. Regardless of how that happens (as
>I don't control that), I need to clean this. And it gets confusing
>when different applications handle the i18n differently (Notepad is
>undoubtedly using the OS i18n support and Open Office is handling it
>differently, and Emacs is doing it differently from both).
>
>A little enlightenment would be appreciated.
>
>Thanks
>
>Ted
What you have there is encoded utf-9 character with
code point \x{2019}.
It is NOT an ascii single quote, rather a Unicode curly
single quote (right). See this table and this web site:
copyright sign 00A9 \u00A9
registered sign 00AE \u00AE
trademark sign 2122 \u2122
em-dash 2014 \u2014
euro sign 20AC \u20AC
curly single quotation mark (left) 2018 \u2018
curly single quotation mark (right) 2019 \u2019
curly double quotation mark (left) 201C \u201C
curly double quotation mark (right) 201D \u201D
http://moock.org/asdg/technotes/usingSpecialCharacters/
By the way it displays fine in Notepad and Word, it is
not ascii, so you need a font and an app that can display
utf-8 characters.
If you want to convert these special characters, use a regex
to strip them from your system.
First before you do that, apparently, the embeddeding is done
in raw octets 'Rec\342\200\231d' that need to be decoded into
utf-8, then you can use code points in the regex.
You can strip these after you decode. Something like this:
$str = decode ('utf8', "your recieved string"); # utf-8 octets
$str =~ s/\x{2018}/'/g;
$str =~ s/\x{2019}/'/g;
$str =~ s/\x{201C}/"/g;
$str =~ s/\x{201D}/"/g;
etc, ...
Find a more efficient way to do the substitutions though.
See below for an example.
-sln
===========================
use strict;
use warnings;
use Encode;
my $str = decode ('utf8', "Rec\342\200\231d"); # utf-8 octets
my $data = "Rec\x{2019}d"; # Unicode Code Point
if ($str eq $data) {
print "yes thier equal\n";
}
open my $fh, '>', 'chr1.txt' or die "can't open chr1.txt: $!";
print $fh $data;
exit;
sub ordsplit
{
my $string = shift;
my $buf = '';
for (map {ord $_} split //, $string) {
$buf.= sprintf ("%c %02x ",$_,$_);
}
return $buf;
}
__END__
------------------------------
Date: Thu, 03 Sep 2009 17:22:38 -0700
From: sln@netherlands.com
Subject: Re: Data cleaning issue involving bad wide characters in what ought to be ascii data
Message-Id: <v8n0a5lbch5mv3qr76ansgje8kfdqsb5m1@4ax.com>
On Thu, 03 Sep 2009 16:07:07 -0700, sln@netherlands.com wrote:
>On Thu, 3 Sep 2009 07:10:36 -0700 (PDT), Ted Byers <r.ted.byers@gmail.com> wrote:
>
>You can strip these after you decode. Something like this:
>
>$str = decode ('utf8', "your recieved string"); # utf-8 octets
>$str =~ s/\x{2018}/'/g;
>$str =~ s/\x{2019}/'/g;
>$str =~ s/\x{201C}/"/g;
>$str =~ s/\x{201D}/"/g;
>
>etc, ...
>
-sln
------------------
use strict;
use warnings;
use Encode;
binmode (STDOUT, ':utf8');
my $str = decode ('utf8', "Rec\342\200\231d"); # utf8 octets
my $data = "Rec\x{2019}d"; # Unicode Code Point
if ($str eq $data) {
print "yes thier equal\n";
}
print ordsplit($data),"\n";
# Substitute select Unicode to ascii equivalent
my %unisub = (
"\x{2018}" => "'",
"\x{2019}" => "'",
"\x{201C}" => '"',
"\x{201D}" => '"',
);
$str =~ s/$_/$unisub{$_}/ge for keys (%unisub);
print $str,"\n";
# OR -- Substitute all Unicode code points, 100 - 1fffff with ? character
$data =~ s/[\x{100}-\x{1fffff}]/?/g;
print $data,"\n";
exit;
sub ordsplit {
my $string = shift;
my $buf = '';
for (map {ord $_} split //, $string) {
$buf.= sprintf ("%c %02x ",$_,$_);
}
return $buf;
}
__END__
output:
yes thier equal
R 52 e 65 c 63 GÇÖ 2019 d 64
Rec'd
Rec?d
------------------------------
Date: Fri, 04 Sep 2009 10:00:01 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 2.4 I copied the perl binary from one machine to another, but scripts don't work.
Message-Id: <By5om.2910$uF2.966@newsfe03.iad>
This is an excerpt from the latest version perlfaq2.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
2.4: I copied the perl binary from one machine to another, but scripts don't work.
That's probably because you forgot libraries, or library paths differ.
You really should build the whole distribution on the machine it will
eventually live on, and then type "make install". Most other approaches
are doomed to failure.
One simple way to check that things are in the right place is to print
out the hard-coded @INC that perl looks through for libraries:
% perl -le 'print for @INC'
If this command lists any paths that don't exist on your system, then
you may need to move the appropriate libraries to these locations, or
create symbolic links, aliases, or shortcuts appropriately. @INC is also
printed as part of the output of
% perl -V
You might also want to check out "How do I keep my own module/library
directory?" in perlfaq8.
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
------------------------------
Date: Fri, 04 Sep 2009 04:00:03 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 2.7 Is there an ISO or ANSI certified version of Perl?
Message-Id: <7h0om.18366$Y83.2480@newsfe21.iad>
This is an excerpt from the latest version perlfaq2.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
2.7: Is there an ISO or ANSI certified version of Perl?
Certainly not. Larry expects that he'll be certified before Perl is.
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
------------------------------
Date: Thu, 03 Sep 2009 22:00:03 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 4.22 How do I expand function calls in a string?
Message-Id: <D%Wnm.135502$sC1.7764@newsfe17.iad>
This is an excerpt from the latest version perlfaq4.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
4.22: How do I expand function calls in a string?
(contributed by brian d foy)
This is documented in perlref, and although it's not the easiest thing
to read, it does work. In each of these examples, we call the function
inside the braces used to dereference a reference. If we have more than
one return value, we can construct and dereference an anonymous array.
In this case, we call the function in list context.
print "The time values are @{ [localtime] }.\n";
If we want to call the function in scalar context, we have to do a bit
more work. We can really have any code we like inside the braces, so we
simply have to end with the scalar reference, although how you do that
is up to you, and you can use code inside the braces. Note that the use
of parens creates a list context, so we need "scalar" to force the
scalar context on the function:
print "The time is ${\(scalar localtime)}.\n"
print "The time is ${ my $x = localtime; \$x }.\n";
If your function already returns a reference, you don't need to create
the reference yourself.
sub timestamp { my $t = localtime; \$t }
print "The time is ${ timestamp() }.\n";
The "Interpolation" module can also do a lot of magic for you. You can
specify a variable name, in this case "E", to set up a tied hash that
does the interpolation for you. It has several other methods to do this
as well.
use Interpolation E => 'eval';
print "The time values are $E{localtime()}.\n";
In most cases, it is probably easier to simply use string concatenation,
which also forces scalar context.
print "The time is " . localtime() . ".\n";
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2582
***************************************