[31337] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2582 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Sep 4 06:09:41 2009

Date: Fri, 4 Sep 2009 03:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 4 Sep 2009     Volume: 11 Number: 2582

Today's topics:
    Re: command perl - SR (Tim McDaniel)
    Re: command perl - SR sln@netherlands.com
    Re: command perl - SR sln@netherlands.com
    Re: command perl - SR sln@netherlands.com
    Re: create a form with cgi and a multidimensional array sln@netherlands.com
    Re: create a form with cgi and a multidimensional array <mstep@podiuminternational.org>
    Re: create a form with cgi and a multidimensional array <mstep@podiuminternational.org>
    Re: Data cleaning issue involving bad wide characters i <mvdwege@mail.com>
    Re: Data cleaning issue involving bad wide characters i sln@netherlands.com
    Re: Data cleaning issue involving bad wide characters i sln@netherlands.com
        FAQ 2.4 I copied the perl binary from one machine to an <brian@theperlreview.com>
        FAQ 2.7 Is there an ISO or ANSI certified version of Pe <brian@theperlreview.com>
        FAQ 4.22 How do I expand function calls in a string? <brian@theperlreview.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 3 Sep 2009 20:33:26 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: command perl - SR
Message-Id: <h7p96m$ft5$1@reader1.panix.com>

In article <44c40c9e-fd7f-44f4-b5d3-8991ac52fcc7@j9g2000vbp.googlegroups.com>,
fred  <fred78980@yahoo.com> wrote:
>Would like to replace my third field (empty) for each five lines by
>foo. This command is not correct. Can you help me fix it ?
>
>!perl -pe 's(&&)($n++ = 5 ? (&foo&)eg' text.txt

I'm afraid that your syntax and semantics have enough wrong that it
would take a while to explain.

I think that approach is barking up the wrong tree anyway.  It looks
more like array element manipulation than string processing.  I think
it's easier to use split(), either explicitly or implicitly, to split
the input into an array, and then just manipulate $field[2].

You didn't explicitly specify what to do
- if the third field is not empty.
  I gather that you would want it to be left alone.
- if there is no third field.
  I assume that no third field is to be added.
and the test data didn't include it such test cases.

I also include a test case to make sure that the right number of empty
fields (other than the third) are preserved.

perl '-F&' -wape 'use strict;
    if (@F > 2 && $F[2] eq "") {
        $F[2] = "foo"; $_ = join("&", @F);
    }' 093.txt

applied to

xxxx&(ght)(hgf)&&(yyt)
xx9x&(gg)(ff)&&(yyt)
oixxx&(hfd)(jj)&&(yyt)
xxxx&(jj)(kk)&&(yyt)
xjhxxx&(jj)(j)&&(yyt)
xjhxxx&(jj)(j)&NO FOO HERE&(yyt)
only_one_field
two&fields
&null&&fields&5 more&&&&&

produces

xxxx&(ght)(hgf)&foo&(yyt)
xx9x&(gg)(ff)&foo&(yyt)
oixxx&(hfd)(jj)&foo&(yyt)
xxxx&(jj)(kk)&foo&(yyt)
xjhxxx&(jj)(j)&foo&(yyt)
xjhxxx&(jj)(j)&NO FOO HERE&(yyt)
only_one_field
two&fields
&null&foo&fields&5 more&&&&&


(I don't know a variable to use instead of the repeated "&".)

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Thu, 03 Sep 2009 14:26:34 -0700
From: sln@netherlands.com
Subject: Re: command perl - SR
Message-Id: <puc0a5tg3ft5b7mhm0fel7l94t8v9akhkc@4ax.com>

On Thu, 3 Sep 2009 10:55:18 -0700 (PDT), fred <fred78980@yahoo.com> wrote:

>Would like to replace my third field (empty) for each five lines by
>foo. This command is not correct. Can you help me fix it ?
>
>!perl -pe 's(&&)($n++ = 5 ? (&foo&)eg' text.txt
>
>xxxx&(ght)(hgf)&&(yyt)
>xx9x&(gg)(ff)&&(yyt)
>oixxx&(hfd)(jj)&&(yyt)
>xxxx&(jj)(kk)&&(yyt)
>xjhxxx&(jj)(j)&&(yyt)
>
>
>Thanks

perl -pe 's/^(.+&.+&)(&.+)$/$1foo$2/' text.txt
 or
perl -pe 's/^((?:.+&){2})(&.+)$/$1foo$2/' text.txt

-sln


------------------------------

Date: Thu, 03 Sep 2009 14:49:58 -0700
From: sln@netherlands.com
Subject: Re: command perl - SR
Message-Id: <0ae0a595kqa0s51kpu738k020s3aoe5djv@4ax.com>

On Thu, 3 Sep 2009 10:55:18 -0700 (PDT), fred <fred78980@yahoo.com> wrote:

>Would like to replace my third field (empty) for each five lines by
>foo. This command is not correct. Can you help me fix it ?
>
>!perl -pe 's(&&)($n++ = 5 ? (&foo&)eg' text.txt
>
>xxxx&(ght)(hgf)&&(yyt)
>xx9x&(gg)(ff)&&(yyt)
>oixxx&(hfd)(jj)&&(yyt)
>xxxx&(jj)(kk)&&(yyt)
>xjhxxx&(jj)(j)&&(yyt)
>
>
>Thanks

If only the first 5 lines:

  perl -0 -pe 's/^(.+&.+&)(&.+)/++$n<6 ? $1.foo.$2 : $1.$2/mge' text.txt
or
  perl -0 -pe 's/^((?:.+&){2})(&.+)/++$n<6 ? $1.foo.$2 : $1.$2/mge' text.txt

-sln


------------------------------

Date: Thu, 03 Sep 2009 15:07:47 -0700
From: sln@netherlands.com
Subject: Re: command perl - SR
Message-Id: <0ef0a5d06be5c9dnr676l7vhe39ji8c73d@4ax.com>

On Thu, 3 Sep 2009 10:55:18 -0700 (PDT), fred <fred78980@yahoo.com> wrote:

>Would like to replace my third field (empty) for each five lines by
>foo. This command is not correct. Can you help me fix it ?
>
>!perl -pe 's(&&)($n++ = 5 ? (&foo&)eg' text.txt
>
>xxxx&(ght)(hgf)&&(yyt)
>xx9x&(gg)(ff)&&(yyt)
>oixxx&(hfd)(jj)&&(yyt)
>xxxx&(jj)(kk)&&(yyt)
>xjhxxx&(jj)(j)&&(yyt)
>
>
>Thanks

Or, to fix what you have (untested):

  perl -0 -pe 's/(&&)/++$n < 6 ? "&foo&" : $1/eg' text.txt
or
  perl -0 -pe 's(&&)(++$n < 6 ? "&foo&" : "&&")eg' text.txt

-sln


------------------------------

Date: Fri, 04 Sep 2009 03:01:09 -0700
From: sln@netherlands.com
Subject: Re: create a form with cgi and a multidimensional array (0/1)
Message-Id: <jgo1a5tmldp9grvfhmagv228ahvs52idag@4ax.com>

On Fri, 4 Sep 2009 00:39:25 -0700 (PDT), Marek <mstep@podiuminternational.org> wrote:

>On 2 Sep., 21:33, s...@netherlands.com wrote:
>> snip
>
>
>Hello -sln!
>
>
>
>Thank you very much for your smart suggestion and sorry for my late
>answer! I was trying to adapt your smart suggestion over two days
>now.
>
>Unfortunately in the @menus there is nothing. I boiled down my script
>to the following. Perhaps I oversaw something?
>

My fault, I rushed it. It is fixed below, and it runs on my
machine. You may need to cleanup the output format
of indentation on the cgi generated stuff.

Thanks, it was fun and I learned some details on page generation.
Maybe I will put some more time into it.
Let me know how it works out.

-sln
========================
use strict;
use warnings;
use CGI qw(:standard escapeHTML);
use CGI::Carp qw(fatalsToBrowser);

$CGI::DISABLE_UPLOADS = $CGI::DISABLE_UPLOADS = 1;
$CGI::POST_MAX        = $CGI::POST_MAX        = 4096;



my $color;

my @element_liste = (
    {
        type  => "ueberschrift",
        name  => "Abholdatum/Abholzeit",
        color => "#C0C0C4"
    },
    {
        type          => 'popup',
        bez           => 'Tag/Monat/Jahr/ - std:min',
        date_time => [
            {
                name  => 'day',
                value => [
                    "01", "02", "03", "04", "05", "06", "07", "08",
                    "09", "10", "11", "12", "13", "14", "15", "16",
                    "17", "18", "19", "20", "21", "22", "23", "24",
                    "25", "26", "27", "28", "29", "30", "31"
                ]
            },
            {
                name  => 'month',
                value => [
                    "Jan", "Feb", "Mär", "Apr", "Mai", "Jun",
                    "Jul", "Aug", "Sep",  "Okt", "Nov", "Dez"
                ]
            },
            {
                name  => 'year',
                value => [ '2009', '2010', '2011', '2012' ]
            }
        ]
    }
);

print header ( -charset => 'utf-8' ),
  start_html(
    {
        -dtd   => '-//W3C//DTD XHTML 1.0 Transitional//EN',
        -title => 'FormTest',
        -style =>
          { 'src' => '../style/style.css' }
    }
  );

my $table_start = <<"EOF";
	<table width="1006" align="center" border="0" cellspacing="0" cellpadding="0">
		<tr>
			<td align="left" valign="middle" bgcolor="#0E1E3F" rowspan="7">
				<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
			</td>
			<td align="left" valign="middle" bgcolor="#0E1E3F" colspan="2">
				<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
			</td>
			<td align="left" valign="middle" bgcolor="#0E1E3F" rowspan="7">
				<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
			</td>
		</tr>
		<tr>
			<td bgcolor="#EDD4C3" colspan="2">
				<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
			</td>
		</tr>
EOF

my $table_end = <<"EOF";
		<tr>
			<td bgcolor="#EDD4C3" colspan="2" height="3">
				<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
			</td>
		</tr>
		<tr>
			<td bgcolor="#0E1E3F" colspan="2" height="3">
				<img src="../pix/grafix/transparent.gif" alt="" width="3" height="3" />
			</td>
		</tr>
	</table>
EOF



my $aktion = lc( param("aktion") );
if ( $aktion eq "" )
{
    show_form( \@element_liste );
}


print end_html();

exit(0);


#
#-------------------------------------
# Eingabe-Formular anzeigen

sub show_form {
    my $element_liste_ref = shift;
    my @zeilen;

    print start_form ( -action => url() );
    print $table_start;

    foreach my $f ( @{$element_liste_ref} ) {
        my $type = $f->{type};
        my $bez  = $f->{bez};
        if ( $type eq "ueberschrift" )
        {
            $color = $f->{color};
            my $ueberschrift = $f->{name};
            push @zeilen,
                Tr(
                    { -bgcolor => "$color" },
                    td( { -colspan => '2' }, h1($ueberschrift) )
                  );
        }
        elsif ( $type eq 'popup' and $f->{bez} =~ /Monat/ )
        {
            my @menus = ();
            for (@{$f->{date_time}}) {
                push @menus,
                  popup_menu(
                    -name  => $_->{name},
                    -value => $_->{value}
                  );
            }
            push @zeilen,
              Tr(
                  { -bgcolor => "$color" },
                  td( {-align => 'center' }, $f->{bez} ),
                  td( {-align => 'left' }, @menus )
                );
             # or, any of these works just fine ...
             #
             # Tr(
             #     { -bgcolor => "$color", -align => 'center' },
             #     td( { -colspan => '2' }, $f->{bez}, @menus )   # <- this uses one td()
             #   );
             #
             # Tr(
             #     { -bgcolor => "$color" },
             #     td( $f->{bez} ), td( @menus )
             #   );
        }
    }

    foreach my $zeile (@zeilen) {
        print $zeile;
    }
    print qq(<tr><td bgcolor="#EDD4C3" colspan="2">);
    print p( { -align => 'right' },
        submit( -name => "aktion", -value => "Absenden" ) );
    print "</td></tr>";
    print $table_end;
    print endform;
    print "<p>&nbsp;</p>";

}
__END__




------------------------------

Date: Fri, 4 Sep 2009 00:17:12 -0700 (PDT)
From: Marek <mstep@podiuminternational.org>
Subject: Re: create a form with cgi and a multidimensional array
Message-Id: <1b860736-c4be-44af-969b-76be57a9032e@38g2000yqr.googlegroups.com>

On 2 Sep., 21:33, s...@netherlands.com wrote:
> snip


Hello -sln!



Thank you very much for your smart suggestion and sorry for my late
answer! I was trying to adapt your smart suggestion over two days
now.

Unfortunately in the @menus there is nothing. I boiled down my script
to the following. Perhaps I oversaw something?


Thank you again!


marek


#! /usr/bin/perl

use strict;
use warnings;
use CGI qw(:standard escapeHTML);
use CGI::Carp qw(fatalsToBrowser);

$CGI::DISABLE_UPLOADS =3D $CGI::DISABLE_UPLOADS =3D 1;
$CGI::POST_MAX        =3D $CGI::POST_MAX        =3D 4096;



my $color;

my @element_liste =3D (
    {
        type  =3D> "ueberschrift",
        name  =3D> "Abholdatum/Abholzeit",
        color =3D> "#C0C0C4"
    },
    {
        type          =3D> 'popup',
        bez           =3D> 'Tag/Monat/Jahr/ - std:min',
        my $date_time =3D> [
            {
                name  =3D> 'day',
                value =3D> [
                    "01", "02", "03", "04", "05", "06", "07", "08",
                    "09", "10", "11", "12", "13", "14", "15", "16",
                    "17", "18", "19", "20", "21", "22", "23", "24",
                    "25", "26", "27", "28", "29", "30", "31"
                ]
            },
            {
                name  =3D> 'month',
                value =3D> [
                    "Jan", "Feb", "M=E4r", "Apr", "Mai", "Jun",
                    "Jul", "Aug", "Sep",  "Okt", "Nov", "Dez"
                ]
            },
            {
                name  =3D> 'year',
                value =3D> [ '2009', '2010', '2011', '2012' ]
            }
        ]
    }
);

print header ( -charset =3D> 'utf-8' ),
  start_html(
    {
        -dtd   =3D> '-//W3C//DTD XHTML 1.0 Transitional//EN',
        -title =3D> 'FormTest',
        -style =3D>
          { 'src' =3D> '../style/style.css' }
    }
  );

my $table_start =3D <<"EOF";
	<table width=3D"1006" align=3D"center" border=3D"0" cellspacing=3D"0"
cellpadding=3D"0">
		<tr>
			<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" rowspan=3D"7">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
			<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" colspan=3D"2">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
			<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" rowspan=3D"7">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
		</tr>
		<tr>
			<td bgcolor=3D"#EDD4C3" colspan=3D"2">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
		</tr>
EOF

my $table_end =3D <<"EOF";
		<tr>
			<td bgcolor=3D"#EDD4C3" colspan=3D"2" height=3D"3">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
		</tr>
		<tr>
			<td bgcolor=3D"#0E1E3F" colspan=3D"2" height=3D"3">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
		</tr>
	</table>
EOF



my $aktion =3D lc( param("aktion") );
if ( $aktion eq "" )
{
    show_form( \@element_liste );
}


print end_html();

exit(0);

#
----------------------------------------------------------------------
# Eingabe-Formular anzeigen

sub show_form {
    my $element_liste_ref =3D shift;
    my @zeilen;

    print start_form ( -action =3D> url() );
    print $table_start;

    foreach my $f ( @{$element_liste_ref} ) {
        my $type =3D $f->{type};
        my $bez  =3D $f->{bez};
        if ( $type eq "ueberschrift" ) {
            $color =3D $f->{color};
            my $ueberschrift =3D $f->{name};
            push(
                @zeilen,
                Tr(
                    { -bgcolor =3D> "$color" },
                    td( { -colspan =3D> '2' }, h1($ueberschrift) )
                )
            );
        }
        elsif ( $type eq 'popup' and $f->{bez} =3D~ /Monat/ )
        {
            my @menus;
            for (@$date_time) {
                push @menus,
                  popup_menu(
                    -name  =3D> $_->{name},
                    -value =3D> $_->{value}
                  );
            }
            push @zeilen,
              Tr( { -bgcolor =3D> "$color" }, td( $f->{bez} ), td
( @menus ) );
        }
    }

    foreach my $zeile (@zeilen) {
        print $zeile;
    }
    print qq(<tr><td bgcolor=3D"#EDD4C3" colspan=3D"2">);
    print p( { -align =3D> 'right' },
        submit( -name =3D> "aktion", -value =3D> "Absenden" ) );
    print "</td></tr>";
    print $table_end;
    print endform;
    print "<p>&nbsp;</p>";

}


__END__


------------------------------

Date: Fri, 4 Sep 2009 00:39:25 -0700 (PDT)
From: Marek <mstep@podiuminternational.org>
Subject: Re: create a form with cgi and a multidimensional array
Message-Id: <de132be0-7809-4a03-ad6e-427d7e3fd1bd@e8g2000yqo.googlegroups.com>

On 2 Sep., 21:33, s...@netherlands.com wrote:
> snip


Hello -sln!



Thank you very much for your smart suggestion and sorry for my late
answer! I was trying to adapt your smart suggestion over two days
now.

Unfortunately in the @menus there is nothing. I boiled down my script
to the following. Perhaps I oversaw something?


Thank you again!


marek


#! /usr/bin/perl

use strict;
use warnings;
use CGI qw(:standard escapeHTML);
use CGI::Carp qw(fatalsToBrowser);

$CGI::DISABLE_UPLOADS =3D $CGI::DISABLE_UPLOADS =3D 1;
$CGI::POST_MAX        =3D $CGI::POST_MAX        =3D 4096;



my $color;

my @element_liste =3D (
    {
        type  =3D> "ueberschrift",
        name  =3D> "Abholdatum/Abholzeit",
        color =3D> "#C0C0C4"
    },
    {
        type          =3D> 'popup',
        bez           =3D> 'Tag/Monat/Jahr/ - std:min',
        my $date_time =3D> [
            {
                name  =3D> 'day',
                value =3D> [
                    "01", "02", "03", "04", "05", "06", "07", "08",
                    "09", "10", "11", "12", "13", "14", "15", "16",
                    "17", "18", "19", "20", "21", "22", "23", "24",
                    "25", "26", "27", "28", "29", "30", "31"
                ]
            },
            {
                name  =3D> 'month',
                value =3D> [
                    "Jan", "Feb", "M=E4r", "Apr", "Mai", "Jun",
                    "Jul", "Aug", "Sep",  "Okt", "Nov", "Dez"
                ]
            },
            {
                name  =3D> 'year',
                value =3D> [ '2009', '2010', '2011', '2012' ]
            }
        ]
    }
);

print header ( -charset =3D> 'utf-8' ),
  start_html(
    {
        -dtd   =3D> '-//W3C//DTD XHTML 1.0 Transitional//EN',
        -title =3D> 'FormTest',
        -style =3D>
          { 'src' =3D> '../style/style.css' }
    }
  );

my $table_start =3D <<"EOF";
	<table width=3D"1006" align=3D"center" border=3D"0" cellspacing=3D"0"
cellpadding=3D"0">
		<tr>
			<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" rowspan=3D"7">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
			<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" colspan=3D"2">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
			<td align=3D"left" valign=3D"middle" bgcolor=3D"#0E1E3F" rowspan=3D"7">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
		</tr>
		<tr>
			<td bgcolor=3D"#EDD4C3" colspan=3D"2">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
		</tr>
EOF

my $table_end =3D <<"EOF";
		<tr>
			<td bgcolor=3D"#EDD4C3" colspan=3D"2" height=3D"3">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
		</tr>
		<tr>
			<td bgcolor=3D"#0E1E3F" colspan=3D"2" height=3D"3">
				<img src=3D"../pix/grafix/transparent.gif" alt=3D"" width=3D"3"
height=3D"3" />
			</td>
		</tr>
	</table>
EOF



my $aktion =3D lc( param("aktion") );
if ( $aktion eq "" )
{
    show_form( \@element_liste );
}


print end_html();

exit(0);

#
----------------------------------------------------------------------
# Eingabe-Formular anzeigen

sub show_form {
    my $element_liste_ref =3D shift;
    my @zeilen;

    print start_form ( -action =3D> url() );
    print $table_start;

    foreach my $f ( @{$element_liste_ref} ) {
        my $type =3D $f->{type};
        my $bez  =3D $f->{bez};
        if ( $type eq "ueberschrift" ) {
            $color =3D $f->{color};
            my $ueberschrift =3D $f->{name};
            push(
                @zeilen,
                Tr(
                    { -bgcolor =3D> "$color" },
                    td( { -colspan =3D> '2' }, h1($ueberschrift) )
                )
            );
        }
        elsif ( $type eq 'popup' and $f->{bez} =3D~ /Monat/ )
        {
            my @menus;
            for (@$date_time) {
                push @menus,
                  popup_menu(
                    -name  =3D> $_->{name},
                    -value =3D> $_->{value}
                  );
            }
            push @zeilen,
              Tr( { -bgcolor =3D> "$color" }, td( $f->{bez} ), td
( @menus ) );
        }
    }

    foreach my $zeile (@zeilen) {
        print $zeile;
    }
    print qq(<tr><td bgcolor=3D"#EDD4C3" colspan=3D"2">);
    print p( { -align =3D> 'right' },
        submit( -name =3D> "aktion", -value =3D> "Absenden" ) );
    print "</td></tr>";
    print $table_end;
    print endform;
    print "<p>&nbsp;</p>";

}


__END__


------------------------------

Date: Fri, 04 Sep 2009 07:44:33 +0200
From: Mart van de Wege <mvdwege@mail.com>
Subject: Re: Data cleaning issue involving bad wide characters in what ought  to be ascii data
Message-Id: <86d467w9fi.fsf@gareth.avalon.lan>

Jürgen Exner <jurgenex@hotmail.com> writes:

> Ted Byers <r.ted.byers@gmail.com> wrote:
>>I thought I'd have to resort to a regex, if I could figure out what to
>>scan for, but if there is a perl package that will make it easier to
>>deal with this odd character, great.
>
> Forgot to mention:
> There is Text::Iconv (see
> http://search.cpan.org/~mpiotr/Text-Iconv-1.7/Iconv.pm) which will
> convert text between different encodings. However I have no idea what it
> does with characters that do not exist in the target character set.
>
If it uses iconv, or works the same as iconv, it'll drop them.

Mart

-- 
"We will need a longer wall when the revolution comes."
--- AJS, quoting an uncertain source.


------------------------------

Date: Thu, 03 Sep 2009 16:07:07 -0700
From: sln@netherlands.com
Subject: Re: Data cleaning issue involving bad wide characters in what ought to be  ascii data
Message-Id: <t4i0a551jfa4kn02k6lfn08konvepkikdc@4ax.com>

On Thu, 3 Sep 2009 07:10:36 -0700 (PDT), Ted Byers <r.ted.byers@gmail.com> wrote:

>Again, I am trying to automatically process data I receive by email,
>so I have no control over the data that is coming in.
>
>The data is supposed to be plain text/HTML, but there are quite a
>number of records where the contraction "rec'd" is misrepresented when
>written to standard out as "Rec\342\200\231d"
>
>When the data is written to a file, these characters are represented
>by the character ' when it is opened using notepad, but by the string
>'’' when it is opened by open office.
>
>So how do I tell what character it is when in three different contexts
>it is displayed in three different ways?  How can I make certain that
>when I either print it or store it in my DB, I get the correct
>"rec'd" (or, better, "received")?
>
>I suspect a minor glitch in the software that makes and send the email
>as this is the ONLY string where what ought to be an ascii ' character
>is identified as a wide character.  Regardless of how that happens (as
>I don't control that), I need to clean this.  And it gets confusing
>when different applications handle the i18n differently (Notepad is
>undoubtedly using the OS i18n support and Open Office is handling it
>differently, and Emacs is doing it differently from both).
>
>A little enlightenment would be appreciated.
>
>Thanks
>
>Ted


What you have there is encoded utf-9 character with
code point \x{2019}.

It is NOT an ascii single quote, rather a Unicode curly
single quote (right). See this table and this web site:

copyright sign                       00A9    \u00A9
registered sign                      00AE    \u00AE
trademark sign                       2122    \u2122
em-dash                              2014    \u2014
euro sign                            20AC    \u20AC
curly single quotation mark (left)   2018    \u2018
curly single quotation mark (right)  2019    \u2019
curly double quotation mark (left)   201C    \u201C
curly double quotation mark (right)  201D    \u201D

http://moock.org/asdg/technotes/usingSpecialCharacters/

By the way it displays fine in Notepad and Word, it is 
not ascii, so you need a font and an app that can display 
utf-8 characters.

If you want to convert these special characters, use a regex
to strip them from your system.

First before you do that, apparently, the embeddeding is done
in raw octets 'Rec\342\200\231d' that need to be decoded into
utf-8, then you can use code points in the regex.

You can strip these after you decode. Something like this:

$str = decode ('utf8', "your recieved string"); # utf-8 octets
$str =~ s/\x{2018}/'/g;
$str =~ s/\x{2019}/'/g;
$str =~ s/\x{201C}/"/g;
$str =~ s/\x{201D}/"/g;

etc, ...

Find a more efficient way to do the substitutions though.

See below for an example.
-sln
===========================
use strict;
use warnings;
use Encode;

my $str = decode ('utf8', "Rec\342\200\231d"); # utf-8 octets

my $data  = "Rec\x{2019}d"; # Unicode Code Point

if ($str eq $data) {
	print "yes thier equal\n";
}
open my $fh, '>', 'chr1.txt' or die "can't open chr1.txt: $!";

print $fh $data;
exit;

sub ordsplit
{ 
	my $string = shift;
	my $buf = '';
	for (map {ord $_} split //, $string) {
		$buf.= sprintf ("%c %02x  ",$_,$_);
	}
	return $buf;
}
__END__







------------------------------

Date: Thu, 03 Sep 2009 17:22:38 -0700
From: sln@netherlands.com
Subject: Re: Data cleaning issue involving bad wide characters in what ought to be  ascii data
Message-Id: <v8n0a5lbch5mv3qr76ansgje8kfdqsb5m1@4ax.com>

On Thu, 03 Sep 2009 16:07:07 -0700, sln@netherlands.com wrote:

>On Thu, 3 Sep 2009 07:10:36 -0700 (PDT), Ted Byers <r.ted.byers@gmail.com> wrote:
>
>You can strip these after you decode. Something like this:
>
>$str = decode ('utf8', "your recieved string"); # utf-8 octets
>$str =~ s/\x{2018}/'/g;
>$str =~ s/\x{2019}/'/g;
>$str =~ s/\x{201C}/"/g;
>$str =~ s/\x{201D}/"/g;
>
>etc, ...
>
-sln
------------------
use strict;
use warnings;
use Encode;

binmode (STDOUT, ':utf8');

my $str = decode ('utf8', "Rec\342\200\231d"); # utf8 octets
my $data  = "Rec\x{2019}d"; # Unicode Code Point

if ($str eq $data) {
	print "yes thier equal\n";
}
print ordsplit($data),"\n";

# Substitute select Unicode to ascii equivalent
my %unisub = (
"\x{2018}" => "'",
"\x{2019}" => "'",
"\x{201C}" => '"',
"\x{201D}" => '"',
);   
$str =~ s/$_/$unisub{$_}/ge for keys (%unisub);
print $str,"\n";

# OR -- Substitute all Unicode code points, 100 - 1fffff with ? character
$data =~ s/[\x{100}-\x{1fffff}]/?/g;
print $data,"\n";

exit;

sub ordsplit { 
	my $string = shift;
	my $buf = '';
	for (map {ord $_} split //, $string) {
		$buf.= sprintf ("%c %02x  ",$_,$_);
	}
	return $buf;
}
__END__

output:

yes thier equal
R 52  e 65  c 63  GÇÖ 2019  d 64
Rec'd
Rec?d



------------------------------

Date: Fri, 04 Sep 2009 10:00:01 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 2.4 I copied the perl binary from one machine to another, but scripts don't work.
Message-Id: <By5om.2910$uF2.966@newsfe03.iad>

This is an excerpt from the latest version perlfaq2.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

2.4: I copied the perl binary from one machine to another, but scripts don't work.

    That's probably because you forgot libraries, or library paths differ.
    You really should build the whole distribution on the machine it will
    eventually live on, and then type "make install". Most other approaches
    are doomed to failure.

    One simple way to check that things are in the right place is to print
    out the hard-coded @INC that perl looks through for libraries:

        % perl -le 'print for @INC'

    If this command lists any paths that don't exist on your system, then
    you may need to move the appropriate libraries to these locations, or
    create symbolic links, aliases, or shortcuts appropriately. @INC is also
    printed as part of the output of

        % perl -V

    You might also want to check out "How do I keep my own module/library
    directory?" in perlfaq8.



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: Fri, 04 Sep 2009 04:00:03 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 2.7 Is there an ISO or ANSI certified version of Perl?
Message-Id: <7h0om.18366$Y83.2480@newsfe21.iad>

This is an excerpt from the latest version perlfaq2.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

2.7: Is there an ISO or ANSI certified version of Perl?

    Certainly not. Larry expects that he'll be certified before Perl is.



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: Thu, 03 Sep 2009 22:00:03 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 4.22 How do I expand function calls in a string?
Message-Id: <D%Wnm.135502$sC1.7764@newsfe17.iad>

This is an excerpt from the latest version perlfaq4.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

4.22: How do I expand function calls in a string?

    (contributed by brian d foy)

    This is documented in perlref, and although it's not the easiest thing
    to read, it does work. In each of these examples, we call the function
    inside the braces used to dereference a reference. If we have more than
    one return value, we can construct and dereference an anonymous array.
    In this case, we call the function in list context.

            print "The time values are @{ [localtime] }.\n";

    If we want to call the function in scalar context, we have to do a bit
    more work. We can really have any code we like inside the braces, so we
    simply have to end with the scalar reference, although how you do that
    is up to you, and you can use code inside the braces. Note that the use
    of parens creates a list context, so we need "scalar" to force the
    scalar context on the function:

            print "The time is ${\(scalar localtime)}.\n"

            print "The time is ${ my $x = localtime; \$x }.\n";

    If your function already returns a reference, you don't need to create
    the reference yourself.

            sub timestamp { my $t = localtime; \$t }

            print "The time is ${ timestamp() }.\n";

    The "Interpolation" module can also do a lot of magic for you. You can
    specify a variable name, in this case "E", to set up a tied hash that
    does the interpolation for you. It has several other methods to do this
    as well.

            use Interpolation E => 'eval';
            print "The time values are $E{localtime()}.\n";

    In most cases, it is probably easier to simply use string concatenation,
    which also forces scalar context.

            print "The time is " . localtime() . ".\n";



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2582
***************************************


home help back first fref pref prev next nref lref last post