[32671] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3947 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue May 14 14:09:36 2013

Date: Tue, 14 May 2013 11:09:12 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 14 May 2013     Volume: 11 Number: 3947

Today's topics:
        async module <nospam.gravitalsun.noadsplease@hotmail.noads.com>
    Re: async module <rweikusat@mssgmbh.com>
    Re: How to break a bash command into an array consistin <jurgenex@hotmail.com>
    Re: How to break a bash command into an array consistin <rweikusat@mssgmbh.com>
    Re: How to break a bash command into an array consistin <rweikusat@mssgmbh.com>
        How to break a bash command into an array consisting of <pengyu.ut@gmail.com>
        Iterating hashes <dave@invalid.invalid>
    Re: Iterating hashes <peter@makholm.net>
    Re: Iterating hashes <rweikusat@mssgmbh.com>
    Re: Iterating hashes <dave@invalid.invalid>
    Re: Iterating hashes <ben@morrow.me.uk>
    Re: Iterating hashes <ben@morrow.me.uk>
    Re: Iterating hashes <rweikusat@mssgmbh.com>
    Re: Iterating hashes <willem@turtle.stack.nl>
        utf8 <nospam.gravitalsun.noadsplease@hotmail.noads.com>
    Re: utf8 <manfred.lotz@arcor.de>
    Re: utf8 <nospam.gravitalsun.noadsplease@hotmail.noads.com>
    Re: utf8 <manfred.lotz@arcor.de>
    Re: utf8 <hjp-usenet3@hjp.at>
    Re: utf8 <nospam.gravitalsun.noadsplease@hotmail.noads.com>
    Re: Why do Perl programmers make more money than Python <rweikusat@mssgmbh.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 13 May 2013 14:18:37 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Subject: async module
Message-Id: <kmqi50$11tm$1@news.ntua.gr>

there are numerous event/parallel based modules at cpan.
I have not time to study and test them. I think to grab AnyEvent and 
work with it.
What do think from your experience, is it a good choise ?


------------------------------

Date: Mon, 13 May 2013 13:52:03 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: async module
Message-Id: <87bo8fghzg.fsf@sapphire.mobileactivedefense.com>

George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
writes:
> there are numerous event/parallel based modules at cpan.
> I have not time to study and test them. I think to grab AnyEvent and
> work with it.
> What do think from your experience, is it a good choise ?

So far, I've written three 'larger' (>10,000 LOC) perl programs
structured around an 'sychronous I/O multiplexing'. The first one used
IO:Poll but I abandoned the idea of ever using that again after
looking at the implementation (because it destroyed and recreated the
'interest set' data structure for every poll call). For the second, I
used a 'saner' poll module I wrote myself but I've since lost the
right to use this code. For the third, I needed a quick solution and
because of this, I wrote two small extension modules making
sigwaitinfo(2) and the Linux 'struct siginfo' available to Perl and
used (Linux-specific) queued realtime signals for I/O readiness
notification.

I wouldn't want to use any 'event module' for the same reason I
wouldn't want to use any YARFPOO or any kind of 'C string library':
This is reasonably simple stuff hobbyists in search of 'fun
programming projects' delight in solving and the reason why there are
10^5 different ways to do the same is because all are deficient in
this or that aspect: Despite they are seriously over-generalized for
the needs of any single program, the problem itself is sufficiently
ill-defined that a sensible 'one size fits all' solution simply
doesn't exist.



------------------------------

Date: Sat, 11 May 2013 22:41:36 -0700
From: Jrgen Exner <jurgenex@hotmail.com>
Subject: Re: How to break a bash command into an array consisting of the arguments in the command?
Message-Id: <6jauo8doo8h1lrjgo37lmf54pru0l5rqce@4ax.com>

Peng Yu <pengyu.ut@gmail.com> wrote:
>Suppose that I have a bash command in a string, e.g.
>
>cmd.sh a 'a b' '
>'
>
>I want get an array consisting of "cmd.sh" "a" "a b" "\n". Is there a robust way to do so in perl that can handle all the possible cases? Thanks.

A brief glance at a BASH EBNF shows:

<command> ::=  <simple_command>
            |  <shell_command>
            |  <shell_command> <redirection_list>

<shell_command> ::=  <for_command>
                  |  <case_command>
                  |  while <compound_list> do <compound_list> done
                  |  until <compound_list> do <compound_list> done
                  |  <select_command>
                  |  <if_command>
                  |  <subshell>
                  |  <group_command>
                  |  <function_def>

Somewhat simplified: a command can be rather complex, in particular it
can be recursive and it can contain pretty much any BASH element
whatsoever. And that implies that the answer to your question is: 
Of course there is a robust way to write a fully-featured BASH-parser in
Perl. And nothing short of a fully-featured BASH-parser will be able to
parse a BASH command line.

jue


------------------------------

Date: Sun, 12 May 2013 19:33:39 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: How to break a bash command into an array consisting of the arguments in the command?
Message-Id: <871u9ccakc.fsf@sapphire.mobileactivedefense.com>

Peng Yu <pengyu.ut@gmail.com> writes:
> Suppose that I have a bash command in a string, e.g.
>
> cmd.sh a 'a b' '
> '
>
> I want get an array consisting of "cmd.sh" "a" "a b" "\n". Is there
> a robust way to do so in perl that can handle all the possible
> cases? Thanks.

Do you consider command substitution a possible case? And what about
process substitution? In case process substitution isn't needed, a
reasonably simple idea would be to invoke the shell to let it perform
"word-splitting" on the command string and use perl to transport the
result of that to a 'parent perl' with the help of some 'suitable
encoding', eg

--------------------
sub shell_cmd_to_list
{
    my ($all, @l, $one, $pos);

    $all = `perl -e 'my \$v; \$v .= pack("Z*", \$_) for \@ARGV; print \$v' $_[0]`;
    $pos = 0;
    do {
	$one = unpack('@'.$pos.'Z*', $all);
	push(@l, $one);
	$pos += length($one) + 1;
    } while ($pos < length($all));

    return @l;
}

my $cmd = "a 'a b' 'a\nb' \"`ls /`\"";
my @v = shell_cmd_to_list($cmd);

printf("%u\n--\n%s\n\n", $_, $v[$_]) for (0 .. $#v);


------------------------------

Date: Mon, 13 May 2013 11:27:54 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: How to break a bash command into an array consisting of the arguments in the command?
Message-Id: <87zjvz2mz9.fsf@sapphire.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> Peng Yu <pengyu.ut@gmail.com> writes:
>> Suppose that I have a bash command in a string, e.g.
>>
>> cmd.sh a 'a b' '
>> '
>>
>> I want get an array consisting of "cmd.sh" "a" "a b" "\n". Is there
>> a robust way to do so in perl that can handle all the possible
>> cases?

[...]

> sub shell_cmd_to_list
> {
>     my ($all, @l, $one, $pos);
>
>     $all = `perl -e 'my \$v; \$v .= pack("Z*", \$_) for \@ARGV; print \$v' $_[0]`;
>     $pos = 0;
>     do {
> 	$one = unpack('@'.$pos.'Z*', $all);
> 	push(@l, $one);
> 	$pos += length($one) + 1;
>     } while ($pos < length($all));
>
>     return @l;
> }

Coming to think of this, this should probably rather be

----------------
sub shell_cmd_to_list
{
    my ($all, @l, $one, $pos);

    $all = `perl -e 'my \$v; \$v .= pack("Z*", \$_) for \@ARGV; print \$v' $_[0]`;
    $? and return ();
    
    $pos = 0;
    while ($pos < length($all)) {
	$one = unpack('@'.$pos.'Z*', $all);
	push(@l, $one);
	$pos += length($one) + 1;
    } 

    return @l;
}
-----------------

which will return an empty list in case the shell encountered a synax
error in the argument string (alternatively, an exception could be
thrown) or if there were no arguments.

It should also be noted that this will not only perform command
substitution aka 'run arbitrary commands contained in the argument
string' but will also run an arbitrary 'trailing shell script'
attached to $_[0], IOW, it is completely unsuitable for processing
input from untrusted sources. OTOH, the shell already knows how to
parse 'shell commands' and using it to do this instead of
reprogramming the parser in Perl is IMHO generally sensible.

Special note: This is one of the rare cases where initializing a
variable is actually necessary in Perl because '@Z*' is not the same
as '@0Z*'.


------------------------------

Date: Sat, 11 May 2013 22:09:53 -0700 (PDT)
From: Peng Yu <pengyu.ut@gmail.com>
Subject: How to break a bash command into an array consisting of the arguments in the command?
Message-Id: <c7ccb0e0-f506-433b-82dd-a32627bae119@googlegroups.com>

Hi,

Suppose that I have a bash command in a string, e.g.

cmd.sh a 'a b' '
'

I want get an array consisting of "cmd.sh" "a" "a b" "\n". Is there a robust way to do so in perl that can handle all the possible cases? Thanks.


------------------------------

Date: Tue, 14 May 2013 13:37:47 +0000 (UTC)
From: "Dave Saville" <dave@invalid.invalid>
Subject: Iterating hashes
Message-Id: <fV45K0OBJxbE-pn2-LBYm3sPyq7F5@paddington.bear.den>

If I have

foreach (sort keys %hash)

Then I know that there is a posible performance hit as the keys are 
all extracted and then sorted. But is this because of the sort or are 
they always all extracted first? If the latter is true then how do you
iterate over a hash without taking the hit?

I am having some problems with a script that has two hashes upon which
I am trying to do inner and outer joins amongst other things. The 
hashes are roughly the same size and have over 60,000 keys the 
majority of the keys have a length of approx 70 characters. The hash 
values are a three element array: A mixed case copy of the key and two
integers.

-- 
Regards
Dave Saville


------------------------------

Date: Tue, 14 May 2013 15:51:13 +0200
From: Peter Makholm <peter@makholm.net>
Subject: Re: Iterating hashes
Message-Id: <87li7hd60e.fsf@vps1.hacking.dk>

"Dave Saville" <dave@invalid.invalid> writes:

> Then I know that there is a posible performance hit as the keys are 
> all extracted and then sorted. But is this because of the sort or are 
> they always all extracted first? If the latter is true then how do you
> iterate over a hash without taking the hit?

You kan use the each() function in a while loop:

    while (($key, $value) = each %hash) {
        print $key, "\n";
    }

See the relevant part of the perlfunc manual page or
http://perldoc.perl.org/functions/each.html

//Makholm


------------------------------

Date: Tue, 14 May 2013 15:49:46 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Iterating hashes
Message-Id: <874ne5aa5x.fsf@sapphire.mobileactivedefense.com>

Peter Makholm <peter@makholm.net> writes:
> "Dave Saville" <dave@invalid.invalid> writes:
>
>> Then I know that there is a posible performance hit as the keys are 
>> all extracted and then sorted. But is this because of the sort or are 
>> they always all extracted first? If the latter is true then how do you
>> iterate over a hash without taking the hit?
>
> You kan use the each() function in a while loop:
>
>     while (($key, $value) = each %hash) {
>         print $key, "\n";
>     }

There's actually a variety of options and depending on the number of
pairs in the hash, they perform differently ('values' came out first
everwhere for the tests I made).

----------------------------
use Benchmark qw(cmpthese);

sub traversal_bench
{
    my %h = map { $_, 1; } 0 .. $_[0];

    print("\n===\n$_[0] keys\n===\n");

    cmpthese(-4,
	      {
	       keys => sub {
		   my $v;
		
		   $v = $h{$_} for keys(%h);
	       },

	       sorted_keys => sub {
		   my $v;
		
		   $v = $h{$_} for sort(keys(%h));
	       },

	       values => sub {
		   1 for values(%h);
	       },
	   
	       scalar_each => sub {
		   my ($v, $k);

		   $v = $h{$k} while $k = each(%h);
	       },

	       list_each => sub {
		   my ($v, $k);

		   1 while ($k, $v) = each(%h);
	       }});
}

traversal_bench($_) for 10, 100, 1000, 10000, 100000, 1000000;


------------------------------

Date: Tue, 14 May 2013 15:30:32 +0000 (UTC)
From: "Dave Saville" <dave@invalid.invalid>
Subject: Re: Iterating hashes
Message-Id: <fV45K0OBJxbE-pn2-9E0b5UWZ6RBe@paddington.bear.den>

On Tue, 14 May 2013 14:49:46 UTC, Rainer Weikusat 
<rweikusat@mssgmbh.com> wrote:

> Peter Makholm <peter@makholm.net> writes:
> > "Dave Saville" <dave@invalid.invalid> writes:
> >
> >> Then I know that there is a posible performance hit as the keys are 
> >> all extracted and then sorted. But is this because of the sort or are 
> >> they always all extracted first? If the latter is true then how do you
> >> iterate over a hash without taking the hit?
> >
> > You kan use the each() function in a while loop:
> >
> >     while (($key, $value) = each %hash) {
> >         print $key, "\n";
> >     }
> 
> There's actually a variety of options and depending on the number of
> pairs in the hash, they perform differently ('values' came out first
> everwhere for the tests I made).
> 
> ----------------------------
> use Benchmark qw(cmpthese);
> 
> sub traversal_bench
> {
>     my %h = map { $_, 1; } 0 .. $_[0];
> 
>     print("\n===\n$_[0] keys\n===\n");
> 
>     cmpthese(-4,
> 	      {
> 	       keys => sub {
> 		   my $v;
> 		
> 		   $v = $h{$_} for keys(%h);
> 	       },
> 
> 	       sorted_keys => sub {
> 		   my $v;
> 		
> 		   $v = $h{$_} for sort(keys(%h));
> 	       },
> 
> 	       values => sub {
> 		   1 for values(%h);
> 	       },
> 	   
> 	       scalar_each => sub {
> 		   my ($v, $k);
> 
> 		   $v = $h{$k} while $k = each(%h);
> 	       },
> 
> 	       list_each => sub {
> 		   my ($v, $k);
> 
> 		   1 while ($k, $v) = each(%h);
> 	       }});
> }
> 
> traversal_bench($_) for 10, 100, 1000, 10000, 100000, 1000000;

===
100 keys
===
               Rate sorted_keys scalar_each   list_each        keys   
  values
sorted_keys  5715/s          --        -28%        -36%        -46%   
    -85%
scalar_each  7956/s         39%          --        -10%        -25%   
    -80%
list_each    8863/s         55%         11%          --        -16%   
    -77%
keys        10602/s         85%         33%         20%          --   
    -73%
values      39161/s        585%        392%        342%        269%   
      --

Care to explain the numbers please?
-- 
Regards
Dave Saville


------------------------------

Date: Tue, 14 May 2013 17:40:31 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Iterating hashes
Message-Id: <v06a6a-8nn1.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Peter Makholm <peter@makholm.net> writes:
> > "Dave Saville" <dave@invalid.invalid> writes:
> >
> >> Then I know that there is a posible performance hit as the keys are 
> >> all extracted and then sorted. But is this because of the sort or are 
> >> they always all extracted first? If the latter is true then how do you
> >> iterate over a hash without taking the hit?
> >
> > You kan use the each() function in a while loop:
> >
> >     while (($key, $value) = each %hash) {
> >         print $key, "\n";
> >     }
> 
> There's actually a variety of options and depending on the number of
> pairs in the hash, they perform differently ('values' came out first
> everwhere for the tests I made).
> 
> ----------------------------
> use Benchmark qw(cmpthese);
> 
> sub traversal_bench
> {
>     my %h = map { $_, 1; } 0 .. $_[0];
> 
>     print("\n===\n$_[0] keys\n===\n");
> 
> 
> 	       sorted_keys => sub {
> 		   my $v;
> 		
> 		   $v = $h{$_} for sort(keys(%h));
> 	       },
> 
> 	       values => sub {
> 		   1 for values(%h);
> 	       },

That's hardly a fair comparison. In fact, 'values' coming out faster is
a red herring as well: it's only happening because the values are all 1
which is much faster to copy than a string.

With a fairer test like

    use Benchmark qw/cmpthese/;

    my %h = map +("$_", "$_"), 1..60_000;
    my $x;

    cmpthese -5, {
        keys    => sub { $x = $_ for keys %h },
        sort    => sub { $x = $_ for sort keys %h },
        values  => sub { $x = $_ for values %h },
        keach   => sub { 1 while $x = each %h },
        veach   => sub { 1 while $x = (each %h)[1] },
    };

I consistently get

             Rate   sort  veach   keys values  keach
    sort   9.11/s     --   -61%   -72%   -76%   -81%
    veach  23.6/s   159%     --   -27%   -37%   -51%
    keys   32.5/s   256%    37%     --   -13%   -33%
    values 37.2/s   309%    58%    15%     --   -23%
    keach  48.4/s   431%   105%    49%    30%     --

which is pretty much what I would expect: 'sort' is very expensive,
'keach' is cheap, and 'veach' is unavoidably more expensive than either
'keys' or 'values' because it has to do more work. I'm not sure why
'values' is cheaper than 'keys', but I suspect it has something to do
with the fact that hash keys are shared.

Attempting to compensate for 'veach's unfair disadvantage like this

    use Benchmark qw/cmpthese/;

    use constant C => "60000";
    my %h = map +("$_", "$_"), 1..60_000;
    my ($k, $v);

    cmpthese -5, {
        keys    => sub { $k = 1; ($k, $v) = ($_, C) for keys %h },
        values  => sub { $k = 1; ($k, $v) = (C, $_) for values %h },
        keach   => sub { $k = 1; ($k, $v) = (scalar each %h, C) while $k },
        veach   => sub { $k = 1; 1 while ($k, $v) = each %h },
    };

gives

             Rate   keys values  veach  keach
    keys   18.4/s     --   -11%   -16%   -49%
    values 20.7/s    12%     --    -5%   -43%
    veach  21.8/s    18%     6%     --   -39%
    keach  35.9/s    95%    74%    65%     --


which puts 'each' fastest again; I'm not sure why 'keach' is faster than
'veach', but I suspect that comparison still isn't entirely fair.

Ben



------------------------------

Date: Tue, 14 May 2013 17:46:07 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Iterating hashes
Message-Id: <fb6a6a-8nn1.ln1@anubis.morrow.me.uk>


Quoth "Dave Saville" <dave@invalid.invalid>:
> If I have
> 
> foreach (sort keys %hash)
> 
> Then I know that there is a posible performance hit as the keys are 
> all extracted and then sorted. But is this because of the sort or are 
> they always all extracted first? If the latter is true then how do you
> iterate over a hash without taking the hit?
> 
> I am having some problems with a script that has two hashes upon which
> I am trying to do inner and outer joins amongst other things. The 
> hashes are roughly the same size and have over 60,000 keys the 
> majority of the keys have a length of approx 70 characters. The hash 
> values are a three element array: A mixed case copy of the key and two
> integers.

Have you considered using SQLite? DBD::SQLite has the option of creating
an in-memory database, which may well be faster and will certainly be
easier than trying to write joins in Perl. (I don't know if it will be
faster because DBI assumes it's talking to a remote database server, so
it tends to be quite heavy. A straight SQLite binding to Perl would be
what you want, but I don't think that exists.)

Ben



------------------------------

Date: Tue, 14 May 2013 18:27:43 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Iterating hashes
Message-Id: <87hai58oa8.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> Peter Makholm <peter@makholm.net> writes:
>> > "Dave Saville" <dave@invalid.invalid> writes:
>> >
>> >> Then I know that there is a posible performance hit as the keys are 
>> >> all extracted and then sorted. But is this because of the sort or are 
>> >> they always all extracted first? If the latter is true then how do you
>> >> iterate over a hash without taking the hit?
>> >
>> > You kan use the each() function in a while loop:
>> >
>> >     while (($key, $value) = each %hash) {
>> >         print $key, "\n";
>> >     }
>> 
>> There's actually a variety of options and depending on the number of
>> pairs in the hash, they perform differently ('values' came out first
>> everwhere for the tests I made).
>> 
>> ----------------------------
>> use Benchmark qw(cmpthese);
>> 
>> sub traversal_bench
>> {
>>     my %h = map { $_, 1; } 0 .. $_[0];
>> 
>>     print("\n===\n$_[0] keys\n===\n");
>> 
>> 
>> 	       sorted_keys => sub {
>> 		   my $v;
>> 		
>> 		   $v = $h{$_} for sort(keys(%h));
>> 	       },
>> 
>> 	       values => sub {
>> 		   1 for values(%h);
>> 	       },
>
> That's hardly a fair comparison. In fact, 'values' coming out faster is
> a red herring as well: it's only happening because the values are all 1
> which is much faster to copy than a string.
>
> With a fairer test like
>
>     use Benchmark qw/cmpthese/;
>
>     my %h = map +("$_", "$_"), 1..60_000;
>     my $x;
>
>     cmpthese -5, {
>         keys    => sub { $x = $_ for keys %h },
>         sort    => sub { $x = $_ for sort keys %h },
>         values  => sub { $x = $_ for values %h },
>         keach   => sub { 1 while $x = each %h },
>         veach   => sub { 1 while $x = (each %h)[1] },
>     };

Why would this be 'a fair test' when the keys are copied for no
particular reason while no attempt is made to determine the values
except for the 'values' and 'each in list context' cases? Iterating
over the keys of a hash while not looking at the values associated
with those keys at all seems to be a rather bizarre idea of a use
case.


------------------------------

Date: Tue, 14 May 2013 17:58:38 +0000 (UTC)
From: Willem <willem@turtle.stack.nl>
Subject: Re: Iterating hashes
Message-Id: <slrnkp4uqe.2t0g.willem@turtle.stack.nl>

Rainer Weikusat wrote:
) Why would this be 'a fair test' when the keys are copied for no
) particular reason while no attempt is made to determine the values
) except for the 'values' and 'each in list context' cases? Iterating
) over the keys of a hash while not looking at the values associated
) with those keys at all seems to be a rather bizarre idea of a use
) case.

Perl doesn't have a 'set' type, and typically a hash is used for that, and
that is a perfectly legitimate use case for using only the keys of a hash.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: Mon, 13 May 2013 14:05:00 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Subject: utf8
Message-Id: <kmqhbe$sgi$1@news.ntua.gr>

Is there any easy way to decice if a string is valid UTF-8 ?


------------------------------

Date: Mon, 13 May 2013 14:51:46 +0200
From: Manfred Lotz <manfred.lotz@arcor.de>
Subject: Re: utf8
Message-Id: <20130513145146.2be51ad7@arcor.com>

On Mon, 13 May 2013 14:05:00 +0300
George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com> wrote:

> Is there any easy way to decice if a string is valid UTF-8 ?

Minimal example:

#! /usr/bin/perl

use strict;
use warnings;

use utf8;
use Encode;

my $string =3D 'H=C3=A4';

Encode::is_utf8($string) or die "bad string";

my $bad_string =3D 0x123456;
Encode::is_utf8($bad_string) or die "bad string";


--=20
Manfred



------------------------------

Date: Mon, 13 May 2013 16:22:36 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Subject: Re: utf8
Message-Id: <kmqpdf$2f7l$1@news.ntua.gr>

Στις 13/5/2013 15:51, ο/η Manfred Lotz έγραψε:
> On Mon, 13 May 2013 14:05:00 +0300
> George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com> wrote:
>
>> Is there any easy way to decice if a string is valid UTF-8 ?
>
> Minimal example:
>
> #! /usr/bin/perl
>
> use strict;
> use warnings;
>
> use utf8;
> use Encode;
>
> my $string = 'Hä';
>
> Encode::is_utf8($string) or die "bad string";
>
> my $bad_string = 0x123456;
> Encode::is_utf8($bad_string) or die "bad string";
>
>



thanks, it is working.
I have tried the same thing, but my mistake was, I have not used the 
line "use utf8;"    !




------------------------------

Date: Mon, 13 May 2013 15:43:52 +0200
From: Manfred Lotz <manfred.lotz@arcor.de>
Subject: Re: utf8
Message-Id: <20130513154352.3c622989@arcor.com>

On Mon, 13 May 2013 16:22:36 +0300
George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com> wrote:

> =CE=A3=CF=84=CE=B9=CF=82 13/5/2013 15:51, =CE=BF/=CE=B7 Manfred Lotz =CE=
=AD=CE=B3=CF=81=CE=B1=CF=88=CE=B5:
> > On Mon, 13 May 2013 14:05:00 +0300
> > George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
> > wrote:
> >
> >> Is there any easy way to decice if a string is valid UTF-8 ?
> >
> > Minimal example:
> >
> > #! /usr/bin/perl
> >
> > use strict;
> > use warnings;
> >
> > use utf8;
> > use Encode;
> >
> > my $string =3D 'H=C3=A4';
> >
> > Encode::is_utf8($string) or die "bad string";
> >
> > my $bad_string =3D 0x123456;
> > Encode::is_utf8($bad_string) or die "bad string";
> >
> >
>=20
>=20
>=20
> thanks, it is working.
> I have tried the same thing, but my mistake was, I have not used the=20
> line "use utf8;"    !
>=20
>=20

Yes, that is important.=20


--=20
Manfred







------------------------------

Date: Tue, 14 May 2013 01:10:59 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: utf8
Message-Id: <slrnkp2so3.skk.hjp-usenet3@hrunkner.hjp.at>

On 2013-05-13 12:51, Manfred Lotz <manfred.lotz@arcor.de> wrote:
> On Mon, 13 May 2013 14:05:00 +0300
> George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com> wrote:
>> Is there any easy way to decice if a string is valid UTF-8 ?
>
> Minimal example:
>
> #! /usr/bin/perl
>
> use strict;
> use warnings;
>
> use utf8;
> use Encode;
>
> my $string = 'H';

This string is not UTF-8 in any useful sense. It consists of two
characters, U+0048 LATIN CAPITAL LETTER H and U+00e4 LATIN SMALL
LETTER A WITH DIAERESIS. The same string encoded in UTF-8 would consist
of three bytes, "\x{48}\x{C3}\x{A4}". Note that the former string has
length 2, the latter has length 3.


> Encode::is_utf8($string) or die "bad string";

This tests whether the internal representation of the string is
utf-8-like, which you almost never want to know in a Perl program. It
also tells you whether the string has character semantics (unless you
use a rather new version of perl with the unicode_strings feature),
which is sometimes useful.

If you want to know whether a string is a correctly encoded UTF-8
sequence, try to decode it:

    $decoded = eval { decode('UTF-8', $string, FB_CROAK) };

(decode(..., FB_CROAK) will die if $string is not UTF-8, so you need to
catch that. All other check parameters are even less convenient).

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpat. -- Ralph Babel


------------------------------

Date: Tue, 14 May 2013 09:58:51 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Subject: Re: utf8
Message-Id: <kmsn9t$2kl9$1@news.ntua.gr>

>
> If you want to know whether a string is a correctly encoded UTF-8
> sequence, try to decode it:
>
>      $decoded = eval { decode('UTF-8', $string, FB_CROAK) };
>
> (decode(..., FB_CROAK) will die if $string is not UTF-8, so you need to
> catch that. All other check parameters are even less convenient).
>

nice !



------------------------------

Date: Sun, 12 May 2013 17:13:11 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Why do Perl programmers make more money than Python programmers
Message-Id: <87fvxsnpm0.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Jrgen Exner <jurgenex@hotmail.com>:
>> johannes falcone <visphatesjava@gmail.com> wrote:
>> 
>> >being slaves to fascism?
>> 
>> If I'm not very much mistaken a certain Sir Winston, KG, OM, CH, TD, DL,
>> FRS, Hon. RA  took care of that about 70 years ago.
>
> I'm hardly proud of this, but there was also a Sir Oswald who carried on
> that sort of unpleasantness for quite a while after that, not to mention
> the likes of the BNP which are unfortunately with us to this day.

Another thing which doesn't belong here but since I would be delighted
when I was paid as regularly as the treasury and have to listen to
this dreck on a day-by-day basis: There's also the guy to publically
stated that [paraphrase] "If we don't stop them now, one day it will
be civil war between us and the immigrants" and stood for council
election without being considered a right-wing extremist, the national
outrage caused by the idea that taxing foreigners more heavily than UK
nationals (cleverly disguised as "Benefit Fraud !!1" by the Daily
Mail) is actually prohibited by the EU treaties and the member of the
ruling coalition who flatly denied that 'immigrants' who 'contribute
to society' [as in paying taxes and NI] exist in the UK at all.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3947
***************************************


home help back first fref pref prev next nref lref last post