[23313] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5533 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Sep 20 11:05:44 2003

Date: Sat, 20 Sep 2003 08:05:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 20 Sep 2003     Volume: 10 Number: 5533

Today's topics:
    Re: A question about nested eval for alarm (James Willmore)
        convert .eml to .mbox in win32 <bill@my.place>
    Re: How can I comment out a large block of perl code? <jidanni@jidanni.org>
    Re: How can I comment out a large block of perl code? <jidanni@jidanni.org>
    Re: Little survey for Unix users <dha@panix.com>
    Re: locale games - looking for portable ways to get a l <doom@kzsu.stanford.edu>
    Re: Order of evaluation of expressions <REMOVEsdnCAPS@comcast.net>
    Re: perl lib all over the place <kuujinbo@hotmail.com>
    Re: perl man pages in xemacs (James Willmore)
    Re: Perl regular expression does not work on 5.8.0 <thepoet@nexgo.de>
        search a string <member40284@dbforums.com>
    Re: search a string <pilsl_usenet@goldfisch.at>
    Re: search a string <jurgenex@hotmail.com>
    Re: search a string <raisin@delete-this-trash.mts.net>
    Re: sort  issue <mpapec@yahoo.com>
    Re: transforming an explicit range based on implicit ex <mpapec@yahoo.com>
        troubles with unicode (incorrect sorting and basic unde <pilsl_usenet@goldfisch.at>
    Re: troubles with unicode (incorrect sorting and basic  <jurgenex@hotmail.com>
    Re: troubles with unicode (incorrect sorting and basic  <flavell@mail.cern.ch>
    Re: troubles with unicode (incorrect sorting and basic  <pilsl_usenet@goldfisch.at>
        What are you allowed to share? (David Morel)
    Re: wtf is the deal? (James Willmore)
    Re:  <bwalton@rochester.rr.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 20 Sep 2003 04:46:21 -0700
From: jwillmore@cyberia.com (James Willmore)
Subject: Re: A question about nested eval for alarm
Message-Id: <e0160815.0309200346.2df8c962@posting.google.com>

Xiaoqin Qiu <xqiu@wlv.agilent.com> wrote in message news:<3F6BD10F.4E3E4C2@wlv.agilent.com>...
> Hi,
> 
> I am trying to wirte code timout a possible hang process. I saw some
> sample codes that uses nested evals. But I can not see the reason why
> the nested evals are needed. I am listing both the nested eval and
> single eval sample codes. Could anyone help me find out if there is any
> problem using just single eval?
<snip>
> Nested eval code:
> -------------
> $SIG{ALRM} = \&timeout;    # Same as single eval code
> 
> sub timeout    # Same as single eval code
> {
>   my $pid;
>   ......          # To find out PID of cat bigfile
>   kill 9 => $pid;
>   die "Timeout";
> }
> 
> if (open(FH, "cat bigfile 2>&1 |"))
> {

    alarm(10);

because ....

>   eval
>   {
>     alarm(10);
>      eval
>      {
>       while(<FH>)
>       {
>          print $_;
>        }
>       };
>      alarm(0);
>   };
>   alarm(0);

you have ^^^ this here ....

from my understanding, you have to marry up the alarm's in order to
get the full effect of them.  i could be wrong :(

>   close(FH);
> }

HTH

Jim


------------------------------

Date: Fri, 19 Sep 2003 23:57:06 -0500
From: Mike James <bill@my.place>
Subject: convert .eml to .mbox in win32
Message-Id: <Xns93FC9ABEF5E9maentigejmsethnnt@206.127.4.25>

Hi, I'm working with some code I copied from here: 
http://www.usethesource.com/articles/02/06/17/1957223.shtml

My system: WinXP Home, ActiveState Perl 5.6.1, operated by Perl newbie...

The script is supposed to descend into sub-folders, read *.eml messages, 
then concatenate them into .mbox files. The problem is that the script 
doesn't descend into the sub-folders. Instead, it only operates on the 
files (and folders) in the folder where the script is located.

Here's the script for convert.pl:

sub convert
{
	my ($file) = @_;
	my $from = '';
	my $date = '';
	my $message = '';
	my $stop_looking = 0;
	
	open OE, "<$file";

	while ()
	{
		my $line;
		chomp;
		$line = $_;
		
		$message .= $_ . "\n";

		if ( $stop_looking == 0 )
		{
			if ( $line =~ /(From:.*\<(.*)\>)/ )
			{
				$from = $2;
			} 
			else 
			{
				if ( $line =~ /(From: ([^\<]*))/ )
				{
					$from = $2;
				}
			}
			
			if ( $line =~ /Date: (.*)/ )
			{
				$date = $1;
				
				if ( $date =~ /(\w{3}), (\d+) (\w{3}) (\d{4}) (\d\d:
\d\d:\d\d)/ )
				{
					my ($dow, $day, $month, $year, $time) = ($1,
$2,$3,$4,$5);
					
					if ( $day =~ /0(\d)/ )
					{
						$day = " $1";
					}
					
					$date = "$dow $month $day $time $year";
				}
			}
			
			if ( $line =~ /X-UIDL:/ )
			{
				$message = "\n\nFrom $from $date\n" . $message;

				$stop_looking = 1;
			}
		}
	}
	
	close OE;
	
	return $message;
}

sub convert_folder
{
	my ($folder) = @_;
	my @files = glob "$folder/*.eml";

	open MBOX, ">$folder.mbox";
	foreach	my $file (@files)
	{
		print "File $file...\n";
		print MBOX convert( $file );
	}
	close MBOX;
}

my @folders = glob '*';
foreach	my $folder (@folders)
{
	print "Converting $folder...\n";
	convert_folder( $folder );
}


------------------------------

Date: Sat, 20 Sep 2003 07:18:21 +0800
From: Dan Jacobson <jidanni@jidanni.org>
Subject: Re: How can I comment out a large block of perl code?
Message-Id: <87y8wks6yq.fsf@jidanni.org>

$ perldoc perlpod|grep =rem
$
=rem is undocumented but works it seems.


------------------------------

Date: Sat, 20 Sep 2003 13:16:20 +0800
From: Dan Jacobson <jidanni@jidanni.org>
Subject: Re: How can I comment out a large block of perl code?
Message-Id: <878yokrqe3.fsf@jidanni.org>

D> my bugs just go into a black hole.

A> What do you mean?  Some Cabale?

Well, ok, I suppose I was just used to Debian's auto etc. acknowledgments.

Anyway: Dear CC'd perlbug: see the current thread
http://groups.google.com/groups?threadm=bkfl3b%24b94%241%40mamenchi.zrz.TU-Berlin.DE
for mention of perlpod undocumented features etc.


------------------------------

Date: Sat, 20 Sep 2003 08:21:10 +0000 (UTC)
From: "David H. Adler" <dha@panix.com>
Subject: Re: Little survey for Unix users
Message-Id: <slrnbmo3fm.hmg.dha@panix2.panix.com>

In article <Pine.LNX.4.53.0309191838540.4262@ppepc56.ph.gla.ac.uk>, Alan
J. Flavell wrote:

> On Fri, 19 Sep 2003, Eric Schwartz wrote:
> 
>> -=Eric (ed is the STANDARD EDITOR!!!!!)
> 
> We have a user who still uses EDT, or rather, an emulation of EDT in
> vi which he got from somewhere to ease the transition from VAX/VMS to
> Ultrix, and that was about 2 OSes back.  Which only goes to prove the
> point that editors are a very personal thing and not worth
> flamefesting over.

Since:

a) You bring up emulation
b) We seem to have wound up with a long thread anyway

 ...I would like to point out that I've been known to use viper-mode (a
vi emulation mode) in emacs.

dha

-- 
David H. Adler - <dha@panix.com> - http://www.panix.com/~dha/
If your head has exploded, you should avoid sausage factories. 
        - Larry Wall


------------------------------

Date: 20 Sep 2003 01:48:09 -0700
From: Joseph Brenner <doom@kzsu.stanford.edu>
Subject: Re: locale games - looking for portable ways to get a list of valid locales
Message-Id: <m3ad8zvoae.fsf@crack.nonagon.org>

Ilja Tabachnik <billy@arnis-bsl.com> writes:

> Joseph Brenner wrote:
> ...
> > 
> > So I would *think* that what I need is some way of getting a list of
> > valid locales, and I don't see any way of doing that out on CPAN.
> ...
> 
> IMHO, the portable way (for Unix-like systems) is just to use
> the locale(1) command, say @all_locales = `locale -a`.
> The locale(1) and it's '-a' switch is defined in SUSv2 (and 3 ?)

Well, that certainly works on my Linux box, but what about
all those benighted Windows machines out there? 

Also, my experience with trying to do cross-platform shell
scripts rubbed my nose in the lack of compatibility between
different unices... I presume there are reasons that the 
"perllocale" documentation doesn't want to commit to what 
you should do:

       Finding locales

       For locales available in your system, consult also setlocale(3) to see
       whether it leads to the list of available locales (search for the SEE
       ALSO section).  If that fails, try the following command lines:

               locale -a

               nlsinfo

               ls /usr/lib/nls/loc

               ls /usr/lib/locale

               ls /usr/lib/nls

               ls /usr/share/locale



------------------------------

Date: Sat, 20 Sep 2003 09:02:22 -0500
From: "Eric J. Roode" <REMOVEsdnCAPS@comcast.net>
Subject: Re: Order of evaluation of expressions
Message-Id: <Xns93FC661F95B86sdn.comcast@206.127.4.25>

dkcombs@panix.com (David Combs) wrote in news:bkdtkp$2j8$1
@reader2.panix.com:

> In article <Xns93EEEC3E82233sdn.comcast@206.127.4.25>,
> Eric J. Roode <REMOVEsdnCAPS@comcast.net> wrote:
> ...
> 
> 
> What I believe to be true is that when dealing
> with *reals*, addition doesn't necessarily
> commute, ie a + b doesn't necessairly equal b + a,
> because of the limited precision of reals.

Offhand, I can't think of an example where this would be true.  Can you 
give an example?

-- 
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print


------------------------------

Date: Sat, 20 Sep 2003 17:10:02 +0900
From: ko <kuujinbo@hotmail.com>
Subject: Re: perl lib all over the place
Message-Id: <bkh23k$p56$1@pin3.tky.plala.or.jp>

Sam wrote:
> after installing debian, I learned that the perl version is 5.6 so 
> knowing perl 5.8 is out there and 6 is just around the cornor and also 
> knowing there is now .deb file for 5.8 yet, I downloaed the source, 
> compiled it and installed it, so now I have 5.8, later I knew I should 
> not have done that since it would upset debian way of doing things.
> so I hope I did not shot my self in the foot. but if I did how can I fix 
>  and avoid future problems?

Sorry, I don't have any experience with Debian. My *nix box is FreeBSD, 
which has a package system that doesn't take mess with the default Perl 
install. From the error messages you show, I'm guessing that when you 
installed Perl 5.8 a link/symlink was created to reference 5.8. You 
could try 'ls -l /usr/bin/perl' from your shell (substitute the full 
path to Perl if its different on your system) to confirm whether my 
guess is correct. If so, just change the link to point to wherever 5.6.1 
resides on your system and you have your default installation back. But 
I want to *stress* that this is not a Perl problem, and that you should 
check one of the debian mailing lists, or post to one of the 
comp.os.linux.* groups to get the real facts that will solve your problem.

>> perl -e 'print "$_\n" foreach (@INC)'

Maybe you made a typo? As Tad suggested in another reply in this thread, 
you can do 'perl -V'. The contents of @INC are output at the end.

perldoc lib
perldoc -q module
perldoc -q include
'How do I add a directory to my include path at runtime?'

May also be insightful.

> using xemacs, perl->Run and if I don't have use lib 
> "/usr/local/lib/perl/5.6.1"; near the top of my module I get
> 
> cd /home/username/perl-programs/
> /usr/local/bin/perl -w "/home/usrname/perl-programs/datetest.pl"
> Can't locate Date/Calc.pm in @INC (@INC contains: 
> /usr/local/lib/perl5/5.8.0/i686-linux /usr/local/lib/perl5/5.8.0 
> /usr/local/lib/perl5/site_perl/5.8.0/i686-linux 
> /usr/local/lib/perl5/site_perl/5.8.0 /usr/local/lib/perl5/site_perl .) 
> at /home/username/perl-programs/trend.pl line 5.
> BEGIN failed--compilation aborted at 
> /home/username/perl-programs/datetest.pl line 5.
> 
> Compilation exited abnormally with code 2 at Sat Sep 20 08:39:55

Here's another (bad) way to find out what is in @INC, and it shows that 
you're running 5.8. You should also be aware that in some cases modules 
need to be compiled for specific versions of Perl - your 'use lib..' is 
using the Date::Calc module built for 5.6.1 (I don't think this is a 
problem with Date::Calc, if I'm wrong, someone please correct me).

Good luck - keith



------------------------------

Date: 20 Sep 2003 04:42:11 -0700
From: jwillmore@cyberia.com (James Willmore)
Subject: Re: perl man pages in xemacs
Message-Id: <e0160815.0309200342.533e5dd8@posting.google.com>

Sam <samj@austarmetro.com.au> wrote in message news:<3F6B7D24.2060400@austarmetro.com.au>...
> Hello
> how can I use xemacs to read perl man pages. currently I do
> bach$man <PerlPackageName>
> 
> another point:
> in the text of a perl doc "See section <SectionName> at the bottom of 
> this document.
> is there a pod reader where <SectionName> would hyperlinked to that 
> location in that documnet for easy of navigation?
> even pod2html does not do this.
> 
> any hint is appriciatee
> 
> thanks

Tk::Pod (http://search.cpan.org)

HTH

Jim


------------------------------

Date: Sat, 20 Sep 2003 10:59:07 +0200
From: "Christian Winter" <thepoet@nexgo.de>
Subject: Re: Perl regular expression does not work on 5.8.0
Message-Id: <3f6c16f2$0$23104$9b4e6d93@newsread2.arcor-online.net>

"Stefan Rupp" <stefan.rupp@inform-ac.com> wrote:
> How do I 'disable utf8 support' in RH 9?  I'm more used to Debian, so
> I'm unfamiliar with 'The RedHat way'.

Me as well. But I would just try "The Debian way"
and put
export LC_ALL="POSIX"
or
export LC_ALL="en_US"
or whatever language specific charset is needed
into /etc/profile. With standard configuration
"locale" should show something like
LANG="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
 ...
in the default configuration.

> The other option, to update perl, would be to go to perl 5.8.1 beta,
> which I'd rather avoid.  I'd need libdb-4.1.so for that as well.

Yes, the old problem with db versions. Changing
locale settings as a work-around should do until
5.8.1 stable is available on RHN.

-Christian




------------------------------

Date: Sat, 20 Sep 2003 07:45:10 -0400
From: alexanderl <member40284@dbforums.com>
Subject: search a string
Message-Id: <3394225.1064058310@dbforums.com>


i would like to do the following with perl script:



to check and search a particular string within a file but the string
must be within <h1>       </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".



is there a simple way to do that pls?



thanks


--
Posted via http://dbforums.com


------------------------------

Date: Sat, 20 Sep 2003 14:59:50 +0200
From: peter pilsl <pilsl_usenet@goldfisch.at>
Subject: Re: search a string
Message-Id: <3f6c4ffc$1@e-post.inode.at>

alexanderl wrote:

> 
> i would like to do the following with perl script:
> 
> 
> 
> to check and search a particular string within a file but the string
> must be within <h1>       </h1> tag
> 
> say for example i check and search the string "Click" that is within the
> <h1></h1> tag. If the string is found then that string will be replace
> with the string "Link".
> 

s/<h1>Click</h1>/<h1>Link</h1>/g;

man perlop

peter


-- 
peter pilsl
pilsl_usenet@goldfisch.at
http://www.goldfisch.at



------------------------------

Date: Sat, 20 Sep 2003 13:35:06 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: search a string
Message-Id: <eIYab.3396$iJ.3007@nwrddc01.gnilink.net>

peter pilsl wrote:
> alexanderl wrote:
>> to check and search a particular string within a file but the string
>> must be within <h1>       </h1> tag
>>
>> say for example i check and search the string "Click" that is within
>> the <h1></h1> tag. If the string is found then that string will be
>> replace with the string "Link".
>
> s/<h1>Click</h1>/<h1>Link</h1>/g;

Right idea but:
    syntax error at C:\tmp\t.pl line 4, near "h1>"

Your command will try to replace '<h1>Click<' with 'h1>' and then there is a
lot of garbage behind the substitution string.
Either escape the forward slash within your RE and the replace string or
even better use a different separator:

    s=<h1>Click</h1>=<h1>Link</h1>=g;

jue




------------------------------

Date: Sat, 20 Sep 2003 09:03:16 -0500
From: Barry Kimelman <raisin@delete-this-trash.mts.net>
Subject: Re: search a string
Message-Id: <MPG.19d61a20c3b3605f989688@news.mts.net>

[This followup was posted to comp.lang.perl.misc]

In article <3394225.1064058310@dbforums.com>, member40284@dbforums.com 
says...
> 
> i would like to do the following with perl script:
>
> to check and search a particular string within a file but the string
> must be within <h1>       </h1> tag
> 
> say for example i check and search the string "Click" that is within the
> <h1></h1> tag. If the string is found then that string will be replace
> with the string "Link".
> 
> is there a simple way to do that pls?
> 


$string = "<h1>Click</h1>";

$string =~ s/Click/Link/;


------------------------------

Date: Sat, 20 Sep 2003 11:19:33 +0200
From: Matija Papec <mpapec@yahoo.com>
Subject: Re: sort  issue
Message-Id: <uo6omvsqntpso88no9aa7561u7vffbc5sl@4ax.com>

X-Ftn-To: John W. Krahn 

"John W. Krahn" <krahnj@acm.org> wrote:
>>     $a->[1] cmp $b->[1] ||
>>     $a->[2] <=> $b->[2]
>>   }
>>   map [ $_, /(\w+)(\d+)/ ], @arr;
>
>Because \w includes \d and + is greedy, the third element will always be
>one character.  In other words, /(\w+)(\d+)/ is the same as
>/(\w+)(\d)/.  You probably want /(\w+?)(\d+)/ or /(\w*\D)(\d+)/ or
>something else instead.

Yes, didn't look for greediness; tnx.



-- 
Matija


------------------------------

Date: Sat, 20 Sep 2003 11:16:45 +0200
From: Matija Papec <mpapec@yahoo.com>
Subject: Re: transforming an explicit range based on implicit exceptions
Message-Id: <uf9nmvsk71rum36sha6kf66feoklmn434e@4ax.com>

X-Ftn-To: YAPoster 

yaposter@yahoo.com (YAPoster) wrote:
>$range = "1-1000";
>and implicit exceptions to it, say:
>$excepts = "3,5-7,12,14,16-18..."
>how do i convert $range into:
>
>$newrange = "1,2,8-11,13,15,19..."

my $excepts = "3,5-7,12,14,16-18";
my $range = "1-100";

my ($fr, $to) = split /-/, $range;
my $str = '1' x ($to-$fr+1);

substr($str, $_-$fr, 1) = '0' for
  map /(\d+)-(\d+)/ ? $1..$2 : $_,
  split /,/, $excepts;

my @tmp;
while ($str =~ /1+/g) {
  my @r = ($-[0]+$fr, $+[0]+$fr-1);
  push @tmp, ($r[0]==$r[1] ? $r[0] : "$r[0]-$r[1]");
}
my $newrange = join ',', @tmp;
print "$newrange\n";

Pure Perl. :)

-- 
$_=$2.$1,s sg s sgmx,$/.=$_,while q mnagdaLrrgysiHgsihgca
ek r=m=~m m(. ) (. ) mmxg;print"There is only one God,$/"


------------------------------

Date: Sat, 20 Sep 2003 13:57:36 +0200
From: peter pilsl <pilsl_usenet@goldfisch.at>
Subject: troubles with unicode (incorrect sorting and basic understandingproblems)
Message-Id: <3f6c4165$1@e-post.inode.at>

 
sorry, but I dont get this unicode-thingy right. I dont even know if it is 
a perl-problem, cause I've three applications interacting.

The task is to enter text via a webinterface, store it to a sql-database 
(postgres) and print it out to a webpage again. The text can be anything 
from russian, german, english, spanish ...

Everything seems to run ok: The text entered in the webinterface is 
processed in perl correct (The unicode-chars appears as two bytes in the 
string) and stored in the database correct (as two-byte again).  Even the 
way back works like a charm and all the text is printed out correct.

The problem starts when doing the sorting.  
If I let the SQL-database do the sorting I get all "exotic" chars sorted 
wrong. (german umlaut-O is between A and B ...).  I'll ask the 
postgres-people about this.
If I let perl-sort do the job I get all the "exotics" at the end. (umlaut-O 
is after Z)

I expect german umlaut-O to occure right between O and P.

Of course I could implement my own sorting-algorithm that deals with these 
special problems, but this would slow down things and I dont think I should 
do it, cause perl should be able to do it.

I think the problem (with sql-sort and perl-sort) is that obviously only 
the first byte of the two-byte is taking into account when sorting. Is this 
because I do something wrong or should to something or is this a very 
common problem ?

To illustrate my problem, I put a small sample-script online:

http://www.goldfisch.at/cgi-bin/unicodetest4.pl

if you enter some text, the text will be inserted in the database and all 
existing entries will be printed out, sorted by perl.

see the source at  http://www.goldfisch.at/test/unicodetest4.txt


I've read about Unicode::Collate to change the collating/sorting-behaviour, 
but I didnt get any clue how to use this to make "default"-latin sorting ..

any help is appretiated ..

thnx,
peter








-- 
peter pilsl
pilsl_usenet@goldfisch.at
http://www.goldfisch.at



------------------------------

Date: Sat, 20 Sep 2003 13:48:57 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: troubles with unicode (incorrect sorting and basic understandingproblems)
Message-Id: <dVYab.3400$iJ.2896@nwrddc01.gnilink.net>

peter pilsl wrote:

Can't help you much, but ...

> The task is to enter text via a webinterface, store it to a
> sql-database (postgres) and print it out to a webpage again. The text
> can be anything from russian, german, english, spanish ...
>
> Everything seems to run ok: The text entered in the webinterface is
> processed in perl correct (The unicode-chars appears as two bytes in
> the string) and stored in the database correct (as two-byte again).

So I guess you are talking about UTF-16 then, right?
"Unicode" by itself is not very precise because it has 3 defining encodings
now, so you better say which one you are using.

> Even the way back works like a charm and all the text is printed out
> correct.
>
> The problem starts when doing the sorting.
> If I let the SQL-database do the sorting I get all "exotic" chars
> sorted wrong. (german umlaut-O is between A and B ...).  I'll ask the
> postgres-people about this.
> If I let perl-sort do the job I get all the "exotics" at the end.
> (umlaut-O is after Z)

> I expect german umlaut-O to occure right between O and P.

Careful!
That would be correct for German, but for let's say Swedish all umlauts are
sorted lexicographically after the Z, i.e. Perl happens to do the right
thing for Swedish.
In other words: the sorting order for extended characters depends on the
locale of the text.
Maybe it will be sufficient to just tell Perl which locale to use for this
sort, but I've never tried it myself.

> [...] but I didnt get any clue how to use this
> to make "default"-latin sorting ..

There is no such thing as a default sorting order for extended characters.

jue




------------------------------

Date: Sat, 20 Sep 2003 16:11:29 +0200
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: troubles with unicode (incorrect sorting and basic understandingproblems)
Message-Id: <Pine.LNX.4.53.0309201402560.21862@lxplus002.cern.ch>

On Sat, Sep 20, peter pilsl inscribed on the eternal scroll:

> sorry, but I dont get this unicode-thingy right. I dont even know if it is
> a perl-problem, cause I've three applications interacting.

Then I'd have to recommend separating the parts and
understanding each component separately, to at least the level
necessary to understand how they interface to each other.

> The task is to enter text via a webinterface, store it to a sql-database
> (postgres) and print it out to a webpage again. The text can be anything
> from russian, german, english, spanish ...

Chinese?  Arabic?

> Everything seems to run ok: The text entered in the webinterface is
> processed in perl correct (The unicode-chars appears as two bytes in the
> string)

I don't know quite how to say this, but since you give the impression
that you aren't clear about what's going wrong, how can you be in a
position to tell us that this particular part is "correct"?  Perl's
native Unicode representation uses utf-8, in which characters occupy
one, two, three or more bytes, not exactly two.  What character
representation are you using at this point that you get exactly two?

> If I let perl-sort do the job I get all the "exotics" at the end. (umlaut-O
> is after Z)

Have you said anything to Perl about what language locale it's meant
to be using?  Sorting order is different for different languages.

Are you allowing Perl to work with Unicode characters, or is it
working at the byte level?  I looked briefly at your source code but
to be honest I wasn't able to answer that.

> I think the problem (with sql-sort and perl-sort) is that obviously only
> the first byte of the two-byte is taking into account when sorting.

I deduce from this that you are trying to work at the byte level,
with your characters occupying pairs of bytes?  That is _not_ Perl's
own intenal character representation.

I think my first move would be to factor-out your database stuff.
It's only acting as a temporary store for your data, after all, and
as a potential cause of extraneous confusion, as far as I can see.


------------------------------

Date: Sat, 20 Sep 2003 16:41:49 +0200
From: peter pilsl <pilsl_usenet@goldfisch.at>
Subject: Re: troubles with unicode (incorrect sorting and basic understandingproblems)
Message-Id: <3f6c67e3$1@e-post.inode.at>

Alan J. Flavell wrote:

<skip>

Ok - I tried to reduce the problem and wrote a small script that simply 
reads the content from a webform (textarea), lowercase and sorts the lines 
in it and print out the sorted line.

The webform has UTF-8 as charset and its content is retrieved via apache 
and the CGI-module to perl. Then I use the standard lc() and 
lower()-functions to process the text. 
For I'm using perl 5.8.0 I didnt specify any UTF-pragma.

Now I state that  lc() does not lower german umlauts and sort() puts german 
umlauts after Z.

My question now is : How can I tell perl (inside my script) that it should 
use "german" conventions for sorting and lowering. Is this possible at all ?

The script is online under 
http://www.goldfisch.at/cgi-bin/unicodetest7.pl

The source is below. If I also print out the length of the string using the 
length()-function I notice that each german umlaut  increases length() by 
two, so I was thinking that it is expressed as two-bytes char and I also 
wondered why length() does not count the chars, but the bytes. But as you 
already mentioned : I'm still not exactely clear what I'm talking about at 
all :(
Maybe I should need to transform the text that is delivered by CGI-module 
to "real" unicode before or whatever.


thnx a lot for any help,
peter


------------------------
#!/usr/local/bin/perl

use CGI;
use strict;

my $query = new CGI;
my $charset = 'UTF-8';

# print header
print  $query->header(-type=>'text/html; 
charset='.$charset),$query->start_html(-title=>'Unicodetest');

# read, process, print the form-content
if ($query->param('submit'))
{
  foreach (sort(map {lc $_} split(/\n/,$query->param('unicode')))) {
    print $_,"<br>\n";                  # print the lowered string
#    print length($_)-1,"<br>\n";   # print the length
  }
}

# print the form
print '<br><br>enter your unicode-testtext here : 
',$query->start_multipart_form,
  $query->textarea(-name=>'unicode',-rows=>10,-columns=>100),
  "\n<br>\n",
  $query->submit(-name=>'submit',-value=>'proceed'),"\n",
  $query->endform,"\n";


print $query->end_html;
--------------------






> 
> I don't know quite how to say this, but since you give the impression
> that you aren't clear about what's going wrong, how can you be in a
> position to tell us that this particular part is "correct"?  Perl's
> native Unicode representation uses utf-8, in which characters occupy
> one, two, three or more bytes, not exactly two.  What character
> representation are you using at this point that you get exactly two?
> 
>> If I let perl-sort do the job I get all the "exotics" at the end.
>> (umlaut-O is after Z)
> 
> Have you said anything to Perl about what language locale it's meant
> to be using?  Sorting order is different for different languages.
> 
> Are you allowing Perl to work with Unicode characters, or is it
> working at the byte level?  I looked briefly at your source code but
> to be honest I wasn't able to answer that.
> 
>> I think the problem (with sql-sort and perl-sort) is that obviously only
>> the first byte of the two-byte is taking into account when sorting.
> 
> I deduce from this that you are trying to work at the byte level,
> with your characters occupying pairs of bytes?  That is _not_ Perl's
> own intenal character representation.
> 
> I think my first move would be to factor-out your database stuff.
> It's only acting as a temporary store for your data, after all, and
> as a potential cause of extraneous confusion, as far as I can see.
> 

-- 
peter pilsl
pilsl_usenet@goldfisch.at
http://www.goldfisch.at



------------------------------

Date: 19 Sep 2003 22:45:26 -0700
From: altalingua@hotmail.com (David Morel)
Subject: What are you allowed to share?
Message-Id: <60c4a7b1.0309192145.32c7eff0@posting.google.com>

Hi all,

I'm trying to do some threads programming, and I need to share a
filehandle. Perl won't let me do that. I get the error: "Cannot share
globs yet..." when I try to share a filehandle. Perl won't even let me
share a pointer to a filehandle, as I get the error: "Invalid value
for shared scalar..."

How can I work around these limitations?

Thanks,
David Morel


------------------------------

Date: 20 Sep 2003 04:40:00 -0700
From: jwillmore@cyberia.com (James Willmore)
Subject: Re: wtf is the deal?
Message-Id: <e0160815.0309200340.a64de4a@posting.google.com>

"Tom" <tom@nosleep.net> wrote in message news:<3f6aaf48$1@nntp0.pdx.net>...
> > > I for the life
> > > of me cannot understand what people's problem is with where the text
>  goes.
> >
> >
> > *plonk*
> >
> >
> > -- 
> >     Tad McClellan                          SGML consulting
> >     tadmc@augustmail.com                   Perl programming
> >     Fort Worth, Texas
> 
> Now there is about the most childish bullshit I've seen in a long time.
> Grow the fuck up. Ahh, that explains it, Texas.

Childish is what you're doing trying to justify do what _you_ want to
do.  It's something called _respect_.  I realize that, in this day and
age, that's something long since forgotten - but _not_ here in this
newsgroup.

If you respect the wishes of the group, we will respect your wishes. 
If you 'dis us, we will 'dis you.  I'm thinking this is the same
everywhere, but it's observered here in this newsgroup very strongly.

Pretty simple - even for an engineer  ... or a husband ... or
boyfriend ... or significant other ... or co-worker .... or friend ...
:-)

Warmest regards

Jim


------------------------------

Date: Sat, 19 Jul 2003 01:59:56 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: 
Message-Id: <3F18A600.3040306@rochester.rr.com>

Ron wrote:

> Tried this code get a server 500 error.
> 
> Anyone know what's wrong with it?
> 
> if $DayName eq "Select a Day" or $RouteName eq "Select A Route") {

(---^


>     dienice("Please use the back button on your browser to fill out the Day
> & Route fields.");
> }
 ...
> Ron

 ...
-- 
Bob Walton



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5533
***************************************


home help back first fref pref prev next nref lref last post