[32979] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4255 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jul 25 14:09:16 2014

Date: Fri, 25 Jul 2014 11:09:03 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 25 Jul 2014     Volume: 11 Number: 4255

Today's topics:
        knocking rough edges off template project: issue1: enco <cal@example.invalid>
        Regex to select between first and third occurence of co <balaji.draj@gmail.com>
    Re: Regex to select between first and third occurence o <rweikusat@mobileactivedefense.com>
    Re: Regex to select between first and third occurence o <jurgenex@hotmail.com>
    Re: Regex to select between first and third occurence o <jurgenex@hotmail.com>
    Re: Regex to select between first and third occurence o (hymie!)
    Re: Regex to select between first and third occurence o <balaji.draj@gmail.com>
    Re: Regex to select between first and third occurence o <rweikusat@mobileactivedefense.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 24 Jul 2014 15:43:01 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: knocking rough edges off template project: issue1: encodings.
Message-Id: <c3dgfmFscedU1@mid.individual.net>

I wanted to post this now, because I have enough partial results to 
frame what my perl scripts are missing now.  In this overhaul of them, I 
use an array of arrays for the first time in perl.  I was well served to 
write my print_aoa, and I use this notation through the script:

sub print_aoa{
use strict;
use warnings;
use 5.010;

my $a = shift;
my @AoA = @$a;
     for my $i ( 0 .. $#AoA ) {
         my $aref = $AoA[$i];
         for my $j ( 0 .. $#{$aref} ) {
             print "elt $i $j is $AoA[$i][$j]\n";
         }
     }
return $a;
}

The slices of this array in one direction will correspond to the 
repetitive body of the html doc: image, english caption, russian 
caption.  I've got them a little mixed up here now, as I'm sorting by 
the first letter of their content now, as opposed to the lexicographic 
order of the filenames, which is the design.  I've just seen it fail for 
the first time during testing with new content, so that's also something 
that needs fixed.

The issue I want to focus on today is on effective use of encodings in 
perl.  For the sake of specificity, we shall stipulate that ruscaptions 
are to be cyrillic.

Let me walk back a step.  Previous versions of this template system look 
like this:
http://merrillpjensen.com/pages/renoir1.html , where the nominal is a 
picture and a caption.  The body was written by the something pretty 
close to the specifier that I think Ben Bacarisse suggested in days gone by:

my $specifier =  read_file( $vars{"body"} ) ;
print "specifier is $specifier\n";
# print content to file
my $refc = get_content($rvars);
say "----------";
my %content = %$refc;
foreach my $key (sort keys %content) {
     printf $fh $specifier,$remote_dir,$key,$content{$key};
}

When I passed cyrillic content through this, it became a bunch of domino 
little monsters with fractions all over the place.  I can see the proper 
cyrillic as it sits in the files where I write it, I believe it's in 
windows 1251 encoding.  I can see it on stdout as it gets read before 
printf turns it into dominoes.  If I strip the dominoes out on the 
resulting html page, paste in the cyrillic from the files, and upload 
that file, I've got proper cyrillic content.  It was getting clobbered 
by printf.  This shows the cyrillic pasted in to the first field, and 
that nasty dominoes on the bottom one which is what's happening now:

http://merrillpjensen.com/pages/norway5.html

What to do?

I try to write my own.  Ben always tells me to use meaningful variable 
names.  If you're rolling your own version of printf, you call it schmintf:

sub schmintf{
use strict;
use warnings;
use 5.010;
use Text::Template;
my $rvars = shift;
my $reftoAoA = shift;
my %vars = %$rvars;
my @AoA = @$reftoAoA;
say "in schmint ";
my $body = $vars{"body"};
my $template = Text::Template->new(
     ENCODING => 'utf8',
     SOURCE => $body)
     or die "Couldn't construct template: $!";
my $return;
for my $i ( 0 .. $#AoA ){
$vars{"file"} = $AoA[$i][0];
$vars{"english"} = $AoA[$i][1];
$vars{"russian"} = $AoA[$i][2];

my $result = $template->fill_in(HASH => \%vars);

$return = $return.$result;
}
#say "return is $return";
return \$return;
}

I thought for sure that I had it when I added  ENCODING => 'utf8', but 
this does not fare any better than did printf.

Q1) How do I use perl to represent cyrillic faithfully?

What I have now looks like this:

http://merrillpjensen.com/pages/sunflower5.html

where the cyrillic part was achieved by using this site to populate the 
files:

http://mashke.org/Conv/

, where the content goes from WIN1251 to KOI8, which essentially turns 
the whole thing in stuff that looks like this:

&#1084;&#1080;&#1090;&#1080; &#1092;&#1072;&#1082;&#1090;&#1080; 
&#1074;&#1110;&#1076; &#1074;&#1080;&#1075;&#1072;&#1076;&#1086;&#1082; 
&#1050;&#1088;&#1077;&#1084;

It's one heck of a time-consuming work around, when there's no need to 
make it unreadable as data.  I don't have these exotic characters in the 
script, either.

Do german source materials have to look like f&#252;rchten ? It seems 
incredible to me that germans would put up with such an ugly thing. 
Alright long post.  The script that makes the page is listed below the 
last one I posted.  The <code> tag for css doesn't quite work.  It seems 
to gobble up <STDIN>, but it otherwise looks about right.

Thanks for your comment,
-- 
Cal Dershowitz


------------------------------

Date: Thu, 24 Jul 2014 06:17:50 -0700 (PDT)
From: IJALAB <balaji.draj@gmail.com>
Subject: Regex to select between first and third occurence of comma
Message-Id: <4abccad2-9c01-4451-9607-6d247e4f385e@googlegroups.com>

Hi,

I am not well versed with regex. So, kindly help me with a regex to achieve=
 the below example:

I have lot of lines like below:
                                                                           =
"MONTH", 1, NULL, 0}, //11
                                                                           =
"YEAR", 1, NULL, 0}, //12
                                                                           =
"HOUR", 1, NULL, 0}, //13
                                                                           =
"MINUTE", 1, NULL, 0}, //14
                                                                           =
"SECOND", 1, NULL, 0}, //15


I need the following output:

                                                                           =
"MONTH",  //11
                                                                           =
"YEAR",  //12
                                                                           =
"HOUR", //13
                                                                           =
"MINUTE",  //14
                                                                           =
"SECOND",  //15


Basically I tried to write  a perl script by splitting based on comma and p=
rinting the first and last back reference. It worked but Will be thankful i=
f someone can give a regular expression to achieve the same. Also kkindly e=
xplain the syntax if time permits. thanks



------------------------------

Date: Thu, 24 Jul 2014 15:20:57 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <87vbqmj0s6.fsf@sable.mobileactivedefense.com>

IJALAB <balaji.draj@gmail.com> writes:
> I am not well versed with regex. So, kindly help me with a regex to achieve the below example:
>
> I have lot of lines like below:
>                                                                            "MONTH", 1, NULL, 0}, //11
>                                                                            "YEAR", 1, NULL, 0}, //12
>                                                                            "HOUR", 1, NULL, 0}, //13
>                                                                            "MINUTE", 1, NULL, 0}, //14
>                                                                            "SECOND", 1, NULL, 0}, //15
>
>
> I need the following output:
>
>                                                                            "MONTH",  //11
>                                                                            "YEAR",  //12
>                                                                            "HOUR", //13
>                                                                            "MINUTE",  //14
>                                                                            "SECOND",  //15

That's something I'd do with sed:

sed 's/,.*\/\//, \/\//'

The substitution expression also works with perl:

perl -pe 's/,.*\/\//, \/\//'


------------------------------

Date: Thu, 24 Jul 2014 08:13:22 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <4d82t9lnof8kfp7obn9gn5nq7bsofidfiv@4ax.com>

IJALAB <balaji.draj@gmail.com> wrote:

Could you please limit your line lenght to <75 characters as has been a
proven custom on Usenet for over 2 decades? Thank you.
Line length not fixed.

>I am not well versed with regex. So, kindly help me with a regex to achieve the below example:
>
>I have lot of lines like below:
>                                                                           "MONTH", 1, NULL, 0}, //11
[...]
>I need the following output:
>                                                                           "MONTH",  //11
[...]

No need for a RE, this can more easily be done with a simple split:
	$_ = '"MONTH", 1, NULL, 0}, //11';
	print join ',', (split ',', $_)[0,-1];

>Basically I tried to write  a perl script by splitting based on comma and printing the first and last back reference. It worked but Will be thankful if someone can give a regular expression to achieve the same. Also kkindly explain the syntax if time permits. thanks

Oh, you did that already. Why do you think that solution is not ok?

jue


------------------------------

Date: Thu, 24 Jul 2014 08:18:36 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <kq82t9hga48luhmnmn94ftn1qitvivhkpe@4ax.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>IJALAB <balaji.draj@gmail.com> writes:
>> I am not well versed with regex. So, kindly help me with a regex to achieve the below example:
>>
>> I have lot of lines like below:
>>                                                                            "MONTH", 1, NULL, 0}, //11
[...]
>> I need the following output:
>>                                                                            "MONTH",  //11
>
>The substitution expression also works with perl:
>
>perl -pe 's/,.*\/\//, \/\//'

If it really has to be a RE (which I don't agree with), then it's much
simpler:
	s/,.*,/,/;
Replace everything between the first and last comma including them with
a single comma.

jue


------------------------------

Date: 24 Jul 2014 17:09:36 GMT
From: hymie@lactose.homelinux.net (hymie!)
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <53d13dd0$0$20121$882e7ee2@usenet-news.net>

In our last episode, the evil Dr. Lacto had captured our hero,
  Jürgen Exner <jurgenex@hotmail.com>, who said:

>If it really has to be a RE (which I don't agree with),

Sometimes your favorite tool is a hammer, and then you try to turn
everything into a nail.

--hymie!    http://lactose.homelinux.net/~hymie    hymie@lactose.homelinux.net


------------------------------

Date: Thu, 24 Jul 2014 10:39:29 -0700 (PDT)
From: IJALAB <balaji.draj@gmail.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <975cf536-f716-46b4-a49e-e4c525ddb0b0@googlegroups.com>

On Thursday, July 24, 2014 6:47:50 PM UTC+5:30, IJALAB wrote:
> Hi,
>=20
>=20
>=20
> I am not well versed with regex. So, kindly help me with a regex to achie=
ve the below example:
>=20
>=20
>=20
> I have lot of lines like below:
>=20
>                                                                          =
  "MONTH", 1, NULL, 0}, //11
>=20
>                                                                          =
  "YEAR", 1, NULL, 0}, //12
>=20
>                                                                          =
  "HOUR", 1, NULL, 0}, //13
>=20
>                                                                          =
  "MINUTE", 1, NULL, 0}, //14
>=20
>                                                                          =
  "SECOND", 1, NULL, 0}, //15
>=20
>=20
>=20
>=20
>=20
> I need the following output:
>=20
>=20
>=20
>                                                                          =
  "MONTH",  //11
>=20
>                                                                          =
  "YEAR",  //12
>=20
>                                                                          =
  "HOUR", //13
>=20
>                                                                          =
  "MINUTE",  //14
>=20
>                                                                          =
  "SECOND",  //15
>=20
>=20
>=20
>=20
>=20
> Basically I tried to write  a perl script by splitting based on comma and=
 printing the first and last back reference. It worked but Will be thankful=
 if someone can give a regular expression to achieve the same. Also kkindly=
 explain the syntax if time permits. thanks

Hello All,

Thanks for the reply. The reason why I wanted RE only was to use the RE in =
an editor where the code existed and didn't want to use an external script.=
 But the suggestions (I tried all) worked great. thanks a ton for the help.


------------------------------

Date: Thu, 24 Jul 2014 18:50:02 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <877g32ir3p.fsf@sable.mobileactivedefense.com>

Jürgen Exner <jurgenex@hotmail.com> writes:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>IJALAB <balaji.draj@gmail.com> writes:
>>> I am not well versed with regex. So, kindly help me with a regex to achieve the below example:
>>>
>>> I have lot of lines like below:
>>>                                                                            "MONTH", 1, NULL, 0}, //11
> [...]
>>> I need the following output:
>>>                                                                            "MONTH",  //11
>>
>>The substitution expression also works with perl:
>>
>>perl -pe 's/,.*\/\//, \/\//'
>
> If it really has to be a RE (which I don't agree with), then it's much
> simpler:
> 	s/,.*,/,/;
> Replace everything between the first and last comma including them with
> a single comma.

Replace everything between the first comma and a //-sequence including
them with ', //' is no more complicated than that.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4255
***************************************


home help back first fref pref prev next nref lref last post