[32979] in Perl-Users-Digest
Perl-Users Digest, Issue: 4255 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jul 25 14:09:16 2014
Date: Fri, 25 Jul 2014 11:09:03 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 25 Jul 2014 Volume: 11 Number: 4255
Today's topics:
knocking rough edges off template project: issue1: enco <cal@example.invalid>
Regex to select between first and third occurence of co <balaji.draj@gmail.com>
Re: Regex to select between first and third occurence o <rweikusat@mobileactivedefense.com>
Re: Regex to select between first and third occurence o <jurgenex@hotmail.com>
Re: Regex to select between first and third occurence o <jurgenex@hotmail.com>
Re: Regex to select between first and third occurence o (hymie!)
Re: Regex to select between first and third occurence o <balaji.draj@gmail.com>
Re: Regex to select between first and third occurence o <rweikusat@mobileactivedefense.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 24 Jul 2014 15:43:01 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: knocking rough edges off template project: issue1: encodings.
Message-Id: <c3dgfmFscedU1@mid.individual.net>
I wanted to post this now, because I have enough partial results to
frame what my perl scripts are missing now. In this overhaul of them, I
use an array of arrays for the first time in perl. I was well served to
write my print_aoa, and I use this notation through the script:
sub print_aoa{
use strict;
use warnings;
use 5.010;
my $a = shift;
my @AoA = @$a;
for my $i ( 0 .. $#AoA ) {
my $aref = $AoA[$i];
for my $j ( 0 .. $#{$aref} ) {
print "elt $i $j is $AoA[$i][$j]\n";
}
}
return $a;
}
The slices of this array in one direction will correspond to the
repetitive body of the html doc: image, english caption, russian
caption. I've got them a little mixed up here now, as I'm sorting by
the first letter of their content now, as opposed to the lexicographic
order of the filenames, which is the design. I've just seen it fail for
the first time during testing with new content, so that's also something
that needs fixed.
The issue I want to focus on today is on effective use of encodings in
perl. For the sake of specificity, we shall stipulate that ruscaptions
are to be cyrillic.
Let me walk back a step. Previous versions of this template system look
like this:
http://merrillpjensen.com/pages/renoir1.html , where the nominal is a
picture and a caption. The body was written by the something pretty
close to the specifier that I think Ben Bacarisse suggested in days gone by:
my $specifier = read_file( $vars{"body"} ) ;
print "specifier is $specifier\n";
# print content to file
my $refc = get_content($rvars);
say "----------";
my %content = %$refc;
foreach my $key (sort keys %content) {
printf $fh $specifier,$remote_dir,$key,$content{$key};
}
When I passed cyrillic content through this, it became a bunch of domino
little monsters with fractions all over the place. I can see the proper
cyrillic as it sits in the files where I write it, I believe it's in
windows 1251 encoding. I can see it on stdout as it gets read before
printf turns it into dominoes. If I strip the dominoes out on the
resulting html page, paste in the cyrillic from the files, and upload
that file, I've got proper cyrillic content. It was getting clobbered
by printf. This shows the cyrillic pasted in to the first field, and
that nasty dominoes on the bottom one which is what's happening now:
http://merrillpjensen.com/pages/norway5.html
What to do?
I try to write my own. Ben always tells me to use meaningful variable
names. If you're rolling your own version of printf, you call it schmintf:
sub schmintf{
use strict;
use warnings;
use 5.010;
use Text::Template;
my $rvars = shift;
my $reftoAoA = shift;
my %vars = %$rvars;
my @AoA = @$reftoAoA;
say "in schmint ";
my $body = $vars{"body"};
my $template = Text::Template->new(
ENCODING => 'utf8',
SOURCE => $body)
or die "Couldn't construct template: $!";
my $return;
for my $i ( 0 .. $#AoA ){
$vars{"file"} = $AoA[$i][0];
$vars{"english"} = $AoA[$i][1];
$vars{"russian"} = $AoA[$i][2];
my $result = $template->fill_in(HASH => \%vars);
$return = $return.$result;
}
#say "return is $return";
return \$return;
}
I thought for sure that I had it when I added ENCODING => 'utf8', but
this does not fare any better than did printf.
Q1) How do I use perl to represent cyrillic faithfully?
What I have now looks like this:
http://merrillpjensen.com/pages/sunflower5.html
where the cyrillic part was achieved by using this site to populate the
files:
http://mashke.org/Conv/
, where the content goes from WIN1251 to KOI8, which essentially turns
the whole thing in stuff that looks like this:
мити факти
від вигадок
Крем
It's one heck of a time-consuming work around, when there's no need to
make it unreadable as data. I don't have these exotic characters in the
script, either.
Do german source materials have to look like fürchten ? It seems
incredible to me that germans would put up with such an ugly thing.
Alright long post. The script that makes the page is listed below the
last one I posted. The <code> tag for css doesn't quite work. It seems
to gobble up <STDIN>, but it otherwise looks about right.
Thanks for your comment,
--
Cal Dershowitz
------------------------------
Date: Thu, 24 Jul 2014 06:17:50 -0700 (PDT)
From: IJALAB <balaji.draj@gmail.com>
Subject: Regex to select between first and third occurence of comma
Message-Id: <4abccad2-9c01-4451-9607-6d247e4f385e@googlegroups.com>
Hi,
I am not well versed with regex. So, kindly help me with a regex to achieve=
the below example:
I have lot of lines like below:
=
"MONTH", 1, NULL, 0}, //11
=
"YEAR", 1, NULL, 0}, //12
=
"HOUR", 1, NULL, 0}, //13
=
"MINUTE", 1, NULL, 0}, //14
=
"SECOND", 1, NULL, 0}, //15
I need the following output:
=
"MONTH", //11
=
"YEAR", //12
=
"HOUR", //13
=
"MINUTE", //14
=
"SECOND", //15
Basically I tried to write a perl script by splitting based on comma and p=
rinting the first and last back reference. It worked but Will be thankful i=
f someone can give a regular expression to achieve the same. Also kkindly e=
xplain the syntax if time permits. thanks
------------------------------
Date: Thu, 24 Jul 2014 15:20:57 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <87vbqmj0s6.fsf@sable.mobileactivedefense.com>
IJALAB <balaji.draj@gmail.com> writes:
> I am not well versed with regex. So, kindly help me with a regex to achieve the below example:
>
> I have lot of lines like below:
> "MONTH", 1, NULL, 0}, //11
> "YEAR", 1, NULL, 0}, //12
> "HOUR", 1, NULL, 0}, //13
> "MINUTE", 1, NULL, 0}, //14
> "SECOND", 1, NULL, 0}, //15
>
>
> I need the following output:
>
> "MONTH", //11
> "YEAR", //12
> "HOUR", //13
> "MINUTE", //14
> "SECOND", //15
That's something I'd do with sed:
sed 's/,.*\/\//, \/\//'
The substitution expression also works with perl:
perl -pe 's/,.*\/\//, \/\//'
------------------------------
Date: Thu, 24 Jul 2014 08:13:22 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <4d82t9lnof8kfp7obn9gn5nq7bsofidfiv@4ax.com>
IJALAB <balaji.draj@gmail.com> wrote:
Could you please limit your line lenght to <75 characters as has been a
proven custom on Usenet for over 2 decades? Thank you.
Line length not fixed.
>I am not well versed with regex. So, kindly help me with a regex to achieve the below example:
>
>I have lot of lines like below:
> "MONTH", 1, NULL, 0}, //11
[...]
>I need the following output:
> "MONTH", //11
[...]
No need for a RE, this can more easily be done with a simple split:
$_ = '"MONTH", 1, NULL, 0}, //11';
print join ',', (split ',', $_)[0,-1];
>Basically I tried to write a perl script by splitting based on comma and printing the first and last back reference. It worked but Will be thankful if someone can give a regular expression to achieve the same. Also kkindly explain the syntax if time permits. thanks
Oh, you did that already. Why do you think that solution is not ok?
jue
------------------------------
Date: Thu, 24 Jul 2014 08:18:36 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <kq82t9hga48luhmnmn94ftn1qitvivhkpe@4ax.com>
Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>IJALAB <balaji.draj@gmail.com> writes:
>> I am not well versed with regex. So, kindly help me with a regex to achieve the below example:
>>
>> I have lot of lines like below:
>> "MONTH", 1, NULL, 0}, //11
[...]
>> I need the following output:
>> "MONTH", //11
>
>The substitution expression also works with perl:
>
>perl -pe 's/,.*\/\//, \/\//'
If it really has to be a RE (which I don't agree with), then it's much
simpler:
s/,.*,/,/;
Replace everything between the first and last comma including them with
a single comma.
jue
------------------------------
Date: 24 Jul 2014 17:09:36 GMT
From: hymie@lactose.homelinux.net (hymie!)
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <53d13dd0$0$20121$882e7ee2@usenet-news.net>
In our last episode, the evil Dr. Lacto had captured our hero,
Jürgen Exner <jurgenex@hotmail.com>, who said:
>If it really has to be a RE (which I don't agree with),
Sometimes your favorite tool is a hammer, and then you try to turn
everything into a nail.
--hymie! http://lactose.homelinux.net/~hymie hymie@lactose.homelinux.net
------------------------------
Date: Thu, 24 Jul 2014 10:39:29 -0700 (PDT)
From: IJALAB <balaji.draj@gmail.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <975cf536-f716-46b4-a49e-e4c525ddb0b0@googlegroups.com>
On Thursday, July 24, 2014 6:47:50 PM UTC+5:30, IJALAB wrote:
> Hi,
>=20
>=20
>=20
> I am not well versed with regex. So, kindly help me with a regex to achie=
ve the below example:
>=20
>=20
>=20
> I have lot of lines like below:
>=20
> =
"MONTH", 1, NULL, 0}, //11
>=20
> =
"YEAR", 1, NULL, 0}, //12
>=20
> =
"HOUR", 1, NULL, 0}, //13
>=20
> =
"MINUTE", 1, NULL, 0}, //14
>=20
> =
"SECOND", 1, NULL, 0}, //15
>=20
>=20
>=20
>=20
>=20
> I need the following output:
>=20
>=20
>=20
> =
"MONTH", //11
>=20
> =
"YEAR", //12
>=20
> =
"HOUR", //13
>=20
> =
"MINUTE", //14
>=20
> =
"SECOND", //15
>=20
>=20
>=20
>=20
>=20
> Basically I tried to write a perl script by splitting based on comma and=
printing the first and last back reference. It worked but Will be thankful=
if someone can give a regular expression to achieve the same. Also kkindly=
explain the syntax if time permits. thanks
Hello All,
Thanks for the reply. The reason why I wanted RE only was to use the RE in =
an editor where the code existed and didn't want to use an external script.=
But the suggestions (I tried all) worked great. thanks a ton for the help.
------------------------------
Date: Thu, 24 Jul 2014 18:50:02 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Regex to select between first and third occurence of comma
Message-Id: <877g32ir3p.fsf@sable.mobileactivedefense.com>
Jürgen Exner <jurgenex@hotmail.com> writes:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>IJALAB <balaji.draj@gmail.com> writes:
>>> I am not well versed with regex. So, kindly help me with a regex to achieve the below example:
>>>
>>> I have lot of lines like below:
>>> "MONTH", 1, NULL, 0}, //11
> [...]
>>> I need the following output:
>>> "MONTH", //11
>>
>>The substitution expression also works with perl:
>>
>>perl -pe 's/,.*\/\//, \/\//'
>
> If it really has to be a RE (which I don't agree with), then it's much
> simpler:
> s/,.*,/,/;
> Replace everything between the first and last comma including them with
> a single comma.
Replace everything between the first comma and a //-sequence including
them with ', //' is no more complicated than that.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 4255
***************************************