[31076] in Perl-Users-Digest
Perl-Users Digest, Issue: 2321 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Apr 5 21:09:48 2009
Date: Sun, 5 Apr 2009 18:09:12 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 5 Apr 2009 Volume: 11 Number: 2321
Today's topics:
Re: Ambiguity with lc sln@netherlands.com
Re: Ambiguity with lc <cherryplankton@gmail.com>
Re: Ambiguity with lc sln@netherlands.com
Re: Ambiguity with lc <glennj@ncf.ca>
Re: bareword question sln@netherlands.com
Re: bareword question sln@netherlands.com
Re: check file exists with case sensitive on a case ins sln@netherlands.com
Re: finding maximum sln@netherlands.com
Re: Hidden mode Apr. 4, 2009 sln@netherlands.com
Re: missing newlines <hjp-usenet2@hjp.at>
Re: missing newlines sln@netherlands.com
Re: missing newlines sln@netherlands.com
Re: missing newlines <wl@gnu.org>
Re: missing newlines <wl@gnu.org>
Re: Process header record and concatenate files sln@netherlands.com
Re: Regex compiler not showing backtracking for * sln@netherlands.com
Re: resolve single line with multiple items into mutlip sln@netherlands.com
Re: XRC vs Perl (GUI) generated code <hjp-usenet2@hjp.at>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 05 Apr 2009 22:51:07 GMT
From: sln@netherlands.com
Subject: Re: Ambiguity with lc
Message-Id: <g7dit4p3mq4mj2n2q06o6hggbkc0pcrbjh@4ax.com>
On Sat, 04 Apr 2009 13:39:51 +0200, Frank Seitz <devnull4711@web.de> wrote:
>#!/usr/bin/perl
>
>use strict;
>use warnings;
>
>my @arr = map {lc.'X'} qw/A B C/;
>print "@arr\n";
>
>__END__
>Warning: Use of "lc" without parentheses is ambiguous at ./test.pl line 6.
>aX bX cX
>
>I don't see the problem. Where is the ambiguity?
>
>Frank
So have you been invited to rewrite the Perl parser?
No?
What's the big deal? Where is the ambiguity?
Any retard could do it, don't you think?
-sln
------------------------------
Date: Sun, 05 Apr 2009 17:52:30 -0500
From: cherryplankton <cherryplankton@gmail.com>
Subject: Re: Ambiguity with lc
Message-Id: <1384480352260664576.616835cherryplankton-gmail.com@news.giganews.com>
Frank Seitz <devnull4711@web.de> wrote:
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my @arr = map {lc.'X'} qw/A B C/;
> print "@arr\n";
>
> __END__
> Warning: Use of "lc" without parentheses is ambiguous at ./test.pl
> line 6.
> aX bX cX
>
> I don't see the problem. Where is the ambiguity?
>
> Frank
What would perl6 do?
------------------------------
Date: Sun, 05 Apr 2009 23:02:04 GMT
From: sln@netherlands.com
Subject: Re: Ambiguity with lc
Message-Id: <urdit4hirtcviknj5fbunv4lun86g3bsu2@4ax.com>
On Sat, 04 Apr 2009 13:39:51 +0200, Frank Seitz <devnull4711@web.de> wrote:
>#!/usr/bin/perl
>
>use strict;
>use warnings;
>
>my @arr = map {lc.'X'} qw/A B C/;
>print "@arr\n";
>
>__END__
>Warning: Use of "lc" without parentheses is ambiguous at ./test.pl line 6.
>aX bX cX
>
>I don't see the problem. Where is the ambiguity?
>
>Frank
Everybody knows lc.'X' is ambigous. Thats why everybody always uses spaces
that don't turn warnings off. The compiler gets abmigous if you don't.
Perl is so rich.and ambigous. Let this be a lesson to you!
-sln
------------------------------
Date: 6 Apr 2009 00:58:23 GMT
From: Glenn Jackman <glennj@ncf.ca>
Subject: Re: Ambiguity with lc
Message-Id: <slrngtiktf.kbb.glennj@smeagol.ncf.ca>
At 2009-04-04 10:17AM, "Frank Seitz" wrote:
> I don't believe that anybody means lc($_.'X') when she writes lc.'X'.
> I expect lc($_).'X' and nothing else.
I don't believe anyone trying to write a maintainable program would
write "lc.'x'"
--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous
------------------------------
Date: Sun, 05 Apr 2009 22:28:13 GMT
From: sln@netherlands.com
Subject: Re: bareword question
Message-Id: <fvbit4dapt56395un9ur70hapkffvicnol@4ax.com>
On Wed, 01 Apr 2009 13:19:42 -0500, Ted Zlatanov <tzz@lifelogs.com> wrote:
>On Mon, 30 Mar 2009 09:39:25 -0700 (PDT) DaLoverhino <DaLoveRhino@hotmail.com> wrote:
>
>D> I have a function that takes an array of quoted words:
>
>D> use strict;
>D> use warnings;
>D> my_function( "hello", "world!");
>
>D> I can do this:
>
>D> my_function qq(hello world!);
>
>D> is there anyway to tell perl, that any barewords after my_function
>D> (and my_function alone) should be treated as quoted words? So that I
>D> can just simply do this:
>
>D> my_function(hello world!);
>
>Make the parameters a string and split it on space in the function.
>
>use Data::Dumper;
>
>my_function("hello world!");
>my_function(qw/hello world/);
>
>sub my_function
>{
> my $p = shift @_;
> my @p;
> if (scalar @_)
> {
> @p = ($p, @_);
> }
> else
> {
> @p = split ' ', $p;
> }
>
> print "Parameters = " . Dumper(\@p);
>}
>
>Produces:
>
>Parameters = $VAR1 = [
> 'hello',
> 'world!'
> ];
>Parameters = $VAR1 = [
> 'hello',
> 'world'
> ];
>
>Doing it the way you suggest is unnecessary obfuscation.
>
>Ted
Why can't you quote like a normal human being?
Whats all the "D> asdfasdf.as.sa.sa." crap.
And you inject code and comments.
None of it has dilineation from what your quoting.
The most immature crap a poster can do.
-sln
------------------------------
Date: Sun, 05 Apr 2009 22:35:26 GMT
From: sln@netherlands.com
Subject: Re: bareword question
Message-Id: <dacit4pgr0h3n3j9kpd252fal5dgui83g4@4ax.com>
On Wed, 01 Apr 2009 13:19:42 -0500, Ted Zlatanov <tzz@lifelogs.com> wrote:
>On Mon, 30 Mar 2009 09:39:25 -0700 (PDT) DaLoverhino <DaLoveRhino@hotmail.com> wrote:
>
[cut]
>sub my_function
>{
> my $p = shift @_;
> my @p;
> if (scalar @_)
> {
> @p = ($p, @_);
> }
> else
> {
> @p = split ' ', $p;
> }
Why do you post code with 1 space indentation???
Are you seriously retarded or what?
Why not just make it a one-liner?
sub my_function{my $p = shift @_;my @p;if (scalar @_){@p = ($p, @_);}else{@p = split ' ', $p;}
This is more readable.
-sln
------------------------------
Date: Sun, 05 Apr 2009 23:21:44 GMT
From: sln@netherlands.com
Subject: Re: check file exists with case sensitive on a case insensitive file system
Message-Id: <r5fit4hqmln9ce6206qv5ldd5e7smv42gl@4ax.com>
On Sun, 5 Apr 2009 08:18:49 -0700 (PDT), Xah Lee <xahlee@gmail.com> wrote:
>been a while i coded in perl...
>
>how to check if a file exists with case sensitiveness on os x?
>
>a quick google search says call system ls & grep. That seems too slow.
>I have some over 10 thousand files to check.
>
> Xah
>? http://xahlee.org/
>
>?
This will do it for you Zah Zah.
Somewhere in here is the answer to your problem.
-sln
-----------------------------------------
#!/usr/bin/perl
package SMG;
use strict;
use File::stat;
use File::Path;
use File::Copy;
use File::Find;
use File::Spec;
use sort 'stable';
require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(SafeMerge GetCommonElementsNxM GetCommonElements);
my $VERSION = 1.00;
#my $current = sort::current();
####################################################################################
# SAFE MERGE
# The two major options -
# 1. Merge 2 directories into 1, preserving/renaming duplicate named files.
# 2. Compare 2 directories and just rename the duplicates of 1 directory.
# Also -
# - The prefix level will be on a per-file monitored basis.
# - File/Dir names are cached and renamed in the "From" array,
# - Move is done via os-rename function into the "ToDir".
# - Remove "From" directory option (after rename).
# - Duplicate's-renaming-only is done in-place in the "FromDir".
# - Report output is created in the current directory (for now).
# - Itteration scheme are within-directory, then cross-directory until no more dups.
#-----------------------------------------------------------------------------------
sub SafeMerge ($$$$$$$)
{
return 0 if (@_ < 2 || @_ > 7);
my ($ToDir,$FromDir,$PrefixName,$PrefixLevel,$Exclude,$Duponly,$RmvFrom) = @_;
$PrefixName = '' unless defined $PrefixName; # Renaming Prefix
$PrefixLevel = 0 unless defined $PrefixLevel; # Number of times to += prefix
$Exclude = [] unless defined $Exclude; # Exclude list (must be re)
$Duponly = 0 unless defined $Duponly; # Flag to just rename dups in "from"
$RmvFrom = 0 unless defined $RmvFrom; # Flag to remove "from" after operation
my @To = ();
my @From = ();
my @Xfound = ();
my $dupsize = 0;
my $passes = 0;
print "\n\n *******************************\n";
print "********[ Safe Merge ]*********\n";
print " *******************************\n";
print "\nPrefix\t$PrefixName\nLevel\t$PrefixLevel\nChecks ... ";
$ToDir = File::Spec->canonpath( "$ToDir" );
$FromDir = File::Spec->canonpath( "$FromDir" );
#print "$ToDir\n$FromDir";
if ($ToDir eq $FromDir) { print "\rFrom/To are identical: $ToDir\n"; return 0; }
if (!-e $ToDir) { print "\rTo directory does not exist: $ToDir\n"; return 0; }
if (!-e $FromDir) { print "\rFrom directory does not exist: $FromDir\n"; return 0;}
# Glob seems to be bothered by spaces in dir path names
my $Sep = "\\".File::Spec->canonpath( "/" );
{
my $td = File::Spec->canonpath( "$ToDir/*" );
my $fd = File::Spec->canonpath( "$FromDir/*" );
if ($td =~ / /) {
$td = "'$td'";
}
if ($fd =~ / /) {
$fd = "'$fd'";
}
foreach (glob ("$td")) {
/.+$Sep(.+)$/;
# /.+\/(.+)$/;
push (@To, [$1, 0]);
}
foreach (glob ("$fd")) {
/.+$Sep(.+)$/;
# /.+\/(.+)$/;
my $ftmp = $1;
for (@{$Exclude}) {
if ($ftmp =~ /$_/) {
# save in @Xfound if $RmvFrom is set (see below)
print "\rExclude - $ftmp\n";
push (@Xfound, $ftmp) if ($RmvFrom);
$ftmp = undef;
last;
}
}
push (@From, [$ftmp, 0, '']) if defined ($ftmp);
}
}
# Will return if "From" empty. Technically not an error.
# If it is empty through exclusions (regexp filter), "From" could be rmdir'd below.
if (@From == 0) {
print "\rEmpty dir: $FromDir\n";
return 0;
}
print "OK\n";
## ============================================================================
## Iterate until there are no more dups across directories after (D2) rename
##
do {
my $Dupref = GetCommonElementsNxM_HashMethod(\@To,\@From, 1, 0);
$dupsize = @{$Dupref};
$passes++;
print "\nP A S S \# $passes\n---------------------\n";
print "Found $dupsize dups:\n";
#**********************************
# Check for cross-directory dups
# -------------
if ($dupsize)
{
my $errflg = 0;
print "\n<DIR> $FromDir\n\n";
for my $ndx (@{$Dupref})
{
my $Fromref = $From[$ndx];
my $pname = $PrefixName;
print "\t$Fromref->[0] ... $PrefixName";
#*****************************************
# Rename and check for in-directory dups
# ----------
while (1)
{
# Level check
if ($Fromref->[1] >= $PrefixLevel)
{
print "$Fromref->[0] <<level $Fromref->[1] exceeded>>\n";
# Clear the name in From so this file doesn't get moved
$Fromref->[0] = "";
$errflg = 1;
last;
}
my $stmp = uc($pname.$Fromref->[0]);
my $found = 0;
for (@From) { if (uc($_->[0]) eq $stmp) { $found = 1; last; } }
if (!$found) {
# OK to rename! Dup not found in "From"
# Change the name in "From" to this new one
# so it gets moved later. Increment its level of prefix
# ---------------------
print "$Fromref->[0]";
print "\n";
if ($Fromref->[1] == 0) { $Fromref->[2] = $Fromref->[0]; }
$Fromref->[0] = $pname.$Fromref->[0];
$Fromref->[1] += 1;
last;
}
else {
# File exist with that name, add another level of prefix.
# ---------------------
print "$PrefixName";
$pname = $pname.$PrefixName;
if ($Fromref->[1] == 0) { $Fromref->[2] = $Fromref->[0]; }
$Fromref->[1] += 1;
}
}
}
if ($errflg) {
print "!!! Can't rename some files, raise the level or change the prefix.\n";
#return 0;
}
}
} while ($dupsize > 0);
##
## End cross-dir iteration
## =============================
#*************************************************************
# Check for rename of duplicates in D2 (only) without moving
# ------------
if ($Duponly)
{
my $rentried = 0;
my $renok = 0;
print "\nRename only ... ";
for (@From) {
my $res = 0;
if ($_->[1] > 0 && $_->[1] < $PrefixLevel) {
$rentried++;
my $fr = File::Spec->canonpath( "$FromDir/$_->[2]" );
my $fr2 = File::Spec->canonpath( "$FromDir/$_->[0]" );
if (!($res = rename ("$fr", "$fr2"))) {
print "Rename error:\t$_->[2] ... $_->[0]\n";
} else {
$renok++; }
}
}
if ($rentried) { print "$renok out of $rentried files OK\n"; }
else { print "No duplicates found\n"; }
return 1;
}
#**********************
# Move "From" into "To"
# -------------------
#print @From."@From\n";
my $sep = File::Spec->canonpath( "/" );
print "\nMoving:\t$FromDir$sep* to $ToDir$sep\n";
my $movcnt = 0;
for (@From) {
my $res = 0;
if ($_->[1] > 0 && $_->[1] < $PrefixLevel) {
my $fr = File::Spec->canonpath( "$FromDir/$_->[2]" );
my $to = File::Spec->canonpath( "$ToDir/$_->[0]" );
$res = File::Copy::move ("$fr", "$to");
if ($res == 0) { print "Move error:\t$_->[2] ... $to\n"; }
}
if ($_->[1] == 0 && $_->[0] gt '') {
my $fr = File::Spec->canonpath( "$FromDir/$_->[0]" );
my $to = File::Spec->canonpath( "$ToDir/$_->[0]" );
$res = File::Copy::move ("$fr", "$to");
if ($res == 0) { print "Move error:\t$fr\n" }
}
$movcnt++ if ($res);
}
my $notmoved = @From - $movcnt;
print "Moved:\t$movcnt out of ".@From." files.\n";
#***************************************
# Check if "FromDir" is to be deleted
# --------------
if ($RmvFrom) {
print "Remove:\t$FromDir ... ";
if ($notmoved > 0) {
print "not deleted, contains $notmoved file(s) that couldn't be moved!\n";
}
else {
# check if @Xfound has values (and their not directories)
# if so, delete these first before trying to remove the From directory
# for now, don't want to "unlink" a directory, and we're not doing a tree here
# ----------------
for (@Xfound) {
my $fr = File::Spec->canonpath( "$FromDir/$_" );
unlink ("$fr") if (!-d $fr) ; }
my $fr = File::Spec->canonpath( "$FromDir" );
if (!rmdir "$fr") {print "$!\n" } else {print "OK\n"; }
}
}
return 1;
}
#######################################################
# Get Common Elements (from two N-dimensioned Array's)
# IN - Refs to the NxN arrays to compare,
# sort flag and the compare field.
# OUT - Ndx's into Right_Array of matching elements
# ---------------------------------------------------
# Notes -
# 1. Elements are assumed textual and case insensitive
# 2. Ignores in-array duplicates
# 3. Sort will be done if sort flag > 0
#
sub GetCommonElementsNxM($$$$)
{
my ($A_Left,$A_Right,$Srtflg,$Fld) = @_;
$Srtflg = 0 unless defined $Srtflg;
$Fld = 0 unless defined $Fld;
# my @Dup = ();
my @Ndx = ();
if ($Srtflg > 0) {
@{$A_Left} = sort {uc($a->[$Fld]) cmp uc($b->[$Fld])} @{$A_Left};
@{$A_Right} = sort {uc($a->[$Fld]) cmp uc($b->[$Fld])} @{$A_Right};
} else {print "==> Common Elements : Not sorting arrays\n";}
my $rpos = 0;
my $rend = @{$A_Right};
my $cnt = 0;
my $llast = undef;
my $rlast = undef;
foreach my $left_element (@{$A_Left})
{
next if (uc($left_element->[$Fld]) eq uc($llast->[$Fld]));
$rpos += $cnt;
$cnt = 0;
foreach my $right_element (@{$A_Right}[$rpos..($rend-1)])
{
last if (uc($left_element->[$Fld]) lt uc($right_element->[$Fld]));
$cnt++;
next if (uc($right_element->[$Fld]) eq uc($rlast->[$Fld]));
if (uc($left_element->[$Fld]) eq uc($right_element->[$Fld]))
{
# push (@Dup, $right_element->[$Fld]); # the string
push (@Ndx, $rpos+$cnt-1); # the index into R_Array
last;
}
$rlast = $right_element;
}
$llast = $left_element;
last if ($rpos >= $rend);
}
# return (\@Dup);
return (\@Ndx);
}
sub GetCommonElementsNxM_HashMethod($$$$)
{
my ($A_Left,$A_Right,$Srtflg,$Fld) = @_;
$Srtflg = 0 unless defined $Srtflg;
$Fld = 0 unless defined $Fld;
my ($element,%count);
#my @Dup = ();
my @Ndx = ();
my $idx = 0;
foreach $element (@{$A_Left}) {
$count{uc $element->[$Fld]} = 1;
}
foreach $element (@{$A_Right}) {
if ($count{uc $element->[$Fld]} == 1) {
#push @Dup, $element->[$Fld];
push @Ndx, $idx;
$count{$element->[$Fld]}++;
}
$idx++;
}
# return (\@Dup);
return (\@Ndx);
}
#######################################################
# Get Common Elements from single Array's
# IN - Refs to the Nx1 arrays to compare, sort flag
# OUT - Ndx's into Right_Array of matching elements
# ---------------------------------------------------
# Notes -
# 1. Elements are assumed textual and case insensitive
# 2. Ignores in-array duplicates
# 3. Sort will be done if sort flag > 0
#######################################################
sub GetCommonElements($$$)
{
my ($A_Left,$A_Right,$Srtflg) = @_;
$Srtflg = 0 unless defined $Srtflg;
# my @Dup = ();
my @Ndx = ();
if ($Srtflg > 0) {
@{$A_Left} = sort {uc($a) cmp uc($b)} @{$A_Left};
@{$A_Right} = sort {uc($a) cmp uc($b)} @{$A_Right};
} else {print "==> Common Elements : Not sorting arrays\n";}
my $rpos = 0;
my $rend = @{$A_Right};
my $cnt = 0;
my $llast = '';
my $rlast = '';
foreach my $left_element (@{$A_Left})
{
next if (uc($left_element) eq uc($llast));
$rpos += $cnt;
$cnt = 0;
foreach my $right_element (@{$A_Right}[$rpos..($rend-1)])
{
last if (uc($left_element) lt uc($right_element));
$cnt++;
next if (uc($right_element) eq uc($rlast));
if (uc($left_element) eq uc($right_element))
{
# push (@Dup, $right_element); # the string
push (@Ndx, $rpos+$cnt-1); # the index into R_Array
last;
}
$rlast = $right_element;
}
$llast = $left_element;
last if ($rpos >= $rend);
}
# return (\@Dup);
return (\@Ndx);
}
sub GetCommonElements_HashMethod($$$$)
{
my ($A_Left,$A_Right,$Srtflg) = @_;
$Srtflg = 0 unless defined $Srtflg;
my ($element,%count);
#my @Dup = ();
my @Ndx = ();
my $idx = 0;
foreach $element (@{$A_Left}) {
$count{uc $element} = 1;
}
foreach $element (@{$A_Right}) {
if ($count{uc $element} == 1) {
#push @Dup, $element;
push @Ndx, $idx;
$count{$element}++;
}
$idx++;
}
# return (\@Dup);
return (\@Ndx);
}
1;
------------------------------
Date: Sun, 05 Apr 2009 22:42:35 GMT
From: sln@netherlands.com
Subject: Re: finding maximum
Message-Id: <uncit4dj1mhps7gvri8grk7n7dr3vq9p9s@4ax.com>
On Wed, 1 Apr 2009 07:44:43 -0700 (PDT), Fred <fred.l.kleinschmidt@boeing.com> wrote:
>On Mar 31, 8:24 pm, Tad J McClellan <ta...@seesig.invalid> wrote:
>> ["Followup-To:" header set to comp.lang.perl.misc.]
>>
>> Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>> > sandeep wrote:
>> >> in idiomatic C?
>>
>> > There is no such thing as "idiomatic C." There are only
>> > "High C" (as practiced during the High C Era), and "Trade C"
>> > (the patois that has taken hold among those who'd rather get
>> > things done than argue about the proprieties). There are also
>> > some debased C-ish dialects, notably "Rapper's C" or "CRap."
>>
>> Leaving out "Tenne C" is inexcusable.
>>
>
>Or just use ordinary C, sometimes called C-plain
I've heard it called ANSI-C, but that may be too specific.
We got your C flavor, strawberry, bannana, vanilla, chocolate
and bannana-split C cream pie.
-sln
------------------------------
Date: Sun, 05 Apr 2009 23:05:27 GMT
From: sln@netherlands.com
Subject: Re: Hidden mode Apr. 4, 2009
Message-Id: <n6eit4hpvglj1u7rhsm3uj8a8f59segqfc@4ax.com>
On Sat, 4 Apr 2009 06:12:47 -0500, "E.D.G." <edgrsprj@ix.netcom.com> wrote:
>When using the Windows operating system, is there a way to start a Perl
>program running in a background, hidden, or invisible mode so that no
>Windows icon is visible while it runs in the background?
>
>
>
>When multiple applications are being run and it is necessary to move between
>windows, that would mean one less window to keep track of.
>
>
>
>
Can a Perl script be compiled as a exe service? Sure, why not?
-sln
------------------------------
Date: Sun, 5 Apr 2009 23:10:16 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: missing newlines
Message-Id: <slrngti7ho.pmu.hjp-usenet2@hrunkner.hjp.at>
On 2009-04-05 19:59, Werner Lemberg <wl@gnu.org> wrote:
> Using perl v5.10.0 on my GNU/Linux box, running the script and the data
> below as
>
> perl merge-major-hyphens.pl in1 in2
>
> I get strange output:
>
> Ab-bau=ma-te-ri-als
> Ab-bau=schAb-bau=schab-be=kom-men-den
>
> However, I expected this:
>
> Ab-bau=ma-te-ri-als
> ab-be=kom-men-den
>
> What am I doing wrong? I can't really believe that there is a bug in Perl
>:-)
>
> The idea of the script is to insert the major hyphenation points of `in2'
> into `in1' in case the hyphenation positions are the same to avoid false
> positives. Otherwise the line is discarded; the same happens if there is no
> hyphenation at all.
>
> [The data files contain latin-1 encoded characters so I'm attaching a tar.gz
> archive also to assure correct transmission.]
Thanks, that was helpful.
>==merge-major-hyphens.pl=====================================================
[...]
> while (1) {
> my @orig_field = split(//, <ORIG>);
> my @major_field = split(//, <MAJOR>);
I'd remove the newlines here and add them again in the output. The
newlines are not part of the words you want to hyphenate.
> my $out = "";
> my $hit = 0;
>
> for (my $o = 0, my $m = 0; $o <= $#orig_field; $o++, $m++) {
> if ($major_field[$m] eq "-") {
> if ($orig_field[$o] eq "\xb7") {
> # a hit
> $out .= "=";
> $hit = 1;
> next;
> }
> else {
> # a failure
> last;
This just leaves the for loop.
> }
> }
[...]
> }
> print $out if $hit;
So in case of a failure you get here and print the contents of $out
(which is incomplete and doesn't contain a newline).
> last if eof(ORIG);
> }
> close(ORIG);
> close(MAJOR);
>
>==in1========================================================================
> Ab·bau·schritt
>==in2========================================================================
> Abbau-sch-ritt
>=============================================================================
And here's the (first) problem in your data: Your program assumes that
each major hyphenation point is also a minor hyphenation point. But the
hypenation point between "sch" and "ritt" (which is wrong, btw) appears
only in the "major" file, not the "orig" file.
You already check for this condition, but you don't seem to handle it
properly: You either need to print a warning or error message (so that
the problem can be fixed) or decide which file is right (and either
insert the hyphenation point or ignore it).
hp
------------------------------
Date: Sun, 05 Apr 2009 21:34:37 GMT
From: sln@netherlands.com
Subject: Re: missing newlines
Message-Id: <hr8it4ljjnsltclj4lu8pehvatijm4jrvu@4ax.com>
On Sun, 5 Apr 2009 19:59:43 +0000 (UTC), Werner Lemberg <wl@gnu.org> wrote:
>
>Using perl v5.10.0 on my GNU/Linux box, running the script and the data
>below as
>
> perl merge-major-hyphens.pl in1 in2
>
>I get strange output:
>
> Ab-bau=ma-te-ri-als
> Ab-bau=schAb-bau=schab-be=kom-men-den
>
>However, I expected this:
>
> Ab-bau=ma-te-ri-als
> ab-be=kom-men-den
>
>What am I doing wrong? I can't really believe that there is a bug in Perl
>:-)
>
>The idea of the script is to insert the major hyphenation points of `in2'
>into `in1' in case the hyphenation positions are the same to avoid false
>positives. Otherwise the line is discarded; the same happens if there is no
>hyphenation at all.
>
>[The data files contain latin-1 encoded characters so I'm attaching a tar.gz
> archive also to assure correct transmission.]
>
>
> Werner
>
>
>==merge-major-hyphens.pl=====================================================
>
>use strict;
>use locale;
>use POSIX qw(locale_h);
>
>my $prog = $0;
>$prog =~ s@.*/@@;
>
>setlocale(LC_CTYPE, "de_DE.ISO8859-1");
>setlocale(LC_COLLATE, "de_DE.ISO8859-1");
>
>my $orig = $ARGV[0];
>my $major = $ARGV[1];
>
># in1: Zwi·schen·mahl·zeit
># in2: Zwischen-mahl-zeit
>#
># out: Zwi-schen=mahl=zeit
>
>open(ORIG, $orig);
>open(MAJOR, $major);
>
>while (1) {
> my @orig_field = split(//, <ORIG>);
> my @major_field = split(//, <MAJOR>);
>
> my $out = "";
> my $hit = 0;
>
> for (my $o = 0, my $m = 0; $o <= $#orig_field; $o++, $m++) {
> if ($major_field[$m] eq "-") {
> if ($orig_field[$o] eq "\xb7") {
> # a hit
> $out .= "=";
> $hit = 1;
> next;
> }
> else {
> # a failure
^^^^^^^^^
If this is a failure, better not print it out then?
$hit = 0;
> last;
> }
> }
> if ($orig_field[$o] eq "\xb7") {
> # after a hyphen there is always an ordinary character
> $out .= "-";
> $o++;
> }
> # since the characters in `orig' and `major' are the same,
> # we don't need to test equality and can emit directly
> $out .= $orig_field[$o];
> }
> print $out if $hit;
^^^^
Not a hit if a failure?
>
> last if eof(ORIG);
>}
>close(ORIG);
>close(MAJOR);
>
>==in1========================================================================
>
>Ab·bau·ma·te·ri·als
>Ab·bau·men·ge
>Ab·bau·men·gen
>Ab·bau·schritt
>Ab·bau·schrit·te
>ab·bau·test
>Ab·bau·un·ter·neh·men
>Ab·be
>Ab·bé
>ab·bei·ßen·de
>ab·bei·ßen·den
>ab·bei·ßen·der
>ab·bei·ßen·des
>ab·be·kom·men·den
>
>==in2========================================================================
>
>Abbau-materials
>Abbaumenge
>Abbaumengen
>Abbau-sch-ritt
>Abbau-sch-ritte
>abbautest
>Abbauunternehmen
>Abbe
>Abbé
>abbeißende
>abbeißenden
>abbeißender
>abbeißendes
>abbe-kommenden
>
>=============================================================================
>
-sln
------------------------------
Date: Sun, 05 Apr 2009 21:54:15 GMT
From: sln@netherlands.com
Subject: Re: missing newlines
Message-Id: <ck9it4pplk7ntft89nkhpf6c6oomnl0ct1@4ax.com>
On Sun, 5 Apr 2009 23:10:16 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>On 2009-04-05 19:59, Werner Lemberg <wl@gnu.org> wrote:
>> Using perl v5.10.0 on my GNU/Linux box, running the script and the data
>> below as
>>
[break]
>>==in1========================================================================
>> Ab·bau·schritt
>>==in2========================================================================
>> Abbau-sch-ritt
>>=============================================================================
>
>And here's the (first) problem in your data: Your program assumes that
>each major hyphenation point is also a minor hyphenation point. But the
>hypenation point between "sch" and "ritt" (which is wrong, btw) appears
>only in the "major" file, not the "orig" file.
>
>You already check for this condition, but you don't seem to handle it
>properly: You either need to print a warning or error message (so that
>the problem can be fixed) or decide which file is right (and either
>insert the hyphenation point or ignore it).
>
> hp
So this is some bad code imo.
It lives dangerously on the edge by assuming a relational construct between
two unequal strings that will permeate via an algol that does not reference
unitialized array elements.
Besides that huge impairment, I don't see any relationships of hyphenation
major or minor. I guess I'm not a word processor.
-sln
------------------------------
Date: Sun, 5 Apr 2009 21:55:24 +0000 (UTC)
From: Werner Lemberg <wl@gnu.org>
Subject: Re: missing newlines
Message-Id: <grb9cc$hc8$1@news.albasani.net>
Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> > while (1) {
> > my @orig_field = split(//, <ORIG>);
> > my @major_field = split(//, <MAJOR>);
> I'd remove the newlines here and add them again in the output. The
> newlines are not part of the words you want to hyphenate.
Well, yes, but...
> > my $out = "";
> > my $hit = 0;
> >
> > for (my $o = 0, my $m = 0; $o <= $#orig_field; $o++, $m++) {
> > if ($major_field[$m] eq "-") {
> > if ($orig_field[$o] eq "\xb7") {
> > # a hit
> > $out .= "=";
> > $hit = 1;
> > next;
> > }
> > else {
> > # a failure
> > last;
> This just leaves the for loop.
Yes, if there is a hyphenation point in `in2' at a wrong position, the loop
is left and nothing gets printed (since `$hit' isn't set).
> > }
> > }
> [...]
> > }
> > print $out if $hit;
> So in case of a failure you get here and print the contents of $out
> (which is incomplete and doesn't contain a newline).
Aah, found it. In case of failure I have to reset $hit to zero. Now it
works.
> [...] Your program assumes that each major hyphenation point is also a
> minor hyphenation point. But the hypenation point between "sch" and "ritt"
> (which is wrong, btw)
Of course it is wrong -- it is the very reason why I'm writing this perl
script :-)
In case you are interested in hyphenation pattern generation for German (and
soon with patterns for `Haupttrennstellen' only), you might have a look
here:
http://repo.or.cz/w/wortliste.git
Thanks to all for helping!
Werner
------------------------------
Date: Sun, 5 Apr 2009 21:56:22 +0000 (UTC)
From: Werner Lemberg <wl@gnu.org>
Subject: Re: missing newlines
Message-Id: <grb9e5$hc8$2@news.albasani.net>
sln@netherlands.com wrote:
> > # a failure
> ^^^^^^^^^
> If this is a failure, better not print it out then?
> $hit = 0;
This was exactly the problem, thanks.
Werner
------------------------------
Date: Sun, 05 Apr 2009 23:12:21 GMT
From: sln@netherlands.com
Subject: Re: Process header record and concatenate files
Message-Id: <efeit4phjop9u4fimt3o7mpni5jqo60vch@4ax.com>
On Sat, 4 Apr 2009 19:22:48 -0700 (PDT), Scott Bass <sas_l_739@yahoo.com.au> wrote:
>Hi,
>
>I'm not looking for a full blown solution, just architectural advice
>for the following design criteria...
>
Just submit this to my office for a bid. Arch advice is not available
however, since we consider that non-advice and a free service, which
we don't offer.
Any other advice is free however. Like should you buy GM stock.
That advice is always free.
-sln
------------------------------
Date: Sun, 05 Apr 2009 22:21:27 GMT
From: sln@netherlands.com
Subject: Re: Regex compiler not showing backtracking for *
Message-Id: <ijbit4hvg0urgnk7caehkql5i29dvnek9n@4ax.com>
>Let's agree with you here. What must have changed? The regex rules?
>The pragma's output? If so, what is the authoritative text now for me?
>I'll just go back and read my way through. My ecstatic feeling was
>that using the pragma and the rules from the book, I can learn to
>write better regexes.
Yeah, thats cool an all. So far you've learned what the * quantifier is.
Your well on your way to learning Regular Expressions. Another 20 years
and you might be able to parse something.
-sln
------------------------------
Date: Sun, 05 Apr 2009 23:35:47 GMT
From: sln@netherlands.com
Subject: Re: resolve single line with multiple items into mutliple lines, single items
Message-Id: <dnfit41hv6cjfjlcnkmdl162vtmjti6spl@4ax.com>
On Sun, 5 Apr 2009 14:50:04 +0800, "ela" <ela@yantai.org> wrote:
>Old line(columns tab-delimited):
>
>Col1 Col2 Col3 ... Coln
>A B1@B2 C ... N1@N2@N3
>
>New lines
>A B1 C .. N1
>A B1 C .. N2
>A B1 C .. N3
>A B2 C .. N1
>A B2 C .. N2
>A B2 C .. N3
>
>The problem is: although pattern matching can recognize "@", but how to
>write the code generically so to get all N1, N2 and N3, such that the number
>of items aren't known beforehand?
>
I just saw this. I didn't read the other posted responces that may have
actually solved this apparent easy problem.
From now on, not only will you Chinese, pigeon-English speaking, non-Perl
programming, American dollar sucking folks have to provide some DOLLA'S,
for the solution (that is what you want isin't it, source and all?),
but you will have to LEARN ENOUGH CORRECT ENLISH TO PROPERLY EXPLAIN THE
PROBLEM !!!!
If this takes hiring an Amrican (English first language) translator, then all the
better. Dok, dac, toa, dit, do, don, just don't cut it.
-sln
------------------------------
Date: Sun, 5 Apr 2009 23:11:51 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XRC vs Perl (GUI) generated code
Message-Id: <slrngti7ko.pmu.hjp-usenet2@hrunkner.hjp.at>
On 2009-04-05 20:03, ccc31807 <cartercc@gmail.com> wrote:
> On Apr 4, 5:37 pm, king kikapu <aboudou...@panafonet.gr> wrote:
>> I think Perl has become a general programming language and that
>> "extraction and reporting" section was only meant when Perl started,
>> now Perl is way beyond that.
>
> We disagree.
Is this the royal we?
hp
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2321
***************************************