[25249] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 7494 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Dec 8 00:05:30 2004

Date: Tue, 7 Dec 2004 21:05:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 7 Dec 2004     Volume: 10 Number: 7494

Today's topics:
    Re: Generate Squential Numbers? <1usa@llenroc.ude.invalid>
    Re: Generate Squential Numbers? <ahamm@mail.com>
    Re: Generate Squential Numbers? <1usa@llenroc.ude.invalid>
        grammar function <ken_sington@nospam_abcdefg.com>
    Re: grammar function <1usa@llenroc.ude.invalid>
    Re: grammar function <matthew.garrish@sympatico.ca>
    Re: grammar function <ken_sington@nospam_abcdefg.com>
    Re: grammar function <ken_sington@nospam_abcdefg.com>
    Re: grammar function <matthew.garrish@sympatico.ca>
        horizontal join of array elements <awkster@yahoo.com>
        horizontal join of array elements <awkster@yahoo.com>
    Re: horizontal join of array elements <1usa@llenroc.ude.invalid>
    Re: horizontal join of array elements <awkster@yahoo.com>
    Re: horizontal join of array elements <1usa@llenroc.ude.invalid>
    Re: horizontal join of array elements <awkster@yahoo.com>
    Re: PerlCom Replacements <not@home.net>
        Reading poorly structured data <amead@comcast.net>
    Re: Reading poorly structured data <1usa@llenroc.ude.invalid>
    Re: Reading poorly structured data <amead@comcast.net>
    Re: Reading poorly structured data <1usa@llenroc.ude.invalid>
    Re: Reading poorly structured data <1usa@llenroc.ude.invalid>
    Re: RegExp Help <sbryce@scottbryce.com>
    Re: RegExp Help <ahamm@mail.com>
        why the following HereDoc print don't work? <zhao_bingfeng@topsec.com.cn>
    Re: why the following HereDoc print don't work? <1usa@llenroc.ude.invalid>
    Re: why the following HereDoc print don't work? <ahamm@mail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 8 Dec 2004 02:23:44 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Generate Squential Numbers?
Message-Id: <Xns95B8D9A59187Dasu1cornelledu@132.236.56.8>

"Andrew Hamm" <ahamm@mail.com> wrote in
news:31n3inF3brdedU1@individual.net: 

> Hope you don't think I'm trying to lecture you. Just having a pleasant
> conversation about the nature of learning and putting my viewpoint.

And that's the way I take comments here.

However, I submit to you that dealing with co-workers (and students in my 
case, although I don't teach programming) is different than dealing with 
posters to a UseNet group. 

The value of this group to me is the fact that I learn by reading about 
specific problems and solutions. If it becomes a place where people just 
post vague problem descriptions and expect answers, I, along with people 
who are much more qualified than myself, would really have no incentive to 
hang around anymore. 

I suspect there are probably better solutions to (what I surmise to be) the 
OP's problem but since he/she has not spent enough effort actually 
formulating some of the parameters, we won't really know.

Anyway ...

Sinan


------------------------------

Date: Wed, 8 Dec 2004 14:25:58 +1100
From: "Andrew Hamm" <ahamm@mail.com>
Subject: Re: Generate Squential Numbers?
Message-Id: <31nai8F3cd7p3U1@individual.net>

A. Sinan Unur wrote:
>
> If it becomes a place where
> people just post vague problem descriptions and expect answers, I,
> along with people who are much more qualified than myself, would
> really have no incentive to hang around anymore.

yeah - could happen for sure. Maybe the volume of traffic is too large for
people to notice the easy questions. Still, even a vague ng question is
often much clearer than a dodgy 30 page spec from an unqualified "business
analyst".

Now I must go wash my mind out and say a few prayers to the Gods of
"customerisalwaysrightahh" aka Hari Kissass




------------------------------

Date: 8 Dec 2004 04:20:40 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Generate Squential Numbers?
Message-Id: <Xns95B8ED7681F12asu1cornelledu@132.236.56.8>

"Andrew Hamm" <ahamm@mail.com> wrote in
news:31nai8F3cd7p3U1@individual.net: 

> Now I must go wash my mind out and say a few prayers to the Gods of
> "customerisalwaysrightahh" aka Hari Kissass

:)))

I had never heard that one before.

Sinan.


------------------------------

Date: Tue, 07 Dec 2004 21:23:40 -0500
From: Ken Sington <ken_sington@nospam_abcdefg.com>
Subject: grammar function
Message-Id: <HJmdnb8C-OEc-CvcRVn-pw@speakeasy.net>

I've been working on a simple one word grammar checker.

I'm hoping other's can add ideas to it.

you access the function like this:
my $word = grammarAdjust("command", "word", <quantity>);

so table becomes tables if there are  more than one.
chair remains chair if there are one or zero.

if there are more than one james, we get jameses. and if something belongs to them, it's 
jameses'

 etc...


# grammarAdjust #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#
# takes single word and determines proper word grammar
# to use:
# grammarAdjust("command", "word", <quantity>);
# commands:
# ("plural", "word", <qty>)
# ("possessive", "word", <qty>)
sub grammarAdjust {
    my ($command, $word, $qty) = @_;
    my ($returnIt, $s)=();

    if ($command eq "plural"){
        if ($qty > 1){
            if (1 == 10){
                # 1 is not 10 of course
            } elsif ($word =~ m/es$/){
                $returnIt = $word ."es"; # james -> jameses
            } elsif ($word =~ m/x$/){
                $returnIt = $word . "es"; # box -> boxes
            } elsif ($word =~ m/ey$/){
                $returnIt = $word =~ s/ey$/ies/; # monkey -> monkies
            } elsif ($word =~ m/oy$/){
                $returnIt = $word . "s"; # boy -> boys
            } elsif ($word =~ m/^(woman|man)$/){
                $word =~ s/an$/en/; # (wo)man -> (wo)men
                $returnIt = $word;
            } elsif ($word !~ m/s$/){
                $returnIt = $word ."s"; # word -> words
            }


        } else {
            $returnIt = $word;
        }
    }
    elsif ($command eq "possessive") {
        if ($qty > 1){
            $s = $word =~ m/s$/ ? "\'" : "s\'" unless ($word =~ m/es$/);
            $word =~ s/es$/eses\'/ if ($word =~ m/es$/);
        } else {
            $s = $word =~ m/s$/ ? "\'" : "\'s" unless ($word =~ m/es$/);
            $word =~ s/es$/es\'s/ if ($word =~ m/es$/);
        }
        $returnIt = "$word$s";
    }



    return $returnIt;
}


------------------------------

Date: 8 Dec 2004 02:27:43 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: grammar function
Message-Id: <Xns95B8DA527694Dasu1cornelledu@132.236.56.8>

Ken Sington <ken_sington@nospam_abcdefg.com> wrote in
news:HJmdnb8C-OEc-CvcRVn-pw@speakeasy.net: 

> I've been working on a simple one word grammar checker.
> 
> I'm hoping other's can add ideas to it.
> 
> you access the function like this:
> my $word = grammarAdjust("command", "word", <quantity>);
> 
> so table becomes tables if there are  more than one.
> chair remains chair if there are one or zero.

Have you looked at Lingua::EN::Inflect?

http://search.cpan.org/~dconway/Lingua-EN-Inflect-1.88/

Sinan


------------------------------

Date: Tue, 7 Dec 2004 21:33:08 -0500
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: grammar function
Message-Id: <yJttd.32014$dC3.684367@news20.bellglobal.com>


"Ken Sington" <ken_sington@nospam_abcdefg.com> wrote in message 
news:HJmdnb8C-OEc-CvcRVn-pw@speakeasy.net...
> I've been working on a simple one word grammar checker.
>
> I'm hoping other's can add ideas to it.
>
> you access the function like this:
> my $word = grammarAdjust("command", "word", <quantity>);
>
[snip]
> $returnIt = $word =~ s/ey$/ies/; # monkey -> monkies
                                               ^^^^^^^

There's one good reason not to venture into computerized grammar checking. 
Take a look at how bad most spell checkers are and you'll find ample reason 
not to follow this road to madness... : )

Matt 




------------------------------

Date: Tue, 07 Dec 2004 21:51:30 -0500
From: Ken Sington <ken_sington@nospam_abcdefg.com>
Subject: Re: grammar function
Message-Id: <Qr2dnUlwRLWb8SvcRVn-3Q@speakeasy.net>

Matt Garrish wrote:
>                                                ^^^^^^^
> 
> There's one good reason not to venture into computerized grammar checking. 
> Take a look at how bad most spell checkers are and you'll find ample reason 
> not to follow this road to madness... : )
> 
> Matt 
> 
> 
too late, I'm already a lunatic.
I went nuts years ago.


------------------------------

Date: Tue, 07 Dec 2004 21:53:57 -0500
From: Ken Sington <ken_sington@nospam_abcdefg.com>
Subject: Re: grammar function
Message-Id: <Qr2dnUhwRLUH8SvcRVn-3Q@speakeasy.net>

A. Sinan Unur wrote:

> Have you looked at Lingua::EN::Inflect?
> 
> http://search.cpan.org/~dconway/Lingua-EN-Inflect-1.88/
> 
> Sinan
Lingua?  Mr. conway, I saw live in person.

so many possibilities. I would never have guessed.

I searched only "spell" and "grammar" and pretty much found nothing (... that I want).


------------------------------

Date: Tue, 7 Dec 2004 22:15:25 -0500
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: grammar function
Message-Id: <blutd.32317$dC3.698027@news20.bellglobal.com>


"A. Sinan Unur" <1usa@llenroc.ude.invalid> wrote in message 
news:Xns95B8DA527694Dasu1cornelledu@132.236.56.8...
> Ken Sington <ken_sington@nospam_abcdefg.com> wrote in
> news:HJmdnb8C-OEc-CvcRVn-pw@speakeasy.net:
>
>> I've been working on a simple one word grammar checker.
>>
>> I'm hoping other's can add ideas to it.
>>
>> you access the function like this:
>> my $word = grammarAdjust("command", "word", <quantity>);
>>
>> so table becomes tables if there are  more than one.
>> chair remains chair if there are one or zero.
>
> Have you looked at Lingua::EN::Inflect?
>
> http://search.cpan.org/~dconway/Lingua-EN-Inflect-1.88/
>

At least he has a sense of humour about it... : )

<doc quote>
BUGS AND IRRITATIONS

The endless inconsistencies of English.
</doc quote>

Matt 




------------------------------

Date: 7 Dec 2004 18:33:39 -0800
From: "Jorge" <awkster@yahoo.com>
Subject: horizontal join of array elements
Message-Id: <1102473219.758126.11630@z14g2000cwz.googlegroups.com>

Hi, it's me again -- Jorge

My input file is a list of x/y coordinates representing a geometry with
each vertex listed vertically on seperate lines. There are several
shapes in the list and each one always has a START delimiter pattern of
/initial/ and will always have a STOP delimiter of either /polygon/ or
/attribute/. The /terminal/ lines between the delims need to be
retained.

I need to gather each set of vertices into a horizontal string
seperated by whitespace.

So far I have managed to isolate the individual groups (after cleaning
them) and then I push the members of each group into an array with the
hopes of joining them on whitespace.

The program works (I think) until I apply the join command at which
time it still returns a vertical listing where I had expected a
horizontal string of all the elements.

I've been through perlfaq5/6 and other places and can not see what I
did wrong with this join command.

Any help is appreciated, Jorge.

Code begins here ....

open (FILE, "file.txt") or die "cant open file: $!";

my @arr;
my $arr;
my $clean_string;

while(<FILE>){
if(/initial|terminal/){
$clean_string = cleanser("$_");
push(@arr, "$clean_string");
}
elsif(/polygon|attribute/){
$clean_string = cleanser("$_");
push(@arr, "$clean_string");
}
$arr = join(" ", @arr);
print "$arr";
@arr = (); $arr = "";
}

close(FILE);

sub cleanser{
$_[0] =~ s/\$\$initial\(\[//g;
$_[0] =~ s/\$\$terminal\(\[//g;
$_[0] =~ s/\]\, \, \@nosnap \)\;//g;
$_[0] =~ s/\] \)\;//g; s/\]\)\;//g;
return $_[0];
}

file.txt begins here ...

$$initial([-0.086,0.062], , @nosnap ); # <- START
$$terminal([-0.052,0.062] );
$$terminal([-0.052,0.138] );
$$terminal([-0.061,0.138] );
------- snip ------
$$terminal([-0.061,-0.114] );
$$terminal([-0.052,-0.114] );
$$terminal([-0.052,-0.062] );
$$terminal([-0.086,-0.062] );
$$polygon( "POWER" );		#<- STOP
$$initial([-0.046,-0.022], , @nosnap ); #<- START
$$terminal([-0.012,-0.022] );
$$terminal([-0.012,-0.154] );
$$terminal([-0.021,-0.154] );
-------- snip --------
$$terminal([0.012,-0.154] );
$$terminal([0.012,-0.022] );
$$terminal([0.064,-0.022] );
$$terminal([0.064,0.022] );
$$attribute(...);		#<- STOP
 .
 .
 .
     more of the same
           .
           .



------------------------------

Date: 7 Dec 2004 18:34:28 -0800
From: "Jorge" <awkster@yahoo.com>
Subject: horizontal join of array elements
Message-Id: <1102473268.122656.75950@f14g2000cwb.googlegroups.com>

Hi, it's me again -- Jorge

My input file is a list of x/y coordinates representing a geometry with
each vertex listed vertically on seperate lines. There are several
shapes in the list and each one always has a START delimiter pattern of
/initial/ and will always have a STOP delimiter of either /polygon/ or
/attribute/. The /terminal/ lines between the delims need to be
retained.

I need to gather each set of vertices into a horizontal string
seperated by whitespace.

So far I have managed to isolate the individual groups (after cleaning
them) and then I push the members of each group into an array with the
hopes of joining them on whitespace.

The program works (I think) until I apply the join command at which
time it still returns a vertical listing where I had expected a
horizontal string of all the elements.

I've been through perlfaq5/6 and other places and can not see what I
did wrong with this join command.

Any help is appreciated, Jorge.

Code begins here ....

open (FILE, "file.txt") or die "cant open file: $!";

my @arr;
my $arr;
my $clean_string;

while(<FILE>){
if(/initial|terminal/){
$clean_string = cleanser("$_");
push(@arr, "$clean_string");
}
elsif(/polygon|attribute/){
$clean_string = cleanser("$_");
push(@arr, "$clean_string");
}
$arr = join(" ", @arr);
print "$arr";
@arr = (); $arr = "";
}

close(FILE);

sub cleanser{
$_[0] =~ s/\$\$initial\(\[//g;
$_[0] =~ s/\$\$terminal\(\[//g;
$_[0] =~ s/\]\, \, \@nosnap \)\;//g;
$_[0] =~ s/\] \)\;//g; s/\]\)\;//g;
return $_[0];
}

file.txt begins here ...

$$initial([-0.086,0.062], , @nosnap ); # <- START
$$terminal([-0.052,0.062] );
$$terminal([-0.052,0.138] );
$$terminal([-0.061,0.138] );
------- snip ------
$$terminal([-0.061,-0.114] );
$$terminal([-0.052,-0.114] );
$$terminal([-0.052,-0.062] );
$$terminal([-0.086,-0.062] );
$$polygon( "POWER" );		#<- STOP
$$initial([-0.046,-0.022], , @nosnap ); #<- START
$$terminal([-0.012,-0.022] );
$$terminal([-0.012,-0.154] );
$$terminal([-0.021,-0.154] );
-------- snip --------
$$terminal([0.012,-0.154] );
$$terminal([0.012,-0.022] );
$$terminal([0.064,-0.022] );
$$terminal([0.064,0.022] );
$$attribute(...);		#<- STOP
 .
 .
 .
     more of the same
           .
           .



------------------------------

Date: 8 Dec 2004 03:11:23 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: horizontal join of array elements
Message-Id: <Xns95B8E1B8D879Dasu1cornelledu@132.236.56.8>

"Jorge" <awkster@yahoo.com> wrote in 
news:1102473268.122656.75950@f14g2000cwb.googlegroups.com:

> The program works (I think) until I apply the join command at which
> time it still returns a vertical listing where I had expected a
> horizontal string of all the elements.

This is clear as mud. I hope you have considered using Data::Dumper to look 
into your data structures and also run this script in the debugger to see 
what is happening at each stage.

> Code begins here ....

It is best to format your code so it can easily be read by other people.

> open (FILE, "file.txt") or die "cant open file: $!";
> 
> my @arr;
> my $arr;
> my $clean_string;

You don't need $clean string in this scope.
 
> while(<FILE>){
> if(/initial|terminal/){
> $clean_string = cleanser("$_");

Useless use of quotes.

> push(@arr, "$clean_string");

Useless use of quotes.

> }

Use capturing regular expressions to extract the part of the string that 
you are interested in.

> elsif(/polygon|attribute/){
> $clean_string = cleanser("$_");

Useless use of quotes.

> push(@arr, "$clean_string");

Useless use of quotes.

> }

You are doing the exact same thing in the 'if' and 'else'.

Based on your verbal description, I came up with something. It would be 
better if you could give an example of what output you expect from the data 
you showed. This deals with only part of the data you posted because I am 
not sure what to with 'attribute' etc.

#! perl

use strict;
use warnings;

my @x;
my @y;

my %polygons;

while(<DATA>) {
    if( /^\$\$polygon\(\s*"(\w+)"\s*\);$/ ) {
        $polygons{$1} = { x => [ @x ], y => [ @y ] };
        @x = @y = ();
        next;
    }
    if( /^\$\$initial\(\s*\[(.+)\]/
        || /^\$\$terminal\(\s*\[(.+)\]/) {
        my ($x, $y) = split /,/, $1;
        push @x, $x;
        push @y, $y;
        next;
    }
}

for my $p (values %polygons) {
    print 'x => ', join(' ', @{ $p->{x} }), "\n";
    print 'y => ', join(' ', @{ $p->{y} }), "\n";
}


__DATA__
$$initial([-0.086,0.062], , @nosnap );
$$terminal([-0.052,0.062] );
$$terminal([-0.052,0.138] );
$$terminal([-0.061,0.138] );
$$terminal([-0.061,-0.114] );
$$terminal([-0.052,-0.114] );
$$terminal([-0.052,-0.062] );
$$terminal([-0.086,-0.062] );
$$polygon( "POWER" );
__END__

D:\Home>perl t6.pl
x => -0.086 -0.052 -0.052 -0.061 -0.061 -0.052 -0.052 -0.086
y => 0.062 0.062 0.138 0.138 -0.114 -0.114 -0.062 -0.062


------------------------------

Date: 7 Dec 2004 20:07:04 -0800
From: "Jorge" <awkster@yahoo.com>
Subject: Re: horizontal join of array elements
Message-Id: <1102478824.153598.45460@z14g2000cwz.googlegroups.com>

For starters I had the post formatted when I pasted it into the dialog
-- maybe I need to see if my editor is not working correctly on
cut-n-paste.

I expect the output to look like this for each group ...

-0.052,0.062 -0.052,0.138 -0.061,0.138 -0.061,-0.114 -0.052,-0.114
-0.052,-0.062 -0.086,-0.062 polygon( "POWER

The last token can also be ATTRIBUTE if that is the case
I apologize for the confusion

Jorge



------------------------------

Date: 8 Dec 2004 04:19:21 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: horizontal join of array elements
Message-Id: <Xns95B8ED3D1C816asu1cornelledu@132.236.56.8>

"Jorge" <awkster@yahoo.com> wrote in 
news:1102478824.153598.45460@z14g2000cwz.googlegroups.com:

> I expect the output to look like this for each group ...
> 
> -0.052,0.062 -0.052,0.138 -0.061,0.138 -0.061,-0.114 -0.052,-0.114
> -0.052,-0.062 -0.086,-0.062 polygon( "POWER

Please quote some context. From your original post:

$$initial([-0.086,0.062], , @nosnap ); # <- START
$$terminal([-0.052,0.062] );
$$terminal([-0.052,0.138] );
$$terminal([-0.061,0.138] );
------- snip ------
$$terminal([-0.061,-0.114] );
$$terminal([-0.052,-0.114] );
$$terminal([-0.052,-0.062] );
$$terminal([-0.086,-0.062] );
$$polygon( "POWER" );

Or you now saying that you actually want to disregard the initial points?

Anyway, it should be trivial to modify the code I posted to do this.

Sinan.


------------------------------

Date: 7 Dec 2004 20:23:37 -0800
From: "Jorge" <awkster@yahoo.com>
Subject: Re: horizontal join of array elements
Message-Id: <1102479817.758540.136010@z14g2000cwz.googlegroups.com>

Sinan

You have given me more than enough to work with and I thank you for
taking the time to do so.

Jorge



------------------------------

Date: Wed, 08 Dec 2004 04:50:14 GMT
From: "VBSome" <not@home.net>
Subject: Re: PerlCom Replacements
Message-Id: <aKvtd.27807$fC4.14389@newssvr11.news.prodigy.com>

> Are there no modules on CPAN? I have never used COM but surely someone
> else has independently made a tool.

So far I've only seen mods to use COM from inside Perl, not Perl from inside 
COM

>
> Was PerlCom based on an open source effort, or was it proprietry from
> Active State? If open source, how do you feel about building it yourself
> from the (presumably) free source?
>
It was proprietary ActiveState stuff. The basically said "sorry don't 
support it anymore". Can;t believe nobody else is trying to do the same 
thing. They have a "replacement called: PerlCtrl. It builds standalone 
ActiveX controls from a Perl script. If I wanted that I would build it in 
one of the MS languages to begin with. I want to extend my app with a script 
language!

Building it myself? I am good, but not let's build a COM interface for Perl 
good *grin*





------------------------------

Date: Tue, 07 Dec 2004 20:40:14 -0600
From: Alan Mead <amead@comcast.net>
Subject: Reading poorly structured data
Message-Id: <pan.2004.12.08.02.40.13.661534@comcast.net>

I have five files of contact info (one for each year of a conference). 
All five have slightly different fairly unstructured formats.  One looks
like this:

Bush, George, President, 1 White House Way, Washington, 
DC 00000; gbush@whitehouse.gov
Kerry, John, 1 Main, Detroit, MI 00000; jkerry@yahoo.com
Williams, Robin, 2 Main, Burbank, CA 00000
Newman, Paul, President and Principal Spokesperson,
Paul Newmans's Own Brand Foods, 123 Main Street, 
Olympia Fields, WY 00000; paul@newmans.org
Blair, Tony, 1 Downing Street, London, UK 0000000
 ... etc..

So the fields are comma-separated, except for email which may be absent,
and the record may be split over two or three lines.

In a later file dozens of records appear on the same line.

I'd like to output

lname=Bush
fname=George
address=President, 1 White House Way, Washington, DC 00000
email=gbush@whitehouse.gov

Any ideas how to parse this using Perl? So far I can parse about 60% of
the records with the below hack. It gets tripped up when the number
of commas in a record is large (some people have five lines of 
address with embedded commas) in which cases it will parse the 
first half of the record fairly well and then try to parse the 
next half as a new record.

-Alan

my $i=0;
while($i<=$count) {
  $i++;
  my($lname,$fname,$address,$email)=('','','','');
  my $line = $lines{$i}; 
  if ($line =~ /[,;]$/) { # clearly more on next line
    $lines{$i+1} = "$line $lines{$i+1}";
    next;
  }
  if ( (scalar split/,/,$line) > 4) { # a proper name and address will
                                      # have at least 5 parts
    if ($line =~ /@/) {
      my @bits = split(/;/,$line); # email is last element when split 
                                   # on semicolons, so save it
      $email = pop(@bits);
      $line = join(';',@bits);     # put line back together (just
                                   # in case there's more than one
                                   # semi-colon in the record)
    }
    my @bits = split(/,/,$line);    # now split on commas 
    $lname = shift @bits;        # lname is first bit 
    $fname = shift @bits;        # folllowed by fname 
    $address = join(',',@bits);  # the rest is the address
  } else {
    $lines{$i+1} = "$line $lines{$i+1}";
    next;
  }
 ...
}




------------------------------

Date: 8 Dec 2004 04:04:53 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Reading poorly structured data
Message-Id: <Xns95B8EAC9DD89Dasu1cornelledu@132.236.56.8>

Alan Mead <amead@comcast.net> wrote in 
news:pan.2004.12.08.02.40.13.661534@comcast.net:

> I have five files of contact info (one for each year of a conference). 
> All five have slightly different fairly unstructured formats.  One looks
> like this:
> 
> Bush, George, President, 1 White House Way, Washington, 
> DC 00000; gbush@whitehouse.gov
> Kerry, John, 1 Main, Detroit, MI 00000; jkerry@yahoo.com
> Williams, Robin, 2 Main, Burbank, CA 00000
> Newman, Paul, President and Principal Spokesperson,
> Paul Newmans's Own Brand Foods, 123 Main Street, 
> Olympia Fields, WY 00000; paul@newmans.org
> Blair, Tony, 1 Downing Street, London, UK 0000000
> ... etc..

Here is somewhat of a kludge that "works" for the snippet you posted. Hope 
this helps.

#! perl

use strict;
use warnings;

use File::Slurp;

my $input = read_file(\*DATA);
$input =~ tr/\n/ /;

my @records;

while(length $input) {
    my %record;
    $record{lname} = grab_name($input);
    $record{fname} = grab_name($input);
    $input =~ /[A-Z]{2} \d+/g;
    $record{address} = substr $input, 0, pos($input);
    $input = substr $input, pos($input);
    if($input =~ /^;\s*(\w+\@\w+\.\w+)\s*/g) {
        $record{email} = $1;
        $input = substr $input, pos $input;
    }
    push @records, \%record;
}

use Data::Dumper;
print Dumper \@records;
    
sub grab_name {
    my $off = index $_[0], ',';
    my $name = substr $_[0], 0, $off;
    $_[0] = substr $_[0], $off + 2;
    return $name;
}

__DATA__
Bush, George, President, 1 White House Way, Washington, 
DC 00000; gbush@whitehouse.gov
Kerry, John, 1 Main, Detroit, MI 00000; jkerry@yahoo.com
Williams, Robin, 2 Main, Burbank, CA 00000
Newman, Paul, President and Principal Spokesperson,
Paul Newmans's Own Brand Foods, 123 Main Street, 
Olympia Fields, WY 00000; paul@newmans.org
Blair, Tony, 1 Downing Street, London, UK 0000000




------------------------------

Date: Tue, 07 Dec 2004 22:29:11 -0600
From: Alan Mead <amead@comcast.net>
Subject: Re: Reading poorly structured data
Message-Id: <pan.2004.12.08.04.29.10.851237@comcast.net>

On Wed, 08 Dec 2004 04:04:53 +0000, A. Sinan Unur wrote:

> Here is somewhat of a kludge that "works" for the snippet you posted. Hope 
> this helps.
> 
> #! perl
> use strict;
> use warnings;
> use File::Slurp;
> my $input = read_file(\*DATA);
> $input =~ tr/\n/ /;
> my @records;
> while(length $input) {
>     my %record;
>     $record{lname} = grab_name($input);
>     $record{fname} = grab_name($input);
>     $input =~ /[A-Z]{2} \d+/g;
>     $record{address} = substr $input, 0, pos($input);
>     $input = substr $input, pos($input);
>     if($input =~ /^;\s*(\w+\@\w+\.\w+)\s*/g) {
>         $record{email} = $1;
>         $input = substr $input, pos $input;
>     }
>     push @records, \%record;
> }
[...]

And so it does very nicely.  I think you are making use of the fact that
these all had a pair of capital letters near the end (including the
convenient UK) but there is a 'D.C.' in my data and some other
addresses outside the US (that lack this feature).  I should have included
a better sample. But this may get me to 95% ... The way you've slurped the
file makes this perfectly applicable to the rest of the files which is a
REALLY BIG help.

Thanks!

-Alan


------------------------------

Date: 8 Dec 2004 04:58:49 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Reading poorly structured data
Message-Id: <Xns95B8F3ED5DCB9asu1cornelledu@132.236.56.8>

Alan Mead <amead@comcast.net> wrote in
news:pan.2004.12.08.04.29.10.851237@comcast.net: 

> On Wed, 08 Dec 2004 04:04:53 +0000, A. Sinan Unur wrote:
> 
>>     $input =~ /[A-Z]{2} \d+/g;
 ...

> And so it does very nicely.  I think you are making use of the fact
> that these all had a pair of capital letters near the end (including
> the convenient UK) but there is a 'D.C.' in my data and some other
> addresses outside the US (that lack this feature).

Actually, that is a standing for some kind of Country/State Code with 
numeric postal code match because all your addresses seemed to end with 
that. 

The "two capital letters followed by some digits as end of mailing address 
indicator" was one of the things that made the code kludgy.

I am sure others will provide better ways once the sun comes up. Good luck.

Sinan.


------------------------------

Date: 8 Dec 2004 05:00:36 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Reading poorly structured data
Message-Id: <Xns95B916EAB76asu1cornelledu@132.236.56.8>

"A. Sinan Unur" <1usa@llenroc.ude.invalid> wrote in
news:Xns95B8F3ED5DCB9asu1cornelledu@132.236.56.8: 

> Actually, that is a standing for some kind of Country/State Code with 
                      ^^^^^^^^
I meant 'stand-in'. Sorry.

Sinan


------------------------------

Date: Tue, 07 Dec 2004 19:46:52 -0700
From: Scott Bryce <sbryce@scottbryce.com>
Subject: Re: RegExp Help
Message-Id: <5tWdnUSl__GB9ivcRVn-2Q@comcast.com>

Andrew Hamm wrote:

> so that looks like a pre-existing text file to me.

No doubt.

> That's why I think it's
> too late to perform a fix where you suggest.

Huh?

I'm not suggesting fixing the file. I am only suggesting an easier way 
to drop trailing zeroes to the right of the decimal than a regex.

> Still, only Indigo knows for
> sure, and it would be nice if Indigo could satisfy our curiosity with a
> bit of info about where it's coming from

That doesn't matter. Once his script has the data, the source of the 
data is irrelevant, if all he wants to do is strip trailing zeroes to 
the right of the decimal.

Or did I miss something?

OK, taking another look at the original post, it looks like he wants to 
retain the formatting of each line. If that is the case, you have a point.



------------------------------

Date: Wed, 8 Dec 2004 14:45:43 +1100
From: "Andrew Hamm" <ahamm@mail.com>
Subject: Re: RegExp Help
Message-Id: <31nbn9F3c9pl3U1@individual.net>

Scott Bryce wrote:
>
> OK, taking another look at the original post, it looks like he wants
> to retain the formatting of each line. If that is the case, you have
> a point.

yup - that was my interpretation. It struck a chord for me because in the
business data world, that's the sort of crap we sometimes have to do;
process outside data which can be in as bad a format as a captured report.
I sincerely hope that XML will soon make report-scraping, CSV, "unload
files" and plain text files nothing but a fuzzy memory.

I'm also kind of wishing that there might be integers in the report which
need protecting against loss of trailing zeros. I'm letting my brain
"background" a solution because it's interesting. A bit of progress has
already popped up. I'm not sure it won't be another stepup of difficulty,
but am not willing to put the time in to play yet. I really shouldn't be
posting messages either...




------------------------------

Date: Wed, 8 Dec 2004 10:17:48 +0800
From: "╒╘▒√╖х" <zhao_bingfeng@topsec.com.cn>
Subject: why the following HereDoc print don't work?
Message-Id: <cp5ofh$16vj$1@mail.cn99.com>

In a generator of C code file script, I wrote:

<CODE>
 ...
print SOURCE <<EOF;
default:
return 0;
}

return ret;
}
EOF
 ...
</CODE>

Perl complained "Can't find string terminator "EOF" anywhere before EOF at
XXX(line No.)".
What happened and why?



гнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгн
Topsec, the first-class security products provider!

гнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгнгн






------------------------------

Date: 8 Dec 2004 02:31:16 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: why the following HereDoc print don't work?
Message-Id: <Xns95B8DAEC0A2ABasu1cornelledu@132.236.56.8>

"╒╘▒√╖х" <zhao_bingfeng@topsec.com.cn> wrote in
news:cp5ofh$16vj$1@mail.cn99.com: 

> In a generator of C code file script, I wrote:
> 
> <CODE>
> ...
> print SOURCE <<EOF;
> default:
> return 0;
> }
> 
> return ret;
> }
> EOF
> ...
> </CODE>
> 
> Perl complained "Can't find string terminator "EOF" anywhere before
> EOF at XXX(line No.)".
> What happened and why?

Show real code.

D:\Home>cat t8.pl
use strict;
use warnings;

open SOURCE, '>-' or die $!;

print SOURCE <<EOF;
default:
return 0;
}

return ret;

EOF


D:\Home>perl t8.pl
default:
return 0;
}

return ret;

Sinan


------------------------------

Date: Wed, 8 Dec 2004 14:29:40 +1100
From: "Andrew Hamm" <ahamm@mail.com>
Subject: Re: why the following HereDoc print don't work?
Message-Id: <31nap6F3d6tgkU1@individual.net>

╒╘▒√╖х wrote:
>
> Perl complained "Can't find string terminator "EOF" anywhere before
> EOF at XXX(line No.)".
> What happened and why?

agree with Sinan - but then, showing real code might be difficult - many
news readers seem to rip out leading tabs and spaces when you make a
posting. From your lack of indentation, either you are a messy programmer,
or the newsreader you use has modified your message.

The EOF needs to be on the very left column. It should not be indented. If
the complete lack of indentation in your posting is how you really do have
it in your script, then it looks to me like it should work; indeed a test
does work for me.

summary - put EOF in the very left column of the line and you should be
happy.




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 7494
***************************************


home help back first fref pref prev next nref lref last post