[31422] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 2674 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Nov 14 00:09:44 2009

Date: Fri, 13 Nov 2009 21:09:09 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 13 Nov 2009     Volume: 11 Number: 2674

Today's topics:
    Re: How to get a program output into array <tadmc@seesig.invalid>
    Re: How to get a program output into array <ben@morrow.me.uk>
    Re: How to get a program output into array <uri@StemSystems.com>
    Re: How to get a program output into array <ben@morrow.me.uk>
    Re: How to identify double bytes language? <nospam-abuse@ilyaz.org>
    Re: How to identify double bytes language? <ben@morrow.me.uk>
    Re: How to identify double bytes language? <sqlcamel@yahoo.com.hk>
        Please help with processing flat file <nobody@nowhere.com>
    Re: Please help with processing flat file <ben@morrow.me.uk>
    Re: Please help with processing flat file <tadmc@seesig.invalid>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 13 Nov 2009 15:03:33 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: How to get a program output into array
Message-Id: <slrnhfri7t.nvf.tadmc@tadbox.sbcglobal.net>

Thomas Barth <txbarth@web.de> wrote:

>      open(SOXIN, "sox $path -r 8000 -c 1 $src_dir/$basename.vox stat |");


You should always, yes *always*, check the return value from open:

    open(SOXIN, "sox $path -r 8000 -c 1 $src_dir/$basename.vox stat |")
        die "could not open the sox program: $!";

see also:

    perldoc -q pipe

        Why doesn't open() return an error when a pipe open fails?


>      close(SOXIN);


So with "pipe open"s you should also check the return value from close():

   close(SOXIN) or die "problem running sox: $!";


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Fri, 13 Nov 2009 23:09:04 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to get a program output into array
Message-Id: <gh51t6-q2p2.ln1@osiris.mauzo.dyndns.org>


Quoth "Uri Guttman" <uri@StemSystems.com>:
>   >> its cleared, I got it with the command
>   >> open(SOXIN, "sox $path -r 8000 -c 1 $src_dir/$basename.vox stat 2>&1 |");
> 
>   sr> Or this:
> 
>   sr> my @soxin = split /\n/, qx/ sox $path -r 8000 -c 1 $src_dir/
>   sr> $basename.vox stat /;
> 
> no need for the split. backticks/qx will split on \n in a list context.
> 
> also that won't work as you are using / for the delimiter and / is on
> the data. so use another delimiter and {} is usually the best choice
> there.

Also, that won't work since qx// doesn't automatically do the 2>&1 that
was missing in the first place...

Ben



------------------------------

Date: Fri, 13 Nov 2009 19:01:47 -0500
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: How to get a program output into array
Message-Id: <878weaht3o.fsf@quad.sysarch.com>

>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:

  BM> Quoth "Uri Guttman" <uri@StemSystems.com>:
  >> >> its cleared, I got it with the command
  >> >> open(SOXIN, "sox $path -r 8000 -c 1 $src_dir/$basename.vox stat 2>&1 |");
  >> 
  sr> Or this:
  >> 
  sr> my @soxin = split /\n/, qx/ sox $path -r 8000 -c 1 $src_dir/
  sr> $basename.vox stat /;
  >> 
  >> no need for the split. backticks/qx will split on \n in a list context.
  >> 
  >> also that won't work as you are using / for the delimiter and / is on
  >> the data. so use another delimiter and {} is usually the best choice
  >> there.

  BM> Also, that won't work since qx// doesn't automatically do the 2>&1 that
  BM> was missing in the first place...

yeah, i thought about that after i saw the other posts. it wasn't clear
from the OP that it was stderr coming out of sox (which is kind of odd
anyhow).

so the above code is wrong on 3 counts: no need for split, broken
delimiter and not redirecing stderr. not bad! :)

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Sat, 14 Nov 2009 01:12:03 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to get a program output into array
Message-Id: <3oc1t6-5t9.ln1@osiris.mauzo.dyndns.org>


Quoth "Uri Guttman" <uri@StemSystems.com>:
> 
> yeah, i thought about that after i saw the other posts. it wasn't clear
> from the OP that it was stderr coming out of sox (which is kind of odd
> anyhow).

IIRC sox usually expects to emit a (converted) sound file on STDOUT; the
'stat' filter presumably doesn't emit a sound file at all and instead
emits some analysis on STDERR.

Ben



------------------------------

Date: Fri, 13 Nov 2009 21:58:13 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: How to identify double bytes language?
Message-Id: <slrnhfrljl.pil.nospam-abuse@powdermilk.math.berkeley.edu>

On 2009-11-13, sqlcamel <sqlcamel@yahoo.com.hk> wrote:
> Hello,
>
> I have a text file, there are some double-bytes words in it, like
> Chinese, Japanese.
> Is there a way to identify them separately with Perl? Thanks.

As you can see, the posters may be confused about the meaning of your
question.

Myself, I think your question is about "how to guess which encoding it
is?".  But please be more specific...

Ilya


------------------------------

Date: Fri, 13 Nov 2009 23:06:28 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to identify double bytes language?
Message-Id: <kc51t6-q2p2.ln1@osiris.mauzo.dyndns.org>


Quoth "Dr.Ruud" <rvtol+usenet@xs4all.nl>:
> Ben Morrow wrote:
> > Dr.Ruud:
> 
> >>> I have a text file, there are some double-bytes words in it, like
> >>> Chinese, Japanese.
> >>> Is there a way to identify them separately with Perl? Thanks.
> >> See
> >>    `perldoc perlopentut`,
> >>    `perldoc -f open`,
> >>    `perldoc open`,
> >>    `perldoc PerlIO`
> >> and look for "layer".
> > 
> > IMHO you should start with perldoc perlunitut and perldoc perlunicode.
> 
> I don't understand. Maybe you thought that UTF-16 was meant?

I didn't. The IO encoding is irrelevant: what matters is that once the
data gets into Perl it's in Unicode-marked strings, so you need to be
prepared to deal with them. Perl's handling of Unicode is...
interesting, for mostly good reasons.

> The data in the "double-byte" encoded files (probably Shift-JIS, GB2312 
> or Big5) will just become normal Perl strings if the right IO-layer is used.

No, they will become SvUTF8 strings, which (shouldn't, but do) behave
differently from byte strings under some circumstances.

Ben



------------------------------

Date: Fri, 13 Nov 2009 19:31:55 -0800 (PST)
From: sqlcamel <sqlcamel@yahoo.com.hk>
Subject: Re: How to identify double bytes language?
Message-Id: <1615aa71-6413-4acb-8129-18f41e154d0b@z3g2000prd.googlegroups.com>

Thanks for all the suggestions.
What I wanted is, for example, given the text piece below:

There is a =D6=D0=B9=FA=C8=CB in the park.

So how to scratch the gb2312 word of =D6=D0=B9=FA=C8=CB from the text?

Thanks again.


On 11=D4=C214=C8=D5, =C9=CF=CE=E75=CA=B158=B7=D6, Ilya Zakharevich <nospam-=
ab...@ilyaz.org> wrote:
> On 2009-11-13, sqlcamel <sqlca...@yahoo.com.hk> wrote:
>
> > Hello,
>
> > I have a text file, there are some double-bytes words in it, like
> > Chinese, Japanese.
> > Is there a way to identify them separately with Perl? Thanks.
>
> As you can see, the posters may be confused about the meaning of your
> question.
>
> Myself, I think your question is about "how to guess which encoding it
> is?".  But please be more specific...
>
> Ilya



------------------------------

Date: Sat, 14 Nov 2009 00:12:20 GMT
From: nobody <nobody@nowhere.com>
Subject: Please help with processing flat file
Message-Id: <EBmLm.188644$8m4.125180@en-nntp-07.dc1.easynews.com>

I'm trying to process flat files with many thousands of records.  In 
these files several rows comprise the information for a single customer.  
In the example __DATA__ below, I'm trying to fill the variables with the 
customer information while the customer number is 06020004293, then for 
customer number 07020000279, and finally customer number 09020000251.  I 
believe my problem is looping while the customer number remains the same, 
then move on to the next customer numbers.  I've been pulling my hair out 
with nested while and do loops.  I've included the desired output below.  
Here's what I'm working with so far:



#!/usr/bin/perl

use strict;
use warnings;

my (
  $Name,
  $City,
  $Street
);


while (<DATA>) {

  chomp;

  if (substr($_, 12, 1) eq 'A') {
    $Name = substr($_, 14, 17);
  }

  if (substr($_, 12, 1) eq 'B') {
    $City = substr($_, 14, 17);
  }

  if (substr($_, 12, 1) eq 'C') {
    $Street = substr($_, 33, 19);
  }
  


}


print "Name: $Name\n";
print "City: $City\n";
print "Street: $Street\n";


# Desired output:

#Name: Fred Flintstone
#City: Bedrock
#Street: 123 Bedrock Road

#Name: George Washington
#City: Washington D.C.
#Street: 

#Name: Joe Smith
#City: Smallville
#Street: 


__DATA__
06020004293 A Fred Flintstone   123 Bedrock Road
06020004293 B Bedrock            Gravel Pit
06020004293 C Loney Toons       123 Bedrock Road
07020000279 A George Washington 234 Washington Ave.
07020000279 B Washington D.C.   234 Washington Ave.
09020000251 A Joe Smith         54 Abbey Road
09020000251 B Smallville        54 Abbey Road 


------------------------------

Date: Sat, 14 Nov 2009 01:56:57 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Please help with processing flat file
Message-Id: <9cf1t6-b8a.ln1@osiris.mauzo.dyndns.org>


Quoth nobody <nobody@nowhere.com>:
> I'm trying to process flat files with many thousands of records.  In 
> these files several rows comprise the information for a single customer.  
> In the example __DATA__ below, I'm trying to fill the variables with the 
> customer information while the customer number is 06020004293, then for 
> customer number 07020000279, and finally customer number 09020000251.  I 
> believe my problem is looping while the customer number remains the same, 
> then move on to the next customer numbers.  I've been pulling my hair out 
> with nested while and do loops.  I've included the desired output below.  
> Here's what I'm working with so far:

Since you want to print out the information you have every time you see
a new customer number, you need to extract and remember the number from
each line. I'll make minimal additions to your code to acheive this,
then talk about general style later.

> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> my (
>   $Name,
>   $City,
>   $Street

    $Customer,
    $Last_Customer,

Note that Perl explicitly allows you to include a trailing comma in
lists, so that you can add and remove lines without worrying about
whether this was the last entry or not.

> );
> 
> while (<DATA>) {
> 
>   chomp;
> 

    # Get the customer number for the new line
    $Customer = substr($_, 0, 10);
    
    if (
        # ..we've seen at least one line already, and...
        defined $Last_Customer and 

        # ...the new line is for a different customer from the last...
        $Customer ne $Last_Customer
    ) {
        # ...print out the data for the old customer before we proceed
        # to extract the data for the new one.
        print "Name: $Name\n";
        print "City: $City\n";
        print "Street: $Street\n";
        print "\n";
    }

    # Remember which customer we were on for next time round the loop.
    $Last_Customer = $Customer;

>   if (substr($_, 12, 1) eq 'A') {
>     $Name = substr($_, 14, 17);
>   }
> 
>   if (substr($_, 12, 1) eq 'B') {
>     $City = substr($_, 14, 17);
>   }
> 
>   if (substr($_, 12, 1) eq 'C') {
>     $Street = substr($_, 33, 19);
>   }
>   
> }

We need to keep this final section in this version of the program, since
otherwise the very last customer will never get their information
printed. *However*, that fact should immediately make you say to
yourself 'I've just written the same thing twice. How could I have
avoided that?'.

> print "Name: $Name\n";
> print "City: $City\n";
> print "Street: $Street\n";

The first comment to make about style is, IMHO, that multiple 'print'
statements are always a bad idea. Perl has a special form of multi-line
quoting called 'here documents' which allow you to avoid that:

    print <<OUTPUT;
    Name: $Name
    City: $City
    Street: $Street

    OUTPUT

See the section "<<EOF" in perldoc perlop for more details.

The second is that it would be much easier to split the line into fields
first, rather than picking out pieces as you need them. For this I would
use a regex, which will additionally let you check that the line looks
as you expect. So, I might write something like

    my @record = /^(\d{10}) ([ABC]) (.{17}) (.{19})$/
        or die "Invalid record: [$_]";

which does rather a lot of things in one statement. First the /.../
expression matches $_ against the given pattern, and returns a list of
substrings. Start with perldoc perlretut to understand the syntax used
for the patterns. Next, the 'my @record =' takes that list of
substrings, and puts it in a newly-declared array. Finally, if the
pattern match failed, the whole expression is 'false', so the 'or die
"..."' will fire to alert you of the error. (The reason for putting the
offending line in [] in the error message is so you can easily see if
there is extra whitespace at either end.)

Using this array is then straightforward: the customer number is in
$record[0], the line code in $record[1], and the two data fields in
$record[2] and $record[3].

(The next step would be to turn the printing into a subroutine, so you
don't have to duplicate the code, and to build up a hash for each
customer rather than using global variables; but this post is already
quite long enough... :).)

Ben



------------------------------

Date: Fri, 13 Nov 2009 20:37:20 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Please help with processing flat file
Message-Id: <slrnhfs5po.p9q.tadmc@tadbox.sbcglobal.net>

nobody <nobody@nowhere.com> wrote:
> I'm trying to process flat files with many thousands of records.  In 
> these files several rows comprise the information for a single customer.  
> In the example __DATA__ below, I'm trying to fill the variables with the 
> customer information while the customer number is 06020004293, then for 
> customer number 07020000279, and finally customer number 09020000251.  


Another way of saying that is:

    fill the variables with the customer information until the start
    of the next customer record (marked by an 'A' row).


> I 
> believe my problem is looping while the customer number remains the same, 


or looping until an 'A' line is found...


> then move on to the next customer numbers.

[snip]

> # Desired output:
>
> #Name: Fred Flintstone
> #City: Bedrock
> #Street: 123 Bedrock Road
>
> #Name: George Washington
> #City: Washington D.C.
> #Street: 
>
> #Name: Joe Smith
> #City: Smallville
> #Street: 


--------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my %buffer;
while ( <DATA> ) {
    chomp;
    my $code = substr $_, 12, 1;

    if ( $code eq 'A' ) {
        if ( keys %buffer) {
            output(%buffer);
            %buffer = ();
        }
        $buffer{Name} = substr $_, 14, 17;
    }
    elsif ( $code eq 'B' ) {
        $buffer{City} = substr $_, 14, 17;
    }
    elsif ( $code eq 'C' ) {
        $buffer{Street} = substr $_, 34, 18;
    }
    else {
        warn "code '$code' is invalid\n";
    }
}
output(%buffer);


sub output {
    my %h = @_;
    foreach my $key qw/Name City Street/ {
        print "#$key: ";
        print $h{$key} if defined $h{$key};
        print "\n";
    }
    print "\n";
}


__DATA__
06020004293 A Fred Flintstone   123 Bedrock Road
06020004293 B Bedrock            Gravel Pit
06020004293 C Loney Toons       123 Bedrock Road
07020000279 A George Washington 234 Washington Ave.
07020000279 B Washington D.C.   234 Washington Ave.
09020000251 A Joe Smith         54 Abbey Road
09020000251 B Smallville        54 Abbey Road
--------------------------------


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2674
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31422] in Perl-Users-Digest

Perl-Users Digest, Issue: 2674 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sat Nov 14 00:09:44 2009

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Nov 14 00:09:44 2009