[28881] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 125 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 9 23:51:29 2007

Date: Fri, 9 Feb 2007 20:50:54 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 9 Feb 2007     Volume: 11 Number: 125

Today's topics:
        sort a text file <gabyr@yahoo.co.uk>
    Re: sort a text file <nobull67@gmail.com>
    Re: sort a text file <gabyr@yahoo.co.uk>
    Re: sort a text file <bik.mido@tiscalinet.it>
    Re: sort a text file <gabyr@yahoo.co.uk>
    Re: sort a text file <bik.mido@tiscalinet.it>
    Re: sort a text file <bik.mido@tiscalinet.it>
    Re: sort a text file <tadmc@augustmail.com>
    Re: sort a text file <bik.mido@tiscalinet.it>
        split : string containing brackets <google@pmburton.clara.co.uk>
    Re: split : string containing brackets <wahab-mail@gmx.de>
    Re: split : string containing brackets <google@pmburton.clara.co.uk>
    Re: split : string containing brackets <glex_no-spam@qwest-spam-no.invalid>
    Re: split : string containing brackets <glex_no-spam@qwest-spam-no.invalid>
    Re: split : string containing brackets <attn.steven.kuo@gmail.com>
    Re: split : string containing brackets <glex_no-spam@qwest-spam-no.invalid>
    Re: split : string containing brackets <ayaz@dev.slash.null>
    Re: split : string containing brackets anno4000@radom.zrz.tu-berlin.de
    Re: split : string containing brackets <wahab-mail@gmx.de>
    Re: split : string containing brackets <google@pmburton.clara.co.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 03 Feb 2007 09:06:15 +0100
From: Gabriel <gabyr@yahoo.co.uk>
Subject: sort a text file
Message-Id: <eq1fpn$8b1$1@cormoran.emeteo.local>

Hello,

I need to DO a subrutine to sort a text file like this:
"ID";"name1";"name2";"surname";age;address;

I have done this one, and it is works ok, but just once. (?)
I think, it writes incorrectly into file, and then it simply can 
understand it.


sub ordenar {
     open(ORIGINAL_FILE,  "<"."$_[0]");
     open(SORT_FILE, ">"."$_[0]"."sort");
     while ($linea_actual=<ORIGINAL_FILE>) {
        @menor_linea = split /;/, $linea_actual;
        while ($linea_actual=<ORIGINAL_FILE>) {
           @linea_actual = split /;/, $linea_actual;
	  if ($menor_linea[$_[1]] gt $linea_actual[$_[1]]) {
               @menor_linea = @linea_actual;
           }
        }
        $linea = 
"$menor_linea[0];$menor_linea[1];$menor_linea[2];$menor_linea[3];$menor_linea[4];$menor_linea[5]";
        &eliminar_registro($_[0], $menor_linea[0]);
        print SORT_FILE $linea;
        close(SORT_FILE);
        close(ORIGINAL_FILE);
        open(SORT_FILE, ">>"."$_[0]"."sort");
        open(ORIGINAL_FILE, "<"."$_[0]");
     }
     close(ORIGINAL_FILE);
     close(SORT_FILE);
     unlink($_[0]);
     rename("$_[0]"."sort", $_[0]);
}



------------------------------

Date: 3 Feb 2007 00:22:24 -0800
From: "Brian McCauley" <nobull67@gmail.com>
Subject: Re: sort a text file
Message-Id: <1170490944.855859.63620@q2g2000cwa.googlegroups.com>

On Feb 3, 8:06 am, Gabriel <g...@yahoo.co.uk> wrote:
> Hello,
>
> I need to DO a subrutine to sort a text file like this:
> "ID";"name1";"name2";"surname";age;address;
>
> I have done this one, and it is works ok, but just once. (?)
> I think, it writes incorrectly into file, and then it simply can
> understand it.
>
> sub ordenar {
>      open(ORIGINAL_FILE,  "<"."$_[0]");
>      open(SORT_FILE, ">"."$_[0]"."sort");
>      while ($linea_actual=<ORIGINAL_FILE>) {
>         @menor_linea = split /;/, $linea_actual;
>         while ($linea_actual=<ORIGINAL_FILE>) {
>            @linea_actual = split /;/, $linea_actual;
>           if ($menor_linea[$_[1]] gt $linea_actual[$_[1]]) {
>                @menor_linea = @linea_actual;
>            }
>         }
>         $linea =
> "$menor_linea[0];$menor_linea[1];$menor_linea[2];$menor_linea[3];$menor_linea[4];$menor_linea[5]";
>         &eliminar_registro($_[0], $menor_linea[0]);
>         print SORT_FILE $linea;
>         close(SORT_FILE);
>         close(ORIGINAL_FILE);
>         open(SORT_FILE, ">>"."$_[0]"."sort");
>         open(ORIGINAL_FILE, "<"."$_[0]");
>      }
>      close(ORIGINAL_FILE);
>      close(SORT_FILE);
>      unlink($_[0]);
>      rename("$_[0]"."sort", $_[0]);
>
> }

That's very, very confused. You keep opening closing and re-reading
files.

If the file is big just use an external sort program. (Writing a sort
for big files is not something you should be bothered with). If the
file is not in a format that the external program will handle then use
Perl to tranform it into such a format and back.

If the file is small, slurp and use Perl's sort function.



------------------------------

Date: Sat, 03 Feb 2007 09:27:19 +0100
From: Gabriel <gabyr@yahoo.co.uk>
Subject: Re: sort a text file
Message-Id: <eq1h17$ai2$1@cormoran.emeteo.local>

Brian McCauley escribió:
> On Feb 3, 8:06 am, Gabriel <g...@yahoo.co.uk> wrote:
>> Hello,
>>
>> I need to DO a subrutine to sort a text file like this:
>> "ID";"name1";"name2";"surname";age;address;
>>
>> I have done this one, and it is works ok, but just once. (?)
>> I think, it writes incorrectly into file, and then it simply can
>> understand it.
>>
>> sub ordenar {
>>      open(ORIGINAL_FILE,  "<"."$_[0]");
>>      open(SORT_FILE, ">"."$_[0]"."sort");
>>      while ($linea_actual=<ORIGINAL_FILE>) {
>>         @menor_linea = split /;/, $linea_actual;
>>         while ($linea_actual=<ORIGINAL_FILE>) {
>>            @linea_actual = split /;/, $linea_actual;
>>           if ($menor_linea[$_[1]] gt $linea_actual[$_[1]]) {
>>                @menor_linea = @linea_actual;
>>            }
>>         }
>>         $linea =
>> "$menor_linea[0];$menor_linea[1];$menor_linea[2];$menor_linea[3];$menor_linea[4];$menor_linea[5]";
>>         &eliminar_registro($_[0], $menor_linea[0]);
>>         print SORT_FILE $linea;
>>         close(SORT_FILE);
>>         close(ORIGINAL_FILE);
>>         open(SORT_FILE, ">>"."$_[0]"."sort");
>>         open(ORIGINAL_FILE, "<"."$_[0]");
>>      }
>>      close(ORIGINAL_FILE);
>>      close(SORT_FILE);
>>      unlink($_[0]);
>>      rename("$_[0]"."sort", $_[0]);
>>
>> }
> 
> That's very, very confused. You keep opening closing and re-reading
> files.
> 
> If the file is big just use an external sort program. (Writing a sort
> for big files is not something you should be bothered with). If the
> file is not in a format that the external program will handle then use
> Perl to tranform it into such a format and back.
> 
> If the file is small, slurp and use Perl's sort function.
> 


I would like! But My teacher is a little bit ridicuoulus.

How can I avoid to opening and closing files? seek?



------------------------------

Date: Sat, 03 Feb 2007 12:07:08 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: sort a text file
Message-Id: <m4q8s2d814ssjtm4eu4ul7k27ipbouacg5@4ax.com>

On Sat, 03 Feb 2007 09:27:19 +0100, Gabriel <gabyr@yahoo.co.uk> wrote:

>I would like! But My teacher is a little bit ridicuoulus.
>
>How can I avoid to opening and closing files? seek?

seek() is one option. Slurping them in all at once is another one. We
recommend not to slurp files unless needed. This is one situation in
which it may be sensible. Back to your original question, you wrote:

: I need to DO a subrutine to sort a text file like this:
: "ID";"name1";"name2";"surname";age;address;

And you want to sort on what? Supposing you want to sort on each field
alphabetically on all of them but on ID and age, and with highest
priority to the left, a simple solution with a schwartzian transform
would be

  sub ordenar {
      my $file=shift;
      my @lines=do {
          open my $fh, '<', $file
            or die "Can't open `$file': $!\n";
          <$fh>;
      };
      open my $fh, '>', $file
        or die "Can't open `$file': $!\n";
      print $fh map $_->[0],
        sort { $a->[1]  < $b-[1] ||
               $a->[2] lt $b-[2] ||
               $a->[3] lt $b-[3] ||
               $a->[4] lt $b-[4] ||
               $a->[5]  < $b-[5] ||
               $a->[6] lt $b-[6] 
        } map [$_, split /;/], @lines;
  }


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Sat, 03 Feb 2007 12:44:44 +0100
From: Gabriel <gabyr@yahoo.co.uk>
Subject: Re: sort a text file
Message-Id: <eq1sjg$o5$1@cormoran.emeteo.local>

Michele Dondi escribió:
> On Sat, 03 Feb 2007 09:27:19 +0100, Gabriel <gabyr@yahoo.co.uk> wrote:
> 
>> I would like! But My teacher is a little bit ridicuoulus.
>>
>> How can I avoid to opening and closing files? seek?
> 
> seek() is one option. Slurping them in all at once is another one. We
> recommend not to slurp files unless needed. This is one situation in
> which it may be sensible. Back to your original question, you wrote:
> 
> : I need to DO a subrutine to sort a text file like this:
> : "ID";"name1";"name2";"surname";age;address;
> 
> And you want to sort on what? 

I give an argument to my sort subrutine which indicate the filed which I 
want to sort on? (Sorry for the english)


Supposing you want to sort on each field
> alphabetically on all of them but on ID and age, and with highest
> priority to the left, a simple solution with a schwartzian transform
> would be
> 
>   sub ordenar {
>       my $file=shift;

What have $file inside when you have already done "$file=shift"?



------------------------------

Date: Sat, 03 Feb 2007 13:00:48 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: sort a text file
Message-Id: <j5u8s2t1k9np3dktp2pl51h4on8ofpkufc@4ax.com>

On Sat, 03 Feb 2007 12:44:44 +0100, Gabriel <gabyr@yahoo.co.uk> wrote:

>> And you want to sort on what? 
>
>I give an argument to my sort subrutine which indicate the filed which I 
>want to sort on? (Sorry for the english)

I mean: based *on which criteria* do you want to swort that file?

>>   sub ordenar {
>>       my $file=shift;
>
>What have $file inside when you have already done "$file=shift"?

That is a standard way to pass an argument to a sub. It also modifies
@_. There is no reason why it should be a problem in this case. IMHO
an explicit file name like $file is much more manageable than a thing
like $_[0]. Or else I didn't understand your question...


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Sat, 03 Feb 2007 14:23:37 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: sort a text file
Message-Id: <s439s2d11h81qf1j7243ghirafuhv9ucnq@4ax.com>

On Sat, 03 Feb 2007 13:00:48 +0100, Michele Dondi
<bik.mido@tiscalinet.it> wrote:

>I mean: based *on which criteria* do you want to swort that file?
                                                  ^^^^^
                                                  ^^^^^

"sort", of course.

>That is a standard way to pass an argument to a sub. It also modifies
>@_. There is no reason why it should be a problem in this case. IMHO
>an explicit file name like $file is much more manageable than a thing
             ^^^^
             ^^^^

>like $_[0]. Or else I didn't understand your question...

"variable", of course.


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Sat, 3 Feb 2007 08:45:37 -0600
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: sort a text file
Message-Id: <slrnes980h.t1e.tadmc@tadmc30.august.net>

Gabriel <gabyr@yahoo.co.uk> wrote:


>      while ($linea_actual=<ORIGINAL_FILE>) {


As a new programmer you should (quickly) adopt sensible style
guidelines to follow when you write your programs.

One such general guideline that is widely accepted is to use
whitespace to separate important program elements:

   while ($linea_actual = <ORIGINAL_FILE>) {
or
   while ( $linea_actual = <ORIGINAL_FILE> ) {

Because that makes in much easier for people to read and understand.

See:

   perldoc perlstyle

for some more good ideas to adopt early on.


>         $linea = 
> "$menor_linea[0];$menor_linea[1];$menor_linea[2];$menor_linea[3];$menor_linea[4];$menor_linea[5]";


You can write that as:

   $linea = join ';', @menor_linea[0..5];

Which is probably easier to read and understand.

Of course neither of them will work should your data end up having
five or seven fields, so it is probably better to write it to handle
any number of fields rather than only 6 fields:

   $linea = join ';', @menor_linea;


>         &eliminar_registro($_[0], $menor_linea[0]);


You should not use the ampersand (&) character on most
subroutine calls.

   perldoc -q "&"

       What’s the difference between calling a function as &foo and foo()?


>         open(ORIGINAL_FILE, "<"."$_[0]");


You should always, yes *always*, check the return value from open().

See also:

   perldoc -q vars

       What’s wrong with always quoting "$vars"?

Then make that:

   open( ORIGINAL_FILE, "<" . $_[0] ) or die "could not open '$_[0]' $!";
or
   open( ORIGINAL_FILE, "<$_[0]" ) or die "could not open '$_[0]' $!";


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Sat, 03 Feb 2007 20:46:15 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: sort a text file
Message-Id: <fip9s258kp6mqs9vb709ggsks4fo8ivsoc@4ax.com>

On Sat, 3 Feb 2007 08:45:37 -0600, Tad McClellan
<tadmc@augustmail.com> wrote:

>>         $linea = 
>> "$menor_linea[0];$menor_linea[1];$menor_linea[2];$menor_linea[3];$menor_linea[4];$menor_linea[5]";
>
>
>You can write that as:
>
>   $linea = join ';', @menor_linea[0..5];
>
>Which is probably easier to read and understand.
>
>Of course neither of them will work should your data end up having
>five or seven fields, so it is probably better to write it to handle
>any number of fields rather than only 6 fields:

Of course Tad means that a warning will be issued, if warnings are on
- as they should!


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: 8 Feb 2007 06:53:18 -0800
From: "Prototype" <google@pmburton.clara.co.uk>
Subject: split : string containing brackets
Message-Id: <1170946398.216900.298790@l53g2000cwa.googlegroups.com>

I've got a string something like this:
$text='val1 , val2 , max(val3,val4)'

which I want to parse into its individual components:
@parts=split(/,/,$text)

but this simplisitic approach doesn't cope with the last part of $text
which contains a bracketed expression containing the delimiting ","
character.

I've read perldoc -q delimited, and this gives useful guidance for the
similar case of:
$text='val1 , val2 , max "val3,val4"'
but brackets are more difficult to cope with, and the methods shown
here can't be directly applied (think of nested brackets for example)

I'm aware of Text::Balanced, which I think may be part of the
solution, but I can't see how!

The only solution I can see at the moment is the fairly brute-force
approach of going through the string a character at a time, counting
open & closing brackets and only splitting into a new array element
every time a comma is found AND the bracket count is zero.

Something tells me there is probably a more elegant approach than
this! Anyone care to suggest it to me please ;-)

Paul.



------------------------------

Date: Thu, 08 Feb 2007 16:21:01 +0100
From: Mirco Wahab <wahab-mail@gmx.de>
Subject: Re: split : string containing brackets
Message-Id: <eqffep$62h$1@mlucom4.urz.uni-halle.de>

Prototype wrote:
> I've got a string something like this:
> $text='val1 , val2 , max(val3,val4)'
> 
> which I want to parse into its individual components:
> @parts=split(/,/,$text)
> 
> but this simplisitic approach doesn't cope with the last part of $text
> which contains a bracketed expression containing the delimiting ","
> ...
> Something tells me there is probably a more elegant approach than
> this! Anyone care to suggest it to me please ;-)

So we have something like:

val1 , val2 , max( val3, val4 )
val1 , val2 , max( val3, min( val6, val7) )
val1 , val2 , max( val3, max( min( val8, val9 ), val7) )

This is clearly a recursive formulation
and should of course be 'elegantly'
parsed by an appropriate recursive
regex.

*If* that really is what you want ...

But before starting this, I'll better
ask you what you *really* want to do.

Can you give a worked out pseudocode-example
what you want to get out from your lines and
how that should look like?

Regards

M.


------------------------------

Date: 8 Feb 2007 07:39:29 -0800
From: "Prototype" <google@pmburton.clara.co.uk>
Subject: Re: split : string containing brackets
Message-Id: <1170949169.816117.188960@a75g2000cwd.googlegroups.com>

Well, in the examples you have given, I would want the results:
'val1 , val2 , max( val3, val4 )'
  $part[0]='val1'
  $part[1]='val2'
  $part[2]='max(val3,val4)'

'val1 , val2 , max( val3, min( val6, val7) )'
  $part[0]='val1'
  $part[1]='val2'
  $part[2]='max( val3, min( val6, val7) )'

'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
  $part[0]='val1'
  $part[1]='val2'
  $part[2]='max( val3, max( min( val8, val9 ), val7) )'

All I'm looking for is to do a "simple" split parse of a string on
commas, but avoiding splitting where the comma is enclosed in some
brackets.

Paul.

On 8 Feb, 15:21, Mirco Wahab <wahab-m...@gmx.de> wrote:
> Prototype wrote:
> > I've got a string something like this:
> > $text='val1 , val2 , max(val3,val4)'
>
> > which I want to parse into its individual components:
> > @parts=split(/,/,$text)
>
> > but this simplisitic approach doesn't cope with the last part of $text
> > which contains a bracketed expression containing the delimiting ","
> > ...
> > Something tells me there is probably a more elegant approach than
> > this! Anyone care to suggest it to me please ;-)
>
> So we have something like:
>
> val1 , val2 , max( val3, val4 )
> val1 , val2 , max( val3, min( val6, val7) )
> val1 , val2 , max( val3, max( min( val8, val9 ), val7) )
>
> This is clearly a recursive formulation
> and should of course be 'elegantly'
> parsed by an appropriate recursive
> regex.
>
> *If* that really is what you want ...
>
> But before starting this, I'll better
> ask you what you *really* want to do.
>
> Can you give a worked out pseudocode-example
> what you want to get out from your lines and
> how that should look like?
>
> Regards
>
> M.




------------------------------

Date: Thu, 08 Feb 2007 10:23:22 -0600
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: split : string containing brackets
Message-Id: <45cb4e37$0$10309$815e3792@news.qwest.net>

Prototype wrote:
> Well, in the examples you have given, I would want the results:

Please don't top post....

> 'val1 , val2 , max( val3, val4 )'
>   $part[0]='val1'
>   $part[1]='val2'
>   $part[2]='max(val3,val4)'
> 
> 'val1 , val2 , max( val3, min( val6, val7) )'
>   $part[0]='val1'
>   $part[1]='val2'
>   $part[2]='max( val3, min( val6, val7) )'
> 
> 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
>   $part[0]='val1'
>   $part[1]='val2'
>   $part[2]='max( val3, max( min( val8, val9 ), val7) )'
> 
> All I'm looking for is to do a "simple" split parse of a string on
> commas, but avoiding splitting where the comma is enclosed in some
> brackets.

If you're 100% sure that each "part" is separated by ' , ', then:

my $str = 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) ';
my @part = split( / , /, , 3 );
print join("\n", @part);
val1
val2
max( val3, max( min( val8, val9 ), val7) )


------------------------------

Date: Thu, 08 Feb 2007 10:24:13 -0600
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: split : string containing brackets
Message-Id: <45cb4e6a$0$10309$815e3792@news.qwest.net>

J. Gleixner wrote:
> Prototype wrote:
>> Well, in the examples you have given, I would want the results:
> 
> Please don't top post....
> 
>> 'val1 , val2 , max( val3, val4 )'
>>   $part[0]='val1'
>>   $part[1]='val2'
>>   $part[2]='max(val3,val4)'
>>
>> 'val1 , val2 , max( val3, min( val6, val7) )'
>>   $part[0]='val1'
>>   $part[1]='val2'
>>   $part[2]='max( val3, min( val6, val7) )'
>>
>> 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
>>   $part[0]='val1'
>>   $part[1]='val2'
>>   $part[2]='max( val3, max( min( val8, val9 ), val7) )'
>>
>> All I'm looking for is to do a "simple" split parse of a string on
>> commas, but avoiding splitting where the comma is enclosed in some
>> brackets.
> 
> If you're 100% sure that each "part" is separated by ' , ', then:
> 
> my $str = 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) ';
> my @part = split( / , /, , 3 );
correction..
my @part = split( / , /, $str, 3 );

> print join("\n", @part);
> val1
> val2
> max( val3, max( min( val8, val9 ), val7) )


------------------------------

Date: 8 Feb 2007 08:28:45 -0800
From: "attn.steven.kuo@gmail.com" <attn.steven.kuo@gmail.com>
Subject: Re: split : string containing brackets
Message-Id: <1170952125.000548.171560@j27g2000cwj.googlegroups.com>

On Feb 8, 6:53 am, "Prototype" <goo...@pmburton.clara.co.uk> wrote:
> I've got a string something like this:
> $text='val1 , val2 , max(val3,val4)'
>

(snipped)

> I'm aware of Text::Balanced, which I think may be part of the
> solution, but I can't see how!


It took me a while to become familiar with Text::Balanced. YMMV.

If your inputs are sufficiently simple, then try:

use strict;
use warnings;
use Text::Balanced qw/extract_bracketed/;
use Text::CSV;
use Data::Dumper;
use constant {
    EXTRACTED => 0,
    REMAINDER => 1,
    SKIPPED   => 2,
};

my $csv = Text::CSV->new;

while (<DATA>)
{
    (my $text = $_);
    my @atoms;
    my @found;
    while ( @found = extract_bracketed( $text, '()', qr/[^(]*/ ) and
defined $found[EXTRACTED] )
    {

        # Use one of the techniques discussed in 'perldoc -q
delimited' for
        # the SKIPPED portion, or use a simple split() if you can get
away
        # with it.  Here I'll demonstrate with Text::CSV

        $csv->parse($found[SKIPPED]);
        my @fields = $csv->fields();
        $fields[-1] .= $found[EXTRACTED];

        push @atoms, @fields;

        # Check for remaining text to be parsed

        $text = $found[REMAINDER];

    }

    # Stuff at the end that's not in brackets:

    if (length $text)
    {
        $csv->parse($text);
        my @fields = $csv->fields();
        shift @fields; # extra comma
        push @atoms, @fields;
    }


    print $_;
    print Dumper \@atoms;
}

__DATA__
val1 , val2 , max(val3,val4)
max(val1, min(val2, val3)), val4, val5
val1, min(val2, pow(val3, 42)), 10E12


> The only solution I can see at the moment is the fairly brute-force
> approach of going through the string a character at a time, counting
> open & closing brackets and only splitting into a new array element
> every time a comma is found AND the bracket count is zero.
>
> Something tells me there is probably a more elegant approach than
> this! Anyone care to suggest it to me please ;-)



If your "grammer" is really complicated, then you may
want to look at parser generators on CPAN:


Parse::Yapp
or
Parse::RecDescent

--
Hope this helps,
Steven




------------------------------

Date: Thu, 08 Feb 2007 10:30:59 -0600
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: split : string containing brackets
Message-Id: <45cb5000$0$503$815e3792@news.qwest.net>

J. Gleixner wrote:
> J. Gleixner wrote:
>> Prototype wrote:
>>> Well, in the examples you have given, I would want the results:
>>
>> Please don't top post....
>>
>>> 'val1 , val2 , max( val3, val4 )'
>>>   $part[0]='val1'
>>>   $part[1]='val2'
>>>   $part[2]='max(val3,val4)'
>>>
>>> 'val1 , val2 , max( val3, min( val6, val7) )'
>>>   $part[0]='val1'
>>>   $part[1]='val2'
>>>   $part[2]='max( val3, min( val6, val7) )'
>>>
>>> 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
>>>   $part[0]='val1'
>>>   $part[1]='val2'
>>>   $part[2]='max( val3, max( min( val8, val9 ), val7) )'
>>>
>>> All I'm looking for is to do a "simple" split parse of a string on
>>> commas, but avoiding splitting where the comma is enclosed in some
>>> brackets.
>>
>> If you're 100% sure that each "part" is separated by ' , ', then:
>>
>> my $str = 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) ';
>> my @part = split( / , /, , 3 );
> correction..
> my @part = split( / , /, $str, 3 );

Or.. even if there is/isn't any whitespace around the separator:

my @part = split( /\s*,\s*/, $str, 3 );

You could also split on /,/ and trim the leading/trailing
whitespace.  Probably the helpful part is the third argument
to split.


------------------------------

Date: Thu, 08 Feb 2007 22:02:23 +0500
From: Ayaz Ahmed Khan <ayaz@dev.slash.null>
Subject: Re: split : string containing brackets
Message-Id: <pan.2007.02.08.16.30.48.314667@dev.slash.null>

"Prototype" typed:

> Well, in the examples you have given, I would want the results:
> 'val1 , val2 , max( val3, val4 )'
>   $part[0]='val1'
>   $part[1]='val2'
>   $part[2]='max(val3,val4)'
> 
> 'val1 , val2 , max( val3, min( val6, val7) )'
>   $part[0]='val1'
>   $part[1]='val2'
>   $part[2]='max( val3, min( val6, val7) )'
> 
> 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
>   $part[0]='val1'
>   $part[1]='val2'
>   $part[2]='max( val3, max( min( val8, val9 ), val7) )'
> 
> All I'm looking for is to do a "simple" split parse of a string on
> commas, but avoiding splitting where the comma is enclosed in some
> brackets.

For the specific case where the expression to be parsed is built strictly
on the pattern "somevalue, somevalue, max(somevalue, somevalue, ...)",
you could provide split() with a LIMIT argument to split only into a max
of LIMIT fields. Something like this, perhaps:
	
	@part = split(/,/, $text, 3);

-- 
Ayaz Ahmed Khan

A witty saying proves nothing, but saying something pointless gets
people's attention.


------------------------------

Date: 8 Feb 2007 17:05:26 GMT
From: anno4000@radom.zrz.tu-berlin.de
Subject: Re: split : string containing brackets
Message-Id: <5313imF1qk7duU2@mid.dfncis.de>

attn.steven.kuo@gmail.com <attn.steven.kuo@gmail.com> wrote in comp.lang.perl.misc:
> On Feb 8, 6:53 am, "Prototype" <goo...@pmburton.clara.co.uk> wrote:

[...]

> If your "grammer" is really complicated, then you may
> want to look at parser generators on CPAN:
> 
> 
> Parse::Yapp
> or
> Parse::RecDescent

The latter is even a core module.

Anno


------------------------------

Date: Thu, 08 Feb 2007 20:01:42 +0100
From: Mirco Wahab <wahab-mail@gmx.de>
Subject: Re: split : string containing brackets
Message-Id: <eqfvtr$aqb$1@mlucom4.urz.uni-halle.de>

Prototype wrote:
> Well, in the examples you have given, I would want the results:
> 'val1 , val2 , max( val3, val4 )'
>   $part[0]='val1'
>   $part[1]='val2'
>   $part[2]='max(val3,val4)'
> 
> All I'm looking for is to do a "simple" split parse of a string on
> commas, but avoiding splitting where the comma is enclosed in some
> brackets.

OK, then you may count over the brackets
within the regex. I s stupidly took some
not-so-good-parseable examples, so it
took longer. The Regex is relatively
simple (just counting), you'll get the idea.

This is my first shot:



  use strict;
  use warnings;

  my $o;
  my $reg = qr/
   (?{$o=0}) \s*
   (
     (?:  [^,()\s]+
        | \((?{ $o++ })
        | \)(?{ $o-- })
        | (?(?{ $o }),|\s)
     )+
   )
   [,\s]*/xs;

  while( <DATA> ) {
     chomp;
     my @fields = /$reg/g;
     print "$_\n", scalar @fields, "\n", (map "  $_\n", @fields), "\n"
  }


__DATA__
val1(), val2 , max(val3,val4)
(val0,val1) , val2 , max(val3,val4)
max(val3,val4), val1 , val2 , max(val3(val6,val7),val4)



Regards

M.


------------------------------

Date: 9 Feb 2007 01:42:19 -0800
From: "Prototype" <google@pmburton.clara.co.uk>
Subject: Re: split : string containing brackets
Message-Id: <1171014139.259075.240590@m58g2000cwm.googlegroups.com>

On 8 Feb, 19:01, Mirco Wahab <wahab-m...@gmx.de> wrote:
> Prototype wrote:
> > Well, in the examples you have given, I would want the results:
> > 'val1 , val2 , max( val3, val4 )'
> >   $part[0]='val1'
> >   $part[1]='val2'
> >   $part[2]='max(val3,val4)'
>
> > All I'm looking for is to do a "simple" split parse of a string on
> > commas, but avoiding splitting where the comma is enclosed in some
> > brackets.
>
> OK, then you may count over the brackets
> within the regex. I s stupidly took some
> not-so-good-parseable examples, so it
> took longer. The Regex is relatively
> simple (just counting), you'll get the idea.
>
> This is my first shot:
>
>   use strict;
>   use warnings;
>
>   my $o;
>   my $reg = qr/
>    (?{$o=0}) \s*
>    (
>      (?:  [^,()\s]+
>         | \((?{ $o++ })
>         | \)(?{ $o-- })
>         | (?(?{ $o }),|\s)
>      )+
>    )
>    [,\s]*/xs;
>
>   while( <DATA> ) {
>      chomp;
>      my @fields = /$reg/g;
>      print "$_\n", scalar @fields, "\n", (map "  $_\n", @fields), "\n"
>   }
>
> __DATA__
> val1(), val2 , max(val3,val4)
> (val0,val1) , val2 , max(val3,val4)
> max(val3,val4), val1 , val2 , max(val3(val6,val7),val4)
>
> Regards
>
> M.


Thanks to all for you various comments and solutions. I'm going with
this one, as it does exactly what I need, and has taught me some neat
new regexp stuff.

P.



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 125
**************************************


home help back first fref pref prev next nref lref last post