[24545] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 6723 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jun 24 11:05:43 2004

Date: Thu, 24 Jun 2004 08:05:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 24 Jun 2004     Volume: 10 Number: 6723

Today's topics:
    Re: [Newbie] Stupid problem need simple answer (Array & <ThomasKratz@REMOVEwebCAPS.de>
    Re: [Newbie] Stupid problem need simple answer (Array & (Anno Siegel)
    Re: [Newbie] Stupid problem need simple answer (Array & <daedalus@videotron.ca>
    Re: [Newbie] Stupid problem need simple answer (Array & <daedalus@videotron.ca>
    Re: [Newbie] Stupid problem need simple answer (Array & <daedalus@videotron.ca>
    Re: anonymous pipe, one writer, many readers (Edouard GAULUE)
    Re: Dealing with lists with SWIG. <mlanzkron@yahoo.co.uk>
    Re: Find length of files (Anno Siegel)
    Re: Find length of files <rustyp@freeshell.org>
    Re: help with perl dbi and update without locks <jcharth@hotmail.com>
    Re: help with perl dbi and update without locks <ThomasKratz@REMOVEwebCAPS.de>
    Re: help with perl dbi and update without locks ctcgag@hotmail.com
    Re: Help! - Need a CGI redirect which passes a querystr <matthew.garrish@sympatico.ca>
    Re: Idiom for partitioning array? <bmb@ginger.libs.uga.edu>
    Re: noob trying to learn where to start? <rustyp@freeshell.org>
    Re: REGEX Negation <nobull@mail.com>
    Re: Regexp, Strings and spaces (Florent Carli)
    Re: Regexp, Strings and spaces (Florent Carli)
    Re: Regexp, Strings and spaces (Anno Siegel)
    Re: split xml file between two processing instructions (kcwolle)
    Re: split xml file between two processing instructions <tadmc@augustmail.com>
    Re: Suggestions on editing multiple files <monkeyjob@ntlworld.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 24 Jun 2004 13:59:38 +0200
From: Thomas Kratz <ThomasKratz@REMOVEwebCAPS.de>
Subject: Re: [Newbie] Stupid problem need simple answer (Array & RegExp)
Message-Id: <40dac229$0$14510$bb690d87@news.main-rheiner.de>

Anno Siegel wrote:

> 
> Your code (in all variants) has another problem.  You are not supposed
> to change an array while running a for-loop over it.  (See "foreach" in
> perlsyn, I suppose.)  That it appears to work in this instance doesn't
> mean it will with other versions of Perl.

AFAIK you are not supposed to change the array, but changing the aliased 
elements should be ok.

so:

push(@array, $_) for @array # not recommended

s/A/B/g for @array          # ok

Right?

Thomas

-- 
open STDIN,"<&DATA";$=+=14;$%=50;while($_=(seek( #J~.> a>n~>>e~.......>r.
STDIN,$:*$=+$,+$%,0),getc)){/\./&&last;/\w| /&&( #.u.t.^..oP..r.>h>a~.e..
print,$_=$~);/~/&&++$:;/\^/&&--$:;/>/&&++$,;/</  #.>s^~h<t< ..~. ...c.^..
&&--$,;$:%=4;$,%=23;$~=$_;++$i==1?++$,:_;}__END__#....>>e>r^..>l^...>k^..


------------------------------

Date: 24 Jun 2004 12:06:09 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: [Newbie] Stupid problem need simple answer (Array & RegExp)
Message-Id: <cbeg3h$gii$2@mamenchi.zrz.TU-Berlin.DE>

Thomas Kratz  <ThomasKratz@REMOVEwebCAPS.de> wrote in comp.lang.perl.misc:
> Anno Siegel wrote:
> 
> > 
> > Your code (in all variants) has another problem.  You are not supposed
> > to change an array while running a for-loop over it.  (See "foreach" in
> > perlsyn, I suppose.)  That it appears to work in this instance doesn't
> > mean it will with other versions of Perl.
> 
> AFAIK you are not supposed to change the array, but changing the aliased 
> elements should be ok.
> 
> so:
> 
> push(@array, $_) for @array # not recommended
> 
> s/A/B/g for @array          # ok
> 
> Right?

Right.

Anno


------------------------------

Date: Thu, 24 Jun 2004 09:55:20 -0400
From: "Daedalus" <daedalus@videotron.ca>
Subject: Re: [Newbie] Stupid problem need simple answer (Array & RegExp)
Message-Id: <l1BCc.4666$uY3.53065@wagner.videotron.net>

> >      if ($string =~ s/(^the\b)|(^a\b)//i){
>
> What are the capturing parentheses in the pattern for? s/^the\b|^a\b//i
> does the same thing.

Actually It was for precedence, I wasn't sure about this \b|^ thing , it
doesn't harm, doesn't it ?
I'll have to familiarize with regexp precedence.


> >         $string_list[++$#string_lis] = $string
>                                     ^^
> Typo here.  Don't re-type code, copy/paste it.

Sorry I didn't know, but it makes sense to me now.

> Your code (in all variants) has another problem.  You are not supposed
> to change an array while running a for-loop over it.  (See "foreach" in
> perlsyn, I suppose.)  That it appears to work in this instance doesn't
> mean it will with other versions of Perl.

Ok, about this, I wanted the foreach to look at the new string in the array
until there's nothing to add. So if the array start with one string like "to
the bookstore", and the the regexp is s/^to\b|^the\b//i, I would end with 3
string in the array: ("to the bookstore", "the bookstore", "bookstore"). I
need to extract all possible variants, so I don't want to process the array
only once and get rid of everything at the same time. There is probably a
better way to acheive this but that's the only one I found.


> There are lots of ways to repair this, one is
>
>     push @string_list, map /(?:the\b|a\b)?\s*(.*)/i, @string_list;
>
> Instead of deleting the unwanted part, this captures the wanted part.
> I have also changed the regex so that it also catches trailing
> whitespace with the articles.

Thank for that. It doesn't act exactly like I want since it would remove "to
the" at the same time, but it just reminds me of that push operator wich I
could use instead of "$string_list[++$#string_lis]" and I just learn the map
operator wich would avoid to overwrite the original array. Well I'm learning
and I like it!  But for now (with my little knowledge) i'm still stuck with
writing the array from inside the for foreach.


> Anno




------------------------------

Date: Thu, 24 Jun 2004 10:15:40 -0400
From: "Daedalus" <daedalus@videotron.ca>
Subject: Re: [Newbie] Stupid problem need simple answer (Array & RegExp)
Message-Id: <pkBCc.4696$uY3.62797@wagner.videotron.net>


"Anno Siegel" <anno4000@lublin.zrz.tu-berlin.de> a écrit dans le message de
news: cbeg3h$gii$2@mamenchi.zrz.TU-Berlin.DE...
> Thomas Kratz  <ThomasKratz@REMOVEwebCAPS.de> wrote in comp.lang.perl.misc:
> > Anno Siegel wrote:
> >
> > >
> > > Your code (in all variants) has another problem.  You are not supposed
> > > to change an array while running a for-loop over it.  (See "foreach"
in
> > > perlsyn, I suppose.)  That it appears to work in this instance doesn't
> > > mean it will with other versions of Perl.
> >
> > AFAIK you are not supposed to change the array, but changing the aliased
> > elements should be ok.
> >
> > so:
> >
> > push(@array, $_) for @array # not recommended
> >
> > s/A/B/g for @array          # ok
> >
> > Right?
>
> Right.

Well the problem is that s///g don't do what I want, since it change every
thing at the same time.
Anyway (for now) it couldn't turn into an infinite loop, since the content
of the loop will inevitably stop adding to the array when it'll stop to find
matches (which is also inevitable, since even if all the words  match, the
loop will end with an empty string)
I know it's not the perfect solution (and I'll try to find a better way
while i'm on my learning process), so if someone have any idea... I'd be
glad to correct this misuse.

DAE





------------------------------

Date: Thu, 24 Jun 2004 10:37:52 -0400
From: "Daedalus" <daedalus@videotron.ca>
Subject: Re: [Newbie] Stupid problem need simple answer (Array & RegExp)
Message-Id: <dFBCc.4729$uY3.71467@wagner.videotron.net>

> Yes.  You are modifying the array element as they are being processed
> since the foreach variable is aliased to the array element and not
> a copy of the array element.

Thanks, since it's an aliase that explain everything.

DAE




------------------------------

Date: 24 Jun 2004 04:32:58 -0700
From: Edouard.Gaulue@ensg.ign.fr (Edouard GAULUE)
Subject: Re: anonymous pipe, one writer, many readers
Message-Id: <77a8fc.0406240332.53162431@posting.google.com>

> ************************************
> using sysread, syswrite (working) :
> ************************************ 
 
>     if ($i == 0) {close READER;print "!!! Close father READER !!!\n";}

Sorry, if you really want this second solution work, you have to
remove this line above and just put a 'close READER;' at the end.
Don't ask me why !

Regards, EG


------------------------------

Date: Thu, 24 Jun 2004 14:29:30 +0300
From: Motti Lanzkron <mlanzkron@yahoo.co.uk>
Subject: Re: Dealing with lists with SWIG.
Message-Id: <hheld0l4hsrk07jb0tfa7mov213cf3g24l@4ax.com>

Tassilo v. Parseval wrote:
>Also sprach Motti Lanzkron:
>
>> I'm trying to write a Perl module in C++ and I've downloaded SWIG.
>> But I'm having trouble finding how to deal with lists. The examples I
>> see only deal with scalars.
>> 
>> Perl:					C++:
>> ------------------------------------
>> sub foo { 1 }		=>	int foo() { return 1; }
>> sub bar { (1..10) }	=>	????
>> 
>> What code do I in C++ write if I want to return a list of integers?
 ...
>Maybe you have the time and patience to make yourself acquainted with XS
>or Inline::C. It is probably more difficult at the beginning, but it has
>advantages on the long run. One advantage is that you can get help much
>more easily.

Unfortunately I lack the time and patience, I'll just write the whole
thing in C++ (which will probably end up taking more time ;o).


------------------------------

Date: 24 Jun 2004 10:34:53 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Find length of files
Message-Id: <cbeaod$dcc$1@mamenchi.zrz.TU-Berlin.DE>

Tad McClellan  <tadmc@augustmail.com> wrote in comp.lang.perl.misc:
> Michael Preminger <michaelp@hio.no> wrote:
> 
> > I have a huge directory, for which I need the word-count of all files 
> > (like wc -w * , and then put all length into a database)
> > 
> > Is there a smart way to do it in perl? 
> 
> 
> Yes and no, depending on the definition of "smart".  :-)
> 
> 
> > (apart from wc * > file and then 
> > open the file..)
> 
> 
> I don't like shelling-out for things that are easily done
> in native Perl.
> 
> Perhaps you can adapt this "wc -w" workalike one-liner to your purposes?
> 
>    perl -ane '$c+=@F; print("$c $ARGV\n"), $c=0 if eof(ARGV)' *
> 
> 
> or suitable for a Real Program:
> 
>    my $cnt = 0;
>    while ( <> ) {
>       my @words = split;
>       $cnt += @words;
>       if ( eof(ARGV) ) {
>          printf "$cnt $ARGV\n";
>          $cnt = 0;
>       }
>    }

Alternatively, tr/// can be used if speed is an issue but space isn't.

    my $cnt;
    for ( do { local $/; <> } ) {
        tr/tr/\n\t / /s; # replace sequences of white space with single blanks
        $cnt = tr/ //;   # count blanks
    }

Because split() ignores trailing white space but tr/// doesn't, the
tr/// count may be one higher than the split() count, but that's
small stuff :)

Anno


------------------------------

Date: Thu, 24 Jun 2004 06:06:46 -0500
From: Rusty Phillips <rustyp@freeshell.org>
Subject: Re: Find length of files
Message-Id: <pan.2004.06.24.11.06.11.174401@freeshell.org>

> Uhm, I guess for every file you have to fork, so I doubt it.
> 
You only have to run wc once ("wc -w *"), so there should only be one
fork.  Because wc is a compiled program designed especially for this
purpose, it is hopefully faster than perl at fetching and reading all 
of the files quickly - enough so to overcome the penalty lost in 
forking once (probably - need a benchmark to be sure).

In addition, it makes the perl coding simpler.  You don't have to 
bother with globbing and opening and closing the multiple files it needs,
or with scanning through the files.  All you have to do is parse the
output from the wc command.


------------------------------

Date: Thu, 24 Jun 2004 12:21:49 GMT
From: joe <jcharth@hotmail.com>
Subject: Re: help with perl dbi and update without locks
Message-Id: <Xns9512547FDBA26josephthecianet@207.69.154.202>

well i am working with dbi, i put all the queries in a pm file.
the frist function inserts the record with the following statements
$QRY1="select max(pk)+1 from $vparam1";
$QRY2="insert $vparam1( pk, mwuser )  values( ? , ? )";
the second function verifies that the user that inserted the record was the 
right user.
$QRY1="select mwuser from mmsqlmsgtable where pk = ?";
if the user is different because of concurrent transaction 
i repeat the function 1 until funtion 2 returns the right user.
i am trying to make this work on dbdmysql dbdoracle and dbdodbc.



------------------------------

Date: Thu, 24 Jun 2004 14:28:30 +0200
From: Thomas Kratz <ThomasKratz@REMOVEwebCAPS.de>
Subject: Re: help with perl dbi and update without locks
Message-Id: <40dac8ed$0$14512$bb690d87@news.main-rheiner.de>

joe wrote:

> i have a table with the fields pk and user; i am trying to avoid conflics 
> when 2 or more users do a select max(pk) and then insert pkmaxvalue, user.
> i created a function that assigns max(pk) to a variable and then uses this 
> variable to create a record with the variable and another field for the 
> user that created this record.  I am trying to avoid conflicts by retriving 
> the user from the record after the update. if the user does not match i 
> created a loop to create a record until the user in the record is equal to 
> the user that created the record. would this work?
>  i am trying to make the script work with mysql, unixodbc and iodbc and i 
> am trying to avoid locks because i dont know if i can do 
> $dbh->do("lock table my table in exclusive mode");
>  for all dbi and dbiodbc implementations. 

If I understand correctly you want to auto increment a numeric primary key 
while inserting new values into a table?
This is heavily dependent on the database you are using. I would suggest 
asking in a MySQL newsgroup.

Most databases can do this internally with something like (This is MSSQL):

CREATE TABLE [dbo].[mytable] (
	[pk] [int] IDENTITY (1, 1) NOT NULL ,
         ....

Which means begin with 1 and increase by 1 for every insert.

Another common method is to use an insert trigger that calculates the next 
primary key value. There maybe others.

Doing the increment on the user's side is the worst of all methods.

Thomas

-- 
open STDIN,"<&DATA";$=+=14;$%=50;while($_=(seek( #J~.> a>n~>>e~.......>r.
STDIN,$:*$=+$,+$%,0),getc)){/\./&&last;/\w| /&&( #.u.t.^..oP..r.>h>a~.e..
print,$_=$~);/~/&&++$:;/\^/&&--$:;/>/&&++$,;/</  #.>s^~h<t< ..~. ...c.^..
&&--$,;$:%=4;$,%=23;$~=$_;++$i==1?++$,:_;}__END__#....>>e>r^..>l^...>k^..


------------------------------

Date: 24 Jun 2004 14:53:07 GMT
From: ctcgag@hotmail.com
Subject: Re: help with perl dbi and update without locks
Message-Id: <20040624105307.834$LI@newsreader.com>

joe <jcharth@hotmail.com> wrote:
> i have a table with the fields pk and user; i am trying to avoid conflics
> when 2 or more users do a select max(pk) and then insert pkmaxvalue,
> user.

This is generally the worst way to do it.  Each database (even MySQL) has
some built-in way to accomplish this.  Create different modules for each
database.

> i created a function that assigns max(pk) to a variable and then
> uses this variable to create a record with the variable and another field
> for the user that created this record.  I am trying to avoid conflicts by
> retriving the user from the record after the update.

This doesn't make much sense.  If your database checks pk for uniqueness,
then you will get an error if you try to insert a value that has just been
inserted by someone else.  If your database doesn't check pk for
uniqueness, then there may be more than one user retrieved by the same PK.
So in one case you don't need to do a select to check, and in the other
case your check is inadequate.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: Thu, 24 Jun 2004 09:31:13 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: Help! - Need a CGI redirect which passes a querystring  value
Message-Id: <zIACc.31753$Nz.1637133@news20.bellglobal.com>


"Gunnar Hjalmarsson" <noreply@gunnar.cc> wrote in message
news:2jvk8iF167949U1@uni-berlin.de...
> Matt Garrish wrote:
> > Both mod_perl and aspx scripts only need to compile once (and with
> > mod_perl you can precompile the modules).
>
> Don't take for granted that mod_perl is an available option. The OP in
> this thread used the wording "the Windows server I am hosted on".
>

Granted, but my point was simply that that line of argument is fast becoming
out-dated.

Matt




------------------------------

Date: Thu, 24 Jun 2004 10:07:59 -0400
From: Brad Baxter <bmb@ginger.libs.uga.edu>
Subject: Re: Idiom for partitioning array?
Message-Id: <Pine.A41.4.58.0406240958360.22212@ginger.libs.uga.edu>

On Thu, 24 Jun 2004, Anno Siegel wrote:

> Brad Baxter  <bmb@ginger.libs.uga.edu> wrote in comp.lang.perl.misc:
> > On Wed, 23 Jun 2004, Anno Siegel wrote:
> > > Brad Baxter  <bmb@ginger.libs.uga.edu> wrote in comp.lang.perl.misc:
> > > > On Wed, 23 Jun 2004, Anno Siegel wrote:
>
> [...]
>
> > > > >     map [ splice @a, 0, $n], 0 .. $#a/$n;
> > > >
> > > > Can't beat that.
> > >
> > > It isn't entirely correct, however.  If @a is empty (and $n > 1), it
> > > returns one empty slice, but should return none.  That's because
> > > the implicit int( $#a/$n) is 0, not something negative, when $#a = -1.
> >
> > Which behavior the non-destructive version also exhibits.
> >
> > ... if @a;
> >
> > perhaps?
>
> That, or map [ splice @a, 0, $n], 1 .. $#a/$n + 1;

And so,

sub min { $_[ $_[1] < $_[0] ] }
my @r = map [ @a[ ($_-1)*$n .. min $_*$n-1, $#a ] ], 1 .. $#a/$n + 1;

Can that be classed as idiomatic?  :-)

Brad


------------------------------

Date: Thu, 24 Jun 2004 06:23:41 -0500
From: Rusty Phillips <rustyp@freeshell.org>
Subject: Re: noob trying to learn where to start?
Message-Id: <pan.2004.06.24.11.23.40.378201@freeshell.org>

IMHO, if you've got a few languages under your belt already, you'll waste
a bit of time on the basics with a lot of the "intro" guides. If you've 
coded in other things, what you need is a
guide to the tools available in Perl, rather than the language in 
general (which should be intuitive given experience in other
languages and all the syntactic sugar in perl).  For that I'd go 
with the Perl Cookbook, also an O'reilly book.

But if VB is all you've got, may I suggest a more structured language
first? There are lots of popular ones, and the knowledge will easily
transfer over to perl.  The advantage is that you'll be able to 
write robust code easier after you are forced to do it with 
structure.  At least that was my experience (I started from BASIC
before there was VB).


------------------------------

Date: 24 Jun 2004 13:17:12 +0100
From: Brian McCauley <nobull@mail.com>
Subject: Re: REGEX Negation
Message-Id: <u9y8mdf8vr.fsf@wcl-l.bham.ac.uk>

Rusty Phillips <rustyp@freeshell.org> writes:

> I know about negative lookahead and negative character closures, 
> but I can't find any good way to do actual negation.

There is in general no way to do negation in regex.

> One thing I'd like to use this for is to match quotes while 
> guaranteeing that I'm not matching backslashed quotes (that is, if I
> find a backslash in the string, the quote in front of it should not
> be matched).

You are talking about negative lookbehind. This is documented not far
from where negative lookahead is documented.  

However, one usually looks for an even number of backslahes followed
by a quote.  (Note: zero is an even number).

/(<?!\\)(?:\\\\)*"/

Another approach not using lookbehind is given in the answer to the
FAQ "How can I split a [character] delimited string except when inside
[character]?  (Comma-separated files)"

Not, of course, that you could have been expected to guess that
because yours not really the same question but is in fact the _next_
question people usually ask after asking the one in the FAQ.

> There are many more places where I'd like to use a negation 
> technique -

Sorry, you have to refactor your question so that is does not invlove
negation.

> especially I'd like to match things of the form:
> "match the largest string that doesn't contain the character sequence 
> 'blah.'"

Regex can never find the longest - it will find always the first (or
occasionally the last).  Within matches starting at the same position
it can be made to favour long or short.  So to get the globally
longest match you need to find all such strings and sort.

These strings will be the same set as the set of shortest strings to
start at the beginning of the input or at the 'l' of 'blah' and to end
at the end of the input or at the 'a' of 'blah'

my @substrings = /(?=((?:^|(?<=b)lah).*?(?:$|bla(?=h))))/g;

For example for $_='xxxxblablahwibbleblahfoo' this gives @substrings =
('xxxxblabla','lahwibblebla','lahfoo').

You can then find the longest with sort() or List::Util::reduce().

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\


------------------------------

Date: 24 Jun 2004 04:28:06 -0700
From: nospam@tomcat.ca.tc (Florent Carli)
Subject: Re: Regexp, Strings and spaces
Message-Id: <6d12cccb.0406240328.4261cf62@posting.google.com>

> 
> Sure:  /"?([^"]*)/
> 
This does not work since 'field=hello field2="world"' would get you
'hello field2=' into $1.


------------------------------

Date: 24 Jun 2004 04:35:41 -0700
From: nospam@tomcat.ca.tc (Florent Carli)
Subject: Re: Regexp, Strings and spaces
Message-Id: <6d12cccb.0406240335.e7fceed@posting.google.com>

>    $line =
>     'field1="value with or without spaces" field2=valuewithoutspaces'
> 
>    while ( $line =~ m/="([^"]*)"|=(\w*)/g )
>    {
>       push @res, $1  if defined $1;
>       push @res, $2  if defined $2;
>    }
> 

I think my specifications were bad.
The "line" can be as long as it wants with so many fields.
It can be field1="test" field2=test2 field3="test 3"
field4="testagain"
and the next line could be
field1="test 4" field2="test 5" field3=test_6 field4="test n°7"

What I need was to get value of field2 for any type of field2 I can
get : "value with space", "valuewithoutspace", valuewithoutspace, or
even empty or "".
Any all cases, the value alone (without quotes) must go into $1 and $1
only.

For now, the only regexp able to do this I have found is : 
field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["]))
But like I said, the software I use to parse is using a version of
perl that does not support lookbehinds ...

I'm trying to do basically the same thing windows does when you type :
copy "my file.doc" "d:\my documents"
or
copy myfile.doc d:\

But only with one regexp (and no second pass in perl to remove the
quotes for instance ;) )
any idea ?


------------------------------

Date: 24 Jun 2004 11:54:02 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Regexp, Strings and spaces
Message-Id: <cbefcq$gii$1@mamenchi.zrz.TU-Berlin.DE>

Florent Carli <nospam@tomcat.ca.tc> wrote in comp.lang.perl.misc:
> > 
> > Sure:  /"?([^"]*)/
> > 
> This does not work since 'field=hello field2="world"' would get you
> 'hello field2=' into $1.

I didn't read your original specification that way.

The best solution is probably a module (Text::Balanced, or one of
the CSV modules).  For background information, see the FAQ:

How can I split a [character] delimited string except when inside [character]

Anno


------------------------------

Date: 24 Jun 2004 04:46:29 -0700
From: kcwolle@freenet.de (kcwolle)
Subject: Re: split xml file between two processing instructions
Message-Id: <4e98b249.0406240346.37489ea8@posting.google.com>

Hello Anno,

I tried the following code to split the document. The problem is that
I get only the first two <no> elements and not the first and the last.

use strict;

my $text;
my $file = shift;
my $outfile = shift;
my $testfile;
open(INPUT, "<$file") or die "Kann Datei $file nicht lesen!\n";
local $/;
$text = <INPUT>;
close INPUT;


while ($text =~ /<\?split \?>(.*?)(?=<\?split \?>)/sg)
{
	my $fragment = $1;
  	my ($from, $to) = $fragment =~ /<no>(.*?)<\/no>/isg;
	$testfile = $outfile."\\test-nr".${from}."to".${to}."\.xml",
	open(OUTPUT, ">$testfile") or die "Kann Datei $testfile nicht
schreiben!!!\n";
	print OUTPUT $fragment;
	close OUTPUT;
}

The general problem with using regular expressions is that there could
be broken elements eg
<?split ?><level1><text>xxx</text><level2><text>yyy</text></level2><?split
?><level2><text>zzz</text></level2></level1>
where a level1 tag begins in the first <?split ?> and an ends in the
second.
How can that broken elements be handled, so that I have well-formed
XML.

On the other hand if I use an XML module the PI is a node that has no
children. How can the following nodes up to the next PI handled?

Btw I'm a relative newbie to Perl and XML programming so that I need
some support in these things. Maybe you can help me? :-|

Yours

Wolfgang


------------------------------

Date: Thu, 24 Jun 2004 07:54:21 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: split xml file between two processing instructions
Message-Id: <slrncdljnt.gfk.tadmc@magna.augustmail.com>

kcwolle <kcwolle@freenet.de> wrote:

> The problem is that
> I get only the first two <no> elements and not the first and the last.


>   	my ($from, $to) = $fragment =~ /<no>(.*?)<\/no>/isg;


Use a "list slice" ("Slices" section in perldata.pod) to slice
the list that m//g is returning, like I did in my earlier followup:


    my ($from, $to) = ($fragment =~ /<no>(.*?)<\/no>/isg)[ 0, -1 ];
                      ^                                 ^^^^^^^^^^
                      ^                                 ^^^^^^^^^^

-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Thu, 24 Jun 2004 14:45:12 GMT
From: "monkeyjob" <monkeyjob@ntlworld.com>
Subject: Re: Suggestions on editing multiple files
Message-Id: <6ECCc.220$mt6.25@newsfe5-gui.server.ntli.net>


Hi,
 if your running under microsoft windows, try FileMonkey:
 http://www.monkeyjob.com/FileMonk.html

It can perform search and replace on multiple files in multiple folders
using multi-line search and replace phrases and basic wildcard support.

The product shareware (free for 30 days)

Regards,
www.monkeyjob.com


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6723
***************************************


home help back first fref pref prev next nref lref last post