[31831] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3094 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Aug 22 21:09:24 2010

Date: Sun, 22 Aug 2010 18:09:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 22 Aug 2010     Volume: 11 Number: 3094

Today's topics:
    Re: Multi-level list generation <tuxedo@mailinator.com>
    Re: Multi-level list generation <tuxedo@mailinator.com>
    Re: Multi-level list generation <tuxedo@mailinator.com>
    Re: Multi-level list generation <uri@StemSystems.com>
    Re: Multi-level list generation <stevem_@nogood.com>
        Regex: deleting non-matching words <no_one_you_know@notthisaddress.com>
    Re: Regex: deleting non-matching words <sbryce@scottbryce.com>
    Re: Regex: deleting non-matching words <tadmc@seesig.invalid>
    Re: Regex: deleting non-matching words <derykus@gmail.com>
    Re: Regex: deleting non-matching words <no_one_you_know@notthisaddress.com>
        Simple script execution problems (newbie) <paul@pstech-inc.com>
    Re: Simple script execution problems (newbie) <ben@morrow.me.uk>
    Re: Simple script execution problems (newbie) <stevem_@nogood.com>
    Re: Simple script execution problems (newbie) <paul@pstech-inc.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 22 Aug 2010 21:30:20 +0200
From: Tuxedo <tuxedo@mailinator.com>
Subject: Re: Multi-level list generation
Message-Id: <i4rtsd$46n$03$1@news.t-online.com>

Steve wrote:

> On 08/22/2010 07:19 AM, Tuxedo wrote:
> > Steve wrote:
> >
> >> On 08/21/2010 03:32 PM, Tuxedo wrote:
> >>> Steve wrote:
> >>>
> >>>> On 08/21/2010 02:04 PM, Tuxedo wrote:
> >>>>> I have a question about how to generate a multi-level (nested) list
> >>>>> structure by perl. I currently have a 2-level
> >>>>> <ul><li><ul><li></li></ul></li></ul>    structure produced via a
> >>>>> perl script, which works fine for its purpose. An example HTML
> >>>>> output by the existing script is:
> >>>>>
> >>>>> <ul>
> >>>>>       <li><a href=subject_1.0.html>Subject 1.0</a>
> >>>>>          <ul>
> >>>>>             <li><a href=page_1.1.html>Page 1.1</a></li>
> >>>>>             <li><a href=page_1.2.html>Page 1.2</a></li>
> >>>>>             <li><a href=page_1.3.html>Page 1.3</a></li>
> >>>>>         </ul>
> >>>>>       </li>
> >>>>> </ul>
> >>>>>
> >>>>> There's more non-relevant code that I've stripped for a bit of
> >>>>> clarity, such as CSS etc. In fact, the actual HTML code is largely
> >>>>> irrelavant. Anyway, a barebone version of the perl procedure
> >>>>> generating the above is:
> >>>>>
> >>>>> #!/usr/bin/perl -w
> >>>>>
> >>>>> use Tie::IxHash;
> >>>>> use strict;
> >>>>> use warnings;
> >>>>>
> >>>>> my $object1 = tie my %listoflinks, "Tie::IxHash";
> >>>>>
> >>>>> %listoflinks = ('subject_1.0.html', =>    'Subject 1.0',
> >>>>>                    'page1.1.html', =>    'Page 1.1',
> >>>>>                    'page1.2.html', =>    'Page 1.2',
> >>>>>                    'page1.3.html', =>    'Page 1.3');
> >>>>>
> >>>>> for (\%listoflinks) {
> >>>>> my $firstkey = each %$_;
> >>>>>
> >>>>> print "<ul>\n"; # open 1st UL
> >>>>>
> >>>>> print "<li><a href=$firstkey>$listoflinks{$firstkey}</a>\n"; # open
> >>>>> 1st LI
> >>>>>
> >>>>> print "<ul>\n"; # open nested UL
> >>>>>
> >>>>> while ( local $_ = each %$_ ) {
> >>>>>       { print "<li><a href=$_>$listoflinks{$_}</a></li>    \n" } #
> >>>>>       print some LI's }
> >>>>>
> >>>>> print "</ul>\n"; # close nested UL
> >>>>> print "</li>\n"; # close first LI
> >>>>> print "</ul>\n"; # close first UL
> >>>>>
> >>>>> }
> >>>>>
> >>>>> The above procedure was put together with good help from this group
> >>>>> ages ago. As mentioned, the code takes care of the 2-level list
> >>>>> structure and does so by fetching the $firstkey from the array
> >>>>> entries or LoH and inserting the needed opening and closing UL's and
> >>>>> LI's in the right places.
> >>>>>
> >>>>> However, I'm not quite sure how to change the script to generate a
> >>>>> third level, such as:
> >>>>>
> >>>>> <ul>
> >>>>>       <li><a href=subject_1.0.html>Subject 1.0</a>
> >>>>>          <ul>
> >>>>>             <li><a href=page_1.1.html>Page 1.1</a></li>
> >>>>>             <li><a href=page_1.2.html>Page 1.2</a></li>
> >>>>>             <li><a href=page_1.3.html>Page 1.3</a></li>
> >>>>>             <li><a href=subject_2.0.html>Subject 2.0</a>
> >>>>>                <ul>
> >>>>>                   <li><a href=page_2.1.html>Page 2.1</a></li>
> >>>>>                   <li><a href=page_2.1.html>Page 2.2</a></li>
> >>>>>                </ul>
> >>>>>             </li>
> >>>>>         </ul>
> >>>>>       </li>
> >>>>> </ul>
> >>>>>
> >>>>> Or for example, the same structure, with another two second-level
> >>>>> list items at the end:
> >>>>>
> >>>>> <ul>
> >>>>>       <li><a href=subject_1.0.html>Subject 1.0</a>
> >>>>>          <ul>
> >>>>>             <li><a href=page_1.1.html>Page 1.1</a></li>
> >>>>>             <li><a href=page_1.2.html>Page 1.2</a></li>
> >>>>>             <li><a href=page_1.3.html>Page 1.3</a></li>
> >>>>>             <li><a href=subject_2.0.html>Subject 2.0</a>
> >>>>>                <ul>
> >>>>>                   <li><a href=page_2.1.html>Page 2.1</a></li>
> >>>>>                   <li><a href=page_2.1.html>Page 2.2</a></li>
> >>>>>                </ul>
> >>>>>             </li>
> >>>>>             <li><a href=page_1.4.html>Page 1.4</a></li>
> >>>>>             <li><a href=page_1.5.html>Page 1.5</a></li>
> >>>>>         </ul>
> >>>>>       </li>
> >>>>> </ul>
> >>>>>
> >>>>> Naturally a different array structure would be required in my
> >>>>> %listoflinks to output the above. Any advise or examples how this
> >>>>> may be pieced together would be most helpful.
> >>>>>
> >>>>> Perhaps someone has a procedure in use that does something similar
> >>>>> already?
> >>>>>
> >>>>> Many thanks,
> >>>>> Tuxedo
> >>>>>
> >>>>> NB: System load efficiency is not an issue, as the procedure will
> >>>>> run only occasionally on a local machine to generate HTML sent onto
> >>>>> a web server in static format. In other words, the script will not
> >>>>> run against any real web page requests. The procedure is simply
> >>>>> meant to be an easy maintenance tool.
> >>>>>
> >>>>>
> >>>>
> >>>> I use recursive routines a *lot* in printing out nested data
> >>>> structures, and they are your friend in cases like this...
> >>>>
> >>>> The below is:
> >>>> 1) not tested in any way,
> >>>> 2) may not even compile,
> >>>> 3) and is just a concept.
> >>>>
> >>>> %hash = ( whatever, too lazy to make one );
> >>>>
> >>>> recurse_hash( \%hash );
> >>>>
> >>>> sub recurse_hash {
> >>>>        my $refhash = shift;
> >>>>        $refhash or return '';
> >>>>
> >>>>        print "<ul>\n";
> >>>>
> >>>>        while( keys %{$refhash} ){
> >>>>            if( ref $refhash->{$_} eq 'HASH' ){
> >>>>                recurse_hash( $refhash->{$_} );
> >>>>            }
> >>>>            else{
> >>>>                print "<li>$refhash->{$_}</li>\n";
> >>>>            }
> >>>>        }
> >>>>
> >>>>        print "</ul>\n";
> >>>> }
> >>>>
> >>>> The beauty of a recursive is it flat doesn't matter how many levels
> >>>> deep the data structure is.
> >>>>
> >>>> The downside is it flat doesn't matter how many levels deep the
> >>>> recursive 'thinks' the data structure is and sloppy programming can
> >>>> bite you big time..... infinite recursion anyone?
> >>>>
> >>>> hth,
> >>>>
> >>>> \s
> >>>>
> >>>>
> >>>
> >>> Thanks for the above solution! However, it is a bit difficult for me
> >>> to figure exactly how the %hash = (part may be composed) to generate a
> >>> multi-level list structure. Any additional pointers anyone?
> >>>
> >>> Tuxedo
> >>>
> >>>
> >>>
> >>
> >> Well.....
> >>
> >> my %hash = (
> >>       key1 =>  { subkey =>  'value',
> >>                 subhash =>  {
> >>                             subsubkey1 =>  'value',
> >>                             subsubkey2 =>  'another value',
> >>                            },
> >>               },
> >>       key2 =>  'value',
> >>       etc  =>  {
> >>               },
> >>
> >> );
> >>
> >> But, 'more better' would be to go a reliable (and vetted) source like:
> >>
> >> http://perldoc.perl.org/perldsc.html
> >>
> >> hth,
> >>
> >> \s
> >
> >
> > Thanks for the hash example and perldoc resource. However, it's still a
> > bit confusing. To gain a bit further understanding I tried to run your
> > script 'as is' but it failed to compile, as of course you warned me it
> > may do. In saving your script as a file named testrun.pl and running it,
> > the following errors are repeatedly returned to the shell until I hit
> > Ctrl+C:
> >
> > <li></li>
> > Use of uninitialized value in hash element at ./testrun.pl line 27.
> > Use of uninitialized value in hash element at ./testrun.pl line 31.
> > Use of uninitialized value in concatenation (.) or string at
> > ./testrun.pl line 31.
> > <li></li>
> > Use of uninitialized value in hash element at ./testrun.pl line 27.
> > Use of uninitialized value in hash element at ./testrun.pl line 31.
> > Use of uninitialized value in concatenation (.) or string at
> > ./testrun.pl line 31.
> > <li></li>
> >
> > etc.
> >
> > Unfortunately I do not understand the meaning of the above errors and if
> > I
> > redirect the output to a file, like in ./testrun.pl>  file.txt, the file
> > contains a long list of empty<li></li>  containers after an opening<ul>:
> >
> > <ul>
> > <li></li>
> > <li></li>
> > <li></li>
> > <li></li>
> > <li></li>
> > <li></li>
> > <li></li>
> > <li></li>
> > <li></li>
> > <li></li>
> >
> > etc...
> >
> > Below is the exact copy of what I tried to run:
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> > use warnings;
> >
> > my %hash = (
> >       key1 =>  { subkey =>  'value',
> >                 subhash =>  {
> >                             subsubkey1 =>  'value',
> >                             subsubkey2 =>  'another value',
> >                            },
> >               },
> >       key2 =>  'value',
> >
> > );
> >
> >
> > recurse_hash( \%hash );
> >
> > sub recurse_hash {
> >       my $refhash = shift;
> >       $refhash or return '';
> >
> >       print "<ul>\n";
> >
> >       while( keys %{$refhash} ){
> >           if( ref $refhash->{$_} eq 'HASH' ){
> >               recurse_hash( $refhash->{$_} );
> >           }
> >           else{
> >               print "<li>$refhash->{$_}</li>\n";
> >           }
> >       }
> >
> >       print "</ul>\n";
> > }
> >
> > Anyone knows where the error(s) in the above procedure are?
> >
> > Thanks again,
> > Tuxedo
> >
> >
> >
> 
> OK, here:
> 1) tested
> 2) compiles
> 3) works
> 
> 
> #! /usr/bin/perl
> 
> use warnings;
> use strict;
> 
> my %one = (
>      onek1v1 => 'hash one value 1',
>      onek2v1 => 'hash one value 2',
> );
> 
> my %oneref = (
>      onerefk1v1 => { anotherlevel => 'hash one down two' },
>      onerefval  => 'hash one value down one',
> );
> 
> $one{'downone'} = \%oneref;
> 
> my %two = (
>      twok1v1 => 'hash two value 1',
>      twok2v1 => 'hash two value 2',
> );
> 
> my %three = (
>      threek1v1 => 'hash three value 1',
>      threek2v1 => 'hash three value 2',
> );
> 
> my %main_hash = (
>      one => \%one,
>      two => \%two,
>      three => \%three,
> );
> 
> recurse_hash( \%main_hash );
> 
> our $spct = 0;  ## track leading spaces count
> 
> sub recurse_hash {
> 
>         my $refhash = shift;
>         $refhash or return '';
> 
>         $spct and print  '  ' x $spct;  # spacecount x spacespace
> 
>         print "<ul>\n";
> 
>         $spct += 1;
> 
>         for( keys %{$refhash} ){
> 
>             if( ref $refhash->{$_} eq 'HASH' ){
> 
>                 $spct += 1;
> 
>                 recurse_hash( $refhash->{$_} );
> 
>                 $spct -= 1;
>             }
>             else{
> 
>                 $spct+= 1;
>                 print '  ' x $spct;
> 
>                 print "<li>$refhash->{$_}</li>\n";
> 
>                 $spct -= 1;
>             }
>         }
> 
>         $spct -= 1;
> 
>         print '  ' x $spct;
> 
>         print "</ul>\n";
> 
>   }
> 
> exit;
> 
> 
> Command line Output:
> 
> <ul>
>      <ul>
>          <li>hash three value 1</li>
>          <li>hash three value 2</li>
>      </ul>
>      <ul>
>          <li>hash one value 2</li>
>          <ul>
>              <li>hash one value down one</li>
>              <ul>
>                  <li>hash one down two</li>
>              </ul>
>          </ul>
>          <li>hash one value 1</li>
>      </ul>
>      <ul>
>          <li>hash two value 2</li>
>          <li>hash two value 1</li>
>      </ul>
> </ul>
> 
> 
> 
> You'll note the order is NOT preserved as I'm not using Tie::IxHash.
> 
> Anyway, play with the script portion to get a feel for what is going on.
> Add/Remove hash refs, whatever.
> 
> Once the concept starts sinking in you will have taken a major step in
> understanding/using Perl.
> 
> I read somewhere once that if you are not using hashes you are not doing
> Perl.
> 
> Spot on.
> 
> \s


Thanks for this excellent example! 

The only odd thing I see at first sight is that the first <ul> is not 
followed directly by an <li> at any level, which is the intended structure 
of the particular HTML list. Instead, there is a <ul><ul> structure. I 
think I need to apply Tie::IxHash to see through this a bit better. I will 
tinker with it and will surely find it a useful learning experience :-)

Thanks again,
Tuxedo






------------------------------

Date: Sun, 22 Aug 2010 21:39:18 +0200
From: Tuxedo <tuxedo@mailinator.com>
Subject: Re: Multi-level list generation
Message-Id: <i4rud7$fe4$00$1@news.t-online.com>

Tad McClellan wrote:

> Tuxedo <tuxedo@mailinator.com> wrote:
> 
> > 'as is' but it failed to compile,
> 
> 
> None of the messages you've shown indicate a failure to compile.
> 
> What makes you think compilation failed?
> 
> 
> > as of course you warned me it may do. In
> > saving your script as a file named testrun.pl and running it, the
> > following errors are repeatedly returned to the shell until I hit
> > Ctrl+C:
> >
> ><li></li>
> > Use of uninitialized value in hash element at ./testrun.pl line 27.
> > Use of uninitialized value in hash element at ./testrun.pl line 31.
> > Use of uninitialized value in concatenation (.) or string at
> > ./testrun.pl line 31.
> ><li></li>
> > Use of uninitialized value in hash element at ./testrun.pl line 27.
> > Use of uninitialized value in hash element at ./testrun.pl line 31.
> > Use of uninitialized value in concatenation (.) or string at
> > ./testrun.pl line 31.
> 
> 
> None of those are error messages.
> 
> They are warning messages.
> 
> 
> > Unfortunately I do not understand the meaning of the above errors
> 
> 
> Even after looking up the messages in Perl's standard docs?
> 
>     perldoc perldiag

Yes, warnings. You're right. Will take a note of 'perldoc perldiag'.

> > and if I
> > redirect the output to a file, like in ./testrun.pl > file.txt, the file
> > contains a long list of empty <li></li> containers after an opening
> > <ul>:
> 
> 
> If the program ran, then it must have compiled successfully...

It sure ran, in fact it didn't stop running without Ctrl+C. As with errors 
vs. warnings, just wrong use of terminology on my part.

Thanks for putting the records straight!

Tuxedo


------------------------------

Date: Sun, 22 Aug 2010 22:16:18 +0200
From: Tuxedo <tuxedo@mailinator.com>
Subject: Re: Multi-level list generation
Message-Id: <i4s0ii$jqa$00$1@news.t-online.com>

Uri Guttman wrote:

> >>>>> "T" == Tuxedo  <tuxedo@mailinator.com> writes:
> 
>   T> Uri Guttman wrote:
>   >> 
>   >> the OP isn't parsing but generating html. parsing it should be done
>   >> with a module. generating it is done well with templates but he still
>   >> needs to learn data structures to work with them. regardless of the
>   >> technology, nested html needs nested data which means perl data
>   >> structures. they are used in some many perl programs that they are
>   >> critical to learn early one. perlreftut, perldsc and perllol are
>   >> required reading from the perl docs.
> 
>   T> Will look into perlreftut, perldsc and perllol. My task may seem
>   trivial, T> but as you say, the relevant data structures are needed,
>   which makes this T> one a hard nut to crack without a fairly deep level
>   of perl knowledge.
> 
> i consider perl refs and data structure mid-level perl and not deep
> knowledge. they aren't that hard to learn and they are used all the time
> which make them important to learn.
> 
> as for templating html, there are many choices. i, of course, recommend
> Template::Simple which you can learn quickly and will help in this
> task. regardless of the method you need to learn perl data structures.
> 
> uri
> 

Point taken! I'm sure even a basic understanding of data structures is a 
requirement to use perl for any advanced programming rather than some kind 
of rudimentary shell like enhancement tool.

Tuxedo



------------------------------

Date: Sun, 22 Aug 2010 18:37:57 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Multi-level list generation
Message-Id: <87bp8us27u.fsf@quad.sysarch.com>


have you ever heard of editing quoted posts? read the group guidelines
for more on this. 291 lines of quote and maybe 100 lines of new stuff is
not a good ratio.

also using named variables like that makes your solution unextendable
and hard to change. this should be done with anon hashes and such. and
there are many ways to do this.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Sun, 22 Aug 2010 15:54:54 -0500
From: Steve <stevem_@nogood.com>
Subject: Re: Multi-level list generation
Message-Id: <BWhco.21759$1v3.6235@newsfe20.iad>

On 08/22/2010 05:37 PM, Uri Guttman wrote:
>
> have you ever heard of editing quoted posts? read the group guidelines
> for more on this. 291 lines of quote and maybe 100 lines of new stuff is
> not a good ratio.
>

Yep, I agree absolutely. And I cringed when I posted all that 'stuff'.

Given the newbyness of the OP though, I wondered if it might make sense 
to keep it all in one post......

> also using named variables like that makes your solution unextendable
> and hard to change. this should be done with anon hashes and such. and
> there are many ways to do this.
>

:-) See above....

I normally would NOT name all that nonsense in working code. In fact I 
would avoid constructing a data hash by hand like the plague. That is 
what code is for.

But again...

My thinking was that if the OP was struggling to understand references, 
compacting/simplifying the code down might not help the matter.

I've been known to be wrong, though.....

> uri
>


thanks,

\s


------------------------------

Date: Sun, 22 Aug 2010 21:06:00 GMT
From: pete <no_one_you_know@notthisaddress.com>
Subject: Regex: deleting non-matching words
Message-Id: <slrni7349o.1cu.no_one_you_know@corv.local>

I have input strings where some words start with an underscore. The plan
is to remove all words that do NOT strt with an underscore and simply
keep the rest. So for example starting with
"word1 word2 _word3 word4 word5 _word6 _word7 word8"

I'm trying to end up with 
"_word3 _word6 _word7"

The expression I have got so far is s/.*?(_[a-z0-9]+).*?/ $1/gi;
and my understanding is as follows:
The first ".*?" part removes everything up to the first matching RE
The "(_[a-z0-9]+)" matches any letter/number combination that starts
with an underscore [sidenote: yes, I know: \w+]
The final ".*?" removes everything up to the next match, or up to
the end of the string.

Here's how I have the RE in a program
$_=(<>);
s/.*?(_[a-z0-9]+).*?/ $1/gi;
print "Have: $_";

and here's how I run it:
echo "word1 word2 _word3 word4 word5 _word6 _word7 word8" | perl s.pl

and here's the output I get:
Have:  _word3 _word6 _word7 word8

Question: Why didn't "word8" get eaten like all its precedessors? and
what do I have to do to match it for removal.

If you have time, I'm looking for enlightenment more than solutions. I
am obviously missing something crucial, but all the online tutorials
I've found stop short of explaining this sort of thing.


------------------------------

Date: Sun, 22 Aug 2010 15:58:16 -0600
From: Scott Bryce <sbryce@scottbryce.com>
Subject: Re: Regex: deleting non-matching words
Message-Id: <i4s6i3$91l$1@news.eternal-september.org>

pete wrote:
> I have input strings where some words start with an underscore. The plan
> is to remove all words that do NOT strt with an underscore and simply
> keep the rest. So for example starting with
> "word1 word2 _word3 word4 word5 _word6 _word7 word8"
> 
> I'm trying to end up with 
> "_word3 _word6 _word7"
> 
> The expression I have got so far is s/.*?(_[a-z0-9]+).*?/ $1/gi;
> and my understanding is as follows:
> The first ".*?" part removes everything up to the first matching RE
> The "(_[a-z0-9]+)" matches any letter/number combination that starts
> with an underscore [sidenote: yes, I know: \w+]
> The final ".*?" removes everything up to the next match, or up to
> the end of the string.
> 
> Here's how I have the RE in a program
> $_=(<>);
> s/.*?(_[a-z0-9]+).*?/ $1/gi;
> print "Have: $_";
> 
> and here's how I run it:
> echo "word1 word2 _word3 word4 word5 _word6 _word7 word8" | perl s.pl
> 
> and here's the output I get:
> Have:  _word3 _word6 _word7 word8
> 
> Question: Why didn't "word8" get eaten like all its precedessors? and
> what do I have to do to match it for removal.

An RE does not remove anything, as you have suggested. An RE matches
something. A substitute replaces whatever is matched with the
replacement string.

After you have matched '_word3', '_word6', and '_word7', nothing else in
the string matches your RE, so no further substitutions are made.

When a string can be split into words, and each word evaluated based on
its first character, I wouldn't use REs.

----------------

use strict;
use warnings;

my $original_string = 'word1 word2 _word3 word4 word5 _word6 _word7 word8';
my @word_list;

for my $word (split ' ', $original_string)
{
	push @word_list, $word if index($word, '_') == 0;
}

my $new_string = join ' ', @word_list;
print $new_string;


------------------------------

Date: Sun, 22 Aug 2010 17:03:29 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Regex: deleting non-matching words
Message-Id: <slrni737dn.ejb.tadmc@tadbox.sbcglobal.net>

pete <no_one_you_know@notthisaddress.com> wrote:
> I have input strings where some words start with an underscore. The plan
> is to remove all words that do NOT strt with an underscore and simply
> keep the rest. So for example starting with
> "word1 word2 _word3 word4 word5 _word6 _word7 word8"
>
> I'm trying to end up with 
> "_word3 _word6 _word7"
>
> The expression I have got so far is s/.*?(_[a-z0-9]+).*?/ $1/gi;


What will your program do if a word has an "interior" underscore, like:

    word1 word2 _word3 word4_and_four word5 _word6 _word7 word8

Try your program on it. Is that what you want to happen in that case?

If not, then you probably want to make use of a "word boundary" (\b)
assertion. See the "Assertions" section in:

    perldoc perlre


> and my understanding is as follows:
> The first ".*?" part removes everything up to the first matching RE


 .*? *matches* everything up to the first word that starts with
an underscore.

It is part of an RE there is no "first RE" or "second RE". 

The s/// operator takes a *single* regular expression.

Regular expressions never "remove" anything, they only "match"
or "do not match".

It is the s/// *operator* that does the "removing".


> The "(_[a-z0-9]+)" matches any letter/number combination that starts
> with an underscore [sidenote: yes, I know: \w+]
> The final ".*?" removes everything up to the next match, or up to
> the end of the string.


The final .*? has no effect whatsoever. You get the same output
if you remove it.

 .*? means "match zero or more, preferring the shortest", so it 
always matches zero characters. (only because it is last in
your particular regular expression)


> and here's the output I get:
> Have:  _word3 _word6 _word7 word8
>
> Question: Why didn't "word8" get eaten like all its precedessors? 


Because it was never matched by anything.

The s/// operator does nothing when it fails to match.


> and
> what do I have to do to match it for removal.
>
> If you have time, I'm looking for enlightenment more than solutions. I
> am obviously missing something crucial, but all the online tutorials
> I've found stop short of explaining this sort of thing.


Because "this sort of thing" is highly dependent on both the pattern
being matched, and the string that it is being matched against.

You need to "become the regex engine" and walk through its operation
on your particular string. The first match is:

word1 word2 _word3 word4 word5 _word6 _word7 word8
 ............^^^^^^          

after the 1st iteration of s///g you are left with

 _word3 word4 word5 _word6 _word7 word8
       ^

with the regex's pos() pointer as marked, match again from that pos():

 _word3 word4 word5 _word6 _word7 word8
       .............^^^^^^

then do the substitution leaving

 _word3 _word6 _word7 word8
              ^

match yet again:

 _word3 _word6 _word7 word8
              .^^^^^^

leaving:

 _word3 _word6 _word7 word8
                     ^

match again: match fails, no substitution is performed s///g is all done.




If this task was for me to do, I would either use a m//g in list context:

    $_ = join ' ', /\b(_[a-z0-9]+)/g;

or separate out the words and find which ones start with an underscore:

    $_ = join ' ', grep /^_/, split;


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.


------------------------------

Date: Sun, 22 Aug 2010 16:18:26 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: Regex: deleting non-matching words
Message-Id: <1f1cb064-d994-4ae9-8c65-4e17c237dc62@k17g2000prf.googlegroups.com>

On Aug 22, 2:06=A0pm, pete <no_one_you_k...@notthisaddress.com> wrote:
> I have input strings where some words start with an underscore. The plan
> is to remove all words that do NOT strt with an underscore and simply
> keep the rest. So for example starting with
> "word1 word2 _word3 word4 word5 _word6 _word7 word8"
>
> I'm trying to end up with
> "_word3 _word6 _word7"
>
> The expression I have got so far is s/.*?(_[a-z0-9]+).*?/ $1/gi;
> and my understanding is as follows:
> The first ".*?" part removes everything up to the first matching RE
> The "(_[a-z0-9]+)" matches any letter/number combination that starts
> with an underscore [sidenote: yes, I know: \w+]
> The final ".*?" removes everything up to the next match, or up to
> the end of the string.
>
> Here's how I have the RE in a program
> $_=3D(<>);
> s/.*?(_[a-z0-9]+).*?/ $1/gi;
> print "Have: $_";
>
> and here's how I run it:
> echo "word1 word2 _word3 word4 word5 _word6 _word7 word8" | perl s.pl
>
> and here's the output I get:
> Have: =A0_word3 _word6 _word7 word8
>
> Question: Why didn't "word8" get eaten like all its precedessors? and
> what do I have to do to match it for removal.
>
> If you have time, I'm looking for enlightenment more than solutions. I
> am obviously missing something crucial, but all the online tutorials
> I've found stop short of explaining this sort of thing.

The problem is that once you've matched a
target substring, ie,  _[a-z0-9]+  then the
regex .*? lazily stops as soon as possible
since .*? says match any character 0 or more
times minimally (also termed lazily). So the
regex lazily chooses 0 and completes a match.

That works but then the only glitch is that
the lazy .*? fails to consume the rest of the
string once the final target_word7 is found
and you're left with ' word8'.

One way to fix that:

 s/ .*?               # match minimally
   ( _[a-z0-9]+ | $ ) # up to target or eol
  / $1/gix;

Now  the regex matches_word7, but then
tries to match one of two alternatives:

  Either: _[a-z0-9]+
  or:     end-of-line

The former isn't found but latter is and
the rest of the string is consumed up to
the end-of-line just before \n.

--
Charles DeRykus


------------------------------

Date: Mon, 23 Aug 2010 00:43:54 GMT
From: pete <no_one_you_know@notthisaddress.com>
Subject: Re: Regex: deleting non-matching words
Message-Id: <slrni73h2a.8l9.no_one_you_know@corv.local>

Thanks Charles, Scott and Tad 
I do appreciate the time you've all taken to put me right. The big thing
I was missing was the REGEX pointer and how it iterates through input
string. I did think that the trailing ".*?" would match everything after
the last bracketed part of the RE but now I think I understand better
why it doesn't.

I have to admit, I've never used Perl's grep - though I use the Unix/Linux
versions frequently. Looks like I have a new tool to play with!

Pete


------------------------------

Date: Sun, 22 Aug 2010 17:37:00 -0400
From: "Paul E. Schoen" <paul@pstech-inc.com>
Subject: Simple script execution problems (newbie)
Message-Id: <iMgco.50634$1F6.31518@newsfe01.iad>

I am trying to set up a webspace so I can use CGI, and I'm getting errors 
and/or it is not behaving as it should using IE8 or Firefox. I read the FAQ 
and it suggests that this should be asked in 
comp.infosystems.www.authoring.CGI, but that NG has been removed. So, at the 
very least, please change the perl FAQ to reflect this and provide a NG that 
is appropriate.

I have a simple perl script which I copied from documentation here: 
http://blog.dreamhosters.com/kbase/index.cgi?area=144

  #!/usr/bin/perl
  use CGI;
  my $query= new CGI;
  print $query->header;
  print "hello people in my head\n";

Here are the problems I'm having:

On the dreamhost server http://www.pauleschoen.com/cgi-bin/first.pl, I get 
Internal Server Error 500 and the error log reports "Premature end of script 
headers:". On my other server, www.smart.net/~pstech/cgi-bin/first.pl, the 
IE8 browser tries to download the file, and Firefox just shows the source 
code text. If I use telnet to execute the script in smart.net, it seems to 
echo the correct header and text. I have also used the pico editor to 
eliminate possible problems with CR+LF newlines. I have not been able to log 
on to the dreamhost server with telnet.

I have tried changing the permissions as recommended, and I have the 
 .htaccess file configured as:

  Options +ExecCGI

My next step will probably be to contact the support or community 
discussions for these servers, but I was hoping that someone here might be 
able to help. I am planning to do some less trivial programming in Perl once 
I get this simple script working as it should.

Thanks!

Paul



------------------------------

Date: Sun, 22 Aug 2010 23:00:01 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Simple script execution problems (newbie)
Message-Id: <18j8k7-m8u2.ln1@osiris.mauzo.dyndns.org>


Quoth "Paul E. Schoen" <paul@pstech-inc.com>:
> I am trying to set up a webspace so I can use CGI, and I'm getting errors 
> and/or it is not behaving as it should using IE8 or Firefox. I read the FAQ 
> and it suggests that this should be asked in 
> comp.infosystems.www.authoring.CGI, but that NG has been removed. So, at the 
> very least, please change the perl FAQ to reflect this and provide a NG that 
> is appropriate.

There is no guarantee that there is one. (If the FAQ points to a
non-existent NG that should be removed, of course.)

> I have a simple perl script which I copied from documentation here: 
> http://blog.dreamhosters.com/kbase/index.cgi?area=144
> 
>   #!/usr/bin/perl
>   use CGI;
>   my $query= new CGI;
>   print $query->header;
>   print "hello people in my head\n";
> 
> Here are the problems I'm having:
> 
> On the dreamhost server http://www.pauleschoen.com/cgi-bin/first.pl, I get 
> Internal Server Error 500 and the error log reports "Premature end of script 
> headers:".

Is /usr/bin/perl a valid path to perl on that machine? Is the script
executable? Does the webserver user (or whatever user is used to run the
CGI) have read and execute access to the script and execute access to
all directories down from the root? Does that perl have CGI installed?
Is mod_perl involved? Do you see any other errors? Have you looked for
any Dreamhost documentation on running Perl CGI scripts?

> On my other server, www.smart.net/~pstech/cgi-bin/first.pl, the 
> IE8 browser tries to download the file, and Firefox just shows the source 
> code text.

Then you haven't got CGI enabled for that directory, or the script isn't
executable, or .pl is mapped to something other than CGI, or...

> If I use telnet to execute the script in smart.net, it seems to 
> echo the correct header and text.

How did you execute it--by executing the file directly, or by running
perl script.pl? Were you doing this as the webserver user or as
yourself?

> I have also used the pico editor to 
> eliminate possible problems with CR+LF newlines. I have not been able to log 
> on to the dreamhost server with telnet.
> 
> I have tried changing the permissions as recommended, and I have the 
> .htaccess file configured as:
> 
>   Options +ExecCGI

This won't necessarily have any effect. It depends on how the server is
configured (which is OT for clpmisc).

> My next step will probably be to contact the support or community 
> discussions for these servers,

That would be a good idea.

Be



------------------------------

Date: Sun, 22 Aug 2010 16:38:49 -0500
From: Steve <stevem_@nogood.com>
Subject: Re: Simple script execution problems (newbie)
Message-Id: <Mzico.3494$cE1.1384@newsfe18.iad>

On 08/22/2010 04:37 PM, Paul E. Schoen wrote:
> I am trying to set up a webspace so I can use CGI, and I'm getting
> errors and/or it is not behaving as it should using IE8 or Firefox. I
> read the FAQ and it suggests that this should be asked in
> comp.infosystems.www.authoring.CGI, but that NG has been removed. So, at
> the very least, please change the perl FAQ to reflect this and provide a
> NG that is appropriate.
>
> I have a simple perl script which I copied from documentation here:
> http://blog.dreamhosters.com/kbase/index.cgi?area=144
>
> #!/usr/bin/perl

# EEEK!!!

> use CGI;
> my $query= new CGI;
> print $query->header;
> print "hello people in my head\n";
>

Keep it *really* simple:

#!/usr/bin/perl

use warnings;  # just do it, don't worry about why for now
use strict;    # you'll save oodles of time debugging code later

print "Content-type: text/html; charset=utf-8\n\n

print "Hello World\n";

exit;


> Here are the problems I'm having:
>
> On the dreamhost server http://www.pauleschoen.com/cgi-bin/first.pl, I
> get Internal Server Error 500 and the error log reports "Premature end
> of script headers:". On my other server,

Mime-type, permissions, ownership, combination of.... sounds like 
cgi-bin might be set up more or less correctly though.

Assuming the webserver thinks the script owner and user configured to 
run the script match up. (IF SUEXEC is enabled. If it isn't: run screaming)

> www.smart.net/~pstech/cgi-bin/first.pl, the IE8 browser tries to
> download the file, and Firefox just shows the source code text.

Sounds like cgi-bin is NOT configured correctly in httpd.conf.

Ah... did *you* create the cgi-bin directory or was it already there?


 > If I use
> telnet to execute the script in smart.net, it seems to echo the correct
> header and text. I have also used the pico editor to eliminate possible
> problems with CR+LF newlines. I have not been able to log on to the
> dreamhost server with telnet.
>
> I have tried changing the permissions as recommended, and I have the
> .htaccess file configured as:
>
> Options +ExecCGI

The above may or may not do any good depending on Server configuration.

>
> My next step will probably be to contact the support or community
> discussions for these servers, but I was hoping that someone here might
> be able to help. I am planning to do some less trivial programming in
> Perl once I get this simple script working as it should.
>
> Thanks!
>
> Paul
>

Here is a link to a page you may find useful:

http://brian-d-foy.cvs.sourceforge.net/viewvc/brian-d-foy/CGI_MetaFAQ/CGI_MetaFAQ.html

And you really should set up a web server on your local box. The first 
time you fire off a script with an infinite while(){} loop in it, your 
providers will not be happy..... and it *will* happen.

hth,

\s






------------------------------

Date: Sun, 22 Aug 2010 20:55:12 -0400
From: "Paul E. Schoen" <paul@pstech-inc.com>
Subject: Re: Simple script execution problems (newbie)
Message-Id: <5Gjco.26337$EF1.6580@newsfe14.iad>


"Ben Morrow" <ben@morrow.me.uk> wrote in message 
news:18j8k7-m8u2.ln1@osiris.mauzo.dyndns.org...
>
> Quoth "Paul E. Schoen" <paul@pstech-inc.com>:
>> I am trying to set up a webspace so I can use CGI, and I'm getting errors
>> and/or it is not behaving as it should using IE8 or Firefox. I read the 
>> FAQ
>> and it suggests that this should be asked in
>> comp.infosystems.www.authoring.CGI, but that NG has been removed. So, at 
>> the
>> very least, please change the perl FAQ to reflect this and provide a NG 
>> that
>> is appropriate.
>
> There is no guarantee that there is one. (If the FAQ points to a
> non-existent NG that should be removed, of course.)

It appears that comp.infosystems.www.authoring.misc is still supported, but 
very little activity. Last post 7/6/2010. I'll try there.


>> I have a simple perl script which I copied from documentation here:
>> http://blog.dreamhosters.com/kbase/index.cgi?area=144
>>
>>   #!/usr/bin/perl
>>   use CGI;
>>   my $query= new CGI;
>>   print $query->header;
>>   print "hello people in my head\n";
>>
>> Here are the problems I'm having:
>>
>> On the dreamhost server http://www.pauleschoen.com/cgi-bin/first.pl, I 
>> get
>> Internal Server Error 500 and the error log reports "Premature end of 
>> script
>> headers:".
>
> Is /usr/bin/perl a valid path to perl on that machine?

I can't access it using FTP, and I can't use Telnet, so I'm not 100% sure. 
But that is the standard location and it is from the documentation on the 
server.


> Is the script executable?

I have permissions set as 755, so it should be executable.


> Does the webserver user (or whatever user is used to run the
> CGI) have read and execute access to the script and execute access to
> all directories down from the root?

The directory permissions are all 755


> Does that perl have CGI installed?

It should be. Here is the documentation: http://wiki.dreamhost.com/CGI


> Is mod_perl involved?

I don't know.


> Do you see any other errors?

There is only a "suexec failure: could not open log file", "fopen: 
Permission denied", "Premature end of script headers: test.cgi", "File does 
not exist: /home/pes1949/pauleschoen.com/internal_error.html"

I could not set the permissions for the error.log higher than 644, but it is 
being updated, so that can't be it. I couldn't find an suex log.


> Have you looked for
> any Dreamhost documentation on running Perl CGI scripts?

Yes. See above.


>> On my other server, www.smart.net/~pstech/cgi-bin/first.pl, the
>> IE8 browser tries to download the file, and Firefox just shows the source
>> code text.
>
> Then you haven't got CGI enabled for that directory, or the script isn't
> executable, or .pl is mapped to something other than CGI, or...

Maybe that's something I must request from the ISP/host?


>> If I use telnet to execute the script in smart.net, it seems to
>> echo the correct header and text.
>
> How did you execute it--by executing the file directly, or by running
> perl script.pl? Were you doing this as the webserver user or as
> yourself?

I logged on with my username and password. Then I went to the cgi-bin 
directory and typed "first.pl". The header and text were printed as text 
(stdout?)


>> I have also used the pico editor to
>> eliminate possible problems with CR+LF newlines. I have not been able to 
>> log
>> on to the dreamhost server with telnet.
>>
>> I have tried changing the permissions as recommended, and I have the
>> .htaccess file configured as:
>>
>>   Options +ExecCGI
>
> This won't necessarily have any effect. It depends on how the server is
> configured (which is OT for clpmisc).

I renamed .htaccess to dothtaccess and nothing changed.


>> My next step will probably be to contact the support or community
>> discussions for these servers,
>
> That would be a good idea.

I just now got a reply on the forum, asking for more information. I can't 
believe this should be so difficult. Thanks for your time. I'll report back 
if and when I get this working. Maybe it will help someone else.

Paul 



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3094
***************************************


home help back first fref pref prev next nref lref last post