[17827] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 5247 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 4 22:38:39 2001

Date: Thu, 4 Jan 2001 19:38:13 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <978665892-v9-i5247@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Thu, 4 Jan 2001     Volume: 9 Number: 5247

Today's topics:
        Search Engine - matching file from list of files <renegade.master@dial.pipex.com>
        Searching for tags - Is this optimal <rrocky@bigfoot.com>
    Re: Searching for tags - Is this optimal (Tad McClellan)
    Re: Searching for tags - Is this optimal (Tad McClellan)
    Re: Searching for tags - Is this optimal (Tad McClellan)
    Re: Searching for tags - Is this optimal <joe+usenet@sunstarsys.com>
    Re: Searching for tags - Is this optimal <rrocky@bigfoot.com>
    Re: Searching for tags - Is this optimal <rrocky@bigfoot.com>
    Re: Searching for tags - Is this optimal egwong@netcom.com
    Re: Searching for tags - Is this optimal <aqumsieh@hyperchip.com>
        Seeming bug with @DB::args and "caller" function <kevin@vaildc.net>
    Re: Seeming bug with @DB::args and "caller" function <bwalton@rochester.rr.com>
    Re: Segmented Mac OS X dev tools (was: Yet another gcc  (Weston Cann)
    Re: Segmented Mac OS X dev tools (was: Yet another gcc  <mail@mikeash.com>
    Re: Segmented Mac OS X dev tools (was: Yet another gcc  <NOdburgunSPAM@earthlink.net>
    Re: Segmented Mac OS X dev tools (Martien Verbruggen)
    Re: Segmented Mac OS X dev tools (Ilya Zakharevich)
        sending emails with attachments <ojenni@freesurf.ch>
    Re: sending emails with attachments <jhelman@wsb.com>
        Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 4 Jan 2001 17:45:33 -0000
From: "Renegade Master" <renegade.master@dial.pipex.com>
Subject: Search Engine - matching file from list of files
Message-Id: <3a54b6a0$0$18976$4d4eb98e@beta.news.uk.uu.net>

Hi,

I've got a search engine script running in Perl. What I'm trying to do is
add the capability to have a list of "excluded" files.  Basically I want to
have a text file, say "no-index.txt" or something.  Then, when the indexer
script opens that file, I want it to check if that file exists in the
no-index list, if so, to skip that one.  A section of code from the indexer
script is shown at the end of this posting, showing the part that opens each
file in turn and works on it.

I guess it would have something to do with regular expressions or something,
but I just can't seem to work it out.  Any kind soul out there wanna save
me?

TIA

David P

== here's a section of code from the indexer script (whole script available
on request) ==

open (OUTPUT, ">searchindex.dat") or medie ("Can't open searchindex.dat -
$!");

push @dirs, $basedir;

while (@dirs) {
   $dir = shift (@dirs);

   opendir (DIR, "$dir") or medie ("Can't open $dir - $!");
   my @files = readdir (DIR);
   closedir (DIR);

   foreach $file (@files) {

      if ((-d "$dir/$file") and ($file ne ".") and ($file ne "..")) {
         push @dirs, "$dir/$file";
         next;
      }

      next if ($file !~ /\.$ext$/i);

      open (FILE, "$dir/$file") or medie ("Can't open $dir/$file - $!");
      my $page = join ('', <FILE>);
      close (FILE);

      $page =~ s/(\n|\r|\t)/ /isg;

      my $title;
      if ($page =~ /<title>(.+?)<\/title>/i) {
         $title = $1;

      } else {
         $title = "No Title";

      }

      $page =~ s/<head>.+?<\/head>//isg;
      $page =~ s/<.+?>//isg;
      $page =~ s/\&.{2,6};/ /isg;

      my $description = $page;
      $description =~ s/\s+/ /isg;
      $description =~ s/^(.{255}).*/$1\.\.\./isg;

      $page =~
s/[\!@\#\$\%\^\&\*\(\)\[\]\{\}\+\=\|\\\/\.\,\_\-\~\`\'\"\;\:]/ /isg;
      $page =~ s/\b($words)\b//isg;
      $page =~ s/\s+/ /isg;
      $page =~ s/^(.{1024}).*/$1/isg;

      $thisdir = $dir;
      $thisdir =~ s/^$basedir//isg;
      my $URL = "$BaseURL$thisdir/$file";

      print OUTPUT "$URL|%%|$title|%%|$description|%%|$page\n";
      print "Indexed <a href=\"$URL\">$title</a>\n<br>\n";

      $| = 1;

   }

}

close (OUTPUT);







--
--
~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~
David Precious
Web Developer, PIPEX Internet Ltd
davidp@pipex.net   www.pipex.net
~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~
A girl phoned me the other day and said .... Come on over, there's nobody
home. I went over. Nobody was home.





------------------------------

Date: Tue, 02 Jan 2001 13:13:01 -0800
From: Rocky Raccoon <rrocky@bigfoot.com>
Subject: Searching for tags - Is this optimal
Message-Id: <3A52445D.7E4913A0@bigfoot.com>

I am searching & printing out all the html tags in an
input file.

while(<>)
{
    $ln = $_;
    while ( $ln =~ /(\<.*?\>)/)
    {
        print "$1\n" ;
        $ln = $' ;
    }
}

Is this an optimal way or can I do it better ?

[ P.S. All this isn't homework. I am a professional C++ Programmer,
looking to do a particular part of the project in Perl because of it's
string/file parsing capabilities. Today is almost Day 1 on Perl, so 
please bear with me if some questions are stupid. There seem to be so
many ways to do the same thing in perl, that I am a little bit
confused.]

-- 
Rocky
RSC - http://www.slack.net/~shiva/rsc.html

-- 
Rocky
RSC - http://www.slack.net/~shiva/rsc.html


------------------------------

Date: Tue, 2 Jan 2001 15:28:58 -0500
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: Searching for tags - Is this optimal
Message-Id: <slrn954ega.ftp.tadmc@magna.metronet.com>

Rocky Raccoon <rrocky@bigfoot.com> wrote:
>I am searching & printing out all the html tags in an
>input file.
>
>while(<>)
>{
>    $ln = $_;


Try it with this legal HTML:

   $ln = '<img src="cool.jpg" alt=">>Cool Pic!<<">';


>    while ( $ln =~ /(\<.*?\>)/)

>Is this an optimal way or can I do it better ?


It isn't even _correct_, so trying to optimize would be premature :-)

You don't need to copy it to the $ln temp variable, just use $_.

That while can never execute its body more that once, because the
pattern can never match more than once. You need a m//g option
if you want to match more than once.

Angle brackets are not regex metacharacters, so backslashing them
is needless clutter that makes it harder to read your own code.

   while ( $ln =~ /(<.*?>)/g )  # a dirty hack approach


   perldoc -q HTML

      "How do I remove HTML from a string?"


-- 
    Tad McClellan                          SGML consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: Tue, 2 Jan 2001 15:31:57 -0500
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: Searching for tags - Is this optimal
Message-Id: <slrn954elt.ftp.tadmc@magna.metronet.com>

Rocky Raccoon <rrocky@bigfoot.com> wrote:
>I am searching & printing out all the html tags in an
>input file.
>
>while(<>)
>{
>    $ln = $_;
>    while ( $ln =~ /(\<.*?\>)/)
>    {
>        print "$1\n" ;
>        $ln = $' ;
               ^^
               ^^

Gak!

Mentioning $' $` or $& along with "optimal" indicates a Big Problem.

Now that you know about perlvar, go look up $' there, and don't
miss the BUGS section at the end.


-- 
    Tad McClellan                          SGML consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: Tue, 2 Jan 2001 15:22:17 -0500
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: Searching for tags - Is this optimal
Message-Id: <slrn954e3p.ftp.tadmc@magna.metronet.com>

Rocky Raccoon <rrocky@bigfoot.com> wrote:
>I am searching & printing out all the html tags in an
>input file.
>
>Is this an optimal way or can I do it better ?


Processing HTML with a regex is NEVER optimal. Using pattern
matching on a non-regular grammar (which is what HTML is)
is always a "dirty hack".

Sometimes a dirty hack is good enough, but it would be a stretch
to characterize it as any form of "optimal".

Use a module that understand HTML for processing HTML.

You can get modules from CPAN:

   http://search.cpan.org/


>Today is almost Day 1 on Perl, so 
>please bear with me if some questions are stupid. 


Stupid questions are just fine. Ask away!

(questions easily answered by checking Perl's standard docs are
 NOT fine though! Those are the kind you should try to avoid 
 asking of hundreds/thousands of newsgroup readers.
)


>There seem to be so
>many ways to do the same thing in perl, that I am a little bit
>confused.]


Perl was designed to be that way:

   http://www.wall.org/~larry/natural.html


That is a Good Thing. With B-D languages, you must find the
One True Way before you can proceed. With Perl you only need
to find one of the ways, and you are back to getting your
work done.


-- 
    Tad McClellan                          SGML consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: 02 Jan 2001 17:41:49 -0500
From: Joe Schaefer <joe+usenet@sunstarsys.com>
Subject: Re: Searching for tags - Is this optimal
Message-Id: <m3lmsty3te.fsf@mumonkan.sunstarsys.com>

Rocky Raccoon <rrocky@bigfoot.com> writes:

> I am searching & printing out all the html tags in an
> input file.
> 
> while(<>)
> {
>     $ln = $_;
>     while ( $ln =~ /(\<.*?\>)/)
>     {
>         print "$1\n" ;
>         $ln = $' ;
>     }
> }
> 
> Is this an optimal way or can I do it better ?

% man perlop
 ...
               In scalar context, each execution of m//g finds
               the next match, returning TRUE if it matches, and
               FALSE if there is no further match.  The position
               after the last match can be read or set using the
               pos() function; see the pos entry in the perlfunc
               manpage.   A failed match normally resets the
               search position to the beginning of the string,
               but you can avoid that by adding the /c modifier
               (e.g. m//gc).  Modifying the target string also
               resets the search position.
 ...

Hence your loop could be written

while(<>) {
        print "$1\n" while /(\<.*?\>)/g;
}

That being said- this will NOT print all html tags in a html
file.  It's not even close to being right.  Please use a module 
like HTML::Parser that can do this for you quite accurately.

To progress from being a (now irritating) newbie to a beginning
perl programmer, please learn how to use the documentation 
that comes with perl

% perldoc -q HTML

Also learn to search dejanews before redundantly asking 
frequently asked questions here, without clarifying that
you have read the FAQ answer, searched dejanews, and are 
somehow unsatisfied with the hundreds of repetitive answers
to the question that you have now reasked for the umpteenth
time.

It is quite bad form to ignore the volumes of information 
that's packaged within your perl installation, as well as
any searchable website like

CPAN- www.cpan.org, 
the dejanews archive for clp.misc at www.dejanews.com,
the www.perl.com site, 
and a decent internet search engine like google.

It will draw to you considerable ire from this newsgroup.
 
> [ P.S. All this isn't homework. I am a professional C++ Programmer,
> looking to do a particular part of the project in Perl because of it's
> string/file parsing capabilities. Today is almost Day 1 on Perl, so 
> please bear with me if some questions are stupid. There seem to be so
> many ways to do the same thing in perl, that I am a little bit
> confused.]

Fine- then you can appreciate the amount of time and effort
that was spent supplying you with the truckload of Perl-related
information that is sitting on your computer.  Please attempt 
to use it before asking FAQs here, and prove that you have 
done so before asking another one.

-- 
Joe Schaefer



------------------------------

Date: Tue, 02 Jan 2001 16:59:35 -0800
From: Rocky Raccoon <rrocky@bigfoot.com>
Subject: Re: Searching for tags - Is this optimal
Message-Id: <3A527977.9C00F77A@bigfoot.com>

Tad McClellan wrote:
> 
> Rocky Raccoon <rrocky@bigfoot.com> wrote:
> >I am searching & printing out all the html tags in an
> >input file.
> >
> >Is this an optimal way or can I do it better ?
> 
> Processing HTML with a regex is NEVER optimal. Using pattern
> matching on a non-regular grammar (which is what HTML is)
> is always a "dirty hack".
> 
> Sometimes a dirty hack is good enough, but it would be a stretch
> to characterize it as any form of "optimal".
> 
> Use a module that understand HTML for processing HTML.
> 
> You can get modules from CPAN:
> 
>    http://search.cpan.org/
> 


I found HTMLTree & downloaded it. Also downloaded
HTMLTagset & HTMLParser neccessary to build it.
Built everything.
A trial dump program also works.

But what I want to do is to traverse the tree.
I can't find any docs or examples for this.
The docs for HTML element say that traverse is only
for the morbid or something like that. 
There is docs for Building a tree from scratch, Dumping a 
built-tree, etc etc.

But I can't find docs for manually traversing the tree.
i.e After reading my HTML file 
my $tree = HTML::TreeBuilder->new; # empty tree
$tree->parse_file($file_name);

I want to now go through the elements etc etc.

Any idea how to do it ?






-- 
Rocky
RSC - http://www.slack.net/~shiva/rsc.html


------------------------------

Date: Tue, 02 Jan 2001 17:11:02 -0800
From: Rocky Raccoon <rrocky@bigfoot.com>
Subject: Re: Searching for tags - Is this optimal
Message-Id: <3A527C26.6CC325BC@bigfoot.com>


Rocky Raccoon wrote:
> 
> Tad McClellan wrote:
> >
> > Rocky Raccoon <rrocky@bigfoot.com> wrote:
> > >I am searching & printing out all the html tags in an
> > >input file.
> > >
> > >Is this an optimal way or can I do it better ?
> >
> > Processing HTML with a regex is NEVER optimal. Using pattern
> > matching on a non-regular grammar (which is what HTML is)
> > is always a "dirty hack".
> >
> > Sometimes a dirty hack is good enough, but it would be a stretch
> > to characterize it as any form of "optimal".
> >
> > Use a module that understand HTML for processing HTML.
> >
> > You can get modules from CPAN:
> >
> >    http://search.cpan.org/
> >
> 
> I found HTMLTree & downloaded it. Also downloaded
> HTMLTagset & HTMLParser neccessary to build it.
> Built everything.
> A trial dump program also works.
> 
> But what I want to do is to traverse the tree.
> I can't find any docs or examples for this.
> The docs for HTML element say that traverse is only
> for the morbid or something like that.
> There is docs for Building a tree from scratch, Dumping a
> built-tree, etc etc.
> 
> But I can't find docs for manually traversing the tree.
> i.e After reading my HTML file
> my $tree = HTML::TreeBuilder->new; # empty tree
> $tree->parse_file($file_name);
> 
> I want to now go through the elements etc etc.
> 
> Any idea how to do it ?


This is what I want to do.
Find a particular tag say <XYZ>. Once I find it, go to the
next tag, check if its' <ABC> if it is then I want to read some text
inside the tag. If it isn't <ABC> I want to find the next <XYZ> whose
immdiate next tag is <ABC>.

Any samples which do something like this ?

-- 
Rocky
RSC - http://www.slack.net/~shiva/rsc.html


------------------------------

Date: Wed, 03 Jan 2001 05:12:38 GMT
From: egwong@netcom.com
Subject: Re: Searching for tags - Is this optimal
Message-Id: <axy46.34136$bw.2164300@news.flash.net>

Rocky Raccoon <rrocky@bigfoot.com> wrote:
[cut]

> This is what I want to do.
> Find a particular tag say <XYZ>. Once I find it, go to the
> next tag, check if its' <ABC> if it is then I want to read some text
> inside the tag. If it isn't <ABC> I want to find the next <XYZ> whose
> immdiate next tag is <ABC>.

> Any samples which do something like this ?

I've never used HTML::TreeBuilder, but this pretty much does what you
want with HTML::Parser.

    use strict;
    use HTML::Parser;
    
    my @tags = ();
    
    my $parser = new HTML::Parser(
      start_h => [ sub { push( @tags, $_[0]); }, "tagname"],
      end_h   => [ sub { while ( pop(@tags) ne $_[0] ) { ; } } , "tagname"],
      text_h  => [
        sub { 
          print @_ if ($tags[-1] eq 'abc' && $tags[-2] eq 'xyz');
        }, "text"] );
    
    while ( my $chunk = <DATA> ) {
      $parser->parse($chunk);
    }
    
    __END__
    <html> 
      <xyz> xyz xyz xyz 
        <def> def def def </def>    <!-- not printed -->
      xyz xyz xyz </xyz>
      <xyz> xyz xyz xyz 
        <abc> abc abc abc </abc>    <!-- printed -->
      xyz xyz xyz </xyz>
    </html>

What this does is set up an HTML::Parser whose start handler (the function
that's called every time a start tag is encountered) just pushes the
element name (the sole argument passed to the handler) onto an array.
The stop handler pops elements off of the @tag array until it finds
the matching start (this probably isn't the best behavior possible
since poorly nested tags (like "<i><b></i></b>") might mess things up.
Splicing out the last matching tag might be better -- but this is Good
Enough for demonstration purposes.)

The text handler is what does the bulk of the work.  It prints out all
text when the last two tags seen are 'xyz' and 'abc'.

Of course this is all explained in gory detail in the HTML::Parser
perldocs.

HTH,
Eric

-- 


------------------------------

Date: Thu, 04 Jan 2001 15:39:54 GMT
From: Ala Qumsieh <aqumsieh@hyperchip.com>
Subject: Re: Searching for tags - Is this optimal
Message-Id: <7a7l4b72cz.fsf@merlin.hyperchip.com>


Rocky Raccoon <rrocky@bigfoot.com> writes:

> I am searching & printing out all the html tags in an
> input file.
> 
> while(<>)
> {
>     $ln = $_;
>     while ( $ln =~ /(\<.*?\>)/)
>     {
>         print "$1\n" ;
>         $ln = $' ;
>     }
> }
> 
> Is this an optimal way or can I do it better ?

You can use the /g modifier for the regexp:

	while (<>) {
		while (/(<.*?>)/g) {
			print $1, "\n";
		}
	}

--Ala


------------------------------

Date: Sun, 31 Dec 2000 15:45:44 -0500
From: Kevin Michael Vail <kevin@vaildc.net>
Subject: Seeming bug with @DB::args and "caller" function
Message-Id: <kevin-89403E.15454431122000@news.his.com>

The 'caller' documentation says that if you use caller from within the 
debugger and with an expression, it will set the variable @DB::args to 
the arguments that the function was called with.  It also warns about 
the optimizer possibly optimizing away stack frames and possibly leaving 
information from the _previous_ call to caller.  Is this what I'm 
running into here, or have I uncovered a bug?

Here's the code, cut-and-pasted, that reproduces the problem:
-----
#!/usr/bin/perl -w

use strict;

sub proto ($);

sub test {
    print "Test args: ", join(':', @_), "\n";
    proto '$($.)*';
    shift;
}

test("1even", -option1 => 1, -option2 => 2);
test("2none");
test("3odd", '-option3');
test("4even", -option4 => 4, -option5 => 5, -option6 => 6);
test("5odd", -option7 => 7, '-option8');

sub proto ($) {
    package DB;
    use vars qw(@args);
    my (@callerdata) = caller(1);
    print 'Proto Args: ', join(':', @args), "\n";
}
-----

What I expect to see when I run it is something like this:

Test args: 1even:-option1:1:-option2:2
Proto Args: 1even:-option1:1:-option2:2
Test args: 2none
Proto Args: 2none
Test args: 3odd:-option3
Proto Args: 3odd:-option3
Test args: 4even:-option4:4:-option5:5:-option6:6
Proto Args: 4even:-option4:4:-option5:5:-option6:6
Test args: 5odd:-option7:7:-option8
Proto Args: 5odd:-option7:7:-option8

i.e., the arguments are the same in both places.  This is what I in fact 
get if I comment out the "shift" in the test subroutine.  What I get if 
I leave it in is:

Test args: 1even:-option1:1:-option2:2
Proto Args: 1even:-option1:1:-option2:2
Test args: 2none
Proto Args: 1even:2none
Test args: 3odd:-option3
Proto Args: 1even:2none:3odd:-option3
Test args: 4even:-option4:4:-option5:5:-option6:6
Proto Args: 4even:-option4:4:-option5:5:-option6:6
Test args: 5odd:-option7:7:-option8
Proto Args: 4even:5odd:-option7:7:-option8

The @DB::args array is finding leftover arguments from previous calls.

If I change the order of the calls, the results change...the problem 
doesn't seem to occur until after the first "even" call, but I haven't 
tried to track down a pattern on this.

I don't know if this happens with a more complicated (real-world) 
example yet, because I ran into this while trying to check the basic 
logic and thought I'd better check this out before investing a lot more 
effort!

This is Perl 5.6.0, compiled for ppclinux.
-- 
Kevin Michael Vail | a billion stars go spinning through the night,
kevin@vaildc.net   | blazing high above your head.
 . . . . . . . . .  | But _in_ you is the presence that
 . . . . . . . . . | will be, when all the stars are dead.  (Rainer Maria Rilke)


------------------------------

Date: Sun, 31 Dec 2000 23:40:20 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: Seeming bug with @DB::args and "caller" function
Message-Id: <3A4FC446.BB838B02@rochester.rr.com>

Kevin Michael Vail wrote:
> 
> The 'caller' documentation says that if you use caller from within the
> debugger and with an expression, it will set the variable @DB::args to
> the arguments that the function was called with.  It also warns about
> the optimizer possibly optimizing away stack frames and possibly leaving
> information from the _previous_ call to caller.  Is this what I'm
> running into here, or have I uncovered a bug?
> 
> Here's the code, cut-and-pasted, that reproduces the problem:
> -----
> #!/usr/bin/perl -w
> 
> use strict;
> 
> sub proto ($);
> 
> sub test {
>     print "Test args: ", join(':', @_), "\n";
>     proto '$($.)*';
>     shift;
> }
> 
> test("1even", -option1 => 1, -option2 => 2);
> test("2none");
> test("3odd", '-option3');
> test("4even", -option4 => 4, -option5 => 5, -option6 => 6);
> test("5odd", -option7 => 7, '-option8');
> 
> sub proto ($) {
>     package DB;
>     use vars qw(@args);
>     my (@callerdata) = caller(1);
>     print 'Proto Args: ', join(':', @args), "\n";
> }
> -----
> 
> What I expect to see when I run it is something like this:
> 
> Test args: 1even:-option1:1:-option2:2
> Proto Args: 1even:-option1:1:-option2:2
> Test args: 2none
> Proto Args: 2none
> Test args: 3odd:-option3
> Proto Args: 3odd:-option3
> Test args: 4even:-option4:4:-option5:5:-option6:6
> Proto Args: 4even:-option4:4:-option5:5:-option6:6
> Test args: 5odd:-option7:7:-option8
> Proto Args: 5odd:-option7:7:-option8
> 
> i.e., the arguments are the same in both places.  This is what I in fact
> get if I comment out the "shift" in the test subroutine.  What I get if
> I leave it in is:
> 
> Test args: 1even:-option1:1:-option2:2
> Proto Args: 1even:-option1:1:-option2:2
> Test args: 2none
> Proto Args: 1even:2none
> Test args: 3odd:-option3
> Proto Args: 1even:2none:3odd:-option3
> Test args: 4even:-option4:4:-option5:5:-option6:6
> Proto Args: 4even:-option4:4:-option5:5:-option6:6
> Test args: 5odd:-option7:7:-option8
> Proto Args: 4even:5odd:-option7:7:-option8
> 
> The @DB::args array is finding leftover arguments from previous calls.
> 
> If I change the order of the calls, the results change...the problem
> doesn't seem to occur until after the first "even" call, but I haven't
> tried to track down a pattern on this.
> 
> I don't know if this happens with a more complicated (real-world)
> example yet, because I ran into this while trying to check the basic
> logic and thought I'd better check this out before investing a lot more
> effort!
> 
> This is Perl 5.6.0, compiled for ppclinux.
> --
> Kevin Michael Vail | a billion stars go spinning through the night,
> kevin@vaildc.net   | blazing high above your head.
 ...
Hmmmmm...for what its worth, running your code verbatim on my system
gives (with the shift; in sub test either present or commented out):

C:\Bob\junk>perl junk248.pl
Test args: 1even:-option1:1:-option2:2
Proto Args: 1even:-option1:1:-option2:2
Test args: 2none
Proto Args: 2none
Test args: 3odd:-option3
Proto Args: 3odd:-option3
Test args: 4even:-option4:4:-option5:5:-option6:6
Proto Args: 4even:-option4:4:-option5:5:-option6:6
Test args: 5odd:-option7:7:-option8
Proto Args: 5odd:-option7:7:-option8

C:\Bob\junk>

which is exactly what you state you expect the output to be.  I'm using:

This is perl, v5.6.0 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2000, Larry Wall

Binary build 618 provided by ActiveState Tool Corp.
http://www.ActiveState.com
Built 21:03:54 Sep 13 2000

on Windoze 98 SE.
-- 
Bob Walton


------------------------------

Date: Wed, 03 Jan 2001 02:06:09 GMT
From: iowa88_song88.remove_eights@hotmail.com (Weston Cann)
Subject: Re: Segmented Mac OS X dev tools (was: Yet another gcc on Mac OS X posting)
Message-Id: <iowa88_song88.remove_eights-0201011913140001@70.salt-lake-city-05-10rs.ut.dial-access.att.net>

[ Note: In our last episode, a sometime programmer (me) attemped to
  segment the PublicBeta_Developer.tgz into several smaller bits for
  easier download. A perl script was employed to segment them, and
  'cat' to try to put them back together. See http://mmedia.csoft.net/darwin
  for details.  Script in question included at end of post for those who
don't    
  want to go to the URL. Summary included for new cross-post to   
  comp.lang.perl.misc. ]

> >It doesn't work, though, and I'm not quite sure why. After I
> >reassemble the files I get the following:
> >
> >[weston@cs:~/darwin]$ gunzip PB_Dev.reassembled.tar.gz 
> >
> >gunzip: PB_Dev.reassembled.tar.gz: unexpected end of file
> >
> >Hmmmm. Any ideas why?
> 
> Assuming your perl script is not broken, I would assume that it's 
> because .gz files are binary, but cat is treating them as text. 

I suppose this is as good an explanation as any, but I'd always thought
that on unix systems there is no difference between text and binary data
(ie printable and unprintable bytes, but handling text bytes and data bytes
were the same).

> Your 
> script may also be treating them as text. 

I'm trying to treat the data as binary in the perl script; I didn't
invoke BINMODE, but I'm using perl's read() function, which grabs a
specified number of bytes rather than a line of text. Again, I'm not
sure this even matters: from what I've read and been told, there is
no text/binary distinction in perl except on Windows/DOS. But then again,
every once in a while, this issue jumps up and bites me (I've still
never gotten HTTP upload to work properly for binary files using perl).

The other thing that seems funny to me: I wrote another script to segment 
a uuencoded file, which did deal with lines of text, which I thought might 
be OK since uuencodedstuff is all printable... ended up with exactly the
same problem as I described above.

Oh yes, the one other funny thing: the fact that the original, unsegmented
gzipped archive somehow becomes unreadable by gzip after I pass it through
this perl script, despite the fact that as far as I can tell, I'm opening 
the .gz for read only.

[c.p.l.m folks: nothing more perl related below]

> Before you do too much, see 
> if your shell server supports resuming FTP, which will save you from 
> having to do anything. 

Would this happen automatically if it did support it? If not, how would
I check, or make this happen?

> Failing that, I wouldn't be surprised if tar had 
> built-in support for segmenting and unsegmenting files (but I don't 
> really know).

I'll try again to understand the man page. There is a provision for 
specifying block size for tar, but I can't tell exactly how to make this work.
Thanks for the idea, though.

> I would imagine there must be 
> a utility to deal with this already there. 

Anyone know of anything else?

=================================================================
"The best laid plans of mice and men are about equal."
iowa_so8ng@hot8mail.com 
Address is spam repelant. Remove eights to reach me.


------------------------------

Date: Wed, 03 Jan 2001 18:53:06 GMT
From: Michael Ash <mail@mikeash.com>
Subject: Re: Segmented Mac OS X dev tools (was: Yet another gcc on Mac OS X posting)
Message-Id: <mail-A6920F.12522603012001@news-server.wi.rr.com>

In article 
<iowa88_song88.remove_eights-0201011913140001@70.salt-lake-city-05-10rs.
ut.dial-access.att.net>, iowa88_song88.remove_eights@hotmail.com 
(Weston Cann) wrote:

[other stuff snipped]
>> Before you do too much, see 
>> if your shell server supports resuming FTP, which will save you from 
>> having to do anything. 
>
>Would this happen automatically if it did support it? If not, how would
>I check, or make this happen?

You need both a server and a client that support resume. Anarchie or 
whatever it's being called these days has good support, as does 
Transmit. Fetch supports it to an extent, but IIRC you can't resume a 
file if you've quit (or crashed) Fetch since starting the download.

As for figuring out if server support is there, use one of the 
aforementioned programs and start getting a file, then kill it right 
away and see if it has to start over or not.

-- 
"Our doubts are traitors, and make us lose the good we oft might win by fearing to attempt." - William Shakespeare

Mike Ash - www.mikeash.com   My reply address is valid, it has no spamproofing.


------------------------------

Date: Wed, 03 Jan 2001 22:21:53 GMT
From: David Burgun <NOdburgunSPAM@earthlink.net>
Subject: Re: Segmented Mac OS X dev tools (was: Yet another gcc on Mac OS X posting)
Message-Id: <01HW.B678E60200A8285E097FB850@news.earthlink.net>

On Wed, 3 Jan 2001 10:53:06 -0800, Michael Ash wrote
(in message <mail-A6920F.12522603012001@news-server.wi.rr.com>):
> As for figuring out if server support is there, use one of the 
> aforementioned programs and start getting a file, then kill it right 
> away and see if it has to start over or not.
>

Has anyone noticed in "Transmit" is says something like:

Host:  Unix (Resume OK)
Host:  Mac (Resume OK)
Host:  Windows NT

I've never seen "Resume OK" under Windows NT, anyone know why? Doesn't it 
support resume? Ot is it that Transmit doesn't support Resume when talking to 
an NT Box?

Thanks a lot
Dave




------------------------------

Date: Wed, 3 Jan 2001 13:30:36 +1100
From: mgjv@tradingpost.com.au (Martien Verbruggen)
Subject: Re: Segmented Mac OS X dev tools
Message-Id: <slrn9553mc.edk.mgjv@martien.heliotrope.home>

On Wed, 03 Jan 2001 02:06:09 GMT,
	Weston Cann <iowa88_song88.remove_eights@hotmail.com> wrote:

[no attributions present]

>> >Hmmmm. Any ideas why?
>> 
>> Assuming your perl script is not broken, I would assume that it's 
>> because .gz files are binary, but cat is treating them as text. 

cat shouldn't care.

>> Your 
>> script may also be treating them as text. 
> 
> I'm trying to treat the data as binary in the perl script; I didn't
> invoke BINMODE, but I'm using perl's read() function, which grabs a

You should. Even if you are on a platform where you think it doesn't
matter, you should always, always use binmode if you mean to read a
binary file. If not for portability, at least for code documentation.
Just use it.

The documentation for binmode got changed quite a bit to make sure that
this point was made clearly.

> specified number of bytes rather than a line of text. Again, I'm not
> sure this even matters: from what I've read and been told, there is
> no text/binary distinction in perl except on Windows/DOS. But then again,

That isn't true. There are many other platforms where there is a
distinction. But you shouldn't worry about that, or try to guess it. If
you have binary data, you use binmode.

> every once in a while, this issue jumps up and bites me (I've still
> never gotten HTTP upload to work properly for binary files using perl).

Are you using the CGI.pm module?

> The other thing that seems funny to me: I wrote another script to segment 
> a uuencoded file, which did deal with lines of text, which I thought might 
> be OK since uuencodedstuff is all printable... ended up with exactly the
> same problem as I described above.

Sounds like you're doing something else wrong then. Maybe you're missing
a byte here or there?

> Oh yes, the one other funny thing: the fact that the original, unsegmented
> gzipped archive somehow becomes unreadable by gzip after I pass it through
> this perl script, despite the fact that as far as I can tell, I'm opening 
> the .gz for read only.

I find that terribly hard to believe. If you open something for read,
there is simply no way that it gets modified.

> [c.p.l.m folks: nothing more perl related below]

Thanks for the warning.

I don't carry the other groups on my news server, so I can't go back to
find the thread that started all this.  The URL you posted doesn't let
me have a look at the scripts in that directory, so there won't be any
really Perl (or shell) specific help here. Maybe what you should do is
post your code (it's only 1 kb after all).

It's impossible to say anything sensible about what your scripts do
without seeing them. Reconfigure your web server, or rename the files,
or post the stuff.

Martien
-- 
Martien Verbruggen              | 
Interactive Media Division      | Useful Statistic: 75% of the people
Commercial Dynamics Pty. Ltd.   | make up 3/4 of the population.
NSW, Australia                  | 


------------------------------

Date: 3 Jan 2001 23:39:40 GMT
From: ilya@math.ohio-state.edu (Ilya Zakharevich)
Subject: Re: Segmented Mac OS X dev tools
Message-Id: <930d7s$21v$1@charm.magnus.acs.ohio-state.edu>

[A complimentary Cc of this posting was NOT sent to Martien Verbruggen
<mgjv@tradingpost.com.au>],
who wrote in article <slrn9553mc.edk.mgjv@martien.heliotrope.home>:
> >> Assuming your perl script is not broken, I would assume that it's 
> >> because .gz files are binary, but cat is treating them as text. 
> 
> cat shouldn't care.

Of course it would.  Remember C-z?

> > I'm trying to treat the data as binary in the perl script; I didn't
> > invoke BINMODE, but I'm using perl's read() function, which grabs a
> 
> You should. Even if you are on a platform where you think it doesn't
> matter, you should always, always use binmode if you mean to read a
> binary file. If not for portability, at least for code documentation.
> Just use it.
> 
> The documentation for binmode got changed quite a bit to make sure that
> this point was made clearly.

Yes.  AFAIK, binmode/textmode (=newline translation) is orthogonal to
(non)buffering (which is what read() is about) *on all the Perl platforms*.

Ilya


------------------------------

Date: Mon, 1 Jan 2001 16:28:20 +0100
From: "Oliver Jenni" <ojenni@freesurf.ch>
Subject: sending emails with attachments
Message-Id: <92q7p6$eg4$1@news1.sunrise.ch>

I tried to find a script, but as it seems, nobody did something like this
before.

Here is, what it should do:

- users entes its email-address
- if the email-address contains @xyz.com an email with a predefined
attachment is sent
- if it is an other email-address, a screen will say that it is not allowed
to send the file due to security reasons.

does anybody know, where I can find this or did someone such a script
already?

Happy 2001!

Oliver






------------------------------

Date: Mon, 01 Jan 2001 15:43:29 GMT
From: Jeff Helman <jhelman@wsb.com>
Subject: Re: sending emails with attachments
Message-Id: <c8915tc9gmcl50vcm6ulmdntaova72hetm@4ax.com>

On Mon, 1 Jan 2001 16:28:20 +0100, "Oliver Jenni" <ojenni@freesurf.ch>
wrote:

>I tried to find a script, but as it seems, nobody did something like this
>before.

Trust me, it's been done.

>Here is, what it should do:
>
>- users entes its email-address
>- if the email-address contains @xyz.com an email with a predefined
>attachment is sent
>- if it is an other email-address, a screen will say that it is not allowed
>to send the file due to security reasons.

Use a regular expression to check the e-mail address.  If the
condition is met, you can compose a message (and send it) using the
MIME-Lite package from CPAN.  The documentation for the module goes
over how to do this.

Hope this helps,
JH


------------------------------

Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V9 Issue 5247
**************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[17827] in Perl-Users-Digest

Perl-Users Digest, Issue: 5247 Volume: 9

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Thu Jan 4 22:38:39 2001

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 4 22:38:39 2001