[22119] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4341 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jan 4 14:05:39 2003

Date: Sat, 4 Jan 2003 11:05:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 4 Jan 2003     Volume: 10 Number: 4341

Today's topics:
        Bullet-proof, complaint-resistant web hosting.  Guarant <sumit.r@net4india.net>
    Re: File type determination using Perl <dave@dave.org.uk>
        How can split ignore quoted characters? (Robert Nicholson)
    Re: How can split ignore quoted characters? <mpapec@yahoo.com>
    Re: How can split ignore quoted characters? <krahnj@acm.org>
    Re: How can split ignore quoted characters? (Greg)
        LWP::UserAgent install ONE test failure in html/form <wlcna@nospam.com>
    Re: Need help with split <Jodyman@hotmail.com>
    Re: Need help with split <Jodyman@hotmail.com>
    Re: Need help with split <jurgenex@hotmail.com>
        Printing and copying lists of lists (J. Romano)
    Re: Problem with huge dataset, 100000000 a magic number <nobody@dev.null>
        While loop + several conditions problem (Stephen Adam)
    Re: While loop + several conditions problem <uri@stemsystems.com>
    Re: While loop + several conditions problem <jurgenex@hotmail.com>
    Re: While loop + several conditions problem (Tad McClellan)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 4 Jan 2003 12:31:24 GMT
From: "Sumit R" <sumit.r@net4india.net>
Subject: Bullet-proof, complaint-resistant web hosting.  Guaranteed.
Message-Id: <70df2d18.31494ebb@new.net4india.net>

Having trouble with complainers?

Need a site to host spamvertised products, hacked wares, bootleg 
music/video, child pornography, or other clandestine goodies?

Whether you are requireing Shared Hosting or Server Colocation, 
then look no further.  Net4India's Internet Data Centers provide the 
physical environment to keep your servers up and running 24/7.

http://net4domains.com/perl/net4.cgi?frm=contactUs





------------------------------

Date: Sat, 04 Jan 2003 08:22:44 +0000
From: "Dave Cross" <dave@dave.org.uk>
Subject: Re: File type determination using Perl
Message-Id: <pan.2003.01.04.08.22.41.991988@dave.org.uk>

On Fri, 03 Jan 2003 13:13:04 +0000, Kevin Newman wrote:

> Hi All,
> 
> I have a program that attempts to determine the file type in order
> figure out what program to use to process it.  The file may be ASCII,
> EBCDIC, PGP, Pkzip, GZip, or other.  The problem is that EBCDIC and
> PGP files are identified as the same file type.  Because of this
> problem, I've resorted to a brute force method (see below) to
> determine file type.   Can anyone suggest a better, simpler way to
> determine a file's type using Perl?
> 
> Thanks,
> 
> kln
> 
> =============== Type test sample program ==============
> use strict;
> 
> my @output;
> my $exit_value;
> 
> my $filename = $ARGV[0];
> my $filetype = `file $filename`;
> 
> print "Filetype = $filetype";
> 
> if ($filetype =~ /ASCII/i){
> 	print "It's an ASCII File \n";
> }elsif ($filetype =~ /PKZIP/i){
> 	print "It's a PKZip file\n";
> }elsif ($filetype =~ /GZIP/i){
> 	print "It's a PKZip file\n";
> }else{
> 	print " Let's try pkzip \n";
> 	@output = `pkunzip -t $filename`;
> 	if ( $? == 0 ){
> 		print "$filename is compressed with PKZip \n";
> 	}else{
> 		print " Let's try decompressing the file with pgp\n";
> 		@output = `pgp --decrypt $filename --overwrite`;
> 		if ($? ==  0 ){
> 			print "$filename is Encrypted \n";
> 		}else{
> 			print "Other file type\n";
> 		}
> 	}
> }
> $exit_value  = $? >> 8; print "Exit Status ==> $exit_value \n";
> =============== Type test program ==============

Have you considered using File::MMagic?

<http://search.cpan.org/dist/File-MMagic>

Dave...

-- 
  Two slightly distorted guitars



------------------------------

Date: 4 Jan 2003 00:22:40 -0800
From: robert@elastica.com (Robert Nicholson)
Subject: How can split ignore quoted characters?
Message-Id: <24a182bd.0301040022.35e0ec86@posting.google.com>

How do you specific the pattern for split to ignore the quoted
characters of the character you want to split on?

ie. you want to split on ',' but you want to ignore \, in the text

if I use

split (/[^\\],/);

that will chop of the last character of any matching field

how can I match without chopping off that last character?


------------------------------

Date: Sat, 04 Jan 2003 12:58:54 +0100
From: Matija Papec <mpapec@yahoo.com>
Subject: Re: How can split ignore quoted characters?
Message-Id: <1pid1vgtm9gu8sb71i1rsmuo90ipg6utvv@4ax.com>

X-Ftn-To: Robert Nicholson 

robert@elastica.com (Robert Nicholson) wrote:
>How do you specific the pattern for split to ignore the quoted
>characters of the character you want to split on?
>
>ie. you want to split on ',' but you want to ignore \, in the text
>
>if I use
>
>split (/[^\\],/);
>
>that will chop of the last character of any matching field
>
>how can I match without chopping off that last character?

split(/(?<=[^\\]),/);



-- 
Matija


------------------------------

Date: Sat, 04 Jan 2003 13:08:16 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: How can split ignore quoted characters?
Message-Id: <3E16DC3B.CE7DC98@acm.org>

Robert Nicholson wrote:
> 
> How do you specific the pattern for split to ignore the quoted
> characters of the character you want to split on?
> 
> ie. you want to split on ',' but you want to ignore \, in the text
> 
> if I use
> 
> split (/[^\\],/);
> 
> that will chop of the last character of any matching field
> 
> how can I match without chopping off that last character?


split /(?<!\\),/;



John
-- 
use Perl;
program
fulfillment


------------------------------

Date: 4 Jan 2003 09:32:03 -0800
From: gdsafford@hotmail.com (Greg)
Subject: Re: How can split ignore quoted characters?
Message-Id: <a8f367ed.0301040932.ad903e0@posting.google.com>

robert@elastica.com (Robert Nicholson) wrote in message news:<24a182bd.0301040022.35e0ec86@posting.google.com>...
> How do you specific the pattern for split to ignore the quoted
> characters of the character you want to split on?
> 
> ie. you want to split on ',' but you want to ignore \, in the text
> 
> if I use
> 
> split (/[^\\],/);
> 
> that will chop of the last character of any matching field
> 
> how can I match without chopping off that last character?

I'm no expert, but perlre documents a 'A zero-width negative
look-behind assertion' that anchors a re without including the tested
text, as:

use strict;
use warnings;

$_ = 'abc,def\,ghi,jkl';

split (/(?<!\\),/); # Gives me a message about the deprecated use of
@_ (:

print "@_\n"; # prints 'abc def\,ghi jkl'

Dunno if this is what you are after.

FWIW.


------------------------------

Date: Sat, 04 Jan 2003 18:58:01 GMT
From: "wlcna" <wlcna@nospam.com>
Subject: LWP::UserAgent install ONE test failure in html/form
Message-Id: <Z8GR9.5198$qU5.4130955@newssrv26.news.prodigy.com>

libwww-perl-5.67
Is this an important failure?  I get an error in installing while testing
HTML::Form that seems to indicate a warning is expected from parsing HTML
containing "<input name=x type="xyzzy">" in a form, the warning expected
that the type "xyzzy" is not expected as an input tag type, but evidently no
such warning occurs.

So does this failure have any general significance?  I'll be running all
kinds of things relying on perl on this machine.

BACKGROUND:

I get this (partial) output from make test while trying to install
libwww-perl-5.67 (installation of which I actually invoked via CPAN module):

Running make test
/usr/local/perl/bin/perl t/TEST 0
[...everything else ok...]
base/common-req........ok
html/form..............FAILED test 11
        Failed 1/14 tests, 92.86% okay
[...everything else ok...]

test 11 can be found in libwww-perl-5.67/t/html/form.t

and is:

# try some more advanced inputs
$f = HTML::Form->parse(<<'EOT', "http://localhost/");
<form method=post>
   <input name=i type="image" src="foo.gif">
   <input name=c type="checkbox" checked>
   <input name=r type="radio" value="a">
   <input name=r type="radio" value="b" checked>
   <input name=t type="text">
   <input name=p type="PASSWORD">
   <input name=h type="hidden" value=xyzzy>
   <input name=s type="submit" value="Doit!">
   <input name=r type="reset">
   <input name=b type="button">
   <input name=f type="file" value="foo.txt">
   <input name=x type="xyzzy">

   <textarea name=a>
abc
   </textarea>

   <select name=s>
      <option>Foo
      <option value="bar" selected>Bar
   </select>

   <select name=m multiple>
      <option selected value="a">Foo
      <option selected value="b">Bar
   </select>
</form>
EOT

#print $f->dump;
#print $f->click->as_string;

print "not " unless $f->click->as_string eq <<'EOT'; print "ok 10\n";
POST http://localhost/
Content-Length: 73
Content-Type: application/x-www-form-urlencoded

i.x=1&i.y=1&c=on&r=b&t=&p=&h=xyzzy&f=foo.txt&a=%0Aabc%0A+++&s=bar&m=a&m=b
EOT

print "not " unless @warn == 1 && $warn[0] =~ /^Unknown input type 'xyzzy'/;
print "ok 11\n";
@warn = ();






------------------------------

Date: Sat, 04 Jan 2003 15:00:26 GMT
From: "Jodyman" <Jodyman@hotmail.com>
Subject: Re: Need help with split
Message-Id: <eGCR9.11767$134.1348069@newsread1.prod.itd.earthlink.net>

"Uri Guttman" <uri@stemsystems.com> wrote in message
news:x74r8pziu9.fsf@mail.sysarch.com...
> >>>>> "J" == Jodyman  <Jodyman@hotmail.com> writes:
>
>   J> "John W. Krahn" <krahnj@acm.org> wrote in message
>   J> news:3E15154F.E518EBC7@acm.org...
>   >> Jodyman wrote:
>   >> >
>   >> > TMTOW with Perl, You don't even need split, try this too:
>   >> >
>   >> > #!c:\perl\bin\perl -w
>   >> > use strict;
>   >> >
>   >> > my $filename1 = 'c:\windows\system32\test.jpg';
>   >> > my $filename2 = '/home/jodyman/pics/test.jpg';
>   >> >
>   >> > my ($results1) = $filename1 =~ /\\(\w+\.\w+)$/;
>   >> > my ($results2) = $filename2 =~ /\/(\w+\.\w+)$/;
>   >> > my ($results3) = $filename2 =~ /[\\|\/](\w+\.\w+)$/;
>   >> > my ($results4) = $filename1 =~ /[\\|\/](\w+\.\w+)$/;
>   >> ^
>   >> Which file system uses | to separate directory names?
>
>   J> Unix uses / Windows uses \   /[\\|\/](\w+\.\w+)$/;
>   J>                                                 ^
>   J> Means look for either \ or /.
>
> no it doesn't. | is a plain char in a char class. so john's comment was
> correct in asking you where | is used in a file system path.
>

OK, I was confused.  I was thinking of alternation.  I see the errors of my
ways.
Thank you.

Yes, I now know that alternation is not performed within [].




------------------------------

Date: Sat, 04 Jan 2003 15:00:31 GMT
From: "Jodyman" <Jodyman@hotmail.com>
Subject: Re: Need help with split
Message-Id: <jGCR9.11768$134.1348069@newsread1.prod.itd.earthlink.net>

"Kevin Cline" <kcline17@hotmail.com> wrote in message
news:ba162549.0301030202.44313343@posting.google.com...
> "Jodyman" <Jodyman@hotmail.com> wrote in message
news:<XH7R9.9359$134.1043839@newsread1.prod.itd.earthlink.net>...
> > "Jim Janovich" <bigpun@mindspring.com> wrote in message
> > news:aupjv7$pe7$1@slb9.atl.mindspring.net...
> > > I have a string that looks like:
> > >
> > > c:\folder1\folder2\folder3\blah.jpg
> > >
> > > But the folders can be any number of folders.  All I need is the last
> >  piece
> > > (blah.jpg).  Can someone help?  I was going to split on \ but since
there
> > > can be any nuber of them I cannot figure it out.  Any help would be
> > > appreciated.
> >
> > TMTOW with Perl, You don't even need split, try this too:
> >
> > #!c:\perl\bin\perl -w
> > use strict;
> >
> > my $filename1 = 'c:\windows\system32\test.jpg';
> > my $filename2 = '/home/jodyman/pics/test.jpg';
> >
> > my ($results1) = $filename1 =~ /\\(\w+\.\w+)$/;
>
> This won't work if the file name is 'Fred and Barney.jpg'

OK, here's another way:  Look at attachment basename.pl
This will work for unices or msdos/windows etc.

> or just plain 'Fred'.  Use File::Basename.

I don't agree, just using a module sometimes doesn't let you learn
or get to know the language better.  Sometimes modules are way
overkill because they have everything and the kitchen sink.  Some
things are easier done with a few one liners.

My argument to all the "Use this or that module." is that most
people don't know the data you're working with as well as you
do.  If you know that you never have a comma within quotes on
a comma delimited file, you don't need to use a module, you can
use a few simple regexes to accomplish what a module would do.
If you know your html code (because you wrote it or created a
program that writes it) and you know there is no wierd stuff in it,
you can use something like: s/<[^>]+>//g; to remove all html tags.

If you never think for yourself, you never learn.  I like jumping into
the modules and learning from the pros.  If you never do that and
blindly "use the modules/packages" you'll never learn the language
well.

Jody




begin 666 basename.pl
M(R%C.EQP97)L7&)I;EQP97)L#0IU<V4@<W1R:6-T.PT*#0IM>2 D9FEL96YA
M;64Q(#T@)V,Z7'=I;F1O=W-<<WES=&5M,S)<0F%R;F5Y(&%N9"!&<F5D('1E
M<W0N:G!G)SL-"FUY("1F:6QE;F%M93(@/2 G+VAO;64O:F]D>6UA;B]P:6-S
M+W1E<W0N:G!G)SL-"FUY("1F:6QE;F%M93,@/2 G+VAO;64O,# Q+W!I8W,O
M,#$R+FIP9R<[#0IM>2 D9FEL96YA;64T(#T@)R]H;VUE+W1E<W0G.PT*;7D@
M)&9I;&5N86UE-2 ]("=C.EQW:6YD;W=S7'-Y<W1E;3,R7'1E<W0@;V8@9FEL
M92<[#0IM>2 D9FEL96YA;64V(#T@)R]H;VUE+W1E<W0@;V8@9FEL92<[#0H-
M"FUY("1B87-E(#T@8F%S92@D9FEL96YA;64Q*3L-"G!R:6YT(")<)&)A<V4@
M/2 D8F%S95QN(CL-"@T*)&)A<V4@/2!B87-E*"1F:6QE;F%M93(I.PT*<')I
M;G0@(EPD8F%S92 ]("1B87-E7&XB.PT*#0HD8F%S92 ](&)A<V4H)&9I;&5N
M86UE,RD[#0IP<FEN=" B7"1B87-E(#T@)&)A<V5<;B([#0H-"B1B87-E(#T@
M8F%S92@D9FEL96YA;64T*3L-"G!R:6YT(")<)&)A<V4@/2 D8F%S95QN(CL-
M"@T*)&)A<V4@/2!B87-E*"1F:6QE;F%M934I.PT*<')I;G0@(EPD8F%S92 ]
M("1B87-E7&XB.PT*#0HD8F%S92 ](&)A<V4H)&9I;&5N86UE-BD[#0IP<FEN
M=" B7"1B87-E(#T@)&)A<V5<;B([#0H-"G-U8B!B87-E('L-"FUY("1R9B ]
M('-H:69T*$!?*3L@;7D@)'=F.PT*)'=F(#T@<F5V97)S92@D<F8I.PT**"1R
M9BD@/2 D=V8@/7X@+RA>+BL_*5M<7%PO72\[#0HD=V8@/2!R979E<G-E*"1R
49BD[#0IR971U<FX@)'=F#0I]#0H`
`
end



------------------------------

Date: Sat, 04 Jan 2003 16:12:55 GMT
From: "J�rgen Exner" <jurgenex@hotmail.com>
Subject: Re: Need help with split
Message-Id: <bKDR9.14007$ta5.10198@nwrddc01.gnilink.net>

Jodyman wrote:
> I don't agree, just using a module sometimes doesn't let you learn
> or get to know the language better.  Sometimes modules are way
> overkill because they have everything and the kitchen sink.  Some
> things are easier done with a few one liners.

> My argument to all the "Use this or that module." is that most
> people don't know the data you're working with as well as you
> do.  If you know that you never have a comma within quotes on
> a comma delimited file, you don't need to use a module, you can
> use a few simple regexes to accomplish what a module would do.
> If you know your html code (because you wrote it or created a
> program that writes it) and you know there is no wierd stuff in it,
> you can use something like: s/<[^>]+>//g; to remove all html tags.


While I recognize your point about modules sometimes being overkill it takes
a lot (I mean _ A LOT_) of experience to recognize if this is the case of
not. And more often then none it turns out that even an experienced
programmer was wrong with his evaluation and there is that infamous 1% of
data that does not conform to the simple structure. Or the boss just decided
that he needs an additional field which busts your model. It's this one
percent that kills the simple minded approach.
Just take the example from last week where the honest usenaut who tried to
recognize valid IP addresses via REs didn't even know which
addresses/address formats are valid to begin with. Unfortunately this is all
too common and then people end up in Usenet asking "This RE works. But how
can I fix it such that it matches the remaining 1%, too?" And they will get
an answer which works for half of those special cases or for their sample
data, but still not for all cases.
Unfortunately this is all to common. In almost all cases you could have
saved yourself and others a lot of work and trouble if you would have used
the readily availabe 100% solution from the beginning.

> If you never think for yourself, you never learn.  I like jumping into
> the modules and learning from the pros.  If you never do that and
> blindly "use the modules/packages" you'll never learn the language
> well.

Au contraire, mon ami. Here I strongly(!) disagree.
A very important part of learning to program as well as learning a new
programming language is to learn about existing libraries/modules. Can you
imagine programming in C without using any library? You can't even print
anything to the screen without using a module!
Or programming in Windows without using libraries? Or creating an X11
application without using libraries?
And in a professional environment you must be able to use those modules,
which are provided by your co-workers. There is simply no way for you to
re-implement the wheel over and over again, even if your data in this
particular instance is simpler than what your coworkers code can handle.

Modules and libraries (aka programming in the large; aka re-usable code) is
the one big step forward from the early programming paradigms (say COBOL,
Fortran, Basic) to more modern approaches. Learning which modules are
available and learning how to use them properly is just as important as to
learn how to use the for condition or the split function. And this does
include learning when not to reinvent the wheel.
This step is not easy, but if you don't make it then you will always be
stuck with programming in the small and never be able to graduate from a
mediocre bulk programmer of last century 80s.

jue




------------------------------

Date: 4 Jan 2003 11:02:18 -0800
From: jl_post@hotmail.com (J. Romano)
Subject: Printing and copying lists of lists
Message-Id: <b893f5d4.0301041102.5665f03f@posting.google.com>

Hi,

   I was thinking a while back about both Lisp and Perl, and I
realized that when Lisp prints out list of lists ("lol"), every
element is displayed (no matter how far nested), whereas Perl will
only print references to lists that are inside lists.

   Shouldn't there be some easy way to make Perl display all the
nested elements of a list-of-lists?  (After all, Perl's motto "Making
easy things easy and hard things possible" would seem to imply it.) 
Likewise, is there an easy way to make a deep copy of an lol?

   I read through some newsgroups and read the perldoc on perllol and
couldn't find any quick way to do this.  Therefore, I wrote my own
subroutines.  The "lister" function takes a lol as a parameter and
returns a string representation of the list that's fit to be printed. 
The "copyList" function also takes a list-of-lists and returns an
identical copy of that list.

   Keep in mind that my code only works on lists that contain lists
and scalars (and lists-of-lists, of course).  It will not work with
lists that contain hash or object references (those references won't
break the code; they'll just be represented by the string "UNK"). 
However, they should be easy enough to modify so that they will allow
hash (and other) references.

   Here is my "lister" code.  I included example usages after the two
functions:


#!/usr/bin/perl -w

use strict;

sub lister {
   my @originalList = @_;
   my @flatList;
   foreach my $item (@originalList) {
      if (! ref($item)) {
         push @flatList, $item;
      } elsif (ref($item) eq "ARRAY") {
         push @flatList, lister(@$item);
      } else {
         push @flatList, "UNK";  # unknown element
      }
   }
   return "(" . join(",",@flatList) . ")";
}

sub copyList {
   my @originalList = @_;
   my @copy;
   foreach my $item (@originalList) {
      if (! ref($item)) {
         push @copy, $item;
      } elsif (ref($item) eq "ARRAY") {
         my @deref = copyList(@$item);
         push @copy, [ @deref ];
      } else {
         push @copy, "UNK";  # unknown element
      }
   }
   return @copy;
}

my $code;
my @list;
my @copy;

$code='   @list = (0,1,2,["3.0",3.1,["3.20",3.21,3.22],3.3,3.4],4,["5.0"],6);';
print "\nCode that defines \@list:\n";
print "$code\n\n";

eval $code;
warn "Error: $@\n" if $@;

print "\@list = ", lister(@list), "\n";

$code = '   @copy = copyList(@list);';
print "\nCode that makes a copy of \@list:\n";
print "$code\n\n";

eval $code;
warn "Error: $@\n" if $@;

print "\@copy = ", lister(@copy), "\n";

$code = << 'EOT';
   $copy[0] = "A";
   $copy[3][1] = "B";
   $copy[3][2][1] = "C";
   $copy[5][0] = "D";
EOT

print "\nCode that modifies \@copy (changes \@copy but not
\@list):\n";
print "$code\n";

eval $code;
warn "Error: $@\n" if $@;

print "\@list = ", lister(@list), "\n\@copy = ", lister(@copy),
"\n\n";

__END__


   That's my code.  Feel free to use it and/or modify it if you find
it useful.

   However, if anyone thinks that I overlooked a simpler way of doing
the same thing, please let me know.

   Thanks,

   J.


------------------------------

Date: Sat, 04 Jan 2003 14:41:37 GMT
From: Andras Malatinszky <nobody@dev.null>
Subject: Re: Problem with huge dataset, 100000000 a magic number?
Message-Id: <3E16F1D4.4060204@dev.null>



Uri Guttman wrote:

>>>>>>"MH" == Mike Hunter <mthunter@students.uiuc.edu> writes:
>>>>>>
> 
>   MH> On 03 Jan 2003 19:44:10 GMT, ctcgag@hotmail.com wrote:
>   >> 
>   >> Yes.  You have a typo at line 23.
> 
>   MH> Is this the line 23 you're talking about?
> 
>   MH>               $packets, $octets) = split /\s+/, $_;
> 
>   MH> I don't see the typo.
> 
> it was a JOKE! [snip]


What do you mean it was a joke?!?!?!?! The line is clearly missing a 
left parenthesis!
:-)



------------------------------

Date: 4 Jan 2003 10:19:36 -0800
From: 00056312@brookes.ac.uk (Stephen Adam)
Subject: While loop + several conditions problem
Message-Id: <945bf980.0301041019.63f62b09@posting.google.com>

Hi guys (and girls), just a simple problem involving a while loop and
a few conditions.

Thanks for taking a look!


This is an abstract of part of a program I am writing to remove the
HTML tags from HTML documents. This part will look through an array
until it finds a space, an end tag or an error is thrown up due to the
delimeter being found.

Why do I end up in an infinite loop? Surely it should quit when it
finds the space or if not the space then it should quit when it finds
the "xxzxx" delimieter.

Whats going wrong? 


#!C:\perl\perl.exe 

@array = ("a", "b", " ", "xxzxx",); 

$temp = 0; 




while ((@array[$temp] ne (">" or " ")) && (not $error)){

if (@array[$temp] = "xxzxx"){   # null terminator found - throw error
msg and exit
$error = 1;
print "end of array found";
}

print "x";
$temp++;
}


exit(0);


Cheers Steve


------------------------------

Date: Sat, 04 Jan 2003 18:22:55 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: While loop + several conditions problem
Message-Id: <x7ptrcwz9h.fsf@mail.sysarch.com>

>>>>> "SA" == Stephen Adam <00056312@brookes.ac.uk> writes:

  SA> This is an abstract of part of a program I am writing to remove the
  SA> HTML tags from HTML documents. This part will look through an array
  SA> until it finds a space, an end tag or an error is thrown up due to the
  SA> delimeter being found.

use an html parser (found on cpan). doing this yourself will be
difficult and bug prone. the problem has been long solved so don't
reinvent the wheel.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org
Damian Conway Perl Classes - January 2003 -- http://www.stemsystems.com/class


------------------------------

Date: Sat, 04 Jan 2003 18:45:24 GMT
From: "J�rgen Exner" <jurgenex@hotmail.com>
Subject: Re: While loop + several conditions problem
Message-Id: <8ZFR9.10629$tQ6.277@nwrddc02.gnilink.net>

Stephen Adam wrote:
> This is an abstract of part of a program I am writing to remove the
> HTML tags from HTML documents.

Unless this is part of some academic excercise to learn how to write parser
you should really use the existing and readily available HTML parser.
See CPAN for HTML::Parser

> Why do I end up in an infinite loop? Surely it should quit when it
> finds the space or if not the space then it should quit when it finds
> the "xxzxx" delimieter.
>
> Whats going wrong?

Your main problem is that you are not using warnings and strict. Besides
having told you what is wrong with your code they would have found a few
additional issues, too.

> #!C:\perl\perl.exe

use warnings;
use strict;

> @array = ("a", "b", " ", "xxzxx",);
> $temp = 0;
> while ((@array[$temp] ne (">" or " ")) && (not $error)){

You are evaluating two pieces of text (">" and " ") using a boolean operator
("or"). That means the boolean value of the two strings will be computed and
or-ed: "TRUE or TRUE"  which yields TRUE. And now you are comparing this
boolean value to the text in @array[$temp] using "ne".
That can't be right.

BTW: @array[$temp] is bogus, too. Did you mean $array[$temp]?

> if (@array[$temp] = "xxzxx"){   # null terminator found - throw error

Again, did you mean $array[$temp]?

jue




------------------------------

Date: Sat, 4 Jan 2003 12:46:40 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: While loop + several conditions problem
Message-Id: <slrnb1eb0g.ufr.tadmc@magna.augustmail.com>

Stephen Adam <00056312@brookes.ac.uk> wrote:

> Thanks for taking a look!


You try folk's patience by not asking for maching help before
resorting to human help.

Please run your code with warnings enabled before posting.

Have you seen the Posting Guidelines that are posted here weekly?


> This is an abstract of part of a program I am writing to remove the
> HTML tags from HTML documents. 


Then we can assume that you've seen the FAQ about that?

   perldoc -q remove

      "How do I remove HTML from a string?"

It is hard to do right even for experts, and you're not one. (no offense)

You should use a module that understands HTML for processing HTML.


> This part will look through an array
> until it finds a space, an end tag 


Your code does not look for end tags.

It appears to look for the end of a start tag, or the end of an
end tag, or any occurance of > even if it is not part of any tag...


> delimeter being found.
  ^^^^^^^^^
> the "xxzxx" delimieter.
              ^^^^^^^^^^


I realize that you spell funny on your side of the pond,
but not that funny.   :-)


> Whats going wrong? 


Ask perl to tell you by enabling warnings.


> while ((@array[$temp] ne (">" or " ")) && (not $error)){


You have many different errors all on that one line.

@array[$temp] is an array slice, warnings would have told
you to write $array[$temp] instead.

the     or " "       part is NEVER evaluated, because 
the ">" part is always "true".

   $array[$temp] ne '>' or $array[$temp] ne ' '
or
   $array[$temp] =~ /^[> ]$/


You compare $array[$temp] to the "true" value (probably "1") from the "or",
I doubt if that is what you want.


> if (@array[$temp] = "xxzxx"){   # null terminator found - throw error
                    ^^


You are using the wrong operator there, warnings would have told
you about that too if you could have been troubled to ask.

Even if you'd used == it would be wrong as that operator is
for comparing numbers.

eq is used for comparing strings.


> msg and exit
> $error = 1;
> print "end of array found";
> }
> 
> print "x";
> $temp++;
> }


Something terrible has happened to the formatting of your code,
you should indent the contents of blocks.

Your code is hopeless, I can't even suggest how to write it
correctly because I can't figure out what you are trying to do.

In addition, I guess that you are going about it all wrong,
parsing arbitrary HTML is really hard, you will work on this
forever if you insist on trying to stumble through it yourself.

Back up a bit and tell us what you really need to do, and maybe
we can give some helpful suggestions to get you on the right path...


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4341
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[22119] in Perl-Users-Digest

Perl-Users Digest, Issue: 4341 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sat Jan 4 14:05:39 2003

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jan 4 14:05:39 2003