[19450] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 1645 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 29 03:05:25 2001

Date: Wed, 29 Aug 2001 00:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <999068708-v10-i1645@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Wed, 29 Aug 2001     Volume: 10 Number: 1645

Today's topics:
    Re: a perl module dedicated to lists ? <Francis.Derive@wanadoo.fr>
    Re: adding rows in a text file <goldbb2@earthlink.net>
    Re: dec2hex conversion ,uninitialized value <goldbb2@earthlink.net>
        Difference between .pl, .cgi, and .pm File Extensions. <bob@eawf.nospam.com>
    Re: EXEs from within Perl <goldbb2@earthlink.net>
    Re: Hash of Complex Records <goldbb2@earthlink.net>
    Re: Help with Piped command, capturing output <mbudash@sonic.net>
    Re: How to remove duplicate records in a huge file?? <goldbb2@earthlink.net>
        image pixel dimensions using perl script but NOTsize mo (Brian A)
    Re: Java mucks up split (Trewth Seeker)
    Re: Java mucks up split (Trewth Seeker)
    Re: Java mucks up split (Abigail)
        Perl Question about hash element within list array (Chris)
    Re: Perl Question about hash element within list array <goldbb2@earthlink.net>
        regexp for removing HTML tags <dogansmoobs.NOSPAM@ctel.net>
    Re: regexp for removing HTML tags <jurgenex@hotmail.com>
    Re: regexp for removing HTML tags <dogansmoobs.NOSPAM@ctel.net>
    Re: regexp for removing HTML tags (Malcolm Dew-Jones)
    Re: simple foreach problem (Malcolm Dew-Jones)
    Re: sort alphabetically <goldbb2@earthlink.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 29 Aug 2001 08:53:37 +0200
From: "Francis Derive" <Francis.Derive@wanadoo.fr>
To: "Tassilo von Parseval" <Tassilo.Parseval@post.rwth-aachen.de>
Subject: Re: a perl module dedicated to lists ?
Message-Id: <B7B25E36-3863D@193.248.252.79>

Merci to all of you !

I appreciate your mastering of Perl programming to take for granted that
Perl will do most of that for me.

In fact, I could write this easy member function ... using recursion : not
so directly and easy than your's using grep :

sub member {
	my ($item, $ref_liste) = @_;
	local @liste = @$ref_liste;

	if (@liste ==  ()) {
		0;
	}else { 
		my $car = shift @liste;
	  	if ($item eq $car) {
	  		unshift @liste, ($item) ; 
	  		@liste; 
		}else{  
			Listes::member($item, \@liste); #Listes, my Lists Perl package
		}
	}
}

member 3, [1,2,3,4];
=> # 3   4

Forget it.

However, is it possible to go further in life without using recursion ?
For example, an element of the list could be a list itself - making it a
tree -where  I would want to search for the $item again.

When it comes to lists, I have been prepared to think in terms of Lisp
algorithms,  which can be hardly implemented in Perl if resursion is
needed.

For sure I will search for List::Util, but would it be reasonable to think
calling Lisp from Perl ( or Perl from Lisp, in another news group ) ?.

Francis.

On Tue, Aug 28, 2001 11:39 PM, Tassilo von Parseval
<mailto:Tassilo.Parseval@post.rwth-aachen.de> wrote:
>Logan Shaw wrote:
>
>> In article <B7B1BE31-354F9@193.248.252.253>,
>> Francis Derive <Francis.Derive@wanadoo.fr> wrote:
>> 
>>>This is it : is there a perl module dedicated to lists ?
>>>
>>>i.e., (member $elem @list), oh ! I beg your pardon : member( $elem,
@list);
>>>delete( $elem, @list );
>>>
>> 
>> Those don't exist directly, but instead of member, you can do this:
>> 
>> 	grep ($elem eq $_, @list) > 0
>
>Yet one should mark that grep can be slightly less efficient that loop 
>last()ed on finding an element.
>
>> and instead of delete you can do this:
>> 
>> 	grep ($elem ne $_, @list)
>
>Well, rather @new = grep(...), otherwise it'd be a no-op.
>
>> On the first one, you can leave off "> 0" if it occurs in a scalar
>> context.  The second one returns a new value rather than modifying the
>> list.
>
>Ah, ok, you mentioned yourself.
>
>> If you are worrying about sets of strings (a common case in Perl),
>> you can turn your list into a hash:
>> 
>> 	%hash = map { $_ => 1 } @list;
>> 
>> And then you can test for membership:
>> 
>> 	if (exists $hash{$elem}) { whatever... }
>
>In this case (since you initialize each hash-element with 1) you could 
>even write "if ($hash{$elem})" which I find a little more eye-friendly.
>But here this does rather the same.
>
>> and you can delete:
>> 
>> 	delete $hash{$elem};
>> 
>> You can convert back into a list easily enough:
>> 
>> 	@list = keys %hash;	# or "sort keys %hash" if you want
>
>Though, one might want to add, doing it with hashes has side-effects. 
>You automatically delete dupilcate entries. Sure, these can be pleasant 
>side-effects in some cases.
>
>> If you're doing lots of set operations, this is likely an efficient way
>> to do it, since hashes really do internally use a hash and therefore
>> these operations are O(1) execution time.  (At least most of the time.
>> I'm not sure delete is always O(1), but it probably is.)
>
>Ah no. Turning an array into a hash is at least O(n)...turning it back 
>another O(n) plus the additional sort with O(n log(n)) if that is done 
>as well.
>
>Tassilo
>-- 
>$a=[(74,116)];$b=[($a->[1]-1,$a->[1]++,0x20)];$c=[(97,110)];$d=[($c->
>[1]+1,$b->[1],"her")];for(@{[$a,$b,$c,$d]}){for(@{$_}){$_=~/\d+/?print
>(chr($_)):print;}}$c=sub{$l=shift;[(0x20+$l-1,0x50,0x65,0x73-0x01,108
>),(0x20,0x68,0x61,)]};print(map{chr($_)}@{($c->(1))});$h={a=>33*3,b=>
>10**2+7,c=>"1"."0"."1",d=>0162};@h=sort(keys(%$h));for(@h){print(chr(
>ord(chr($h->{$_}))))};
>
>






------------------------------

Date: Wed, 29 Aug 2001 02:10:34 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: adding rows in a text file
Message-Id: <3B8C875A.474B9508@earthlink.net>

soumitra bhattacharya wrote:
> 
> Hi!
> I have a txt file of the following format
> 1||15||8||99||08||7
> 2||15||8||99||08||7
> 3||33||55||3||2||1
> 4||109||3||0||1||80
> 5||78||5||0||2||88
> 6||55||4||0||1||67
> 7||44||6||0||1||85
> 8||25||7||0||1||70
> 13||6||7||0||0||36
> 14||3||13||0||6||50
> 15||9||12||0||0||51
> 16||3||12||0||1||48
> 17||4||32||1||16||165
> 18||1||5||1||2||78
> The first piece of data in each row is the number of logins and
> the next datas are mails read||deleted||etc for login.
> I want to generate a graph
> but I want to sum on the basis of number of logins after 4.
> i.e.
> 1||....
> 2||..
> 3||..
> 4||...
> 5-10||sum of mails read for 5 to 10||sum of mails deleted for5-10||.....
> 11-20||...
> 21-30||....
> 31-40||...
> 41-50||....
> 50-whatever is max
> so in first row will be data for people logged in once,
> but in row 5-10 will be sum of data in rows 5,6,7,8,9,10 if they exist.
> similarly for the row named 11-20 will be sum of data in rows 11..20 if they exist.
> How do I do this.
> Any Help will be appreciated.
> Soumitra

my %data;
while( my $record = <DATA> ) {
	chomp;
	my @fields = split, /\Q||/;
	my $key = shift @fields;
	$key -= ($key-1) % 10 - 10 if( $key > 4 );
	$data{$key}{$_} += $fields{$_} for( 0 .. $#fields );
}

for my $record ( sort keys %data ) {
	print $record > 10 ? (($record-9) . "-" . $record) :
		$record == 10 ? "5-10" : $record;
	print "||", join("||", @{$data{$record}} ), "\n";
}

-- 
"I think not," said Descartes, and promptly disappeared.


------------------------------

Date: Wed, 29 Aug 2001 00:49:30 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: dec2hex conversion ,uninitialized value
Message-Id: <3B8C745A.4EE49E25@earthlink.net>

Stephen Lohning wrote:
[snip]
> And does anyone know how to get pack unpack to do decimal to hex
> string conversions ?
> 
> Thanks
> 

sub dec2hex {
	my ($dec, $width) = @_;
	unpack( "H" x $width, pack( "N", $dec ) );
}

-- 
I'm not a programmer but I play one on TV...


------------------------------

Date: Tue, 28 Aug 2001 23:35:23 -0700
From: Bob Holden <bob@eawf.nospam.com>
Subject: Difference between .pl, .cgi, and .pm File Extensions.
Message-Id: <l33potsh0n3evogfoqhg7kthl6bicd7l30@4ax.com>

I'm new to Perl, and trying to figure this all out.  What I can't find
clear references to in all the books I've purchased and info I've
downloaded is:

What's the difference between a .pl, .cgi, and a .pm file, and why
would you use one over another?

TIA


------------------------------

Date: Wed, 29 Aug 2001 02:28:30 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: EXEs from within Perl
Message-Id: <3B8C8B8E.C35849C6@earthlink.net>

Tassilo von Parseval wrote:
> 
> Graham W. Boyes - TOAO.net wrote:
> > Hi,
> > Using the latest versions of ActivePerl and Perl2EXE on DOS 7. 
> > (E.g., a Windows 98SE boot disk.)
> >
> > I was wondering if it's possible to to run a DOS executable, for
> > example, Mem.exe (displays used and unused memory) and have the
> > output put into a variable inside my Perl program.  Right now I'm
> > doing something like "mem > temp.txt" at the DOS prompt, and then
> > running my Perl program which opens the file temp.txt and works on
> > it.  This is rather cumbersome as I do it six times (with different
> > programs) in my Perl program.
> 
> There are different more convenient ways of doing that.
> 
> 1) Using backticks:
> 
> my $var = `mem`;
> 
> 2) Using open:
> 
> open MEM, "mem |" or die "Error: Could not fork: $!";
> my @var = <MEM>;
> 
> The second approach is more sophisticated since it forks off the
> program.

Actually, on unix, both fork off and exec a program, but with the first,
the program runs to completion before the expression returns anything,
and with the second, the open() returns immediately, and you can start
reading data as soon as it starts outputing data, without having to wait
for the program to exit.

On windows [excluding nt], redirection to and from pipes is emulated
[that is, it *pretends* to do it, but it's actually doing something else
behind the scenes].  When you have the line in your perl program
open(MEM, "mem |"), it actually runs the mem.exe program with it's
output redirected to a temporary file, waiting until mem is finished,
then opens that file for reading.

Actually, I believe that on win, backticks (that is, `` and qx) work the
same way -- it runs the .exe with output redirected to a temporary file,
then opens that file for reading.  The difference of course is that with
backticks, you don't get to see the file io happening, and that you
*must* slurp the whole file into memory, while with open(...|) you can
read and process a line at a time if you choose.

-- 
"I think not," said Descartes, and promptly disappeared.


------------------------------

Date: Wed, 29 Aug 2001 00:42:32 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Hash of Complex Records
Message-Id: <3B8C72B8.1558C2C@earthlink.net>

[please don't top-post]

Mark Riehl wrote:
> 
> The output didn't work for me, it only printed the vehicle IDs from
> the outer loop, never entered the inner loop.  I'm wondering if
> there's a problem on the input side.  Here's what is filling the hash
> (this is inside another while, which performs the same task for
> multiple vehicle ids).
> 
>   $rec = {};
>   $rec->{urn} = $this_urn;
>   @movement_records = ();
> 
>   while (<INPUT>) {
>     ... Get the time, lat, long from a file
> 
>     %fields = (time=>$time, latitude=>$latitude,
>       longitude=>$longitude);
>     push @movement_records, {%fields};
>   }
> 
>   $LocationData{ $rec->{vid} } = @movement_records;

This assigns the number of items in the array @movement_records into the
hash -- this is not what you want.  You want an arrayref there.

I would suggest one of::

while( <INPUT> ) {
	chomp;
	my %fields;
	@fields{qw(time latitude longitude)} =
		split /,/;
	push @{LocationData{$rec->{vid}}, \%fields;
}

Or:
$LocationData{$rec->{vid}} = [ map {
	chomp;
	my %fields;
	@fields{qw(time latitude longitude)} =
		split /,/;
	\%fields;
} <INPUT> ];

Or:

  my @movement_records; # where it's declared make a difference.
  while (<INPUT>) {
    my %fields;
    ... Get the time, lat, long from a file

    @fields{qw(time latitude longitude)} =
      ($time, $latitude, $longitude);
    push @movement_records, \%fields;
  }
  $LocationData{$rec->{vid}} = \@movement_recrods; # note the \ there

or ...... anyway, TIMTOWTDI

-- 
I'm not a programmer but I play one on TV...


------------------------------

Date: Wed, 29 Aug 2001 05:38:39 GMT
From: Michael Budash <mbudash@sonic.net>
Subject: Re: Help with Piped command, capturing output
Message-Id: <mbudash-2E9A2A.22384028082001@news.sonic.net>

In article <4d6fd5fc.0108281159.1bf3b7f0@posting.google.com>, 
dawfun@seanet.com (Jonathan Cunningham) wrote:

> Michael Budash <mbudash@sonic.net> wrote in message 
> news:<mbudash-008F99.21503120082001@news.sonic.net>...
> > 
> > hmmm... you might try appending " 2>&1" to $pgpcmd and see what you 
> > get... also: it may not be able to find your config file: as i recall, 
> > i 
> > had to put it (pgp.cfg) in the same dir as the script with this is in 
> > it:
> > 
> > PubRing="/full/path/to/pubring.pkr"
> > Verbose=2
> >  
> > this was with pgp 5.x for irix 6.x, so YMMV...
> > 
> > hth-
> 
> No luck here either.

what worked and what didn't?

> Lame.

excuse me? "Lame"?
-- 
Michael Budash ~~~~~~~~~~ mbudash@sonic.net


------------------------------

Date: Wed, 29 Aug 2001 02:59:34 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: How to remove duplicate records in a huge file??
Message-Id: <3B8C92D6.250C610C@earthlink.net>

Wim wrote:
> 
> How can I remove duplicate records in a huge file?
> The file contains about 3.000.000 records (=800Mb)

tie my %hash, "DB_File", "temp" or die ...

open( local(*FILE), "<the_big_file" ) or die ...
while( $record = <FILE> ) {
	$hash{ $record } = $. if !exists $hash{ $record };
}
close FILE or die ...

open( FILE, ">temp2" ) or die ...
while( my ($record, $index) = each %hash ) {
	print TEMP2 pack("N",$index), $record;
}
close FILE or die ...

untie %hash;
unlink "temp";

open( FILE, "sort temp2|" ) or die ...
print STDOUT substr $_, 4 while <FILE>;
close FILE or die ...
unlink "temp2";

__END__

I use the sort program, rather than perl's sort function, because it has
means to do what's known as an "external sort", which means that it does
it without having the entire file in memory.

Assuming you're on unix, and have the DB_File module installed, this is
probably the most robust solution.

If you don't need the records to be in the same order as they started
in, it is a bit simpler... just put them into the hash as I said, then
use each to get them out of the hash and print them where you want...
without bothering to sort.

-- 
"I think not," said Descartes, and promptly disappeared.


------------------------------

Date: Tue, 28 Aug 2001 21:07:55 GMT
From: nospam_bca1000@hotmail.com (Brian A)
Subject: image pixel dimensions using perl script but NOTsize modulle
Message-Id: <3b8c0736.5658060@news.uk.worldonline.com>

I need a script which will give the pixel dimensions of an image file.
All I can find on the Net are ones which use the size module. These
don't seem t work on the server I use (the module deosn't seem to be
there). Support never reply  to emails so I am stuck. Any simple
alternative scripts to achieve my aim?
Remove 'nospam_' from email address
Brian A.


------------------------------

Date: 28 Aug 2001 22:22:16 -0700
From: trewth_seeker@yahoo.com (Trewth Seeker)
Subject: Re: Java mucks up split
Message-Id: <d690a633.0108282122.73c59523@posting.google.com>

abigail@foad.org (Abigail) wrote in message news:<slrn9onri3.rpa.abigail@alexandra.xs4all.nl>...
> Trewth Seeker (trewth_seeker@yahoo.com) wrote on MMCMXIX September
> MCMXCIII in <URL:news:d690a633.0108280002.6054b2b5@posting.google.com>:
> $$ Those Java folks at Sun seem to have trouble getting it right, even
> $$ when it's already been done for them.  They've added regexps and
> $$ "split" to Java 1.4, but the split doc
> $$ http://java.sun.com/j2se/1.4/docs/api/java/util/regex/Pattern.html
> $$ says
> $$ 
> $$ The input "boo:and:foo", for example, yields the following results
> $$ with these parameters:
> $$ 
> $$ Regex    Limit    Result
> $$    :        2     { "boo", "and" }
> $$ 
> $$ That's just dumb, and misses the whole point of the limit.  This is
> $$ going to lead to confusion and cause some people to think that Perl is
> $$ just as broken, since Java's split is taken from Perl.  What can be
> $$ done to get Sun to do it right?
> 
> 
> Why on earth should this group be bothered? 
> 
> I don't think there's any point in bickering "oh, language X has feature
> foo, that should be in language Y too, and exactly as X has".
> 
> I quickly gave up on the entire perl6 development process as there were
> just too many people trying to turn Perl into Java/C/Python/Frobel/whatever.
> 
> Let Perl be remain Perl. Let Java take whatever course it takes. From a
> Perl perspective, it doesn't matter. Nor does it matter how Python 2500
> is going to look like, or the next version of Eiffel or SQL.
> 
> This groups is about Perl. Perl. Perl. Perl.
> 
> 
> The Java groups are over there. Complain there if you think Sun is doing
> you injustice. Perhaps someone there will take pitty.
> 
> 
> Use Perl. Not Java.
> 
> 
> Abigail

Get stuffed, oh rude one.


------------------------------

Date: 28 Aug 2001 22:34:36 -0700
From: trewth_seeker@yahoo.com (Trewth Seeker)
Subject: Re: Java mucks up split
Message-Id: <d690a633.0108282134.61f58a7e@posting.google.com>

Craig Berry <cberry@cinenet.net> wrote in message news:<Xns910B8D00CBAFEcberrycinenetnet1@207.126.101.92>...
> abigail@foad.org (Abigail) wrote in
> news:slrn9onri3.rpa.abigail@alexandra.xs4all.nl: 
> > Use Perl. Not Java.
> 
> Or use both, applying the correct tool to each job.

And sometimes people don't have a choice as to what to apply.

> And I believe the 
> heads-up regarding split compatibility here in clpm was justified, as Sun 
> has explicitly stated their intention of basing regex-related tools on Perl.  

Thanks; I'm glad *someone* gets it.

> I agree that further discussion of potential remedies would belong in the 
> java groups.

I've filed a bug with Sun.  But the problem is that "the Java groups"
simply won't *care* that Java has a broken version of Perl's facility,
and most Java users will be quite unaware that it *is* broken.  Someone
asked why I want Java to do it right if I'm "a perler" -- I'm not "a perler",
I'm an engineer and a human being, and I care when things are broken.
I have trouble fathoming people who don't.  Sun claims that these
facilities are taken from Perl, and I expected at least one
person in the Perl community (LW, for instance) to care that they
broke it in the process and thereby misrepresent Perl and introduce
pointless inconsistency into the programming community at large, and
my note was directed to such people.  If Abigail or anyone else isn't
one of those people, they are free to ignore the thread.


------------------------------

Date: 29 Aug 2001 06:29:23 GMT
From: abigail@foad.org (Abigail)
Subject: Re: Java mucks up split
Message-Id: <slrn9op2u6.sp9.abigail@alexandra.xs4all.nl>

Trewth Seeker (trewth_seeker@yahoo.com) wrote on MMCMXX September
MCMXCIII in <URL:news:d690a633.0108282134.61f58a7e@posting.google.com>:
?? 
?? I've filed a bug with Sun.  But the problem is that "the Java groups"
?? simply won't *care* that Java has a broken version of Perl's facility,
?? and most Java users will be quite unaware that it *is* broken.  Someone
?? asked why I want Java to do it right if I'm "a perler" -- I'm not "a perler",
?? I'm an engineer and a human being, and I care when things are broken.
?? I have trouble fathoming people who don't.  Sun claims that these
?? facilities are taken from Perl, and I expected at least one
?? person in the Perl community (LW, for instance) to care that they
?? broke it in the process and thereby misrepresent Perl and introduce
?? pointless inconsistency into the programming community at large, and
?? my note was directed to such people.  If Abigail or anyone else isn't
?? one of those people, they are free to ignore the thread.


I still don't see an issue here. When Larry designed Perl, he "took"
the block structure of C and put it into Perl. Except he didn't copy it
exactly - braces are mandatory after if(), for(), etc, even if there's
one statement.

That didn't cause people in comp.lang.c whining Larry "broke" block
structures. Just because you take a feature from one language to
incorporate in your own doesn't mean it needs to be copied exactly.

Is Java's split also "broken" because Java doesn't have a $* variable?


Followups set.

Abigail
-- 
use   lib sub {($\) = split /\./ => pop; print $"};
eval "use Just" || eval "use another" || eval "use Perl" || eval "use Hacker";


------------------------------

Date: 28 Aug 2001 22:57:34 -0700
From: clee008@yahoo.com (Chris)
Subject: Perl Question about hash element within list array
Message-Id: <6b5e9aa2.0108282157.413cab0d@posting.google.com>

Hi all,

I got a difficult question of writing some code to loop through all
the data printing it out

@myData = ( ( name =>	'Chris',
              address  => 'London',
              phone => '207 721 2000' },
	    { name =>	'John',
              address =>  'Hong Kong' },
              phone => '3306 2457' }, 
             etc ...
);

I tried foreach loop to go through the list but I couldn't manipulate
$_ as hash item. Do you have any idea? Please email me the answer.
Many thanks.

Regard,

Chris Lee


------------------------------

Date: Wed, 29 Aug 2001 02:16:36 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Perl Question about hash element within list array
Message-Id: <3B8C88C4.E4F2F50E@earthlink.net>

Chris wrote:
> 
> Hi all,
> 
> I got a difficult question of writing some code to loop through all
> the data printing it out
> 
> @myData = ( ( name =>   'Chris',
>               address  => 'London',
>               phone => '207 721 2000' },
>             { name =>   'John',
>               address =>  'Hong Kong' },
>               phone => '3306 2457' },
>              etc ...
> );
> 
> I tried foreach loop to go through the list but I couldn't manipulate
> $_ as hash item. Do you have any idea? Please email me the answer.
> Many thanks.

$_ isn't a hash, it's a reference to a hash.

print "name, address, phone:"
foreach( @myData ) {
	print join(" ",$_->{qw(name address phone)}), "\n";
}

-- 
"I think not," said Descartes, and promptly disappeared.


------------------------------

Date: Wed, 29 Aug 2001 00:49:09 -0400
From: Mik Mifflin <dogansmoobs.NOSPAM@ctel.net>
Subject: regexp for removing HTML tags
Message-Id: <toot0p1rls9ka0@corp.supernews.com>

I've been trying to come up with a regexp to remove all HTML tags.  I need 
this becuase I'm writing a GAIM plugin, and some people's GAIM messages are 
sent in HTML, and I just need the text........

-- 
 - Mik Mifflin


------------------------------

Date: Tue, 28 Aug 2001 21:52:01 -0700
From: "J�rgen Exner" <jurgenex@hotmail.com>
Subject: Re: regexp for removing HTML tags
Message-Id: <3b8c74f9$1@news.microsoft.com>

"Mik Mifflin" <dogansmoobs.NOSPAM@ctel.net> wrote in message
news:toot0p1rls9ka0@corp.supernews.com...
> I've been trying to come up with a regexp to remove all HTML tags.  I need
> this becuase I'm writing a GAIM plugin, and some people's GAIM messages
are
> sent in HTML, and I just need the text........

As mentioned a gazillion times in this NG: if you want to parse HTML use an
HTML parser, e.g. HTML::Parse.
Please see the FAQ about why it's a bad idea to try using REs for parsing
HTML.

jue




------------------------------

Date: Wed, 29 Aug 2001 02:07:18 -0400
From: Mik Mifflin <dogansmoobs.NOSPAM@ctel.net>
Subject: Re: regexp for removing HTML tags
Message-Id: <top1j9auk1g8ee@corp.supernews.com>

> "Mik Mifflin" <dogansmoobs.NOSPAM@ctel.net> wrote in message
> news:toot0p1rls9ka0@corp.supernews.com...
>> I've been trying to come up with a regexp to remove all HTML tags.  I
>> need this becuase I'm writing a GAIM plugin, and some people's GAIM
>> messages
> are
>> sent in HTML, and I just need the text........
> 
> As mentioned a gazillion times in this NG: if you want to parse HTML use
> an HTML parser, e.g. HTML::Parse.
> Please see the FAQ about why it's a bad idea to try using REs for parsing
> HTML.
> 
> jue
> 
> 
> 
I know that, but I don't want to use a module.  I just want to strip away 
the tags.  I've tried things like s/<.*>//g, but they dont' work.  I need a 
regular expression to drop everything in between < and >.
-- 
 - Mik Mifflin


------------------------------

Date: 28 Aug 2001 23:16:42 -0800
From: yf110@vtn1.victoria.tc.ca (Malcolm Dew-Jones)
Subject: Re: regexp for removing HTML tags
Message-Id: <3b8c88ca@news.victoria.tc.ca>

Mik Mifflin (dogansmoobs.NOSPAM@ctel.net) wrote:
: I've been trying to come up with a regexp to remove all HTML tags.  I need 
: this becuase I'm writing a GAIM plugin, and some people's GAIM messages are 
: sent in HTML, and I just need the text........

: -- 
:  - Mik Mifflin


Try this (untested)

	$entire_html_file =~ s/<[^>]*>//g;

It [maybe] removes everything starting with a < and upto the next >.  (I
can't test it right now.) 

(of course all formatting is lost.)

You can keep some of the most basic formatting with 

	$entire_html_file =~ s/\n/  /g;		# unwrap lines
	$entire_html_file =~ s/<br>/\n/gi;	# break
	$entire_html_file =~ s/<p>/\n\n/gi;	# paragraphs
	$entire_html_file =~ s/<[^>]*>//g;	# zap all other tags

(Text::Wrap can rewrap the text)

Tables one level deep might look ok with the following added before the
"zap all" step...

	$entire_html_file =~ s/<[/]?table[^>]>/\n/gi;
	$entire_html_file =~ s/<[/]?t[rh][^>]>/\n/gi;
	$entire_html_file =~ s/<td[^>]>/\t/gi;


Lists are left as an excersize...

the above is *KLUDGEY*, but you get what you don't pay 4.


------------------------------

Date: 28 Aug 2001 21:53:14 -0800
From: yf110@vtn1.victoria.tc.ca (Malcolm Dew-Jones)
Subject: Re: simple foreach problem
Message-Id: <3b8c753a@news.victoria.tc.ca>

2obvious (vadivasbro@hotmail.com) wrote:
: I'm an ActiveState Perl user and I'm getting unpredictable results
: with the foreach loop.  Here's a simple example.

: @array = <STDIN>;
: foreach (@array)
: {
: 	print;
: }

: When I type in these three lines and terminate them with a CTRL-Z:

: one
: two
: three

: then it only returns the last two:

: two
: three

: Now, I thought this would have returned all three elements of my list.
:  Am I misunderstanding how foreach works, or this really as strange as
: I think it is?


It is a bug in something.  I have seen it using activestate perl on
windows.  It might be a bug in windows, as it is somewhat similar to bugs
in windows I have encountered in the past that affected reading (or
writing) standard input using pipes or redirections with regular .EXE
programs. 


The work around is to alter how you access the file on the command line. 
I forget how I solved this when I saw it, but for example, if you're doing
'perl script.pl < file' then changes such as 'type file | perl script.pl'
or @array=<> combined with 'perl script.pl file' can solve the problem. 




------------------------------

Date: Wed, 29 Aug 2001 00:25:36 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: sort alphabetically
Message-Id: <3B8C6EC0.7FBF188D@earthlink.net>

greg wrote:
> 
> I'm having a hard time trying to find out how to sort alphabetically,
> i have two fields, title and text and i'm trying to sort
> alphabetically by the title field.  Here is my code
> 
> open(DATA,"<$data") || print "Error Opening File";
> while ($riga = <DATA>)
>         {
>         ($title,$text) = split(/\|/,$riga);
>         print "$title";
>         print "$text";  }
> close(DATA);
> 
> can somebody please help me?

for(sort do {
	open( local(*DATA), "<$data" )
		or die "Couldn't open $data: $!";
	<DATA> } ) {
	local $, = " ";
	print split(/\|/, $_), "\n";
}


-- 
I'm not a programmer but I play one on TV...


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 1645
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[19450] in Perl-Users-Digest

Perl-Users Digest, Issue: 1645 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Aug 29 03:05:25 2001

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 29 03:05:25 2001