[11743] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5343 Volume: 8

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Apr 9 16:07:32 1999

Date: Fri, 9 Apr 99 13:00:18 -0700
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 9 Apr 1999     Volume: 8 Number: 5343

Today's topics:
    Re: =~tr / / /; problem (Bart Lateur)
    Re: Anyone help with reading from data file?  Read here <uri@home.sysarch.com>
    Re: Incrementing a hash value. <jglascoe@giss.nasa.gov>
    Re: Incrementing a hash value. <tbriles@austin.ibm.com>
    Re: Incrementing a hash value. <smiles@wfubmc.edu>
    Re: Incrementing a hash value. (Larry Rosler)
    Re: Incrementing a hash value. <uri@home.sysarch.com>
    Re: Latest AdminMisc and ActiveState's perl (Andrew Haveland-Robinson)
        LWP and sockets <smiles@wfubmc.edu>
    Re: perl regular expression (Tad McClellan)
    Re: perl regular expression (Larry Rosler)
    Re: Piping Input into a Perl script on NT <jsjensen@Paramin.COM>
        PPM has created a new dir. <greg2@surfaid.org>
    Re: Privacy for slaves forced to use a proxy/firewall t <kperrier@blkbox.com>
    Re: Privacy for slaves forced to use a proxy/firewall t <nospam@nospam.org>
    Re: SORT BY DATE (Larry Rosler)
    Re: SORT BY DATE (Sam Holden)
    Re: SORT BY DATE <jglascoe@jay.giss.nasa.gov>
    Re: SORT BY DATE (Sam Holden)
    Re: SORT BY DATE <jglascoe@giss.nasa.gov>
    Re: SORT BY DATE (Sam Holden)
    Re: SORT BY DATE (Larry Rosler)
    Re: Stripping html tags within perl <cassell@mail.cor.epa.gov>
    Re: Stripping html tags within perl <nsandow@otnnet.com>
    Re: Stripping html tags within perl (Sam Holden)
    Re: Yet another regexp question (Sam Holden)
        Special: Digest Administrivia (Last modified: 12 Dec 98 (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 09 Apr 1999 18:57:55 GMT
From: bart.lateur@skynet.be (Bart Lateur)
Subject: Re: =~tr / / /; problem
Message-Id: <370e4d35.325210@news.skynet.be>

Larry Rosler wrote:

>/s : Translate a sequence of identical characters to one character.

Er... no.

	tr/\r\n\t / /s;

will translate any sequence of return characters, newlines, tabs or
spaces into a single space. First, translate the characters. Then remove
subsequent duplicates in the result.

	Bart.


------------------------------

Date: 09 Apr 1999 15:30:43 -0400
From: Uri Guttman <uri@home.sysarch.com>
Subject: Re: Anyone help with reading from data file?  Read here.
Message-Id: <x76775siak.fsf@home.sysarch.com>

>>>>> "LR" == Larry Rosler <lr@hpl.hp.com> writes:

  LR> In article <370a2bfe@news.greatbasin.net> on Tue, 6 Apr 1999 08:42:30 -
  LR> 0700, Freaky <D@nt.Email.Me> says...
  >> Does anyone know how to read a data file in reverse order? I use the format
  >> open (HEADERREAD, "$file3") or die "Unable to open...
  >> and it will only read in normal order.  Im just curious if theres a way to
  >> reverse it.  Thanks.

  LR> For huge files (for example, server logs), a more sophisticated approach 
  LR> is required.  I suggest you search this newsgroup in DejaNews for 
  LR> keywords 'read file backwards' (I found 103 messages).  In particular, 
  LR> look at a module 'Backwards.pm' that Uri Guttman has developed.

hey, i have to upload that to cpan already. i need a minor tuit and some
kicks like this to do it.

uri

-- 
Uri Guttman  -----------------  SYStems ARCHitecture and Software Engineering
uri@sysarch.com  ---------------------------  Perl, Internet, UNIX Consulting
Have Perl, Will Travel  -----------------------------  http://www.sysarch.com
The Best Search Engine on the Net -------------  http://www.northernlight.com


------------------------------

Date: Fri, 09 Apr 1999 14:56:54 -0400
From: Jay Glascoe <jglascoe@giss.nasa.gov>
To: Gabriel Richards <grichard@uci.edu>
Subject: Re: Incrementing a hash value.
Message-Id: <370E4D76.A5635C48@giss.nasa.gov>

Gabriel Richards wrote:
> 
> "Useless use of numeric eq in void context...at line 11."
> 
> Here's the code:
> 
>  if (exists($tally{$line})) {$tally{$line} += 1} #line 11
>  else {$tally{$line} = 1}
> 

I see no error, perl seems to be complaining about the
use of an equality test in a void context.  E.g.,

# on a line by itself
1 == 3;


> What does the error message mean?

perldoc perldiag: "Useless use of %s in void context"


------------------------------

Date: Fri, 09 Apr 1999 13:50:01 -0500
From: Tom Briles <tbriles@austin.ibm.com>
Subject: Re: Incrementing a hash value.
Message-Id: <370E4BD9.1BDB75AC@austin.ibm.com>

Gabriel Richards wrote:

> "Useless use of numeric eq in void context...at line 11."
>
> Here's the code:
>
>  if (exists($tally{$line})) {$tally{$line} += 1} #line 11
>  else {$tally{$line} = 1}
>
> What does the error message mean?

perldoc perldiag

Now you know what any perl error message means!

> How can I increment the value associated
> to $tally{$line}?

Just like you did.

This is not the line causing the error.  Perhaps a line above?

- Tom



------------------------------

Date: Thu, 08 Apr 1999 14:47:08 -0400
From: Steve Miles <smiles@wfubmc.edu>
Subject: Re: Incrementing a hash value.
Message-Id: <370CF9AC.8BCF772C@wfubmc.edu>

Instead of

>  if (exists($tally{$line})) {$tally{$line} += 1} #line 11
>  else {$tally{$line} = 1}

try:

    if (exists $tally{$line}) {$tally{$line}++;}   # increments the value of
the hash if it exists
    else {$tally{$line} = 1;}    # remember your ";" !

Good Luck,
Steve

=============================================
Steve Miles (smiles@wfubmc.edu)
----> http://www.groundbreak.com  <----
Wake Forest University School of Medicine
5019 Hanes, Medical Center Blvd.
Winston-Salem, NC 27157
Phone: 336.716.0454     FAX: 336.716.7200
=============================================



------------------------------

Date: Fri, 9 Apr 1999 12:01:48 -0700
From: lr@hpl.hp.com (Larry Rosler)
Subject: Re: Incrementing a hash value.
Message-Id: <MPG.1177ee778d4df53b98987e@nntp.hpl.hp.com>

In article <TFrP2.4999$tY1.2995@wbnws01.ne.mediaone.net> on Fri, 9 Apr 
1999 14:28:58 -0400, Jon Smirl <jonsmirl@mediaone.com> says...
> I had the same problem and decided that -w does not always make things
> cleaner and clearer.
> 
> Which is more obvious and better performing?
> 
>  if (exists($tally{$line})) {$tally{$line} += 1} else {$tally{$line} = 1}

I get no warning on this from any perl, 5.002 through 5.005_02.

> or
> 
> $tally{$line}++;

Of course, This Is Ihe Right Way To Do Tt.  But note that the following 
equivalent statement:

  $tally{$line} += 1;

drew a warning until perl 5.004.

-- 
(Just Another Larry) Rosler
Hewlett-Packard Company
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: 09 Apr 1999 15:26:42 -0400
From: Uri Guttman <uri@home.sysarch.com>
Subject: Re: Incrementing a hash value.
Message-Id: <x790c1sih9.fsf@home.sysarch.com>

>>>>> "SM" == Steve Miles <smiles@wfubmc.edu> writes:

  SM> Instead of
  >> if (exists($tally{$line})) {$tally{$line} += 1} #line 11
  >> else {$tally{$line} = 1}

  SM> try:

  SM>     if (exists $tally{$line}) {$tally{$line}++;}   # increments the value of
  SM> the hash if it exists
  SM>     else {$tally{$line} = 1;}    # remember your ";" !


not needed here. ; in perl SEPARATES stamenets, not terminates them.

if he was missing a ; the program would not have compiled at all.

uri


-- 
Uri Guttman  -----------------  SYStems ARCHitecture and Software Engineering
uri@sysarch.com  ---------------------------  Perl, Internet, UNIX Consulting
Have Perl, Will Travel  -----------------------------  http://www.sysarch.com
The Best Search Engine on the Net -------------  http://www.northernlight.com


------------------------------

Date: Fri, 09 Apr 1999 19:25:32 GMT
From: andy@-nospam-haveland.com (Andrew Haveland-Robinson)
Subject: Re: Latest AdminMisc and ActiveState's perl
Message-Id: <370f5208.10746281@news.demon.co.uk>

On Fri, 9 Apr 1999 02:03:22 -0700, in comp.lang.perl.misc you wrote:

>Andy,
>Keep reading in the README file:
>
> - IF you are using the ActiveState 5.005 version of Win32 Perl:
>    a)  Copy the ADMINMISC.PM file into the directory
>          site\lib\win32\
>    b)  Rename the file ADMINMISC_005.DLL to ADMINMISC.DLL
>    c)  Make a directory:
>          site\lib\MSWin32-x86-object\auto\win32\AdminMisc
>    d) Copy the ADMINMISC.DLL file into the directory in step c

Thanks Dave - I had noticed this in the readme, but it was not obvious to me
that this was the one that was required. I know the ActiveState perl
versions by their build number, not its 5.005 version number.
The other two adminmiscs named build XXX just served to confuse, so I had to
try installing all of them until I found one that worked.

>    ALTERNATIVELY if you are using ActivePerl (or core perl 5.005 with
>    PERL_OBJECT defined):
>    a) run the Perl Package Manager:
>       perl ppm.pl install
>http://www.roth.net/perl/packages/win32-adminmisc.ppd

>The last option (using ppm.pl) is by far the easiest.

I did this too... but it didn't install your later versions as
Win32::AdminMisc::GetUsers was missing, and the libraries go into yet
another place, so then there are two versions installed...

>The latest binary (19990407) was built against ActivePerl build 509, not
>511.
Thanks... would it be too much trouble to clarify this for anyone else's
benefit, or am I the only person in here who was confused? :-)

There are so many different builds and versions that it's a minefield for
the non-gurus. I use and program in perl very frequently, but I don't
breathe it!

Cheers,
Andy.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Haveland-Robinson Associates             Tel. +44 (0)1252-845697
6 Haywarden Place, Hartley Wintney,                 ICQ: 1331640
Hants RG27 8UA England              Web: http://www.haveland.com


------------------------------

Date: Thu, 08 Apr 1999 15:05:11 -0400
From: Steve Miles <smiles@wfubmc.edu>
Subject: LWP and sockets
Message-Id: <370CFDE7.54EB2E36@wfubmc.edu>

Hi!

I saw that you were interested in socket connections. I've been writing
a lot of scripts using the LWP library and could help you if you'd like.
Let me know exactly what you want to do - it should be easy. I run a
site at www.groundbreak.com.

See ya,
Steve


=============================================
Steve Miles (smiles@wfubmc.edu)
----> http://www.groundbreak.com  <----
Wake Forest University School of Medicine
5019 Hanes, Medical Center Blvd.
Winston-Salem, NC 27157
Phone: 336.716.0454     FAX: 336.716.7200
=============================================




------------------------------

Date: Fri, 9 Apr 1999 11:03:29 -0400
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: perl regular expression
Message-Id: <1s4le7.ur2.ln@magna.metronet.com>

Nick Allan (guardian@silas-2.cc.monash.edu.au) wrote:

: anything that has Content-Type: and does
: not have Text/Plain after it is invalid.

: How do I get a regexp to only match on content-Type: not text/Plain?


   print "only plain allowed\n" unless m#^Content-Type: Text/Plain$#i;

   or

   print "only plain allowed\n" unless $_ eq "Content-Type: Text/Plain\n";


--
    Tad McClellan                          SGML Consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: Fri, 9 Apr 1999 12:39:12 -0700
From: lr@hpl.hp.com (Larry Rosler)
Subject: Re: perl regular expression
Message-Id: <MPG.1177f73aae59de6b989880@nntp.hpl.hp.com>

In article <1s4le7.ur2.ln@magna.metronet.com> on Fri, 9 Apr 1999 
11:03:29 -0400, Tad McClellan <tadmc@metronet.com> says...
> Nick Allan (guardian@silas-2.cc.monash.edu.au) wrote:
> 
> : anything that has Content-Type: and does
> : not have Text/Plain after it is invalid.
> 
> : How do I get a regexp to only match on content-Type: not text/Plain?
> 
>    print "only plain allowed\n" unless m#^Content-Type: Text/Plain$#i;
> 
>    or
> 
>    print "only plain allowed\n" unless $_ eq "Content-Type: Text/Plain\n";

ITYM

print "only plain allowed\n" unless lc eq "content-type: text/plain\n";

as in your regex.  Case is irrelevant for this header.

But in any case, this test should only be made on lines that begin with 
'Content-type: ', if I read the problem request correctly.

-- 
(Just Another Larry) Rosler
Hewlett-Packard Company
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: Fri, 09 Apr 1999 13:05:33 -0600
From: "J. S. Jensen" <jsjensen@Paramin.COM>
Subject: Re: Piping Input into a Perl script on NT
Message-Id: <370E4F7D.3DA16762@Paramin.COM>

Eric Bohlman wrote:

> When you use file associations on NT to automatically run a non-executable
> just by typing its name, you lose the ability to redirectits input or
> output, and piping  is a form of redirection.

Why is that?  Because the acutal binary is being called as a reference from
the command interpreter and not attaching std fds?


--
J. S. Jensen
mailto:jsjensen@Paramin.COM
http://www.Paramin.COM




------------------------------

Date: Fri, 09 Apr 1999 20:03:56 +0100
From: Greg Griffiths <greg2@surfaid.org>
Subject: PPM has created a new dir.
Message-Id: <370E4F1C.859E1F04@surfaid.org>

I've just started using PPM to update my packages and have noted that it
has created 3 new dir in my PERL directory :

\html
\htmlhelp
\site

what can I/ Should I do with them ?


------------------------------

Date: 09 Apr 1999 14:35:41 -0500
From: Kent Perrier <kperrier@blkbox.com>
Subject: Re: Privacy for slaves forced to use a proxy/firewall to access the net?
Message-Id: <ysid81dsi2a.fsf@blkbox.com>

foj@nym.alias.net writes:

> 
> The totalitarian micromanagers are everywhere I agree - thus the added need
> for protection for the individual.
                         ^^^^^^^^^^

You spelled this wrong.  Its induhvidual

Kent 


------------------------------

Date: Fri, 09 Apr 1999 15:31:49 -0400
From: MicroChip <nospam@nospam.org>
Subject: Re: Privacy for slaves forced to use a proxy/firewall to access the net?
Message-Id: <370E55A5.2195@nospam.org>

<-- snip irrelevant debate --> 

> If you know of some way for perl to play a role in the following setup, please
> let me know:
> 
> 1. Have a server setup with SSL.
> 2. People come to a page on the server.
> 3. People enter in a web address.
> 4. A script or program of some sort gets the page the user wanted and sends it
> out to them over the SSL connection - all the while not changing the URL up at
> the top of the user's browser. Perhaps the URL could change if it only
> reflected an address which could NOT be viewed by someone else - to determine
> where the user was going.
> 5. The script also adjusts the links in the document such that they are
> referenced back to the server.
> 
> Can perl act as the "script" above?
> 

yes I believe perl can 'play a part' in your sceneario, but, well, dont
ask me to help....i simply wont. try learning perl, the bulk of what you
describe is farily simple. but as others have said in various ways, you
will probably find only tighter security, if not loss of job, at the end
of that path.

note that my personal opinion is intentionally absent.

MC

-- 
________________________________
  MicroChip Technical Services
  mc at techpage.csv.cmich.edu


------------------------------

Date: Fri, 9 Apr 1999 12:16:29 -0700
From: lr@hpl.hp.com (Larry Rosler)
Subject: Re: SORT BY DATE
Message-Id: <MPG.1177f1ebcd39c5ec98987f@nntp.hpl.hp.com>

[Posted and a courtesy copy mailed.]

In article <370E4991.AF94C93D@giss.nasa.gov> on Fri, 09 Apr 1999 
14:40:17 -0400, Jay Glascoe <jglascoe@giss.nasa.gov> says...
> Larry Rosler wrote:
> > 
> > Oh, here we go again.  There is absolutely nothing wrong with
> > representing a year as a two-digit number, in applications where the
> > problem CONTEXT can determine the correct century unambiguously.
> 
> Okay, I'll bite:  Bababozorg, what is your problem context?
> Does your problem domain involve past/present/future dates?
> Is it even *slightly* conceivable that your code will be
> around a year from now?
> 
> At any rate, here's a line of Perl for you:
> 
> sub for_sort { sprintf "%02d%02d%02d", (split '-', shift)[2,0,1] }
> 
> obviously this breaks if you've the years, say, 1996 and 2003 
> being represented by "96" and "3", resp.

That is why I shouted CONTEXT in my response.  I guess you didn't hear 
it anyhow.

The CONTEXT must be supplied by the code.  For many, if not most, 
problem domains, the following will suffice quite well:

  $year += $year > 69 ? 1900 : 2000 if $year < 100;

If this 100-year window doesn't suit your problem domain, use a 
different window to disambiguate the year.

-- 
(Just Another Larry) Rosler
Hewlett-Packard Company
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: 9 Apr 1999 19:20:33 GMT
From: sholden@pgrad.cs.usyd.edu.au (Sam Holden)
Subject: Re: SORT BY DATE
Message-Id: <slrn7gsko1.p31.sholden@pgrad.cs.usyd.edu.au>

On Fri, 09 Apr 1999 14:40:17 -0400, Jay Glascoe <jglascoe@giss.nasa.gov> wrote:
>[courtesy copy of post sent to Larry and "Bababozorg"]
>
>Larry Rosler wrote:
>> 
>> Oh, here we go again.  There is absolutely nothing wrong with
>> representing a year as a two-digit number, in applications where the
>> problem CONTEXT can determine the correct century unambiguously.
>
>Okay, I'll bite:  Bababozorg, what is your problem context?
>Does your problem domain involve past/present/future dates?

What your code interpretes the year 34 to be will depend on that, no problem
you code it so it does...

The statement said 'in applications where the problem CONTEXT' in other words
applications where the answer to your question is known.

>Is it even *slightly* conceivable that your code will be
>around a year from now?

Is it even slightly conceivable that you could write a program that new what
the current year was and used that in it's choice of context. This is only for
IO, for internal storage it *knows* it's context (and if it's perl will
probably use 'year - 1900').
>
>At any rate, here's a line of Perl for you:
>
>sub for_sort { sprintf "%02d%02d%02d", (split '-', shift)[2,0,1] }
>
>obviously this breaks if you've the years, say, 1996 and 2003 
>being represented by "96" and "3", resp.

Yes, but if you are using two digit years and the your context spans a
two centuries then you obviously won't use that as a sort routine...

-- 
Sam

Perl was designed to be a mess (though in the nicest of possible ways). 
	--Larry Wall


------------------------------

Date: Fri, 9 Apr 1999 15:27:42 -0400
From: Jay Glascoe <jglascoe@jay.giss.nasa.gov>
To: Larry Rosler <lr@hpl.hp.com>
Subject: Re: SORT BY DATE
Message-Id: <Pine.A32.3.96.990409152258.23698C-100000@jay.giss.nasa.gov>

[posted and mailed]

On Fri, 9 Apr 1999, Larry Rosler wrote:

> The CONTEXT must be supplied by the code.  For many, if not most,
> problem domains, the following will suffice quite well:
> 
>   $year += $year > 69 ? 1900 : 2000 if $year < 100;
  
I often work with years 1700 up to 2200.  1700 up to now is
historic climatology, now to 2200 is climate modelling.
  
> If this 100-year window doesn't suit your problem domain, use a
> different window to disambiguate the year.

# ahhh... a 600 year window
$year += int rand 600;
  
	Jay Glascoe
--  
    	"That which does not kill me makes me stranger."

	--Larry Wall



------------------------------

Date: 9 Apr 1999 19:35:14 GMT
From: sholden@pgrad.cs.usyd.edu.au (Sam Holden)
Subject: Re: SORT BY DATE
Message-Id: <slrn7gslji.p31.sholden@pgrad.cs.usyd.edu.au>

Jay Glascoe <jglascoe@jay.giss.nasa.gov> wrote:
>On Fri, 9 Apr 1999, Larry Rosler wrote:

>> The CONTEXT must be supplied by the code.  For many, if not most,
>> problem domains, the following will suffice quite well:
>> 
>>   $year += $year > 69 ? 1900 : 2000 if $year < 100;
>  
>I often work with years 1700 up to 2200.  1700 up to now is
>historic climatology, now to 2200 is climate modelling.

So don't use two digit dates...

-- 
Sam

You can blame it all on the internet. I do...
	--Larry Wall


------------------------------

Date: Fri, 09 Apr 1999 15:43:39 -0400
From: Jay Glascoe <jglascoe@giss.nasa.gov>
To: sholden@cs.usyd.edu.au
Subject: Re: SORT BY DATE
Message-Id: <370E586B.5847F56A@giss.nasa.gov>

Sam Holden wrote:
> 
> So don't use two digit dates...

precisely my point  ;^)


------------------------------

Date: 9 Apr 1999 19:46:36 GMT
From: sholden@pgrad.cs.usyd.edu.au (Sam Holden)
Subject: Re: SORT BY DATE
Message-Id: <slrn7gsm8s.p31.sholden@pgrad.cs.usyd.edu.au>

On Fri, 09 Apr 1999 15:43:39 -0400, Jay Glascoe <jglascoe@giss.nasa.gov> wrote:
>Sam Holden wrote:
>> 
>> So don't use two digit dates...
>
>precisely my point  ;^)

For your application... For some of my applications it is prefectly
reasonable...


-- 
Sam

 "... the whole documentation is not unreasonably transportable in a
 student's briefcase." - John Lions describing UNIX 6th Edition
 "This has since been fixed in recent versions." - Kernighan & Pike


------------------------------

Date: Fri, 9 Apr 1999 12:49:13 -0700
From: lr@hpl.hp.com (Larry Rosler)
Subject: Re: SORT BY DATE
Message-Id: <MPG.1177f99414a36c35989881@nntp.hpl.hp.com>

[Posted and a courtesy copy mailed.]

In article <Pine.A32.3.96.990409152258.23698C-100000@jay.giss.nasa.gov> 
on Fri, 9 Apr 1999 15:27:42 -0400, Jay Glascoe 
<jglascoe@jay.giss.nasa.gov> says...
> On Fri, 9 Apr 1999, Larry Rosler wrote:
> > The CONTEXT must be supplied by the code.  For many, if not most,
> > problem domains, the following will suffice quite well:
> > 
> >   $year += $year > 69 ? 1900 : 2000 if $year < 100;
>   
> I often work with years 1700 up to 2200.  1700 up to now is
> historic climatology, now to 2200 is climate modelling.
>   
> > If this 100-year window doesn't suit your problem domain, use a
> > different window to disambiguate the year.
> 
> # ahhh... a 600 year window
> $year += int rand 600;

Very funny.  This is degenerating into silliness.  Just perhaps, two-
digit year representation isn't appropriate in *your* problem domain.

As I said in my original response, the Jewish calendar takes a longer 
historical view, and uses a three-digit year.  So should you.

-- 
(Just Another Larry) Rosler
Hewlett-Packard Company
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: Fri, 09 Apr 1999 12:04:05 -0700
From: David Cassell <cassell@mail.cor.epa.gov>
Subject: Re: Stripping html tags within perl
Message-Id: <370E4F25.4FA627F1@mail.cor.epa.gov>

Fredrik Larsson wrote:
> 
> $NAME2 = $NAME;
> $NAME2 ~= s/<.*?>//g;    # erases all html-tags in the string
> 
> $NAME2 now contains the same as $NAME but without all html-tags

Well, it would work if you had written your pattern-binding operator as
=~ instead of ~= ...  

Unless:
[1] your tag runs over more than one line;
[2] there are embedded `>' in your tag;
[3] `<' or `>' can show up somewhere other than in the html tags;
[4] you have html comments to handle also...
Larry Rosler or Uri or Rick will point out some other case I forgot. 
:-)

This is hard to do for the general case.  Check out HTML::Parse and
let it do the dirty work for you.

And check the FAQ first.  This is in perlfaq9.

> /Larzon
> 
> >In a perl script I have a variable, $NAME assigned to the first field in
> >each line of an ascii text file.
> >Some of these fields contain html tags <a
> >href="http://blahblahbalh>name</a> surrounding the name.
> >
> >I would like to create another variable which would be the same as $NAME
> >but with the html tags stripped, leaving
> >just the name=$NAME.
> >
> >Can somebody supply me with the correct syntax to accomplish this?
> >
> >Thanks! -Neil
> >rx@rxlist.com   http://www.rxlist.com
> >

-- 
David Cassell, OAO                               
cassell@mail.cor.epa.gov
Senior Computing Specialist                          phone: (541)
754-4468
mathematical statistician                              fax: (541)
754-4716


------------------------------

Date: Fri, 09 Apr 1999 12:11:57 -0700
From: Neil Sandow <nsandow@otnnet.com>
To: Larry Rosler <lr@hpl.hp.com>
Subject: Re: Stripping html tags within perl
Message-Id: <370E50FC.F1DF3ED@otnnet.com>

s/<[^>]*>//g
That was the one that did it.  Thank you! Thank you! Thank you!

-Neil

Larry Rosler wrote:

> In article <VEqP2.6114$x43.10691@nntpserver.swip.net> on Fri, 9 Apr 1999
> 19:19:14 +0200, Fredrik Larsson <nils@hotmail.com> says...
> > $NAME2 = $NAME;
> > $NAME2 ~= s/<.*?>//g;    # erases all html-tags in the string
> >
> > $NAME2 now contains the same as $NAME but without all html-tags
>
> Not if they cross lines, as in the original post.
>
> One might use
>
>     s/<.*?>//gs
>
> or
>
>     s/<[^>]*>//g
>
> --
> (Just Another Larry) Rosler
> Hewlett-Packard Company
> http://www.hpl.hp.com/personal/Larry_Rosler/
> lr@hpl.hp.com



------------------------------

Date: 9 Apr 1999 19:38:02 GMT
From: sholden@pgrad.cs.usyd.edu.au (Sam Holden)
Subject: Re: Stripping html tags within perl
Message-Id: <slrn7gsloq.p31.sholden@pgrad.cs.usyd.edu.au>

On Fri, 09 Apr 1999 12:11:57 -0700, Neil Sandow <nsandow@otnnet.com> wrote:
>s/<[^>]*>//g
>That was the one that did it.  Thank you! Thank you! Thank you!

Until you hit something like :

<img alt="--->" src="arrow.gif">

-- 
Sam

testing? What's that? If it compiles, it is good, if it boots up it is
perfect.
	--Linus Torvalds


------------------------------

Date: 9 Apr 1999 19:06:32 GMT
From: sholden@pgrad.cs.usyd.edu.au (Sam Holden)
Subject: Re: Yet another regexp question
Message-Id: <slrn7gsjto.p31.sholden@pgrad.cs.usyd.edu.au>

On 9 Apr 1999 18:39:28 GMT, Sam Holden <sholden@pgrad.cs.usyd.edu.au> wrote:
>
>You fail to match <img src="images/asphalt.gif">
>                                 ^^^

I swear that was lined up when I posted it...

-- 
Sam

 "... the whole documentation is not unreasonably transportable in a
 student's briefcase." - John Lions describing UNIX 6th Edition
 "This has since been fixed in recent versions." - Kernighan & Pike


------------------------------

Date: 12 Dec 98 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Special: Digest Administrivia (Last modified: 12 Dec 98)
Message-Id: <null>


Administrivia:

Well, after 6 months, here's the answer to the quiz: what do we do about
comp.lang.perl.moderated. Answer: nothing. 

]From: Russ Allbery <rra@stanford.edu>
]Date: 21 Sep 1998 19:53:43 -0700
]Subject: comp.lang.perl.moderated available via e-mail
]
]It is possible to subscribe to comp.lang.perl.moderated as a mailing list.
]To do so, send mail to majordomo@eyrie.org with "subscribe clpm" in the
]body.  Majordomo will then send you instructions on how to confirm your
]subscription.  This is provided as a general service for those people who
]cannot receive the newsgroup for whatever reason or who just prefer to
]receive messages via e-mail.

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.

The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V8 Issue 5343
**************************************

home help back first fref pref prev next nref lref last post