[18213] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 381 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Feb 28 21:10:46 2001

Date: Wed, 28 Feb 2001 18:10:18 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <983412618-v10-i381@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Wed, 28 Feb 2001     Volume: 10 Number: 381

Today's topics:
    Re: print "</tr><tr>" vs. print $tr <flavell@mail.cern.ch>
    Re: print "</tr><tr>" vs. print $tr <stan_no_spam_for_me@alamo.nmsu.edu>
    Re: print "</tr><tr>" vs. print $tr <c_clarkson@hotmail.com>
    Re: problem with setuid script nobull@mail.com
    Re: PROPOSAL: Graphics::ColorNames (Mark Jason Dominus)
    Re: PROPOSAL: Graphics::ColorNames (Mark Jason Dominus)
    Re: regex help needed <donotreply@interbulletin.bogus>
    Re: regex help please <krahnj@acm.org>
    Re: regex help please <bart.lateur@skynet.be>
    Re: Regexp to match Web urls? <donotreply@interbulletin.bogus>
        Slow down <sks@sierra.net>
        TCP socket writer. <vjayl@emc.com>
        Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 1 Mar 2001 00:28:59 +0100
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: print "</tr><tr>" vs. print $tr
Message-Id: <Pine.LNX.4.30.0103010013110.18159-100000@lxplus003.cern.ch>

On Wed, 28 Feb 2001, Stan McCann wrote:

> Yes!  That's what I am trying to do, write clear, clean, easy to read
> code that works reliably

That's the ticket!

> without sacrificing speed.

I say again, "don't optimise yet".  The bits you seem to be worrying
about are almost certain to be insignificant parts of the big picture.

I was once vaguely concerned with a big piece of software that ran for
several days on a mainframe to get its result.  People were worrying
themselves sick to save a few CPU cycles in the code.  They weren't
able to save anywhere near half of the total time.

Then along came an analyst who told them they were using entirely the
wrong algorithm as the basis of their implementation.  The replacement
algorithm was coded up, and the new program needed less than an hour.

Choosing the right method can be of major importance.  Worrying about
detailed optimisation of code is a diversion of effort, especially
during the design-and-coding stage.  Leave it, at least until the prog
is solidly designed and working.  Most likely you then won't need it,
or your profiling study will assure you that the effort isn't
cost-effective.

If you were writing the Perl system itself, which was going to be used
by millions of users for running myriad scripts, it could make sense
to worry about those last few cycles.  Thwe term is leverage.  Don't
let that mislead you in reference to writing a script of the more
usual kind.




------------------------------

Date: Wed, 28 Feb 2001 16:26:08 -0700
From: Stan McCann <stan_no_spam_for_me@alamo.nmsu.edu>
Subject: Re: print "</tr><tr>" vs. print $tr
Message-Id: <3A9D8910.8C8D1C27@alamo.nmsu.edu>

"Alan J. Flavell" wrote:
> 
> On Wed, 28 Feb 2001, Stan McCann wrote:
> 
> > Where do I find *good* documentation on using the cgi.pm module?
> 
> Did you even try?  Although "cgi.pm" isn't really correct, suppose we

Yes, I did try.  I also know that cgi.pm isn't correct, I'm a lazy typer
and only use caps when I have to.

> feed that into Google, then the number 1 hit is
> http://stein.cshl.org/WWW/software/CGI/

Thank you for the link.  I'll check it out.

> which also happens to be the author's own HTML-format documentation
> for the CGI.pm (note the capitalisation) module.
> 
> If we confine ourselves to the Perl documentation, then it's true that
> typing 'perldoc cgi,pm' or 'perldoc -q cgi.pm' produced nothing when I
> tried it, but 'perldoc -q cgi' offered, among other things,
> 
>   Where can I learn about CGI or Web programming in Perl?
> 
> which surely wouldn't be far off finding the answer.
> 
> perldoc CGI (or perldoc CGI.pm) would bring you another version of the
> documentation that would have adequately answered your questions
> (though for the time being I'd say the author's HTML version is better
> - I believe this is being worked on for future releases).
> 
> It's a normal courtesy to the regulars on any usenet group to
> familiarise yourself with their FAQs first.  Nobody minds if you
> missed something that you needed, but at least if you show some
> evidence of having tried, you'll command a lot more sympathy.

I do not need lessons in usenet etiquette, I'm new to perl, not usenet. 
If you will go back and look at the original question, you will find
that it was not an FAQ.  The business about the docs is a side issue, I
would not have posted *just* to ask that.  I'm not looking for sympathy,
I was looking for an answer to my question which I got some pretty good
answers for.  You seem to be the only one that has a problem with my
post.

> 
> > using cgi but couldn't find anyplace that documented table use.  I had
> > tried things like start_table, start_row, end_table, end_row, etc. but
> > they don't work.
> 
> Didn't I already propose to you to think in terms of complete
> elements, rather than in these kind of fragmentary actions?

Each to his own.  When there are starts and ends, I think of them as
starting and ending.  To me, what you are proposing is like trying to do
an entire web page using something like html(everything goes in here)
instead of using start_html and end_html.  When I found information
about start_html and start_form, I (incorrectly but logically) assumed
that other start and end tags would work the same way.

> 
> > I just figured that the cgi mod only had routines for
> > forms.
> 
> Far from it, though you don't _have_ to use CGI.pm for writing all
> your HTML if you don't want to.
> 
> But, as it happens, its table calls can be particularly convenient, as
> its documentation shows.

Sorry, but the documentation sucks.  I can't get the table calls to work
at all, to say nothing about conveniently.


------------------------------

Date: Wed, 28 Feb 2001 17:26:10 -0600
From: "Charles K. Clarkson" <c_clarkson@hotmail.com>
Subject: Re: print "</tr><tr>" vs. print $tr
Message-Id: <97k7lp$unm@library2.airnews.net>

Stan McCann <stan_no_spam_for_me@alamo.nmsu.edu> wrote:
: brian d foy wrote:
: >
: > In article <3A9D1E81.C2182300@alamo.nmsu.edu>,
: >
: > > Which is more efficent?  Is there a better way?
: >
: > how about:
: >
: >     use CGI qw(:standard);
: >
: >     print Tr( ... );
:
: Where do I find *good* documentation on using the cgi.pm module?
:  I am using cgi but couldn't find any place that documented table use.

http://www.perl.com/CPAN-local/doc/manual/html/lib/CGI.html
in the above link click on:
THE DISTRIBUTIVE PROPERTY OF HTML SHORTCUTS

:  I had
: tried things like start_table, start_row, end_table, end_row, etc. but
: they don't work.

    They are not part of the :standard export. Try:

use CGI qw/:standard start_table end_table start_tr end_tr/;
or
use CGI qw/:standard *table *tr/;

For some reason the standard documentation no longer mentions this.

: I just figured that the cgi mod only had routines for forms.

The module mentions a few others:

%EXPORT_TAGS = (
  ':html2'=>['h1'..'h6',qw/p br hr ol ul li dl dt dd menu code var strong em
                    tt u i b blockquote pre img a address cite samp dfn html
head
                    base body Link nextid title meta kbd start_html end_html
                    input Select option comment charset escapeHTML/],

  ':html3'=>[qw/div table caption th td TR Tr sup Sub strike applet Param
                    embed basefont style span layer ilayer font frameset
frame
                    script small big/],

  ':netscape'=>[qw/blink fontsize center/],

  ':form'=>[qw/textfield textarea filefield password_field hidden checkbox
                checkbox_group submit reset defaults radio_group popup_menu
                button autoEscape scrolling_list image_button start_form
end_form
                startform endform start_multipart_form end_multipart_form
isindex
                tmpFileName uploadInfo URL_ENCODED MULTIPART/],

  ':cgi'=>[qw/param upload path_info path_translated url self_url
                script_name cookie Dump raw_cookie request_method
                query_string Accept user_agent remote_host content_type
                remote_addr referer server_name server_software
                server_port server_protocol virtual_host remote_ident
                auth_type http save_parameters restore_parameters
                param_fetch remote_user user_name header redirect
                import_names put Delete Delete_all url_param cgi_error/],

  ':ssl' => [qw/https/],

  ':imagemap' => [qw/Area Map/],

  ':cgi-lib' => [qw/ReadParse PrintHeader HtmlTop HtmlBot SplitParam Vars/],

  ':html' => [qw/:html2 :html3 :netscape/],

  ':standard' => [qw/:html2 :html3 :form :cgi/],

  ':push' => [qw/multipart_init multipart_start multipart_end/],

  ':all' => [qw/:html2 :html3 :netscape :form :cgi :internal/]
);

HTH,
Charles K. Clarkson






------------------------------

Date: 28 Feb 2001 23:05:08 +0000
From: nobull@mail.com
Subject: Re: problem with setuid script
Message-Id: <u9ofvm8l3v.fsf@wcl-l.bham.ac.uk>

"Ray Rizzuto" <ray.rizzuto@ulticom.com> writes:

> This is a multi-part message in MIME format.

No thankyou, this is a plaintext only newsgroup.

> I am running  a perl script from a korn shell setuid script, and getting =
> these errors:
> 
> Insecure dependency in require while running setuid at reformatCr line =

>   use FindBin;
>   use lib "$FindBin::Bin";

> Any help would be appreciated!  I've had no luck searching the FAQ, or =
> the camel book.

Searching is not relevant.  You have an error message and a module -
look at the documentation of these and the cause of the problem should
be obvious.

If you check the "KNOWN BUGS" section of the FindBin documentation you
will find that it is trivial for a malicious user to cause
$FindBin::Bin to have an arbitary value.  Clearly allowing the user to
state that arbitratry directories should be searched before the
standard ones allows the user to cause aribratry code to be executed.
Allowing the user to cause aribratry code to be executed under setuid
is exactly the reason the "insecure dependency" error exists.

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\


------------------------------

Date: Wed, 28 Feb 2001 23:07:05 GMT
From: mjd@plover.com (Mark Jason Dominus)
Subject: Re: PROPOSAL: Graphics::ColorNames
Message-Id: <3a9d8499.8b5$6a@news.op.net>
Keywords: Clarence, affectionate, parapsychology, transmittance

In article <973280$p4s$3@mamenchi.zrz.TU-Berlin.DE>,
Anno Siegel <anno4000@lublin.zrz.tu-berlin.de> wrote:
>According to Brad Baxter  <bmb@ginger.libs.uga.edu>:
>
>>     18  *get_colour = *get_couleur = *get_colore = *get_color;
>
>This aliases not only &get_colour but also $get_colour etc.  In a
>general-purpose module this is a bit risky.  use
>
>          *get_colour = *get_couleur = *get_colore = \ &get_color;

This also aliases $get_colour and $get_couleur, etc.

I suggest

        { no strict 'refs';
          for (qw(colour couleur colore)) {
            *{"get_$_"} = \&get_color;
          }
        }


-- 
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print


------------------------------

Date: Wed, 28 Feb 2001 23:11:33 GMT
From: mjd@plover.com (Mark Jason Dominus)
Subject: Re: PROPOSAL: Graphics::ColorNames
Message-Id: <3a9d85a4.8cc$252@news.op.net>
Keywords: giddap, purchase, scabrous, stephanotis

In article <qdbn9tkkaac2hf98ja50bfp87sfus3bnf7@4ax.com>,
Bart Lateur  <bart.lateur@skynet.be> wrote:
>I really don't like the interface to this module. There is no reason to
>hide an elegant "tie" under an ugly "use" interface 

Probably so, but in that case I think the appropriate response is to
not use the module.  You may as well simply use 'tie' directl, and
omit the module completely.  This is what I do in my own programs.

-- 
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print


------------------------------

Date: Wed, 28 Feb 2001 22:25:58 +0000
From: Cryofan <donotreply@interbulletin.bogus>
Subject: Re: regex help needed
Message-Id: <3A9D7AF6.2DFF64A6@interbulletin.com>

Jay Tilton <nouser@emailunwelcome.com> wrote in article 
<6g4p9tkbltmkq3b2dpcm9hcflubn002m7v@4ax.com> : 
>cryofan@mylinuxisp.com (cRYOFAN) wrote:
>
>>Well, I think it's a regex problem anyway.
>
>Yup.
>
>>$array1[$num_words-1] =~>s/>.*$/$blank/;
>
>Look into what the /s modifier does to a m// or s/// regex.
>
>While you're at it, It wouldn't hurt to explore 'grep' and 'push' and
>'qw'.

Well, I have already looked at m & s; they don't seem to help. The others you list look helpful in general, but I don't see how they could solve my problem.

A restatement of my problem and updated code is shown below.
Thanks.


For my senior project I am making a data mining tool that is currently running on a free website with Perl 5 access, but it does not have the HTTP module. Forgive if I posted this twice. I tried to post this yesterday, but I can't find any trace of it.

The code posted below does work, except it has a problem with deleting the end of a string using a regex in one case.

The code below runs on a timed loop, and it wakes up once an hour and opens a HTTP connection to a URL ("the Main Page"), where it parses some of the URLs in that webpage. I am interested in URLs that lead to financial news stories. Once I successfully parse those URLs I find, I store them in an array, and open an HTTP connection, and then count certain  words found in those news stories. I then store this data.

I store the Main Page in a string and split it into an array using

@array1= split(/href=/,$content);

And then I step through @array1 with a matchig regex thusly:

if($array1[$num_words-1] =~/^http\:\/\/biz\.yahoo\.com\/rb\/.*/)

The only URLs I am interested in do fit this format.
And when I find them I store them in @array2.

But the URLs are not ready yet: there are unneeded characters at the end of all these URLs. I use a regex to wipe out these unwanted characters thusly:

$array1[$num_words-1] =~ s/>.*$/$blank/; #$blank = "";

So this above regex works fine for all the URLs except the FIRST URL that gets stored in @array2. For some reason, that first URL (0th element in @array2) does not get its unwanted characters removed. I have viewed the source code for the Main Page repeatedly; all the URLs of Interest are of this format:

http://biz.yahoo.com/rb/.SomeText.html><b>TextHere_Then_End)

Why does my regex work on all the other @array2 elements, but not the 0th element?

This perl program is currently running on a free server, and works fine, but I am not using the 0th element of @array2, i.e., I am not trying to open an HTTP connection to it.. That's not ruinous, since on The Main Page, the 0th element is just the newest, latest news story, and in a while that news story will be bumped further down the web page, so next go-around, I will parse it, and it will be the 1st element in @array2.

But I still want to be able to get it in a timely fashion. 

Any clues?

P.S.

This program is intended to gather data on financial news stories, and that data consists of the count of certain Words of Interest in financial news stories, and the time those words occur in a news story. For example, in one week, the word "recession" may occur 10 times; the following program would store that data in a file, along with the date and time. I will add this week a function that will also periodically store several vital stock market indicators (e.g., Dow Jones Industrial Average, etc), in a file, in much the same fashion.

Then I will create (in March-April, after I have some data), a perl CGI program that will access these data files and present my stored data (word count and stock market vital stats) in a graphical format on a web page, with word count on the vertical axis versus time horizontally. Same thing for the vital stats as presented on the same webpage.

The viewer/user could then see if there are any correlations time-wise, between an increase or decrease of certain important words/phrases, and the stock market vitals over any time period of interest.
  
The longest that I have run the program below is about 4 hours. It simply runs as a CGI script triggered from a button on an HTML form. I of course want it to run autonomously for long periods of time. 

Any thoughts on that? It is doable?

For a free webserver, it is ethical? The duty cycle right is pretty low--it sleeps for an hour and runs for only a minute or less. But I want to step up the duty cycle. I do intend to display the data on that same web server, and I think it may even be interesting and useful to day traders, stock brokers etc, so therefore the users would see the banner ads.
Any thoughts?

 
*************
Code is below

#!/usr/local/bin/perl


use LWP::Simple;
print "Content-type:text/html\n\n";

####################

if(open(KILLFILE1, "killfile1.txt"))
	{
	$line= <KILLFILE1>;
	if ($line == 1)
	{
		##########
		if(open(DATAFILEA, ">>datafileA.txt"))
		{
			print DATAFILEA "\n  1:", time, " \n";
			$time1=time;
			do{#start of do while < time span

				$webpage = "http://finance.yahoo.com/?u";
				$content = get($webpage);
				#####################
				$blank="";
				@array1= split(/href=/,$content);
				$num_links=0;
				$num_words = @array1;
				while($num_words > 0)
				{
					if($array1[$num_words-1] =~/^http\:\/\/biz\.yahoo\.com\/rb\/.*/)
					  { 
					     $array1[$num_words-1] =~ s/>.*$/$blank/;
					     $somestring= $array1[$num_words-1];
					     $array2[$num_links]=$somestring;
					    $num_links++;
					   }
   					$num_words--; 
				}#end while $num_words > 0
				##########################
				@words= ("recession", "rates", "profits", "losses",
				"down", "up", "bull", "bear", "trading", "soar","drop", "bush", "greenspan", "slowing", "slowdown", "bears", "bulls", "bearish", "bullish","confidence", "confident","shaky", "fear", "capital", "economy", "positive", "negative", "failing", "fail", "rising", "falling", "prices", "consumer confidence", "rally", "rallied" );
				while($num_links > 1){#cycle thru each link
					#         print "\n";
					#   print $array2[$num_links-1];
        				$page = $array2[$num_links-1];
        				$content2 = get($page);
        				$count =0;$pos=0; 
				        $wordcounter=  @words;
				        print DATAFILEA ("For link: ");
    	    				print DATAFILEA ($array2[$num_links-1], "\n");
				        ($secs, $mins, $hrs,$days, $mons, $yr)=(localtime)[0,1,2,3,4,5];    
					    						   print DATAFILEA ("\nDate and Time: ",$mons+1, " ",$days," ",$yr+1900," ",
  					           $hrs," ",$mins," ",$secs, "\n");  
				        #for each each link, count each word of interest
					while($wordcounter >0){
					          while (($pos=index($content2,$words[$wordcounter-1] ,$pos)) 
					           != -1){
						      		$count++;  $pos++;    
						   }#end while loop to count a word
    	    					   print DATAFILEA ($words[$wordcounter-1]);
    	    					   print DATAFILEA ("  ");
    	       					   print DATAFILEA ($count, "  ");
    	          				  
					           $count=0; $pos=0;
					           $wordcounter--;
 				          }#end while $wordcounter > 0

       					  $num_links--;

   				}#end while $num_links >1
   				
   				
   			        sleep(3600);
      				$time2=time;
		     }while($time2 < ($time1+14400));#end do while loop

	   print "\n",  "it's over!";

	   close(DATAFILEA);
    }#end if datafileA successfully opened
    else{
    print"\n", "datafileA not opened!","\n";
    }
    }#end if killfile == 1
else{

    print "\n", "killed by killfile", "\n";

}
close(KILLFILE1);
}#end if KILLFILE1 successfully opened

else
{

print "\n", "Killfile1 did not open!","\n";

}

########END of FILE
_______________________________________________
Submitted via WebNewsReader of http://www.interbulletin.com



------------------------------

Date: Wed, 28 Feb 2001 23:23:13 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: regex help please
Message-Id: <3A9D89B4.FB150547@acm.org>

Jonas Nilsson wrote:
> 
> > $ perl -e '$_ = "( ( ( ( ( ( x - ( 2 + ( 2x + 3 ( 4x + 5 ) - 8.345 ) ^ -
> > 4 * ( 5 - x / 2 ) + 4 ) - 6 ) ) ) ) ) )"; i while
> > s/^\s*\(\s*(\(.+\))\s*\)\s*$/$1/g; print "$_\n";'
> > ( x - ( 2 + ( 2x + 3 ( 4x + 5 ) - 8.345 ) ^ - 4 * ( 5 - x / 2 ) + 4 ) -
> > 6 )
> 
> This doen't work for ((x-1))((y-3)). Mine example does :)
> /jN


1 while s/\(\s*(\(\s*[^)(].+?[^)(]\s*\))\s*\)/$1/g;


John


------------------------------

Date: Thu, 01 Mar 2001 00:27:45 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: regex help please
Message-Id: <ml5r9tc6mua1arc51cke7ga4be26dmdi5t@4ax.com>

The Mosquito ScriptKiddiot wrote:

>this is a pet-project of my own, i wanna put it up on my
>own website...this problem needs to be solved, because it will be used in a
>larger program that solves derivatives in calculus

In that case, you'll still have similar but much bigger problems to
solve.

I once wrote something similar. Having some LISP background for AI, I
first converted the infix to prefix (I find prefix notation easier to
manipulate on a symbolic level, than infix), applied the derivation
rules to that, and then... simplify. Ouch. Even with relative small
problems, it's easy to run out of memory. Multi-megabytes. The problem
was something in the neighbourhood of 6th derivative of tan(x).

-- 
	Bart.


------------------------------

Date: Wed, 28 Feb 2001 22:23:21 +0000
From: cryofan <donotreply@interbulletin.bogus>
Subject: Re: Regexp to match Web urls?
Message-Id: <3A9D7A59.6E991B52@interbulletin.com>

newsone@cdns.caNOSPAM wrote in article 
<97hodb$miv$1@news.netmar.com> : 
>In article <971bpq$l33$1@panix3.panix.com>, Clay Shirky <clays@panix.com>
>writes:
>>I need the canonical regexp to match urls beginning with http:// (I
>>don't need to worry about ftp:, telnet: or mailto:, in other words)
>>and though I don't want to roll my own, Google searches of the form 
>>
>>  regexp url http 
>>
>>are useless because url and http appear everywhere.
>>
>>Any pointers appreciated.
>>


You can use the HTTP package to do this very easily (I think), if yuor server/computer has it installed. 
My code below does successfully parse URLS, of a certain form; you may be able to modify it. It does have one problem, however (explained below).



For my senior project I am making a data mining tool that is currently running on a free website with Perl 5 access, but it does not have the HTTP module. Forgive if I posted this twice. I tried to post this yesterday, but I can't find any trace of it.

The code posted below does work, except it has a problem with deleting the end of a string using a regex in one case.

The code below runs on a timed loop, and it wakes up once an hour and opens a HTTP connection to a URL ("the Main Page"), where it parses some of the URLs in that webpage. I am interested in URLs that lead to financial news stories. Once I successfully parse those URLs I find, I store them in an array, and open an HTTP connection, and then count certain  words found in those news stories. I then store this data.

I store the Main Page in a string and split it into an array using

@array1= split(/href=/,$content);

And then I step through @array1 with a matchig regex thusly:

if($array1[$num_words-1] =~/^http\:\/\/biz\.yahoo\.com\/rb\/.*/)

The only URLs I am interested in do fit this format.
And when I find them I store them in @array2.

But the URLs are not ready yet: there are unneeded characters at the end of all these URLs. I use a regex to wipe out these unwanted characters thusly:

$array1[$num_words-1] =~ s/>.*$/$blank/; #$blank = "";

So this above regex works fine for all the URLs except the FIRST URL that gets stored in @array2. For some reason, that first URL (0th element in @array2) does not get its unwanted characters removed. I have viewed the source code for the Main Page repeatedly; all the URLs of Interest are of this format:

http://biz.yahoo.com/rb/.SomeText.html><b>TextHere_Then_End)

Why does my regex work on all the other @array2 elements, but not the 0th element?

This perl program is currently running on a free server, and works fine, but I am not using the 0th element of @array2, i.e., I am not trying to open an HTTP connection to it.. That's not ruinous, since on The Main Page, the 0th element is just the newest, latest news story, and in a while that news story will be bumped further down the web page, so next go-around, I will parse it, and it will be the 1st element in @array2.

But I still want to be able to get it in a timely fashion. 

Any clues?

P.S.

This program is intended to gather data on financial news stories, and that data consists of the count of certain Words of Interest in financial news stories, and the time those words occur in a news story. For example, in one week, the word "recession" may occur 10 times; the following program would store that data in a file, along with the date and time. I will add this week a function that will also periodically store several vital stock market indicators (e.g., Dow Jones Industrial Average, etc), in a file, in much the same fashion.

Then I will create (in March-April, after I have some data), a perl CGI program that will access these data files and present my stored data (word count and stock market vital stats) in a graphical format on a web page, with word count on the vertical axis versus time horizontally. Same thing for the vital stats as presented on the same webpage.

The viewer/user could then see if there are any correlations time-wise, between an increase or decrease of certain important words/phrases, and the stock market vitals over any time period of interest.
  
The longest that I have run the program below is about 4 hours. It simply runs as a CGI script triggered from a button on an HTML form. I of course want it to run autonomously for long periods of time. 

Any thoughts on that? It is doable?

For a free webserver, it is ethical? The duty cycle right is pretty low--it sleeps for an hour and runs for only a minute or less. But I want to step up the duty cycle. I do intend to display the data on that same web server, and I think it may even be interesting and useful to day traders, stock brokers etc, so therefore the users would see the banner ads.
Any thoughts?

 
*************
Code is below

#!/usr/local/bin/perl


use LWP::Simple;
print "Content-type:text/html\n\n";

####################

if(open(KILLFILE1, "killfile1.txt"))
	{
	$line= <KILLFILE1>;
	if ($line == 1)
	{
		##########
		if(open(DATAFILEA, ">>datafileA.txt"))
		{
			print DATAFILEA "\n  1:", time, " \n";
			$time1=time;
			do{#start of do while < time span

				$webpage = "http://finance.yahoo.com/?u";
				$content = get($webpage);
				#####################
				$blank="";
				@array1= split(/href=/,$content);
				$num_links=0;
				$num_words = @array1;
				while($num_words > 0)
				{
					if($array1[$num_words-1] =~/^http\:\/\/biz\.yahoo\.com\/rb\/.*/)
					  { 
					     $array1[$num_words-1] =~ s/>.*$/$blank/;
					     $somestring= $array1[$num_words-1];
					     $array2[$num_links]=$somestring;
					    $num_links++;
					   }
   					$num_words--; 
				}#end while $num_words > 0
				##########################
				@words= ("recession", "rates", "profits", "losses",
				"down", "up", "bull", "bear", "trading", "soar","drop", "bush", "greenspan", "slowing", "slowdown", "bears", "bulls", "bearish", "bullish","confidence", "confident","shaky", "fear", "capital", "economy", "positive", "negative", "failing", "fail", "rising", "falling", "prices", "consumer confidence", "rally", "rallied" );
				while($num_links > 1){#cycle thru each link
					#         print "\n";
					#   print $array2[$num_links-1];
        				$page = $array2[$num_links-1];
        				$content2 = get($page);
        				$count =0;$pos=0; 
				        $wordcounter=  @words;
				        print DATAFILEA ("For link: ");
    	    				print DATAFILEA ($array2[$num_links-1], "\n");
				        ($secs, $mins, $hrs,$days, $mons, $yr)=(localtime)[0,1,2,3,4,5];    
					    						   print DATAFILEA ("\nDate and Time: ",$mons+1, " ",$days," ",$yr+1900," ",
  					           $hrs," ",$mins," ",$secs, "\n");  
				        #for each each link, count each word of interest
					while($wordcounter >0){
					          while (($pos=index($content2,$words[$wordcounter-1] ,$pos)) 
					           != -1){
						      		$count++;  $pos++;    
						   }#end while loop to count a word
    	    					   print DATAFILEA ($words[$wordcounter-1]);
    	    					   print DATAFILEA ("  ");
    	       					   print DATAFILEA ($count, "  ");
    	          				  
					           $count=0; $pos=0;
					           $wordcounter--;
 				          }#end while $wordcounter > 0

       					  $num_links--;

   				}#end while $num_links >1
   				
   				
   			        sleep(3600);
      				$time2=time;
		     }while($time2 < ($time1+14400));#end do while loop

	   print "\n",  "it's over!";

	   close(DATAFILEA);
    }#end if datafileA successfully opened
    else{
    print"\n", "datafileA not opened!","\n";
    }
    }#end if killfile == 1
else{

    print "\n", "killed by killfile", "\n";

}
close(KILLFILE1);
}#end if KILLFILE1 successfully opened

else
{

print "\n", "Killfile1 did not open!","\n";

}

########END of FILE
_______________________________________________
Submitted via WebNewsReader of http://www.interbulletin.com



------------------------------

Date: Wed, 28 Feb 2001 23:30:17 -0000
From: adam <sks@sierra.net>
Subject: Slow down
Message-Id: <t9r2g9iapeu897@corp.supernews.com>

Most perl programs I run just blink on and right nback off.  I'd sometimes 
like to see the output of a program.  How can I make perl wait before it 
shuts the screen down? (active perl windows 98)
   Also,  I don't fully understand how the output from a program gets 
displayed on a web page,  With SSI or without.  Can someone please clarify?
Adam T.
sks@sierra.net


--
Posted via CNET Help.com
http://www.help.com/


------------------------------

Date: Wed, 28 Feb 2001 19:23:51 -0500
From: "V.Jay Lescoe" <vjayl@emc.com>
Subject: TCP socket writer.
Message-Id: <97k4r2$r031@emcnews1.lss.emc.com>

Hello,

I am new to perl and I need some help. I am trying to
put together an easy way to open a TCP port write
and write a text string to it. I have a listener running
on the other end already. Can any one point me in the
right direction? Please respond directly to me. My E-mail
is vjayl@emc.com Thanks!

vjl




------------------------------

Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 381
**************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[18213] in Perl-Users-Digest

Perl-Users Digest, Issue: 381 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Feb 28 21:10:46 2001

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Feb 28 21:10:46 2001