[22186] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4407 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jan 15 09:11:33 2003

Date: Wed, 15 Jan 2003 06:10:10 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 15 Jan 2003     Volume: 10 Number: 4407

Today's topics:
    Re: Question about high performance spidering in perl (Ed Kastenmeier)
    Re: Question about high performance spidering in perl <extendedpartition@NOSPAM.yahoo.com>
        remove users from a mailing list  <blnukem@hotmail.com>
    Re: remove users from a mailing list <ubl@schaffhausen.de>
    Re: remove users from a mailing list <bongie@gmx.net>
        Renaming files *.txt to 1234.txt <rubberducky703@hotmail.com>
    Re: Renaming files *.txt to 1234.txt <josef.moellers@fujitsu-siemens.com>
    Re: Renaming files *.txt to 1234.txt <bernard.el-hagin@DODGE_THISlido-tech.net>
    Re: Renaming files *.txt to 1234.txt <josef.moellers@fujitsu-siemens.com>
    Re: return value of backticks under DOS <koos_pol@NO.nl.JUNK.compuware.MAIL.com>
    Re: save and run bytocode <junk_nntp@hoopajoo.net>
    Re: save and run bytocode <ubl@schaffhausen.de>
    Re: save and run bytocode (Barry)
    Re: security of open(TAR, "tar -cvf - $filelist|") <mzawadzk@man.poznan.pl>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 14 Jan 2003 22:33:21 -0800
From: jsmirn@hotmail.com (Ed Kastenmeier)
Subject: Re: Question about high performance spidering in perl
Message-Id: <1b5f07f8.0301142233.32a34d4f@posting.google.com>

Cameron Dorey <camerond@mail.uca.edu> wrote in message news:<3E247595.4090308@mail.uca.edu>...
> Extended Partition wrote:
> 
> > Hello Everyone,
> > 
> > I am looking at a project that aims to create a high performance
> > spider program to assist in internet searches. [snip]
>  
> > "should I use Perl to do it"?
>  
>  >
>  
> > Theoretically, this program might be faced with crawling millions of
>  
> > pages. And, while time is not an object, I would like to make this as
>  
> > quick as possible.
> >
> > Now, I know that Perl is GREAT at text parsing and that's why I am
> > considering it for this project. But do you think it's a good choice
> > for a program of this scale? The program itself won't be massive
> > (perhaps a few thousand lines at most) but the recursive and
> > repetitive nature of its functionality might incur some overhead. What
> > do you think? I would like opinions, suggestions, etc.
> 
> 
> When the internet is involved, it is almost always the slow step. Just 
> about anything you do in Perl on your end will take negligable time in 
> comparison. Get "Perl and LWP" and "Programming the Perl DBI" to help 
> get yourself up to speed with the details quickly.
> 
> 
> Cameron

were you planning on using threads? That could put the ball back in Perl's court.


------------------------------

Date: Wed, 15 Jan 2003 00:42:46 -0600
From: Extended Partition <extendedpartition@NOSPAM.yahoo.com>
Subject: Re: Question about high performance spidering in perl
Message-Id: <2m0a2vcibe2ktvn5evt6ko1vkldclqqvad@4ax.com>

>> When the internet is involved, it is almost always the slow step. Just 
>> about anything you do in Perl on your end will take negligable time in 
>> comparison. Get "Perl and LWP" and "Programming the Perl DBI" to help 
>> get yourself up to speed with the details quickly.
>> 
>> 
>> Cameron
>
>were you planning on using threads? That could put the ball back in Perl's court.

Actually yes. I was thinking of using threads. So that will increase
performance?

Extended


------------------------------

Date: Wed, 15 Jan 2003 11:40:24 GMT
From: "Blnukem" <blnukem@hotmail.com>
Subject: remove users from a mailing list 
Message-Id: <IMbV9.166917$FT6.30043946@news4.srv.hcvlny.cv.net>

Hi all

I have a small sub routine that removes users from a mailing list the
problem is that if tries to remove a user named bill it will remove every
person name that contains bill in it example if I just wanted to remove the
user bill it will remove bill, bill gates, bill clinton.


sub remove {

open (USERLIST, "<data/mail/list.dat");
my @subscriptions = <USERLIST> ;
close(USERLIST);

foreach $subscribed (@subscriptions){
chomp($subscribed);
my ($name,$email) = split (/\|/, $subscribed);

$goodsubscriptions ="$name|$email\n";

push(@newsubscriptions, $goodsubscriptions) unless ($name eq $username ||
$email eq $useremail);
  }

}




------------------------------

Date: Wed, 15 Jan 2003 12:59:03 +0100
From: Malte Ubl <ubl@schaffhausen.de>
Subject: Re: remove users from a mailing list
Message-Id: <b03lmb$4re$1@news.dtag.de>

Blnukem wrote:
> Hi all
> 
> I have a small sub routine that removes users from a mailing list the
> problem is that if tries to remove a user named bill it will remove every
> person name that contains bill in it example if I just wanted to remove the
> user bill it will remove bill, bill gates, bill clinton.

plus it will do other evil things in a multitasking environment since 
you don't do any file locking.

THink you your user list as a set of data. Each element can be in the 
set only once. The very thing that makes each element unique, can then 
be used to identify the element.... thus if you identify the right 
element, you can delete it without destroying enything else.

Ähhh, if I were you, I'd stop reinventing the wheel and use a database.

->malte

PS: The identifying thing you are looking for is the complete email address.

-- 
srand 108641088; print chr int rand 256 for qw<J A P H>



------------------------------

Date: Wed, 15 Jan 2003 13:17:13 +0100
From: "Harald H.-J. Bongartz" <bongie@gmx.net>
Subject: Re: remove users from a mailing list
Message-Id: <1862098.0m21ePkDgt@nyoga.dubu.de>

Blnukem wrote:
> sub remove {
> 

Hm.  No parameters?  I had expected at least to see $username and
$useremail as parameters.

> open (USERLIST, "<data/mail/list.dat");

Always check the result of open():
        open (USERLIST, "<data/mail/list.dat")
                or die "cannot open user list: $!";

> my @subscriptions = <USERLIST> ;
> close(USERLIST);

I hope your user list will not become too large, or this will be quite
memory consuming.

> foreach $subscribed (@subscriptions){

        foreach my $subscribed (...

Choose a scope as small as possible!

> chomp($subscribed);
> my ($name,$email) = split (/\|/, $subscribed);
> 
> $goodsubscriptions ="$name|$email\n";

Looks a bit awkward.  Why not simply
        foreach my $subscribed (@subscriptions) {
                my $goodsubscriptions = $subscribed;
                chomp $subscribed;
                my ($name, $email) = ...
?  You're splitting the line, and then joining again, where you could
use the original line read from the file.

> push(@newsubscriptions, $goodsubscriptions) unless ($name eq $username
> || $email eq $useremail);

Looks good to me, at least as long as there are no subscribers with
empty $name or $email. (I suppose only one of $username and $useremail
will be filled for the search, so any entries with empty $name/$email
would be deleted.)

>   }
> }

Where are you writing the resulting @newsubscriptions?


Here an alternative version using Tie::File. (May be a bit slower, I
assume, but I just had to try it. ;-) )

sub remove {
        my ($username, $useremail) = @_;
        my $datafile = "data/mail/list.dat";
        use Tie::File;
        tie my @subscriptions, 'Tie::File', $datafile
                or die "cannot tie to $datafile: $!";
        for (0..$#subscriptions) {
                my ($name, $email) = split /\|/, $subscriptions[$_];
                if ($name eq $username || $email eq $useremail) {
                        splice @subscriptions, $_, 1;
                        last;
                }
        }
        untie @subscriptions;
}

        
Ciao,
        Harald
-- 
Harald H.-J. Bongartz <bongie@gmx.net>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Interesting Error Messages #20:
Close your eyes and press escape three times. 


------------------------------

Date: Wed, 15 Jan 2003 11:48:32 -0000
From: "Paul Tomlinson" <rubberducky703@hotmail.com>
Subject: Renaming files *.txt to 1234.txt
Message-Id: <b03hjr$kt805$1@ID-116287.news.dfncis.de>

Renaming files *.txt to 1234.txt

Hello,

I need to rename every .txt file in a directory to 1234.txt.  I understand
that this will cause some files to get overridden, but this is ok.  Is there
any one line expression that will do this sort of thing for me?




------------------------------

Date: Wed, 15 Jan 2003 13:04:55 +0100
From: Josef =?iso-8859-1?Q?M=F6llers?= <josef.moellers@fujitsu-siemens.com>
Subject: Re: Renaming files *.txt to 1234.txt
Message-Id: <3E254E67.6D2926BC@fujitsu-siemens.com>

Paul Tomlinson wrote:
> =

> Renaming files *.txt to 1234.txt
> =

> Hello,
> =

> I need to rename every .txt file in a directory to 1234.txt.  I underst=
and
> that this will cause some files to get overridden, but this is ok.  Is =
there
> any one line expression that will do this sort of thing for me?

map { rename $_, "1234.txt" } (<*.txt>);

-- =

Josef M=F6llers (Pinguinpfleger bei FSC)
	If failure had no penalty success would not be a prize
						-- T.  Pratchett


------------------------------

Date: Wed, 15 Jan 2003 12:17:00 +0000 (UTC)
From: Bernard El-Hagin <bernard.el-hagin@DODGE_THISlido-tech.net>
Subject: Re: Renaming files *.txt to 1234.txt
Message-Id: <b03jfs$ph0$1@korweta.task.gda.pl>

In article <3E254E67.6D2926BC@fujitsu-siemens.com>, Josef Möllers wrote:
> Paul Tomlinson wrote:
>> 
>> Renaming files *.txt to 1234.txt
>> 
>> Hello,
>> 
>> I need to rename every .txt file in a directory to 1234.txt.  I understand
>> that this will cause some files to get overridden, but this is ok.  Is there
>> any one line expression that will do this sort of thing for me?
> 
> map { rename $_, "1234.txt" } (<*.txt>);


Or:


  rename $_, '1234.txt' for glob('*.txt');


Cheers,
Bernard
--
echo 42|perl -pe '$#="Just another Perl hacker,"'


------------------------------

Date: Wed, 15 Jan 2003 14:42:06 +0100
From: Josef =?iso-8859-1?Q?M=F6llers?= <josef.moellers@fujitsu-siemens.com>
Subject: Re: Renaming files *.txt to 1234.txt
Message-Id: <3E25652E.EFFD736@fujitsu-siemens.com>

Bernard El-Hagin wrote:
> =

> In article <3E254E67.6D2926BC@fujitsu-siemens.com>, Josef M=F6llers wro=
te:
> > Paul Tomlinson wrote:
> >>
> >> Renaming files *.txt to 1234.txt
> >>
> >> Hello,
> >>
> >> I need to rename every .txt file in a directory to 1234.txt.  I unde=
rstand
> >> that this will cause some files to get overridden, but this is ok.  =
Is there
> >> any one line expression that will do this sort of thing for me?
> >
> > map { rename $_, "1234.txt" } (<*.txt>);
> =

> Or:
> =

>   rename $_, '1234.txt' for glob('*.txt');

Yes, definitely TMTOWTDI.
It's exactly the same number of characters, though B-{)

More cheers,

Josef
-- =

Josef M=F6llers (Pinguinpfleger bei FSC)
	If failure had no penalty success would not be a prize
						-- T.  Pratchett


------------------------------

Date: Wed, 15 Jan 2003 09:00:04 +0100
From: Koos Pol <koos_pol@NO.nl.JUNK.compuware.MAIL.com>
Subject: Re: return value of backticks under DOS
Message-Id: <newscache$5wwq8h$69a$1@news.emea.compuware.com>

Tad McClellan wrote:

> Koos Pol <koos_pol@NO.nl.JUNK.compuware.MAIL.com> wrote:
>> Chas Friedman wrote (Tuesday 14 January 2003 17:15):
>> 
>>>  $dir=`dir`;
> 
>> backticks should not be used to catch returned values.
> 
> 
> Yes they should.
> 
> That is exactly their purpose.
> 
> 
>> Use the qx operator
>> instead.
> 
> 
> That is just an alternative way of writing backticks.
> 
> ie. qx _is_ backticks (in disguise).
> 
> 
>>See the Regexp Quote-Like Operators entry in the perlop manpage.
> 
> 
> Which shows one description applicable to either of them,
> they are just different ways of writing the same thing.


What an incredible <beep> am I.
I was running the system() vs qx sermon. Apparently I didn't even read the 
OP right. Thank heavens for people smarter than me...
Well placed Tad, (and that includes BillS S. too)
Chas, don't listen to me. I'm insane.

-- 
KP



------------------------------

Date: Tue, 14 Jan 2003 22:15:39 -0800
From: Steve Slaven <junk_nntp@hoopajoo.net>
Subject: Re: save and run bytocode
Message-Id: <v29v4bm4949ue9@corp.supernews.com>

Barry wrote:
> Hi All,
> 
> I am relatively new to Perl. I am using it to do a CGI on Apache 2.x.
> 
> If my understanding is correct. Perl will compile the source to
> bytecode then run the bytecode.
> 
> Apache now has an integrated Perl compiler - it used to have mod_perl.
> Both get rid of the time needed to load the Perl interpreter.
> 
> But, if I understand correctly, isn't the Perl code still compiled
> from source each time the CGI is invoked. Or, is the bytecode cached?
> 
> I've been scouring the web. It seems like one should be able to
> compile the source to bytcode. Save the bytecode. Then use this as the
> CGI. Saves the compile step each time.
> 
> There are things called Perl compilers, but they are off on some other
> tangent.
> 
> Maybe it is possible to run from this intermediary step. Is it
> possible to run from precompiled bytecode?
> 
> Is the compiling so relatively fast compared to everything else that
> it does not matter?
> 
> The best solution would be to have Apache (the web server) cache the
> bytecode in RAM for frequently needed CGIs? Maybe this is already
> being done? ColdFusion does this.
> 
> Thanks for your discussion and pointers.
> Barry.
> 
If you are using mod_perl, then yes the compiled script is cached, with 
certain side-effect, like persistent variables and such.  There was a 
way to coredump scripts and "undump" them in to executables, but I don't 
think anyone does that anymore.  I guess the simple answer, if you're 
doing CGI and need the speed, use mod_perl and it'll do everything 
you're asking for, plus give you hooks in to the deep insides of apache 
itself.

-- 
+----------------------------------------------------------------------------+
As soon as we started programming, we found to our surprise that
it wasn't as easy to get programs right as we had thought.
Debugging had to be discovered.  I can remember the exact instant
when I realized that a large part of my life from then on was going
to be spent in finding mistakes in my own programs.
      -- Maurice Wilkes discovers debugging, 1949
+----------------------------------------------------------------------------+
Steve Slaven - http://hoopajoo.net
MIS Programmer, Horizon Distribution - http://horizondistribution.com
Office: (509) 453-3181 x 254 / Fax: (509) 457-5769



------------------------------

Date: Wed, 15 Jan 2003 12:51:33 +0100
From: Malte Ubl <ubl@schaffhausen.de>
Subject: Re: save and run bytocode
Message-Id: <b03l8d$2p7$1@news.dtag.de>

Barry wrote:
> Hi All,
> 
> I am relatively new to Perl. I am using it to do a CGI on Apache 2.x.
> 
> If my understanding is correct. Perl will compile the source to
> bytecode then run the bytecode.
> 
> Apache now has an integrated Perl compiler - it used to have mod_perl.
> Both get rid of the time needed to load the Perl interpreter.

I think you are confusing something here. mod_perl is a Perl compiler 
integrated into Apache, and yes it does things like caching "bytecode" 
(the perl interpreter doesnt use bytecode, but rather a kind of tree 
structure to represent your code, but that doesnt matter).

Besides not having to recompile the source code on each requests, 
mod_perl also saves you the fork to perl, which depending on the size of 
your source, gives the much greater speed up (+ it does many more 
optimizations).

mod_perl, however, has some pit falls if you are used to cgi scripting, 
so you should RTFM before you do anything serious.

->malte

-- 
srand 108641088; print chr int rand 256 for qw<J A P H>



------------------------------

Date: Wed, 15 Jan 2003 13:47:40 GMT
From: bschler1@twcny.rr.com (Barry)
Subject: Re: save and run bytocode
Message-Id: <3e2564d8.4891335@news-server.twcny.rr.com>


Hi,
Thanks for the response.

A follow up question:

I've seen tools which encrypt or pre-compile or something the Perl
source so it can be distributed without giving away the source.

The solution I saw seemed to be proprietary.

Is there a standard way to do this in Perl so that most installations
of the Perl interpreter can still run the script?

Do you have a name of what to search for?

Thanks Again,
Barry.



Steve Slaven <junk_nntp@hoopajoo.net> wrote:
>If you are using mod_perl, then yes the compiled script is cached, with 
>certain side-effect, like persistent variables and such.  There was a 
>way to coredump scripts and "undump" them in to executables, but I don't 
>think anyone does that anymore.  I guess the simple answer, if you're 
>doing CGI and need the speed, use mod_perl and it'll do everything 
>you're asking for, plus give you hooks in to the deep insides of apache 
>itself.



------------------------------

Date: Wed, 15 Jan 2003 13:26:43 +0100
From: Marek Zawadzki <mzawadzk@man.poznan.pl>
Subject: Re: security of open(TAR, "tar -cvf - $filelist|")
Message-Id: <Pine.GSO.4.44.0301151316290.18440-100000@rose.man.poznan.pl>

Thank you for all your input.
Now here is what I've done to prevent malicious users from tampering with
my backup script by creating files with "evil" filenames:

1. I prepare directory listing using:
    opendir(DIR, $dir)
        || abort("can't opendir $dir\n");
    $dir_listing[<the two apropriate entries>] =~ /^\.{1,2}$/;
    @dir_listing = readdir(DIR);

2. instead of taring stuff like shown in a topic I do:
    $pid = open(TAR, "-|");
    if (!($pid)) { # child
        @my_arr = ("cf", "-", "--", @filelist);
        exec("/bin/tar", @my_arr)
            || die "can't exec program: $!";
        # NOTREACHED
    } else { # parent
        while (($r = read(TAR, $buffer, $buff_size))) {
		# etc.
	}
    }
# (@filelist is extracted from @dir_listing in [1.]

Now whatever file/directory I create (with ;, `, spaces, etc.) it works
just fine. I'm not doing metacharacter escaping at all.

I'll appreciate any extra comments to this. The script's gonna be run as
root.

-marek



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4407
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[22186] in Perl-Users-Digest

Perl-Users Digest, Issue: 4407 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Jan 15 09:11:33 2003

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jan 15 09:11:33 2003