[18649] in Perl-Users-Digest
Perl-Users Digest, Issue: 817 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue May 1 21:05:40 2001
Date: Tue, 1 May 2001 18:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <988765508-v10-i817@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Tue, 1 May 2001 Volume: 10 Number: 817
Today's topics:
Re: [newbie Q] regexp via variables <dodger@necrosoft.net>
[OT: language religion ] Re: Should Perl be first? <bchambless@nrlssc.navy.mil>
ANNOUNCE: Filter::Simple 0.60 (Damian Conway)
Re: Another regexp question <bart.lateur@skynet.be>
Re: Corrupted scripts Winows to UNIX <jakobs@redrhinosports.com>
Re: Hacker challenge. Can you break this script for me? <jfreeman@tassie.net.au>
Re: Hacker challenge. Can you break this script for me? <jfreeman@tassie.net.au>
Re: Hacker Challenge. Can you break this script for me? <jfreeman@tassie.net.au>
Re: Hacker challenge. Can you break this script for me? (Anno Siegel)
Re: Listing files on client end? (Craig Berry)
Re: Listing files on client end? <bop@mypad.com>
Re: one-line stderr, stdout redirection <bart.lateur@skynet.be>
Re: possible to dupe STDOUT to a file while still STDOU <webmaster@webdragon.unmunge.net>
Re: Q: Using 'rename' with CGI <webmaster@webdragon.unmunge.net>
Re: RegEx Question <nospam@newsranger.com>
Re: Remove Adult Files with Perl (Craig Berry)
Re: Removing Lines... how's this? (Garry Williams)
Re: Removing Lines... how's this? <bart.lateur@skynet.be>
Re: requiring something I only need once <bart.lateur@skynet.be>
Re: requiring something I only need once <dodger@necrosoft.net>
Re: Should Perl be first? (Abigail)
Re: Should Perl be first? <mischief@velma.motion.net>
Re: Should Perl be first? <mischief@velma.motion.net>
Re: Should Perl be first? <bchambless@nrlssc.navy.mil>
Re: Should Perl be first? <billy@localhost.net>
Re: Strange string -> num conversion (David H. Adler)
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 02 May 2001 01:04:50 GMT
From: "Dodger" <dodger@necrosoft.net>
Subject: Re: [newbie Q] regexp via variables
Message-Id: <S2JH6.42950$B22.10449074@news1.rdc2.pa.home.com>
"Fairlight" <fairlite@shell1.iglou.com> wrote in message
news:3aee70a3$1_1@news.iglou.com...
>
> I was attempting to do the following:
>
> chop($string = <STDIN>);
> chop($search = <STDIN>);
> chop($replace = <STDIN>);
> $string =~ s/${search}/${replace}/g;
>
> ...I note that match patterns using parentheses as memory (.mark) ...will
> work in the matches. BUT...in replacement patterns, NONE of the special
> characters seem to work. For instance, \U\1\E gives me that literally as
> the replacement.
In a substitution regex, the second half is NOT interpreted by the regex
engine, but, rather, interpolated as a doublequotish string. It's not a
replacement pattern -- it's a replacement string.
Anything in the second half of a regex will be treated as if it were in "",
in other words.
If you use ' as the delimiter (s'this'that'g) it will be treated as a
singlequoted string.
The exception is supposed to be the \1 construct, which is deprecated but is
supposed to work. Because it's deprecated, I'd recommend using $1 instead.
If you want more power in your replacement, use the /e option on the end,
and you then eval the second half.
If this is not something you use yourself -- for instance, if this script is
available to anyone -- I'd be careful, especially if it's setuid. The regex
itself can actually open up some security holes.
By the way, if you're just pulling off the newlines, chomp is favoured over
chop.
--
Dodger
www.dodger.org
www.necrosoft.net
www.gothic-classifieds.com
------------------------------
Date: 2 May 2001 00:25:03 GMT
From: Billy Chambless <bchambless@nrlssc.navy.mil>
Subject: [OT: language religion ] Re: Should Perl be first?
Message-Id: <9cnk4v$get$1@news.datasync.com>
In article <slrn9etqmd.d8k.eins@www42.t-offline.de>,
Rudolf Polzer <eins@durchnull.de> wrote:
>Martien Verbruggen <mgjv@tradingpost.com.au> wrote:
>> If you want to become a professional programmer, a good one, learn some
>> other languages, and C should probably be in there [1].
>Really?
Yes, really.
> Is C still widely used?
Yes, widely.
> AFAIK C++ is, but C?
Yes.
> OK, programming handhelds and things like that has to be done in C,
Not necessarily. :)
> but since space is normally not an issue,
There are more considerations than space involved.
> C++ is much more used because it is more powerful.
Hmmm... the part before the "because" seems to be true in the Wintel world,
partly because nobody seems to be in the C-compiler-for-Wintel business
any more, but I'm seeing a lot more C than C++ on Unix.
As far as "powerful" goes, I'm not sure what you mean.
Are there programs that can be written in C++ that can't be written in C?
Or are you referring to ease of use? Java certainly seems (to me) easier to
use than C++.
>I even noticed it is hard to learn C when you know C++!
Which might support Martin's point. C is a good foundation for learning
C++, Perl, Java, yada yada due to family resemblances.
OBOnTopic: Whatever language one writes production code in, Perl is a must
for tools, etc.
------------------------------
Date: 1 May 2001 21:39:27 GMT
From: damian@cs.monash.edu.au (Damian Conway)
Subject: ANNOUNCE: Filter::Simple 0.60
Message-Id: <teuh31fpj69946@corp.supernews.com>
Keywords: perl, module, release
==============================================================================
Release of version 0.60 of Filter::Simple
==============================================================================
NAME
Filter::Simple - Simplified source filtering
SYNOPSIS
# in MyFilter.pm:
package MyFilter;
use Filter::Simple;
FILTER { ... };
# or just:
#
# use Filter::Simple sub { ... };
# in user's code:
use MyFilter;
# this is filtered
no MyFilter;
# this is not
DESCRIPTION
The Filter::Simple module provides a simplified interface to
Filter::Util::Call; one that is sufficient for most common cases.
AUTHOR
Damian Conway (damian@conway.org)
COPYRIGHT
Copyright (c) 2000, Damian Conway. All Rights Reserved. This module
is free software. It may be used, redistributed and/or modified under
the terms of the Perl Artistic License
(see http://www.perl.com/perl/misc/Artistic.html)
==============================================================================
CHANGES IN VERSION 0.60
- Fixed POD nit (thanks Dean)
- Added optional second argument to import to allow
terminator to be changed (thanks Brad)
- Fixed bug when empty filtered text was appended to (thanks Brad)
- Added FILTER as the normal mechanism for specifying filters
==============================================================================
AVAILABILITY
Filter::Simple has been uploaded to the CPAN
and is also available from:
http://www.csse.monash.edu.au/~damian/CPAN/Filter-Simple.tar.gz
==============================================================================
------------------------------
Date: Tue, 01 May 2001 23:26:56 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: Another regexp question
Message-Id: <pchuet8l1ogkedh8m9apulnffirp5qbfob@4ax.com>
Rudolf Polzer wrote:
>> >s/(regex)p/\1/gi;
>>
>> You should enable warnings.
>
>Of course. But this is not an error.
It is, or it should be. \1 has a meaning outside regexes: it's a
reference to the scalar 1.
Also:
$_ = 'gregexpr';
s/(regex)p/$1/;
print;
Duh.
s/(?<=\bregex)p\b//;
--
Bart.
------------------------------
Date: Tue, 01 May 2001 17:23:06 -0500
From: Paula Jakobs <jakobs@redrhinosports.com>
Subject: Re: Corrupted scripts Winows to UNIX
Message-Id: <3AEF374A.F12A62E5@redrhinosports.com>
Nigel Taylor wrote:
> Hi
> I am currently having problems with scripts that I upload to a Unix server.
>
> For example after editing formmail (Matts Archive) to my parameters, and
> uploading via WS_FTP, the script will not run.
>
> 'Internal Server Error'
>
> My hosting tech had to re-install the script from his end, he said the size
> of the file should be 14k, but after upload it
> is 23k.
>
> I know it is not a path problem etc.
>
> I use Programmers file editor or Notepad to edit, I edit in windows 98, and
> upload with a DSL connection.
>
> This is obviously a real problem if all my scripts are corrupted somehow.
>
> Anyone have any ideas on what may be causing my problems.
>
When using Programmers file editor, make sure to save as a UNIX file (should be
in the options somewhere or on the save screen). Also, when uploading, make
sure to upload it as an ASCII file and not binary.
Paula Jakobs
------------------------------
Date: Wed, 02 May 2001 09:23:35 +1000
From: Jfreeman <jfreeman@tassie.net.au>
Subject: Re: Hacker challenge. Can you break this script for me?
Message-Id: <3AEF4577.8FC0DB3@tassie.net.au>
"David J. Marcus" wrote:
> Out of curiosity, you say you have tested the stripper (does it tease?) on
> nearly 120,000 lines of code.
>
> This leads, naturally, to the following variants on the same question or
> correctness:
> 1- How do you know that it does indeed strip correctly?
> 2- How do you determine if the generated (stripped down) code is not
> broken?
At the risk of insulting your intelligence, in the original post I wrote:
Testing Protocol
The testing protocol is very simple.
1) First input the target script. Concatenate it into a string. Eval this
string, compiling it but avoid actually running it. Check $@ to ensure that no
errors are detected. If there is nothing in $@ then we can be sure that we have
a valid piece of perl that checks out using this eval method.
2) Next process the script and concatenate it (i.e. remove most of the new
lines) to give a line length approximating, and where possible not exceeding, a
user defined length (say 80, 160, or even 42000 if you want to turn a 2000 line
script into a serious one liner!
3) Run the new processed script through exactly the compile check algorithm used
in 1. As it was not initially broken it will only be broken now if
stripcomments.pl has broken it. Note if it is found to be broken than
stripcomments.pl restores the original file from the backup, and unlinks the
backup with a net effect of no change to the original file other than the
timestamp.
Why this logically works.
It is next to impossible to strip a # char and everything following it until the
end of that line (EOL) without causing a compilation error unless the #....EOL
is a real comment.
If you have:
for (0..$#array) {
and incorrectly strip '#array) {' you get a compilation error. This goes for
virtually all cases as you will either strip off a } or a ; at the end of the
line giving a fatal error.
One exception is (c) Abigail:
$_ = "Just another Perl Hacker # No comment, no comment!
# Yes, really!
# I am really a Perl Hacker!
";print;
In this case you could strip all the #...EOL without causing a compilation
error. Although stripcomments.pl did do this once upon a once upon it now
doesn't. This is one of the few cases that come to mind where you can
haphazardly hack off a non comment #...EOL without breaking a script.
Essentially you need some block or quoting context where the opening delimiter
and closing delimiter will remain intact after you strip the #..EOL and I
believe the script has all these covered.
So I am fairly confident that stripcomments.pl is not stripping anything that is
not a comment. Although there is a small potential that it may be doing so
undetected I feel that this is unlikely with a test base of 119,566 lines and no
breakages. More test material would be good.
The script concatenation forms the second part of the test. If you have
for (@foo) { # iterate over foo
$_++; # increment each foo
}
and concatenate it you will have
for (@foo) { # iterate over fo0 $_++; # increment each foo }
this is now a syntax error as the closing } is now hidden in the comment.
About the only way you can still have comments present in a script and still
concatenate it is if you are concatenating aiming for a standard 80 char line
the some comments could simply fortuitously fall at the end of the line, thus
not breaking the script. By getting stripcomments.pl to concatenate to a number
or different line lengths (causing any theoretical unstripped comments to move
position in the line) I believe it unlikely (but not of course impossible) that
this is occurring.
>
>
> Just having the resulting output still compile is not equivalent to 'not
> breaking' the code.
Of course what you say is true, but given the logic above can you please post
one example where incorrectly stripping a # char to end of line that will not
break a script. Naturally you will need to show that stripcomments.pl also
strips it if it is to be of concern. Examples such as:
print "
# not a comment
";
q(
# not a comment
);
Are known and not an issue as the '# not a comment' does not strip. In my
experience when you start ripping comments just about any mistake leads to the
syntax error. problem.
>
>
> Short of fully testing each and every module it is virtually impossible to
> certify that nothing has been broken. Full testing is impossible, since that
> requires you to certifiably test that every possible input and external
> machine state combinations have been tested (again, an impossibility).
>
> So, again, how do you know that it didn't break anything?
I don't but I expect to be promptly, and publicly told when and if someone finds
that I have!
It has been suggested that by using B:Bytecode before and after stripping and
diffing the result will prove this which sounds like a brilliant idea. I will
explore it today. Thanks to Rudolf Polzer for this idea.
James
>
>
> -Regards
> David
>
> "Jfreeman" <jfreeman@tassie.net.au> wrote in message
> news:3AEEEB4D.8657D1E4@tassie.net.au...
> >
> >
> > "Godzilla!" wrote:
> >
> > > Jfreeman wrote:
> > >
> > > (snipped)
>
------------------------------
Date: Wed, 02 May 2001 09:29:12 +1000
From: Jfreeman <jfreeman@tassie.net.au>
Subject: Re: Hacker challenge. Can you break this script for me?
Message-Id: <3AEF46C8.C28D2AF2@tassie.net.au>
Rudolf Polzer wrote:
> David J. Marcus <djmarcus@ex-pressnet.com> wrote:
> > Out of curiosity, you say you have tested the stripper (does it tease?) on
> > nearly 120,000 lines of code.
> >
> > This leads, naturally, to the following variants on the same question or
> > correctness:
> > 1- How do you know that it does indeed strip correctly?
> > 2- How do you determine if the generated (stripped down) code is not
> > broken?
> >
> > Just having the resulting output still compile is not equivalent to 'not
> > breaking' the code.
> >
> > Short of fully testing each and every module it is virtually impossible to
> > certify that nothing has been broken. Full testing is impossible, since that
> > requires you to certifiably test that every possible input and external
> > machine state combinations have been tested (again, an impossibility).
> >
> > So, again, how do you know that it didn't break anything?
>
> Perhaps the B::Bytecode approach is possible: create the assembler code file
> and strip out the line numbers before diff-ing.
Brilliant idea. I will explore it today. As I have a diff script floating around
in my subroutine toolbox it might even be quick and easy to implement, here's
hoping.
Cheers
James
>
>
> --
> #!/usr/bin/perl -- Random sig generator. Editor command in slrn => ~/siggs
> $F=shift;open H,"+<$F";$_=join"",<H>;$s=index$_,"\n\n-- \n";$s<0||truncate
> H,$s;close H;system"$ENV{EDITOR} $F</dev/tty>/dev/tty";$s=$n=0;for#sichtig
> (<~/siggs/*>){++$n;int rand$n or$s=$_};`(echo "\n\n-- ")|cat - $s>>$F`+nan
------------------------------
Date: Wed, 02 May 2001 09:49:34 +1000
From: Jfreeman <jfreeman@tassie.net.au>
Subject: Re: Hacker Challenge. Can you break this script for me?
Message-Id: <3AEF4B8E.A4F5C178@tassie.net.au>
"Godzilla!" wrote:
> Rudolf Polzer wrote:
>
> > Godzilla! wrote:
> > > Rudolf Polzer wrote:
> > > > Godzilla! wrote:
> > > > > Jfreeman wrote:
>
> (significant snippage)
>
> > > > I do not see anything in this that does not work.
>
> > > This is correct. My script has been in use for a few
> > > years, by thousands of visitors sustaining well over
> > > three million hits to date. Obviously, it compiles.
This is no reflection on Godzilla's script, which I am sure works. If you check
out the thread 'Capturing the output of perl -c myfile.pl' you will be able to see
the routine used to generate these two lines:
Compile check .\chahta.cgi
Compile check failed!
These next two are generated when the sub compile returns with the value of $@.
Sorry perl script .\chahta.cgi does not compile, Aborting!
Can't localize lexical variable $found at (eval 1) line 1152
$not_ok = &compile($path);
if ($not_ok) {
print "Sorry perl script $path does not compile, Aborting!\n$not_ok\n\n";
next;
}
For good reasons, related to namespace issues, the line eval $code where $code is
a string containing a perl script does not always work. About 5% of valid perl
will not eval.
I have an essentially global package variable called my $found which is causing
this namespace issue.
You can easily kill the compile checking by simply setting the configuration
variable:
$no_eval = 1.
You will need to perl -c myfile.pl your script afterwards to see if is is broken
and if so manually restore it from the myfile.pl.bak file. If you do this what
happens???
Cheers
James
>
>
> > I wanted to know which code was generated by his script and which code was
> > the original one, since the error message looks like a perl syntax error and
> > not a die() from his script. I just wanted to see what the error is!
>
> My suggestion is you run tests of your own. Doing this will
> provide quicker, more authoritative answers.
>
> Godzilla!
------------------------------
Date: 2 May 2001 00:53:58 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Hacker challenge. Can you break this script for me?
Message-Id: <9cnlr6$g1$1@mamenchi.zrz.TU-Berlin.DE>
According to Jfreeman <jfreeman@tassie.net.au>:
[snipped, over-long lines reformatted]
> Why this logically works.
>
> It is next to impossible to strip a # char and everything following it
> until the end of that line (EOL) without causing a compilation error
> unless the #....EOL is a real comment.
This assumption is incredibly naive. This bit
my $length = $#var;
$var = 456;
will compile just fine after you cut off the "comment". It will leave
a nasty bug too, especially if $var happens to be a scalar ref at the
time. Except for the coexistence of $var with @var (yes, Uri, that's
bad praxis :), this code is by no means remarkable.
Anno
------------------------------
Date: Tue, 01 May 2001 23:58:21 -0000
From: cberry@cinenet.net (Craig Berry)
Subject: Re: Listing files on client end?
Message-Id: <teujctki7a7800@corp.supernews.com>
Scott Shannon (sshannon@acc.com) wrote:
: Hi. I'd like to know the simplest way, using a perl cgi script, to list
: the files (placed, say, in a special public accesible directory), on
: the *clients* end..i.e. the server should do the equivalent of an ls on
: the clients computer, then create a html page with that listing,
: which it then displays back to the client. Thanks for any info
Impossible(*), thank god. You can set up a frameset and have one of the
frames point at a local directory using file:, but the files won't be
known to the server, or in any sense publicly accessible.
* Modulo the latest Microsoft security misfeatures, of course.
--
| Craig Berry - http://www.cinenet.net/~cberry/
--*-- "When the going gets weird, the weird turn pro."
| - Hunter S. Thompson
------------------------------
Date: Wed, 02 May 2001 00:34:44 GMT
From: "flash" <bop@mypad.com>
Subject: Re: Listing files on client end?
Message-Id: <ECIH6.680814$f36.19174944@news20.bellglobal.com>
Just make a cgi that opens a dir reads it and then send the data to the
server. then the user can goto your site login and recive what he wants.
after you write the client scipt compile it using perlcc (windows version
though)
and then the user can download the program and it will do it's stuff
"Scott Shannon" <sshannon@acc.com> wrote in message
news:3AEF23A0.626CA1DE@acc.com...
>
> Hi. I'd like to know the simplest way, using a perl cgi script, to list
> the files (placed, say, in a
> special public accesible directory), on the *clients* end..i.e. the
> server should do the
> equivalent of an ls on the clients computer, then create a html page
> with that listing,
> which it then displays back to the client. Thanks for any info
> S
>
>
>
------------------------------
Date: Tue, 01 May 2001 23:45:24 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: one-line stderr, stdout redirection
Message-Id: <48iuetcv9h9gm15efbdupidc2j61pvpq75@4ax.com>
Rudolf Polzer wrote:
>Can you still use IO::Select when you have >30 open sockets?
I'm not sure, but:
>select() does
>not work because the bit vector can only hold fileno()s in 0..31.
Heh? "perlfunc -f select" reveals a lot of text that explicitly and
consistently talks about vec(). This means that the bit vectors are
actually strings. You set/reset bits in the bytes of those strings. The
maximum length of the bitmask thus is virtually unlimited.
--
Bart.
------------------------------
Date: 2 May 2001 00:24:51 GMT
From: "Scott R. Godin" <webmaster@webdragon.unmunge.net>
Subject: Re: possible to dupe STDOUT to a file while still STDOUT-ing? :)
Message-Id: <9cnk4j$h7p$0@216.155.32.215>
In article <oh7tetglrot6amgfp9t9stfa61jd0f8p92@4ax.com>,
Bart Lateur <bart.lateur@skynet.be> wrote:
| Scott R. Godin wrote:
|
| >I'd like to know if it's possible to do such a thing.. continue printing
| >to STDOUT but also have same echoed to a file without multiple print
| >statements or suffling back and forth with select($fh)
|
| What you're asking for, is known as tee-ing. There's even an entry n the
| fAQ about it.
|
[snip]
| Bummer. Possible solutions still include a tied filehandle, get tee()
| from the GNU-for-Win32 toolbox (Cygwin), or use a perl script tee clone,
| like (as per the FAQ)
| <http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz>
|
|
| >I have an instance where, a program I've written to output an html-based
| >template from form input is sent to the browser for preview, but I'd
| >also like it dumped to a local file at the same time, so that the user
| >can download it.
|
| Isn't that silly? Can't the user just rightclick (or something similar)
| and choose "Save As..."?
well, yeah, but some *cough* browsers don't preserve the name of the
original file in the save dialog, which in this case is important to me.
The two files created MUST be named after the name of the map described
in the template, as it will be bundled with the map file as a readme
document. The purpose of the script is to 'suggest' away from creating
files named "readme.txt" to bundle with them (which tend to overwrite
the previous files) -- with over 3800 maps for Unreal Tournament on the
NaliCity website, this can be somewhat problematic. :-)
...hence the creation of this CGI.
Ah well, it was just a thought. I have it producing links to the
resultant files in html_output/ as well as to the .tgz file, so all's
well, I suppose. It just would have been interesting to automatically
start the download for them with an appropriate content-type header
after displaying the html to the user.
The result I have is good enough, and usable, so I'll leave it at that
for now. Thanks for the info though.. something to spend a modicum of
research time on, when i have some. (time, that is)
--
unmunge e-mail here:
#!perl -w
print map {chr(ord($_)-3)} split //, "zhepdvwhuCzhegudjrq1qhw";
# ( damn spammers. *shakes fist* take a hint. =:P )
------------------------------
Date: 2 May 2001 00:43:52 GMT
From: "Scott R. Godin" <webmaster@webdragon.unmunge.net>
Subject: Re: Q: Using 'rename' with CGI
Message-Id: <9cnl88$h7p$1@216.155.32.215>
In article <GRyH6.384$tD6.31021@nnrp1.ptd.net>,
"Shannon Brown" <news@shannonbrown.net> wrote:
| Perl 5.003
unrelated to your problem, but 5.003 is seriously buggy and should be
replaced immediately do-not-pass-Go with at least 5.004.
check out the perldelta.pod and specifically the 'Changes' file with
extensive information for 5.004 to see why this is so. primarily
security issues.
--
unmunge e-mail here:
#!perl -w
print map {chr(ord($_)-3)} split //, "zhepdvwhuCzhegudjrq1qhw";
# ( damn spammers. *shakes fist* take a hint. =:P )
------------------------------
Date: Tue, 01 May 2001 23:49:52 GMT
From: Dan <nospam@newsranger.com>
Subject: Re: RegEx Question
Message-Id: <AYHH6.6039$SZ5.503032@www.newsranger.com>
In article <m3snipm779.fsf@dhcp9-172.support.tivoli.com>, Ren Maddox says...
>> Thank you... But, how can I get all the data before the $$?
>
>The same way, with the caveat that you need to escape the dollar
>signs.
>
>my ($text) = $data =~ /(.*)\$\$/s;
>
>Or, use the substr/index method I already mentioned:
>
>my $text = substr($data, 0, index($data, '$$'));
>
>--
>Ren Maddox
>ren@tivoli.com
Thank you for your help. The substr method works great for this.
Dan
------------------------------
Date: Tue, 01 May 2001 22:34:01 -0000
From: cberry@cinenet.net (Craig Berry)
Subject: Re: Remove Adult Files with Perl
Message-Id: <teueepjpks86a0@corp.supernews.com>
BUCK NAKED1 (dennis100@webtv.net) wrote:
: I want to remove all files in a subdirectory of "wkdir" IF they include
: one of many defined "bad words." I know this is not a complete solution,
: as people can name adult files anything they wish; but at least it's a
: start. My webhost doesn't allow adult material, and is peculiar. They
: search words and if they find certain words on your site, they just
: delete your site. Thus, I have to pad the "dirty words" as I've done
: below.
Yow! I'd change providers, personally. But anyway...
: Is this a good solution, or is there a "dirty word" filter script
: already out there?
:
: $f = "f0u0c0k"; $f =~ s/0//;
: $s = "s0e0x"; $s =~ s/0//;
I'd tend to generalize this, like so:
@dirties = qw( f0u0c0k s0e0x j0a0v0a g0a0t0e0s );
tr/0//d foreach @dirties;
$dirtypat = join '|', @dirties;
That generalizes more easily, and is also far more suitable to using an
external dirty-words data file should that become convenient. It assumes
there are no regex metacharacters in the dirty words list, of course.
: $wkdir = "wkdir/";
: # Remove files with bad words
: use File::Find;
: find sub {-f;
: if((my $new = $_) =~ $f | $s )
You don't have regex delimiters around '$f | $s, which means that whole
line isn't doing what you think it is. I'm also unclear on why you're
copying to $new; $_ is unchanged by an m// operation. Also, that hanging
-f is a little pointless.
: { unlink $_; } }, $wkdir ;
:
: I wrote the above for filtering "dirty" words in filenames, but I'd also
: like a script for filtering out "certain" words in all files in a
: directory too, if anyone has one.
find sub {
return unless -f;
open F, $_ or die "$_ : $!";
my $text = do { local $/; <F> };
print "unlink $_ (for $1)\n" if $text =~ /($dirtypat)/;
}, $wkdir;
I made it print out an unlink command (and what word it was unlinked for,
rather than executing the unlink, to protect myself during local testing.
Here's what the whole test harness looks like:
#!/usr/bin/perl -w
# dirty - dirty word finder test
# Craig Berry (20010501)
use File::Find;
use strict;
my @dirties = qw( f0u0c0k s0e0x j0a0v0a g0a0t0e0s );
tr/0//d foreach @dirties;
my $dirtypat = join '|', @dirties;
my $wkdir = '.';
find sub {
return unless -f;
open F, $_ or die "$_ : $!";
my $text = do { local $/; <F> };
print "unlink $File::Find::name (for $1)\n" if $text =~ /($dirtypat)/;
}, $wkdir;
--
| Craig Berry - http://www.cinenet.net/~cberry/
--*-- "When the going gets weird, the weird turn pro."
| - Hunter S. Thompson
------------------------------
Date: Tue, 01 May 2001 22:15:35 GMT
From: garry@ifr.zvolve.net (Garry Williams)
Subject: Re: Removing Lines... how's this?
Message-Id: <slrn9eudc7.clt.garry@zfw.zvolve.net>
On Tue, 01 May 2001 20:04:54 -0000, Craig Berry <cberry@cinenet.net> wrote:
> Garry Williams (garry@ifr.zvolve.net) wrote:
> : s/(?:.*?\n){3}//
> ^
>
> No need for the non-greedy qualifier; it will always match exactly to the
> next newline.
Yes, I missed that and Bart Lateur's post in this thread made the
point, too.
--
Garry Williams
------------------------------
Date: Tue, 01 May 2001 23:35:30 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: Removing Lines... how's this?
Message-Id: <dvhuets4gtlonjddickdm0chfpilckpgor@4ax.com>
Uri Guttman wrote:
>in fact he doesn't need the ^ as it will always use the first match it
>finds.
That's basically why I wrote:
! The ^ mainly serves to indicate the purpose of the code.
Jim Monty wrote:
:Would ^ save futile match attempts for values of $log that contain
:less than three (new)lines?
That's why I wrote "mainly".
I'm not sure it really helps. But it sure doesn't hurt, and the purpose
of the resulting regex is more understandable for the reader.
--
Bart.
------------------------------
Date: Tue, 01 May 2001 23:47:56 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: requiring something I only need once
Message-Id: <2oiuet0ghgn18emknr7h7bnisaqtcm8svi@4ax.com>
Greg Bacon wrote:
>: Here, on the LHS, you're using a symbolic reference, hence, a global
>: variable.
>
>In your post, you asserted that it was necessary to lose the my. I
>demonstrated that your assertion is false.
If you want to stay in the spirit of the OP's code, yes. But you have
failed in eliminating the need of a global. You replaced one global with
another, in an obscure place.
What's so bad about a global once in a while, anyway.
--
Bart.
------------------------------
Date: Wed, 02 May 2001 00:41:31 GMT
From: "Dodger" <dodger@necrosoft.net>
Subject: Re: requiring something I only need once
Message-Id: <%IIH6.42647$B22.10434142@news1.rdc2.pa.home.com>
"Bart Lateur" <bart.lateur@skynet.be> wrote in message
news:2oiuet0ghgn18emknr7h7bnisaqtcm8svi@4ax.com...
> What's so bad about a global once in a while, anyway.
Nothing, provided you know what you are doing with them. It's also generally
a good idea to declare them, to avoid problems with strict and to help other
people see what you are doing before they get lost in your code.
--
Dodger
www.dodger.org
www.necrosoft.net
www.gothic-classifieds.com
------------------------------
Date: Tue, 1 May 2001 22:26:32 +0000 (UTC)
From: abigail@foad.org (Abigail)
Subject: Re: Should Perl be first?
Message-Id: <slrn9eue0o.8j5.abigail@tsathoggua.rlyeh.net>
Rudolf Polzer (eins@durchnull.de) wrote on MMDCCC September MCMXCIII in
<URL:news:slrn9eu6nn.ke.eins@www42.t-offline.de>:
}}
}} Of course. But Pascal->C is IMHO easier than C++->C: C++ and C have the
}} same syntax, but when you start with C++, you'll learn nothing about malloc,
}} char * and such things that you have to learn to deal with. When you did Pas
}} before, you already know malloc as GetMem and you know the pitfalls.
Pascal as Go^WWirth intended doesn't have GetMem.
Abigail
--
print v74.117.115.116.32;
print v97.110.111.116.104.101.114.32;
print v80.101.114.108.32;
print v72.97.99.107.101.114.10;
------------------------------
Date: Tue, 01 May 2001 22:48:58 -0000
From: Chris Stith <mischief@velma.motion.net>
Subject: Re: Should Perl be first?
Message-Id: <teufaqr4ajk7ac@corp.supernews.com>
Rudolf Polzer <eins@durchnull.de> wrote:
> Chris Stith <mischief@velma.motion.net> wrote:
>> Rudolf Polzer <eins@durchnull.de> wrote:
>> > Tad McClellan <tadmc@augustmail.com> wrote:
>> >> Rudolf Polzer <eins@durchnull.de> wrote:
>> >> >Martien Verbruggen <mgjv@tradingpost.com.au> wrote:
>> >> >> If you want to become a professional programmer, a good one, learn some
>> >> >> other languages, and C should probably be in there [1].
>> Besides, not only will knowing C make you a better programmer,
>> but C is found in places that are slow to take on new languages.
>> Such places include cross-compilers for embedded controllers and
>> specialty operating systems.
> I know; I am programming the TI-85 calculator in C... and it is used
> on the Gameboy.
>> The new and free of C++ are still low-level compared to many
>> languages. Pointers are considered low-level in most any
>> modern survey of languages. C++ therefore is still not as
>> HL as could be. Remember, too, that pointers cause a lot of
>> the bugs in C/C++ that memory management doesn't. ;-)
> They are new and delete.
Zoiks! This I know, and yet didn't type correctly. Core
error detected... please reboot. ;-)
> C++ is the highest-level compiled language
> I know (With 'compiled' I mean that native x86 code is generated that
> is not completely bloated).
I'd disagree. In my mind, OCaml is higher level than C++. Cecil
is as well. Dylan is supposed to be, but I haven't played with it.
GNU E is higher level and compiled, but requires an extra run-time
interface to acheive its persistence using Exodus.
Eiffel is higher level, and it doesn't produce object code just
yet but does produce C which can be compiled.
Some high-level languages about which I can't really comment
are ZPL, Sather (currently working on native object, but
to C is working), and Euphoria.
Remember, too, that anything that's purely functional or
close to it can be considered high level. Haskell comes to
mind.
>> I think teaching a subset of Perl as a second language would
>> be a good idea. I'm not sure there's any language I can truly
>> say would be a good irst language. All the teaching languages
>> have morphed into something quite broken and overextended when
>> taken too seriously. Most of the general purpose languages are
>> pretty difficult to learn with no experience (C, C++) or can
>> lead to bad habits with no experience (Perl, C).
> I would recommend Pascal here. Unfortunately it is not widely used,
> but it enforces 'good style'. Unfortunately Turbo Pascal does not;
> it supports type casts that do not look bad (like C++'s reinterpret_cast).
There are ISO Pascal compilers for nearly every platform, including
the PC. I don't really agree with Pascal, Modula-2, Modula-3, Ada,
or Oberon as a first language. Pascal's type system is strict, but
it's not quite right. Ada and the Modula family get better, but they
are large and cumbersome languages. The subset-of-Ada languages I've
been reading about, such as Ava, may be a good place to start. Ada
itself is a whale of a language. Perhaps Ava will find a proper
subset which works well.
Heck, Ada + PL/1 + Scheme in the same project, and you'd have nearly
every programming idiom in the world at your fingertips, once you
figure out where they are. :-/
Chris
--
You must not lose faith in humanity. Humanity is an ocean;
if a few drops of the ocean are dirty, the ocean does not
become dirty. -- Mohandas K. Gandhi
------------------------------
Date: Tue, 01 May 2001 22:59:52 -0000
From: Chris Stith <mischief@velma.motion.net>
Subject: Re: Should Perl be first?
Message-Id: <teufv85hocn442@corp.supernews.com>
Bart Lateur <bart.lateur@skynet.be> wrote:
> BUCK NAKED1 wrote:
>>I started learning Perl a few months ago, but still have much to learn.
>>Friends tell me that VB is THE language to learn.
> Well... VB is really great for putting together a (client side) GUI
> really quickly. Then you attach database connectivity to the controls...
> really neat. But that's it. Although I really like the syntax of BASIC,
> it beats C hand down, it is also pretty limited. Building more complex
> stuff is almost as bad as having to write it in raw Assembler -- or C.
Ths is the best argument for both sides of VB I've heard in a while.
Unfortunately, the dark side of this popular language is much larger
than the light side.
>>or would you recommend another programming
>>language? C++? Java? what?
> Not really. I think those languages require more of the programmer.
In some ways. Java requires more discipline to learn it, but
Perl requires more discipline to use it. ;-)
> Oh BTW, database connectivity with Perl is very good, too, using DBI.
True. Some of the best database connectivity anywhere, largely
because it's nearly transparent which database your program
will use as a backend. This does require parts of the SQL
language to be learned as well, but that's a small price to
pay for how easily Perl lets you work with the many systems
it does with no changes.
> As a summary: I don't think it's too wise to limit yourself to just one
> language. All have their typical atmosphere, and I find it's not much of
> a risk of you beginning to confuse between them. It's often refreshing
> to be able to do something entirely different once in a while.
Perl, of course, is just barely contained by the phrase 'one
language'. You could put together a few different smaller languages
from parts of Perl that don't really overlap much. Larry jokes about
this, but it's funny because it's basically true.
Chris
--
Product shown enlarged to make you think you're getting more.
------------------------------
Date: 2 May 2001 00:12:02 GMT
From: Billy Chambless <bchambless@nrlssc.navy.mil>
Subject: Re: Should Perl be first?
Message-Id: <9cnjci$g5t$1@news.datasync.com>
In article <Pine.SUN.4.33.0105010912140.7659-100000@mamba.cs.Virginia.EDU>,
David Coppit <newspost@coppit.org> wrote:
>C is simple. Programming in C is not. :) Heck, programming in Perl
>isn't simple either unless you use some discipline.
Programming in either language is simple until you have to deal with code
somebody else wrote. :)
------------------------------
Date: 2 May 2001 00:33:14 GMT
From: Billy Chambless <billy@localhost.net>
Subject: Re: Should Perl be first?
Message-Id: <9cnkka$gjq$1@news.datasync.com>
In article <teto4umcm4u49c@corp.supernews.com>,
Chris Stith <mischief@velma.motion.net> wrote:
> The trouble with teaching Perl as a first computer language is that your
> students won't appreciate it till they start learning their second.
>* Don't let anyone tell you what your first computer language should be
> before you've learned several.
These two add up to an important point:
Although a person can certainly learn one language and start getting
real work done, learning several languages is part of the process of
learning the profound truths of programming that are needed to make a
Really Great Programmer.
The best programmers I know know enough languages that they're not bound
by the conceptual limits of any one of them.
------------------------------
Date: 1 May 2001 23:36:56 GMT
From: dha@panix6.panix.com (David H. Adler)
Subject: Re: Strange string -> num conversion
Message-Id: <slrn9eui4p.9o3.dha@panix6.panix.com>
Data points:
using perl -le 'print "0x12" + 1':
debian:~$ perl553 -le 'print "0x12" + 1'
19
This is perl, version 5.005_03 built for i586-linux
debian:~$ perl56 -le 'print "0x12" + 1'
19
This is perl, v5.6.0 built for i586-linux
using Rudolf's C program:
debian:~$ ./stontst
18.000000
1.125000
using Samuel's C program:
debian:~$ ./stontst2
f1: 1.125000
f2: 1.125000
f3: 10.000000
18.000000
1.125000
1.125000
1.125000
10.000000
Good luck figuring out what this all means. :-)
dha
--
David H. Adler - <dha@panix.com> - http://www.panix.com/~dha/
Trained Philosopher: Will Think For Food
- R. Dan Henry
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 817
**************************************