[32072] in Perl-Users-Digest
Perl-Users Digest, Issue: 3336 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Mar 28 14:09:23 2011
Date: Mon, 28 Mar 2011 11:09:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 28 Mar 2011 Volume: 11 Number: 3336
Today's topics:
Re: How to avoid searching this folder? geoff@invalid.invalid
REVISED: Variable length array to XML <dpich@polartel.com>
Re: REVISED: Variable length array to XML <uri@StemSystems.com>
Re: using File::Find (Randal L. Schwartz)
Re: using File::Find <hjp-usenet2@hjp.at>
Re: using File::Find <uri@StemSystems.com>
Re: using File::Find <mgogala@no.address.invalid>
Re: using File::Find <uri@StemSystems.com>
Re: using File::Find <hjp-usenet2@hjp.at>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 27 Mar 2011 22:26:07 +0100
From: geoff@invalid.invalid
Subject: Re: How to avoid searching this folder?
Message-Id: <inavo6l2db87t047a0jnd18dpo7mnchjf0@4ax.com>
On Fri, 25 Mar 2011 23:59:18 -0700, "John W. Krahn"
<jwkrahn@example.com> wrote:
>geoff@invalid.invalid wrote:
>> Hello
>
>Hello,
>
>> I am using Tom Boutell's simple search engine on my website but would
>> like it to not index the files in a particular folder called archives.
>>
>> How would I modify the code for this? I have tried and so far failed.
>>
>> Thanks
>>
>> Geoff
>>
>> #!/usr/bin/perl
>
>The next two lines should be:
>
>use warnings;
>use strict;
>
>
>> $path = "/path/public_html";
>> $webpath = "";
>> $indexname = "/path/formmail/searchindex.txt";
>
>my $path = "/path/public_html";
>my $webpath = "";
>my $indexname = "/path/formmail/searchindex.txt";
>
>
>> $nextFd = 0;
>
>It looks like you don't really need this variable, so what is it really
>supposed to do for your program?
>
>
>> open(OUT, ">$indexname");
>
>You should *always* verify that the file was opened correctly before
>trying to use what may be an invalid filehandle:
>
>open OUT, '>', $indexname or die "Cannot open '$indexname' because: $!";
>
>
>> &update($path, $webpath);
>
>In modern versions of Perl you don't need to use ampersands on
>subroutine calls:
>
>update($path, $webpath);
>
>
>> sub update {
>> my($path, $webpath) = @_;
>> my($dd) = $nextFd++;
>
>Why are you storing a number in a variable that you are going to use for
>a directory handle? That makes no sense.
>
>
>> print "Updating in $path\n";
>> if (!opendir($dd, $path)) {
>> print STDERR "Warning: can't open $path\n";
>> return;
>> }
>
>You should declare variables where you first use them and you should
>include $! in the error message so you know why it failed:
>
> opendir my $dd, $path or do {
> warn "Warning: can't open '$path' because: $!";
> return;
> };
>
>
>> while ($entry = readdir($dd)) {
>
> while ( my $entry = readdir $dd ) {
>
>
>> if ($entry =~ /^\.$/) {
>> next;
>> }
>>
>> if ($entry =~ /^\.\.$/) {
>> next;
>> }
>
>Or simply:
>
> next if $entry =~ /\A\.\.?\z/;
>
>
>> if (-d "$path/$entry") {
>> &update("$path/$entry", "$webpath/$entry");
>> next;
>> }
>> if (($entry !~ /.html$/i)&& ($entry !~ /.htm$/i)) {
>> next;
>> }
>
>You have to escape the period or it will match any character and you can
>combine both regular expressions into one (same as example above):
>
> next if $entry !~ /\.html?$/i;
>
>
>> my($fd) = $nextFd++;
>
>Why are you storing a number in a variable that you are going to use for
>a filehandle? That makes no sense.
>
>
>> if (!open($fd, "$path/$entry")) {
>> print STDERR "Warning: can't open
>> $path/$entry\n";
>> next;
>> }
>
>You should declare variables where you first use them and you should
>include $! in the error message so you know why it failed:
>
> open my $fd, '<', "$path/$entry" or do {
> warn "Warning: can't open '$path/$entry' because: $!";
> next;
> };
>
>
>> my(%words) = ( );
>
>Or just:
>
> my %words;
>
>
>> my($line);
>> while ($line =<$fd>) {
>
>Or just:
>
> while ( my $line = <$fd> ) {
>
>
>> # Support for turning off the search engine
>> # indexer for parts of a page. These markers
>> # must have a line to themselves. 3/13/00
>> if ($line =~ /<\!\-\- SEARCH-ENGINE-OFF -->/)
>> {
>> while ($line =<$fd>) {
>> if ($line =~ /<\!\-\-
>> SEARCH-ENGINE-ON -->/) {
>> last;
>> }
>> }
>> next;
>> }
>> # Simple HTML flusher
>> $line =~ s/\<.*?\>//g;
>> # Case insensitive
>> $line =~ tr/A-Z/a-z/;
>> # If it's not a letter, it's whitespace
>> $line =~ s/[^a-z]/ /g;
>
>You could also use tr/// for that:
>
> $line =~ tr/a-z/ /c;
>
>
>> my(@words) = split(/\s+/, $line);
>
>That might be better as:
>
> my @words = split ' ', $line;
>
>
>> my($p);
>> for $p (@words) {
>
>Better as:
>
> for my $p ( @words ) {
>
>
>> if (length($p)) {
>
>Why would $p have zero length? Probably because you are using /\s+/
>instead of ' ' as the first argument to split which will give you a zero
>length string if there is leading whitespace in $line.
>
>
>> $words{$p}++;
>> }
>> }
>> }
>> print OUT "$webpath/$entry ";
>> my($first) = 1;
>
>Why are you forcing list context on a scalar assignment?
>
>
>> while (($key, $val) = each(%words)) {
>
>Better as:
>
> while ( my ( $key, $val ) = each %words ) {
>
>
>> print OUT "$val:$key";
>> if ($first) {
>> $first = 0;
>> } else {
>> print OUT " ";
>> }
>
>So you want no space between the first and second "$val:$key" but a
>space after every other occurrence of "$val:$key" including at the end
>of the line?
>
>
>> }
>> print OUT "\n";
>
>It looks like you could probably do that while loop like this instead:
>
> print OUT join( ' ', map "$words{$_}:$_", keys %words ), "\n";
>
>
>> close($fd);
>> }
>> closedir($dd);
>> }
>> close(OUT);
>
>
>
>
>John
John,
You have really made a lot of no dount useful comments but the code is
not mine - it came from Tom Boutell's site and my only concern was to
be able to avoid indexing some particular files/folders.
Cheers
Geoff
------------------------------
Date: Mon, 28 Mar 2011 09:29:27 -0500
From: Don Pich <dpich@polartel.com>
Subject: REVISED: Variable length array to XML
Message-Id: <Q6WdnSqEPZLaBg3QnZ2dnUVZ_uOdnZ2d@polarcomm.com>
I've been messing with this on and off since my last post. Here is my
current code (cleaned up thanks to advice in this board, and made shorter
- Took out some unnecessary arrays):
___CODE___
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $infile = '/media/Docs/Scripts/Perl/Putty/TEST.csv';
open (CSVFILE, $infile) || die ("Could not open $infile! $!");
my @final = ();
my @temp1 = ();
my @temp2 = ();
while (my $line = <CSVFILE>) {
$line =~ tr/"\r\n//d;
$line =~ s/\\/:/g;
(my $C1,my $C2,my $C3,my $C4) = split ',', $line;
my @temp1 = split (':', $C2);
shift (@temp1);
my $tempcount = @temp1;
for (my $a = 0 ; $a <= $tempcount-1; $a++) {
push ( @temp2, $temp1[$a] );
}
push ( @temp2, $C1 );
push ( @temp2, $C4 );
my $temp2count = @temp2;
unshift ( @temp2, $temp2count );
push @final, [ @temp2 ];
$temp2count = 0;
@temp2 = ();
@temp1 = ();
}
close $infile;
exit(0);
___CODE___
Here is PART of the input
___ INPUT ___
"Default Settings","Sessions","",""
"ADMSND70AFC.01","Sessions\NOC-CO\AFC","","172.16.22.34"
"ARTHND16AFC.01","Sessions\NOC-CO\AFC","","172.16.22.26"
"CAVWND48AFC.01","Sessions\NOC-CO\AFC","","172.16.22.6"
"CRYSND04AFC.01","Sessions\NOC-CO\AFC","","172.16.22.46"
"CVLRND10AFC.01","Sessions\NOC-CO\AFC","","172.16.22.90"
"PKRVND05AFC.04 GFTN","Sessions\NOC-CO\AFC","","172.16.22.110"
"PMBNND60AFC.01","Sessions\NOC-CO\AFC","","172.16.22.78"
"STTMND02AFC.01","Sessions\NOC-CO\AFC","","172.16.22.74"
"WLCTND67AFC.01","Sessions\NOC-CO\AFC","","172.16.22.114"
"WVTNMN74AFC.01 DMAX","Sessions\NOC-CO\AFC","","172.16.22.94"
"WVTNMN74AFC.02 UMC1000","Sessions\NOC-CO\AFC","","172.16.22.102"
"PKRVND05APMAX.01","Sessions\NOC-CO\APMAX","","apuser@172.16.1.145"
"PKRVND05APMAX.02","Sessions\NOC-CO\APMAX","","apuser@172.16.1.146"
"DVN cisco","Sessions\NOC-CO\Cisco","","10.243.255.250"
"DYTNND01C1924.01","Sessions\NOC-CO\Cisco\1900 Series","","172.16.19.175"
"PKRVND05C1924.01","Sessions\NOC-CO\Cisco\1900 Series","","172.16.19.171"
___ INPUT ___
Here is the Output:
I want to ignore the first line ('Default' etc) and remove 'Sessions' as
putting that into this information is redundant. I also counted how may
array elements are in each array within the array (i.e. the first line
has four elements (NOC-CO,AFC,ADMSND70AFC.01,172.16.22.34). Hence, the
first array element is the count of array elements. Not necessarily sure
if it's necessary, but it's there.
What I am really having a hard time wrapping my head around is that it's
obvious that a hash is a better choice for sorting this data. I think
I'm making a mistake by actually using an array, but I understand
arrays. Hashes are simpler, but I must be missing the point.
My goal is to do as others have posted:
___ DESIRED RESULTS ___
<SESSION>
<NOC-CO>
<AFC>
<ADMSND70AFC.01>
<172.16.22.34>
</ADMSND70AFC.01>
etc...
___ DESIRED RESULTS ___
The code below should be what I need.
___ PROPOSED ADDITIONAL CODE ___
print qq(<SESSION>\n);
foreach my $k1 (sort keys %session)
{
print qq( <$k1>\n);
foreach my $k2 (sort keys %{$session{$k1}})
{
print qq( <$k2>\n);
foreach my $k3 (sort keys %{$session{$k1}{$k2}})
{
print qq( <$k3>\n);
print qq( $session{$k1}{$k2}{$k3}\n);
print qq( </$k3>\n);
}
print qq( </$k2>\n);
}
print qq( </$k1>\n);
}
print qq(</SESSION>\n);
___ PROPOSED ADDITIONAL CODE ___
I'm having a hard time wrapping my head around populating the hash.
------------------------------
Date: Mon, 28 Mar 2011 13:49:03 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: REVISED: Variable length array to XML
Message-Id: <878vvzqgog.fsf@quad.sysarch.com>
>>>>> "DP" == Don Pich <dpich@polartel.com> writes:
DP> I've been messing with this on and off since my last post. Here is my
DP> current code (cleaned up thanks to advice in this board, and made shorter
DP> - Took out some unnecessary arrays):
DP> use Data::Dumper;
DP> my $infile = '/media/Docs/Scripts/Perl/Putty/TEST.csv';
DP> open (CSVFILE, $infile) || die ("Could not open $infile! $!");
DP> my @final = ();
DP> my @temp1 = ();
DP> my @temp2 = ();
the = () are not needed.
DP> while (my $line = <CSVFILE>) {
DP> $line =~ tr/"\r\n//d;
DP> $line =~ s/\\/:/g;
DP> (my $C1,my $C2,my $C3,my $C4) = split ',', $line;
DP> my @temp1 = split (':', $C2);
why do you redeclare @temp1? and temp is ALWAYS a bad name. choose a
name that reflects the actual data or usage of a variable.
DP> shift (@temp1);
no comments. you need to say WHY you are doing something like losing an
element from the array.
DP> my $tempcount = @temp1;
DP> for (my $a = 0 ; $a <= $tempcount-1; $a++) {
$a is reserved for use by sort. also a bad generic name which doesn't
say anything.
DP> push ( @temp2, $temp1[$a] );
all you do is copy temp1 to temp2!! that doesn't need a loop.
DP> }
DP> push ( @temp2, $C1 );
DP> push ( @temp2, $C4 );
DP> my $temp2count = @temp2;
DP> unshift ( @temp2, $temp2count );
with all the temps, this code is impossible to follow. there is no easy
way to tell if it is correct or what.
DP> push @final, [ @temp2 ];
since all you do is collect that array and its count:
push @final, [ scalar @temp2, @temp2 ];
you have a fetish for unneeded temp variables
DP> $temp2count = 0;
DP> @temp2 = ();
DP> @temp1 = ();
all unneeded since the first uses of them assign fresh values to them.
you never use any of the computed data. this doesn't DO anything.
DP> I'm having a hard time wrapping my head around populating the hash.
a clue. PICK NAMES THAT MEAN SOMETHING. if the xml template has useful
names, why not use those in the code? then if you Data::Dumper the hash
tree, you can compare that to the desired outout and have a chance of
matching them up and fixing it.
also pick a templater to do the main work. look at Template::Simple
which can do this for you with the least amount of work on your part.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Sun, 27 Mar 2011 07:30:13 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: using File::Find
Message-Id: <86d3lcmya2.fsf@red.stonehenge.com>
>>>>> "Uri" == Uri Guttman <uri@StemSystems.com> writes:
Uri> and why do you like bareword file handles? name one advantage they have
Uri> over lexicals.
Less typing. For a one liner, I'd never use anything *but* a bareword,
and as short as possible, like F or B.
print "Just another Perl hacker,"; # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion
------------------------------
Date: Sun, 27 Mar 2011 18:27:42 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: using File::Find
Message-Id: <slrnioupbu.9a6.hjp-usenet2@hrunkner.hjp.at>
On 2011-03-27 14:30, Randal L. Schwartz <merlyn@stonehenge.com> wrote:
>>>>>> "Uri" == Uri Guttman <uri@StemSystems.com> writes:
>
>Uri> and why do you like bareword file handles? name one advantage they have
>Uri> over lexicals.
>
> Less typing. For a one liner, I'd never use anything *but* a bareword,
> and as short as possible, like F or B.
For one-liners most style rules don't apply.
hp
------------------------------
Date: Sun, 27 Mar 2011 13:03:02 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: using File::Find
Message-Id: <87r59sxzqx.fsf@quad.sysarch.com>
>>>>> "RLS" == Randal L Schwartz <merlyn@stonehenge.com> writes:
>>>>> "Uri" == Uri Guttman <uri@StemSystems.com> writes:
Uri> and why do you like bareword file handles? name one advantage they have
Uri> over lexicals.
RLS> Less typing. For a one liner, I'd never use anything *but* a bareword,
RLS> and as short as possible, like F or B.
not many one liners open files! and given one liners don't usually
enable strict, nor worry about closing files, the benefits of lexical
handles become moot. so this is one useful place for them.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Sun, 27 Mar 2011 23:44:59 +0000 (UTC)
From: Mladen Gogala <mgogala@no.address.invalid>
Subject: Re: using File::Find
Message-Id: <imoi5r$tqd$1@solani.org>
On Sat, 26 Mar 2011 23:17:52 -0400, Uri Guttman wrote:
> and why do you like bareword file handles?
Because I use Perl since 1993, when it was still version 4, and I have a
bunch of old scripts for which perlcritic now tells me that they're badly
written. Compatibility is the reason. I use lexicals in my newer scripts
but every now and then someone runs one of my old scripts through
perlcritic and starts nagging. Simply, bareword file handles used to be
The Perl Way(TM) and I don't think it's right to now flag them as bad
programming.
--
http://mgogala.byethost5.com
------------------------------
Date: Sun, 27 Mar 2011 22:04:31 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: using File::Find
Message-Id: <87mxkguhjk.fsf@quad.sysarch.com>
>>>>> "MG" == Mladen Gogala <mgogala@no.address.invalid> writes:
MG> On Sat, 26 Mar 2011 23:17:52 -0400, Uri Guttman wrote:
>> and why do you like bareword file handles?
MG> Because I use Perl since 1993, when it was still version 4, and I have a
MG> bunch of old scripts for which perlcritic now tells me that they're badly
MG> written. Compatibility is the reason. I use lexicals in my newer scripts
MG> but every now and then someone runs one of my old scripts through
MG> perlcritic and starts nagging. Simply, bareword file handles used to be
MG> The Perl Way(TM) and I don't think it's right to now flag them as bad
MG> programming.
and so what? rewrite them already. they are properly flagged in recent
perl 5 versions. no one cares about perl4 now (and i did tons of perl4
coding too but i didn't keep it alive forever).
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Mon, 28 Mar 2011 10:51:31 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: using File::Find
Message-Id: <slrnip0j0j.slo.hjp-usenet2@hrunkner.hjp.at>
On 2011-03-27 23:44, Mladen Gogala <mgogala@no.address.invalid> wrote:
> On Sat, 26 Mar 2011 23:17:52 -0400, Uri Guttman wrote:
>> and why do you like bareword file handles?
>
> Because I use Perl since 1993, when it was still version 4, and I have a
> bunch of old scripts for which perlcritic now tells me that they're badly
> written. Compatibility is the reason.
Compatibility with Perl4? Who's still running Perl4 (Or Perl 5.005 -
lexical file handles were only introduced in 5.6)? And you are only
compatible if you don't use *any* newer feature - just avoiding one
particular feature doesn't make your scripts compatible.
> I use lexicals in my newer scripts but every now and then someone runs
> one of my old scripts through perlcritic and starts nagging.
There is a simple answer to those people:
"I like the idea. Send patches!"
They want something changed. They have the source. They can change it
themselves.
> Simply, bareword file handles used to be The Perl Way(TM) and I don't
> think it's right to now flag them as bad programming.
There are now better ways and bareword file handles really are bad
programming. They were bad programming in 1993, there just wasn't an
alternative (except using a different programming language).
hp
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3336
***************************************