[19673] in Perl-Users-Digest
Perl-Users Digest, Issue: 1868 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Oct 4 06:05:32 2001
Date: Thu, 4 Oct 2001 03:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <1002189908-v10-i1868@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Thu, 4 Oct 2001 Volume: 10 Number: 1868
Today's topics:
Re: Accurate timing <iltzu@sci.invalid>
bigram statistical package by pedersen <joerivdv007@hotmail.com>
building perl - installing modules in Win32 <qvyht@removejippii.fi>
Re: building perl - installing modules in Win32 <Thomas@Baetzler.de>
Re: building perl - installing modules in Win32 <qvyht@removejippii.fi>
Re: Efficient code? <s.warhurst@rl.ac.uk>
Re: Efficient code? <s.warhurst@rl.ac.uk>
Re: Efficient code? <s.warhurst@rl.ac.uk>
Re: Efficient code? <s.warhurst@rl.ac.uk>
Re: Efficient code? <s.warhurst@rl.ac.uk>
Re: Efficient code? <s.warhurst@rl.ac.uk>
Re: filehandles between functions <iltzu@sci.invalid>
Re: func prototype enforcement with sub refs nobull@mail.com
Re: MAIN::PASSWD on true64 <goldbb2@earthlink.net>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 4 Oct 2001 09:54:41 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: Accurate timing
Message-Id: <1002188828.24925@itz.pp.sci.fi>
In article <slrn9rnavr.pp3.mgilfix@bofh.concordia.ca>, Michael Gilfix wrote:
>
> Hi Neb. You might want to try something like this:
>
> use Time::HiRes;
> my $mark = [ Time::HiRes::gettimeofday ];
> # ... Do cool stuff
> my $benchmark = Time::HiRes::tv_interval(
> $mark,
> [ Time::HiRes::gettimeofday ]
> );
> print "Elapsed time: $benchmark seconds\n";
>
> That'll give you floating point seconds.
Or, simpler yet:
use Time::HiRes qw( time );
my $time = -time;
# ... Do cool stuff
$time += time;
print "Elapsed time: $time seconds\n";
Or, using Time::Stopwatch (shameless plug):
use Time::Stopwatch;
tie my $time, 'Tie::Stopwatch';
# ... Do cool stuff;
print "Elapsed time: $time seconds\n";
For the last one, you still need Time::HiRes installed to get fractional
seconds.
--
Ilmari Karonen -- http://www.sci.fi/~iltzu/
"Get real! This is a discussion group, not a helpdesk. You post something,
we discuss its implications. If the discussion happens to answer a question
you've asked, that's incidental." -- nobull in comp.lang.perl.misc
------------------------------
Date: Thu, 04 Oct 2001 09:32:21 GMT
From: "fishy" <joerivdv007@hotmail.com>
Subject: bigram statistical package by pedersen
Message-Id: <F0Wu7.84814$6x5.18846076@afrodite.telenet-ops.be>
Hi, if anyone is familiar with the bigram statistical package by Pedersen,
maybe they can help me.
My problem is the following:
the count.pl offers one the possibility to count bigrams which are not
adjacent and for the user to set the distance, which is allowed between the
words. Suppose on sets the distance at 5, the programme will count bigrams
for a certain word up to 4 words to the right. E.g. The dog eats bones. The
programm will count the following bigrams :
(the, dog)
(the, eats)
(the, bones) and
(the, .)
I want to change the code so that the programme adds a weight to more
distant bigrams. For the example above, the bigram (the, eats) is a
distance3 bigram and should e.g. get the weight 0.5. So instead of adding 1
to the total of the bigrams containing (the, eats), it only adds 0.5, and so
on. I have altered the code, but it doesn't work. Can anyone tell me what
I'm doing wrong?
Here is the code:
#!/usr/local/bin/perl -w
# count.pl version 0.4
# Program to take one or more text files and calculate the bigram frequency
# for the whole corpus.
#
# Copyright (C) 2000-2001
# Ted Pedersen, University of Minnesota, Duluth
# tpederse@d.umn.edu
# Satanjeev Banerjee, University of Minnesota, Duluth
# bane0025@d.umn.edu
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
#---------------------------------------------------------------------------
--
# Start of program
#---------------------------------------------------------------------------
--
# have to be able to use unicode... so turn on unicode support...!
# use utf8;
# utf8 turned off by tdp since regexes are not using utf8 \p{} due
# to the poor efficiency that results in regex mataching
# we have to use commandline options, so use the necessary package!
use Getopt::Long;
# first check if no commandline options have been provided... in which case
# print out the usage notes!
if ( $#ARGV == -1 )
{
&minimalUsageNotes();
exit;
}
# now get the options!
GetOptions("verbose", "recurse", "version", "help", "histogram=s",
"frequency=i", "window=i", "stop=s", "newLine", "extended",
"token=s"
);
# set the variables according to what has been provided!
if ( defined $opt_verbose ) { $verbose = $opt_verbose; }
else { $verbose = 0; }
if ( defined $opt_recurse ) { $recurse = $opt_recurse; }
else { $recurse = 0; }
if ( defined $opt_version ) { $version = $opt_version; }
else { $version = 0; }
if ( defined $opt_help ) { $help = $opt_help; }
else { $help = 0; }
if ( defined $opt_frequency ) { $cutOff = $opt_frequency; }
else { $cutOff = 0; }
if ( defined $opt_histogram )
{
$enableHistogram = 1;
$histFile = $opt_histogram;
}
else { $enableHistogram = 0; }
if ( defined $opt_window ) { $windowSize = $opt_window; }
else { $windowSize = 2; }
if ( defined $opt_stop )
{
$stopList = $opt_stop;
if (!(-e $stopList ))
{
print "Cant find stoplist file $stopList.\n";
askHelp();
exit;
}
}
# now starts the fun...!
# if help has been requested, print out help!
if ( $help )
{
&showHelp();
exit;
}
# if version has been requested, show version!
if ( $version )
{
&showVersion();
exit;
}
# check if tokens file has been supplied. if so, try to open it and extract
the
# regex's.
if ( defined $opt_token )
{
if ( !( -e $opt_token))
{
print "Cant find token definition file $opt_token.\n";
askHelp();
exit;
}
open TOKEN, $opt_token || die "Couldnt open $opt_token\n";
while(<TOKEN>)
{
chomp;
s/^\s*//g;
s/\s*$//g;
if (length($_) <= 0) { next; }
if (!(/^\//) || !(/\/$/))
{
print STDERR "Ignoring regex with no delimiters: $_\n";
next;
}
s/^\///;
s/\/$//;
push @tokenRegex, $_;
}
close TOKEN;
}
else
{
push @tokenRegex, "\\p{IsWord}+";
push @tokenRegex, "\\p{IsPunct}";
}
# create the complete token regex
$tokenizerRegex = "";
foreach $token (@tokenRegex)
{
if ( length($tokenizerRegex) > 0 )
{
$tokenizerRegex .= "|";
}
$tokenizerRegex .= "(";
$tokenizerRegex .= $token;
$tokenizerRegex .= ")";
}
# if you dont have any tokens to work with, abort
if ( $#tokenRegex < 0 )
{
print "No token definitions to work with.\n";
askHelp();
exit;
}
# get/set the maximum token length
# $maxTokenLen = (defined $opt_maxTokenLen)?($opt_maxTokenLen):(100);
# having stripped the commandline of all the options et al, we should now be
# left only with the source/destination files
# so, first get hold of the destination file!
$destination = shift;
# check to see if a destination has been supplied at all...
if ( !($destination ) )
{
print "No file supplied. ";
askHelp();
exit;
}
# check to see if destination exists, and if so, if we should overwrite...
if ( -e $destination )
{
print "File $destination exists! Overwrite (Y/N)? ";
$reply = <STDIN>;
chomp $reply;
$reply = uc $reply;
exit 0 if ($reply ne "Y");
}
# having ascertained that we may open the destination file for output, lets
do
# so...
open ( DST, ">$destination" ) || die "couldnt open $destination";
# whats left in the command line are paths. go thru them and salvage all
# text files to be processed. the following function does just that, putting
# all useful files in @sourceFiles :o)
&getSourceFiles(@ARGV);
# output the files found, if verbose set!
if ( $verbose )
{
print "\nFound the following $index file(s) to source from: \n";
for ( $i = 0; $i < $index; $i ++ ) { print "$sourceFiles[$i]\n"; }
print "\n";
}
# set bigramTotal, which is the variable where we'll get the sample size.
$bigramTotal = 0;
# now get the source files one by one from @sourceFiles, and process them in
a
# loop!
foreach $source (@sourceFiles)
{
# we already know that the file exists... that is checked by
# &getSourceFiles, so no need to check it again! so just open the source
# file!!
open( SRC, "$source" ) || die "Coudlnt open $source, quitting";
# having successfully opened the source file start reading it...
if ( $verbose ) { print "Accessing file $source.\n"; }
# start off the window index which will tell us where in the window array
# we are right now!!!
$windex = 0; # the NEXT place in the array to write to!
# read in the file, tokenize and process the tokens all in one fell swoop
# fell because of the (1) below :)
while (<>)
{
if ( defined $opt_newLine )
{
$opt_newLine = 1;
$windex = 0;
}
while ( /$tokenizerRegex/g )
{
$token = $&;
processToken($token);
}
}
}
# that is the tokenizing and token-processing done!
# now to put in the stop list, if its been provided
if ( defined $stopList )
{
# we have already checked that the stop list exists... so go ahead and open
it!
open ( STP, $stopList ) ||
die ("Couldnt open the stoplist file $stopList\n");
$stopString = "";
while ( <STP> )
{
chomp;
s/^\s+//;
s/\s+$//;
if ( /^\// && /\/$/ )
{
s/^\///;
s/\/$//;
if ( !($stopString eq "") )
{
$stopString .= "|";
}
$stopString .= "(" . $_ . ")";
}
}
close STP;
# having got the file, go thru the pair hash, getting rid of the
# offending bigrams
foreach ( keys %pairFreq )
{
if ( $_ =~ /<>/ )
{
$tempArray[0] = $`;
$tempArray[1] = $';
}
else
{
print STDERR "Fatal internal error! Aborting!\n";
exit;
}
if ( ( $tempArray[0] =~ /^($stopString)$/ ) && ( $tempArray[1] =~
/^($stopString)$/ ) )
{
# reduce bigram total by the frequency of this bigram
$bigramTotal -= $pairFreq{$_};
# reduce left and right counts (and delete keys if
# they become zero) by the frequency of this bigram
$leftFreq{$tempArray[0]} -= $pairFreq{$_};
if ( $leftFreq{$tempArray[0]} <= 0 )
{
delete $leftFreq{$tempArray[0]};
}
$rightFreq{$tempArray[1]} -= $pairFreq{$_};
if ( $rightFreq{$tempArray[1]} <= 0 )
{
delete $rightFreq{$tempArray[1]};
}
# remove this bigram!
delete $pairFreq{$_};
}
}
}
# now to introduce frequency cut off's!
foreach ( keys %pairFreq )
{
if ( $pairFreq{$_} < $cutOff )
{
$bigramTotal -= $pairFreq{$_};
if ( $_ =~ /<>/ )
{
$tempArray[0] = $`;
$tempArray[1] = $';
}
else
{
print STDERR "Fatal internal error! Aborting!\n";
exit;
}
$leftFreq{$tempArray[0]} -= $pairFreq{$_};
if ( $leftFreq{$tempArray[0]} <= 0 )
{
delete $leftFreq{$tempArray[0]};
}
$rightFreq{$tempArray[1]} -= $pairFreq{$_};
if ( $rightFreq{$tempArray[1]} <= 0 )
{
delete $rightFreq{$tempArray[1]};
}
# remove this bigram!
delete $pairFreq{$_};
}
}
# end of processing all the files. now to write out the information.
if ( $verbose ) { print "Writing to $destination.\n"; }
if ( defined $opt_extended )
{
$opt_extended *= 1;
# print out the window size used
print DST "\@count.WindowSize=$windowSize\n";
# print out the frequency cut off used
print DST "\@count.FrequencyCut=$cutOff\n";
}
# finally print out the total bigrams
print DST "$bigramTotal\n";
foreach ( sort { $pairFreq{$b} <=> $pairFreq{$a} } keys %pairFreq )
{
if ( $_ =~ /<>/ )
{
$tempArray[0] = $`;
$tempArray[1] = $';
}
else
{
print STDERR "Fatal internal error! Aborting!\n";
exit;
}
# bit stuffing... if a line starts with a single @, its a command (extended
# output). if it starts with two consequtive @'s, then its a single
'literal' @.
$WeighedTotal = $pairfreq + $Dist3FreqWeight + $Dist4FreqWeight +
$Dist5FreqWeight;
if ( $_ =~ /^@/ ) { print DST "@"; }
print DST "$__"; # bigram
print DST "$pairFreq{$_} "; # bigram freq
print DST "$Dist3FreqWeight{$_} ";
print DST "$Dist4FreqWeight{$_} ";
print DST "$Dist5FreqWeight{$_} ";
print DST "$WeighedTotal{$_} ";
print DST "$leftFreq{$tempArray[0]} "; # single freq of first word
print DST "$rightFreq{$tempArray[1]}\n"; # single freq of second word
}
foreach ( sort { $pairDist3Freq{$b} <=> $pairDist3Freq{$a} } keys
%pairDist3Freq )
{
if ( $_ =~ /<>/ )
{
$tempArray[0] = $`;
$tempArray[1] = $';
}
else
{
print STDERR "Fatal internal error! Aborting!\n";
exit;
}
# bit stuffing... if a line starts with a single @, its a command (extended
# output). if it starts with two consequtive @'s, then its a single
'literal' @.
$WeighedTotal = $pairfreq + $Dist3FreqWeight + $Dist4FreqWeight +
$Dist5FreqWeight;
if ( $_ =~ /^@/ ) { print DST "@"; }
print DST "$__"; # bigram
print DST "$pairFreq{$_} "; # bigram freq
print DST "$Dist3FreqWeight{$_} ";
print DST "$Dist4FreqWeight{$_} ";
print DST "$Dist5FreqWeight{$_} ";
print DST "$WeighedTotal{$_} ";
print DST "$leftFreq{$tempArray[0]} "; # single freq of first word
print DST "$rightFreq{$tempArray[1]}\n"; # single freq of second word
}
foreach ( sort { $pairDist4Freq{$b} <=> $pairDist4Freq{$a} } keys
%pairDist4Freq )
{
if ( $_ =~ /<>/ )
{
$tempArray[0] = $`;
$tempArray[1] = $';
}
else
{
print STDERR "Fatal internal error! Aborting!\n";
exit;
}
# bit stuffing... if a line starts with a single @, its a command (extended
# output). if it starts with two consequtive @'s, then its a single
'literal' @.
$WeighedTotal = $pairfreq + $Dist3FreqWeight + $Dist4FreqWeight +
$Dist5FreqWeight;
if ( $_ =~ /^@/ ) { print DST "@"; }
print DST "$__"; # bigram
print DST "$pairFreq{$_} "; # bigram freq
print DST "$Dist3FreqWeight{$_} ";
print DST "$Dist4FreqWeight{$_} ";
print DST "$Dist5FreqWeight{$_} ";
print DST "$WeighedTotal{$_} ";
print DST "$leftFreq{$tempArray[0]} "; # single freq of first word
print DST "$rightFreq{$tempArray[1]}\n"; # single freq of second word
}
foreach ( sort { $pairDist5Freq{$b} <=> $pairDist5Freq{$a} } keys
%pairDist5Freq )
{
if ( $_ =~ /<>/ )
{
$tempArray[0] = $`;
$tempArray[1] = $';
}
else
{
print STDERR "Fatal internal error! Aborting!\n";
exit;
}
# bit stuffing... if a line starts with a single @, its a command (extended
# output). if it starts with two consequtive @'s, then its a single
'literal' @.
$WeighedTotal = $pairfreq + $Dist3FreqWeight + $Dist4FreqWeight +
$Dist5FreqWeight;
if ( $_ =~ /^@/ ) { print DST "@"; }
print DST "$__"; # bigram
print DST "$pairFreq{$_} "; # bigram freq
print DST "$Dist3FreqWeight{$_} ";
print DST "$Dist4FreqWeight{$_} ";
print DST "$Dist5FreqWeight{$_} ";
print DST "$WeighedTotal{$_} ";
print DST "$leftFreq{$tempArray[0]} "; # single freq of first word
print DST "$rightFreq{$tempArray[1]}\n"; # single freq of second word
}
# having done it all, close all open files...
close SRC;
close DST;
# now check to see if we need to output the histogram... if not leave!
if ( !($enableHistogram) ) { exit; }
# now see if the histogram file exists, and if so if we can overwrite it...
if ( -e $histFile )
{
print "File $histFile exists! Overwrite (Y/N)? ";
$reply = <STDIN>;
chomp $reply;
$reply = uc $reply;
exit 0 if ($reply ne "Y");
}
# having ascertained that we may open the histogram file for output, lets do
# so...
open ( HST, ">$histFile" ) || die "couldnt open $histFile";
# now to construct the histogram hash...
$maxFreq = 0;
foreach ( keys %pairFreq )
{
$histogram{$pairFreq{$_}}++;
if ( $pairFreq{$_} > $maxFreq ) { $maxFreq = $pairFreq{$_}; }
}
# having done that, lets print out to the hashFile...
print HST "Total bigrams = $bigramTotal\n";
for ( $i = 1; $i <= $maxFreq; $i++ )
{
if ( exists $histogram{$i} )
{
printf HST "Number of bigrams that occurred %3d time(s) = %5d (%.2f
percent)\n", $i, $histogram{$i}, ($histogram{$i}*$i*100)/$bigramTotal;
}
}
close HST;
# ... and thats it! :o)
#---------------------------------------------------------------------------
--
# User Defined Function Definitions
#---------------------------------------------------------------------------
--
# function to process tokens
sub processToken
{
my $token = shift;
# first put the word into the window array!
$window[$windex] = $token;
# if this is the first word of the corpus, just keep going!
if ( $windex == 0 )
{
$windex++;
return;
}
# otherwise, create the bigrams! the word that's just come in will
# go in $collocation[1], while we will successively put the rest
# of the array into $collocation[0] to create all the bigrams one
# by one!
$collocation[1] = $window[$windex];
for ( $i = 0; $i < $windex; $i++ )
{
$collocation[0] = $window[$i];
# create the bigram string in $pair...
$pair = $collocation[0] . "_" . $collocation[1];
# take into account how far apart the two words are
# when forming the bigram and multiply that count
# with a weighing factor
if ($windex == 2) {
# increment the total bigram count.
$bigramTotal++;
# increment the frequency count of this bigram
$pairFreq{$pair}++;
}
if ($windex == 3) {
$bigramTotal++;
# increment the frequency count of this bigram
$pairDist3Freq{$pair}++;
# multiply the frequency of this bigram with the weight
# for distance 3 bigrams
$Dist3FreqWeight = $pairDist3Freq * 0.5;
}
if ($windex == 4) {
$bigramTotal++;
# increment the frequency count of this bigram
$pairDist4Freq{$pair}++;
# multiply the frequency of this bigram with the weight
# for distance 4 bigrams
$Dist4FreqWeight = $pairDist4Freq * 0.25;
}
if ($windex == 5) {
$bigramTotal++;
# increment the frequency count of this bigram
$pairDist5Freq{$pair}++;
# multiply the frequency of this bigram with the weight
# for distance 5 bigrams
$Dist5FreqWeight = $pairDist5Freq * 0.125;
}
# increment the left count and right counts of the two words
# that make up the bigram
$leftFreq{$collocation[0]}++;
$rightFreq{$collocation[1]}++;
# $pairFreq{$pair}++;
}
# having dealt with all the bigrams, increment the windex, if less
# than the size, or shift out the first element of the array to
# make place for the next word thats coming in!
# if ( $windex < $windowSize - 1 ) { $windex++; }
# else { shift @window; }
}
# Function &getSourceFiles: function to take the command tail and
# return an array of text files to be used to count! while going thru the
# command line do the following processing:
#
# 1> if the string is a text file and can be opened, add it to the array.
# 2> if the string is a directory name, find all text files in that
directory,
# and append to array.
# 3> if the -r (recursive) option is set, go into all subdirectories of that
# directory too, to do the above!
sub getSourceFiles
{
# get the next commandline string...
my $nextString = shift;
$index = 0;
while ( $nextString )
{
if ( !( -e $nextString ) )
{
# file doesn't exist... ignore!
if ( $verbose ) { print "File $nextString does not exist!\n"; }
$nextString = shift;
next;
}
if ( !( -r $nextString ) )
{
# file can't be read... ignore!
if ( $verbose ) { print "File $nextString cant be read!\n"; }
$nextString = shift;
next;
}
if ( -d $nextString )
{
# this is a directory, go and search this directory for text files
&directorySearch( $nextString );
$nextString = shift;
next;
}
if ( !( -T $nextString ) )
{
# file is not a text file... ignore!
if ( $verbose ) { print "$nextString is not a text file!\n"; }
$nextString = shift;
next;
}
$sourceFiles[$index] = $nextString;
$index++;
$nextString = shift;
}
}
# function to (possibly recursively) search inside the given directory for
# text files
sub directorySearch
{
my $directory = $_[0];
opendir DIR, $directory || "Couldnt open directory $directory!\n";
my @files = grep !/^\./, readdir DIR;
@files = map "$directory/$_", @files;
closedir DIR;
my $file = "";
foreach $file (@files)
{
if ( ( -d $file ) && ( $recurse ) ) { &directorySearch($file); }
if ( ( -T $file ) )
{
$sourceFiles[$index] = $file;
$index++;
}
}
}
# function to output a minimal usage note when the user has not provided any
# commandline options
sub minimalUsageNotes
{
print "Usage: count.pl [OPTIONS] DESTINATION SOURCE [[, SOURCE] ...]\n";
askHelp();
}
# function to output help messages for this program
sub showHelp
{
print "Usage: count.pl [OPTIONS] DESTINATION SOURCE [[, SOURCE] ...]\n\n";
print "Counts the frequency of bigrams occurring in SOURCE and outputs
results to
DESTINATION. SOURCE may be a file or a directory. If a directory is
specified
as SOURCE, all text files in that directory (and all sub directories, if
--recurse is set) are counted. Optionally outputs a histogram to FILE (see
option --histogram below).\n\n";
print "OPTIONS:\n\n";
print " --verbose Switches to verbose mode.\n\n";
print " --version Prints the version number.\n\n";
print " --help Prints this help message.\n\n";
print " --recurse Recursively searches thru all directories mentioned
in the commandline for text files to count from.\n\n";
print " --frequency N Ignores all bigrams of frequency < N.\n\n";
print " --histogram FILE Outputs histogram to FILE. Tabulates how many times
bigrams of a given frequency have occured.\n\n";
print " --window N Sets window size to N. N = 2 by default.\n\n";
print " --stop FILE Removes all bigrams made up entirely of words from
FILE. Words in FILE should be separated by new line.\n\n";
print " --newLine Prevents bigrams from spanning across new-line\n";
print " characters.\n\n";
print " --token FILE Picks up regular expressions from FILE to use to\n";
print " tokenize the input text file. By default two\n";
print " regular expressions are provided.\n\n";
print " --extended Outputs chosen window size and frequency cut-off in
\"extended\" format. Warning: Extended output is not readable
by BSP version <= 0.3.\n\n";
}
# function to output the version number
sub showVersion
{
print "count.pl - Version 0.4\n";
print "A component of the BSP Version 0.4\n";
print "Copyright (C) 2000-2001, Ted Pedersen & Satanjeev Banerjee\n";
}
# function to output "ask for help" message when the user's goofed up!
sub askHelp
{
print "Type count.pl --help for help.\n";
}
------------------------------
Date: Thu, 04 Oct 2001 08:21:41 GMT
From: Hessu <qvyht@removejippii.fi>
Subject: building perl - installing modules in Win32
Message-Id: <3BBC1ACD.5EC25E99@removejippii.fi>
1) ok, I need a compiler but have it to be VC++?
(cannot afford it!)
Can I use Borland's free stuff or what comes
with Cygwin!?
2) I downloaded ActiveState, in Installer.bat
there several instances of string "ActivePerl\Perl\"
i guess it means "c:\ActivePerl\Perl\"
can I replace them with "MyDir\Perl\" (meaning "c:\MyDir\Perl\")?
-and what is "ActivePerl\Perl\bin\perl" then? A mistyping?!
------------------------------
Date: Thu, 04 Oct 2001 11:27:21 +0200
From: =?ISO-8859-1?Q?Thomas_B=E4tzler?= <Thomas@Baetzler.de>
Subject: Re: building perl - installing modules in Win32
Message-Id: <7kaorto606kiuhfmsbpke41p81rjo22cof@4ax.com>
On Thu, 04 Oct 2001 08:21:41 GMT, Hessu <qvyht@removejippii.fi> wrote:
>1) ok, I need a compiler but have it to be VC++?
>(cannot afford it!)
>Can I use Borland's free stuff or what comes
>with Cygwin!?
Only if you roull your own Perl from the sources. Apart from that, the
modules will have to be compield with the same compiler as the Perl
core.
Now, the Good News is that not every module has stuff that needs to be
compiled. Perl-only modules just need a working nmake. Check the FAQ on
the ActiveState site for pointers to an unbundled nmake.
>2) I downloaded ActiveState, in Installer.bat
[...]
Download the MSI package and and the suitable MSInst for your system.
There are problems with the non-MSI package.
HTH,
--
use strict;my($i,$t,@r)=(0,'5 -.@BHJPT4acd6e2hk2lmn2o4r2s3tuz',map{ord}
split//,unpack('u*','L#`T&)QD5#0`#!!`#%1D)#08`#P05!!(3``$$"``#"0L&``('.
'"`P<!`````0$`'));$t=~s/(\d)(.)/$2x$1/eg;map{$t.=substr$t,$i,1,''while
$_--;$i++}@r;print"$t\n";# Thomas@Baetzler.de - http://baetzler.de/perl
------------------------------
Date: Thu, 04 Oct 2001 09:48:10 GMT
From: Hessu <qvyht@removejippii.fi>
Subject: Re: building perl - installing modules in Win32
Message-Id: <3BBC2F15.BD5E8A95@removejippii.fi>
Thomas Bätzler wrote:
>
> On Thu, 04 Oct 2001 08:21:41 GMT, Hessu <qvyht@removejippii.fi> wrote:
> >1) ok, I need a compiler but have it to be VC++?
> >(cannot afford it!)
> >Can I use Borland's free stuff or what comes
> >with Cygwin!?
>
> Only if you roull your own Perl from the sources. Apart from that, the
> modules will have to be compield with the same compiler as the Perl
> core.
>
> Now, the Good News is that not every module has stuff that needs to be
> compiled. Perl-only modules just need a working nmake. Check the FAQ on
> the ActiveState site for pointers to an unbundled nmake.
>
> >2) I downloaded ActiveState, in Installer.bat
> [...]
>
> Download the MSI package and and the suitable MSInst for your system.
> There are problems with the non-MSI package.
>
> HTH,
> --
> use strict;my($i,$t,@r)=(0,'5 -.@BHJPT4acd6e2hk2lmn2o4r2s3tuz',map{ord}
> split//,unpack('u*','L#`T&)QD5#0`#!!`#%1D)#08`#P05!!(3``$$"``#"0L&``('.
> '"`P<!`````0$`'));$t=~s/(\d)(.)/$2x$1/eg;map{$t.=substr$t,$i,1,''while
> $_--;$i++}@r;print"$t\n";# Thomas@Baetzler.de - http://baetzler.de/perl
Done.Thanks!
------------------------------
Date: Thu, 4 Oct 2001 09:31:50 +0100
From: "S Warhurst" <s.warhurst@rl.ac.uk>
Subject: Re: Efficient code?
Message-Id: <9ph6pn$lsm@newton.cc.rl.ac.uk>
"Steffen Müller" <tsee@gmx.net> wrote in message
news:9pd45r$275$02$1@news.t-online.com...
> > @emails = ('john@here.com', 'tim@there.com', 'bill@anisp.com',
> > 'tim@there.com', 'john@here.com')
> >
> > and wanted to make them unique I would use the line:
> >
> > @emails = do {my %h; grep {!$h {$_} ++} @emails}
>
> undef %saw;
> @out = grep(!$saw{$_}++, @in);
That's a very neat little bit of code :) A couple of questions spring to
mind though:
1) What if the strings in the array being sorted conatined other
non-alphanumeric characters? In other words, can the hash keys contains ANY
character? I seem to remember I had problems when I had a hash key starting
with an underscore (eg: _lake_). But what about regexps characters? Could a
string like "abc$^(/\)efg" be used as a hash key?
2) With the above code, the hash key = email address & hash value = count.
The @out variable only contains the email addresses. Is there an easy way of
transferring the cotents of the hash into a multi-dimensional array? The
reason I might want to do this is if I want to add a 3rd column to the data.
Thanks
Spencer
------------------------------
Date: Thu, 4 Oct 2001 09:47:18 +0100
From: "S Warhurst" <s.warhurst@rl.ac.uk>
Subject: Re: Efficient code?
Message-Id: <9ph7mo$tg2@newton.cc.rl.ac.uk>
"S Warhurst" <s.warhurst@rl.ac.uk> wrote in message
news:9ph6pn$lsm@newton.cc.rl.ac.uk...
> 2) With the above code, the hash key = email address & hash value = count.
> The @out variable only contains the email addresses. Is there an easy way
of
> transferring the cotents of the hash into a multi-dimensional array? The
> reason I might want to do this is if I want to add a 3rd column to the
data.
Actually, ignore that question.. David answered that below :)
Spencer
------------------------------
Date: Thu, 4 Oct 2001 09:55:07 +0100
From: "S Warhurst" <s.warhurst@rl.ac.uk>
Subject: Re: Efficient code?
Message-Id: <9ph85c$1754@newton.cc.rl.ac.uk>
"David Wall" <darkon@one.net> wrote in message
news:Xns912EA25175D93darkononenet@207.126.101.97...
> Steffen Müller already answered this. I just want to add: Whenever you
> want something unique, think hash. It will make your life much easier
> than trying to do it with with arrays.
There's a couple of things that made me a little hesitant to use a hash -
pls see my reply to Steffen above.
> my @array = (['bath.ac.uk', '46'],
> ['blackpool.ac.uk', '22'],
> ['hull.ac.uk', '13'],
> ['sussex.ac.uk', '36'],
> ['hull.ac.uk', '31'],
> ['blackpool.ac.uk', '2']);
> my %domains;
> foreach my $ref (@array) {
> $domains{$ref->[0]} += $ref->[1];
> }
You've just caused the penny to drop with regard to accessing columns in
multidimensional arrays, ie: "$ref->[1]" etc. Before, I would have gone to
rather long-winded lengths to split up $ref into another temp array to
access the column values :)
> # now %domains has all the info you need
> # if you want it back in @array:
>
> undef @array; # get rid of the old data in @array
> foreach my $key (sort keys %domains) {
> push @array, [ $key, $domains{$key} ];
> }
>
> Do you have a copy of the _Perl_Cookbook_? It has all sorts of examples
> of how to do things with Perl. Using the FAQ and the cookbook, you can
> learn how to do do all sorts of useful and interesting things.
I haven't.. I got the impression from a review I read on it that it covered
more in-depth scenarios, rather than giving a load of useful one-liners.
Just looked at some more reviews though and it does look good. I'll plug the
boss to get hold of it ;)
Many thanks for your help.
Spencer
------------------------------
Date: Thu, 4 Oct 2001 10:01:33 +0100
From: "S Warhurst" <s.warhurst@rl.ac.uk>
Subject: Re: Efficient code?
Message-Id: <9ph8he$12re@newton.cc.rl.ac.uk>
"Logan Shaw" <logan@cs.utexas.edu> wrote in message
news:9pe4na$b27$1@charity.cs.utexas.edu...
> Another way you can do this is to quote the regular expressions using
> "qr". I think this will be faster since they won't have to keep
> getting rebuilt into finite state machines[1]. Doing that instead
> would result in code like this:
>
> my @list = ( qr/string1/, qr/string2/, qr/"[^"]+"/, qr/(this|that)
stuff/ );
>
> my $line;
> while ($line = <DATA>)
> {
> foreach my $regex (@list)
> {
> print "$regex matched at $.\n" if $line =~ $regex;
> }
> }
I hadn't heard of the qr function before. It isn't in the reference guide I
use.. and after a bit of searching only seems to appear in about half of the
online reference guides. I note there are other ones aswell like qq & qx.
Is there a complete reference guide you can buy in book form that gives ALL
the perl 5.6 functions & operators etc, that gives a detailed descriptions
of each one and an example of how it is most commonly used?
Thanks
Spencer
PS: I wont even ask what finite state machines are ;)
------------------------------
Date: Thu, 4 Oct 2001 10:25:44 +0100
From: "S Warhurst" <s.warhurst@rl.ac.uk>
Subject: Re: Efficient code?
Message-Id: <9ph9up$19i0@newton.cc.rl.ac.uk>
"Bart Lateur" <bart.lateur@skynet.be> wrote in message
news:tb9krtc20kr9uqii0jaes1uc60kabnvbuo@4ax.com...
> Eh... Are you trying to find if there is at least one that matches? I
> think I'd try to combine all the tests in one regex, using "|":
>
> /I like (?:cornflakes|honeypops|muesli) for breakfast/
>
> That's three of them.
>
> If it's something else you want... I don't know. Be more specific.
That was just to say that some of the things I was searching for required
regular expression matching, since I thought that someone might suggest
storing the 50 strings in an array & looping through those for each line of
the file, but I didn't know how to go about doing regexp matching using that
method. Anyway, David & Logan have answered that one, thanks.
> Just sum them in a hash.
>
> my %total;
> $total{$_->[0]} += $_->[1] foreach @array;
> @table = map { [ $_ => $table{$_} ] } sort keys %total;
That last line in particular is very revealing and gives a one-line way of
turning a hash back into a table array. I'm going to save this thread to
disk.. it's an invaluable reference in itself :)
Thanks
Spencer
------------------------------
Date: Thu, 4 Oct 2001 10:44:03 +0100
From: "S Warhurst" <s.warhurst@rl.ac.uk>
Subject: Re: Efficient code?
Message-Id: <9phb14$15ie@newton.cc.rl.ac.uk>
"Joe Schaefer" <joe+usenet@sunstarsys.com> wrote in message
news:m33d51n4rw.fsf@mumonkan.sunstarsys.com...
<snip>
> The line marked XXX might be accelerated by first inverting @line_break
> outside the loops:
>
> my %line_num = map {$line_break[$_] => $_} 0..$#line_break;
>
> # just in case last char isn't "\n":
> $line_num{ -1 } = $line_num{ $line_break[-1] };
>
> and replacing the line below the XXX with
>
> $n = $line_num{ index($_, "\n", pos) };
>
> but I haven't tested this either, so YMMV.
Hmmm.. that's quite a different way of doing it than otherwise suggested. I
can imagine how it might speed things up though. I'll have to do some
benchmarking on this when I get more time. Thanks for writing all that out..
this thread is all going into a reference file I'm keeping :)
Regards
Spencer
------------------------------
Date: 4 Oct 2001 09:42:31 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: filehandles between functions
Message-Id: <1002187942.23902@itz.pp.sci.fi>
[N.B.: This is a superseded version of an earlier article, which
erraneously contained "-s $fh" intead of "-f $fh". This sparked a lot
of discussion about the behavior of -s on pipes, which can be summed as
"it's unreliable, and in any case not what is wanted here". Note that
the purpose is indeed to slurp everything from a filehandle until EOF,
not merely to read one bufferful. I am superseding this article to
avoid the possibility of anyone later cut-and-pasting the buggy code.]
In article <x74rph9xvo.fsf@home.sysarch.com>, Uri Guttman wrote:
>>>>>> "GW" == Garry Williams <garry@ifr.zvolve.net> writes:
>
> GW> sub read_file {
> GW> open( my $fh, $infile ) || die "can't open '$infile': $!\n";
> GW> chomp( my @lines = <$fh> );
> GW> return join "", @lines;
> GW> }
>
>if you want to slurp a file, this is a useful sub
>
>sub read_file {
> my( $file_name ) = shift ;
> my( $buf ) ;
> local( *FH ) ;
> open( FH, $file_name ) || carp "can't open $file_name $!" ;
> return <FH> if wantarray ;
> read( FH, $buf, -s FH ) ;
> return $buf ;
>
># older slow way
># local( $/ ) unless wantarray ;
># <FH>
>}
Another variant, which works for pipes and other funny things too:
sub slurp ($) {
my $fh = shift;
return <$fh> if wantarray;
local $/;
return <$fh> unless -f $fh; # bug fixed in this version
read $fh, my $buf, -s _;
return $buf;
}
This one takes a filehandle, not a filename. If you don't want to open
the file yourself, just add a wrapper:
sub read_file ($) {
my $file = shift;
local *FH;
open FH, $file or do {
my $err = $!; # require() may reset $!
require Carp;
Carp::confess("Can't open '$file': $err\n");
};
return slurp(\*FH);
}
--
Ilmari Karonen -- http://www.sci.fi/~iltzu/
"Get real! This is a discussion group, not a helpdesk. You post something,
we discuss its implications. If the discussion happens to answer a question
you've asked, that's incidental." -- nobull in comp.lang.perl.misc
------------------------------
Date: 04 Oct 2001 08:49:19 +0100
From: nobull@mail.com
Subject: Re: func prototype enforcement with sub refs
Message-Id: <u9bsjnzs45.fsf@wcl-l.bham.ac.uk>
paul.a.balyoz@intel.com (Paul Balyoz) writes:
> How do you enforce function prototypes when you call a subroutine
> from a reference to it? It ignores my prototype.
Prototype handling in Perl is done at compile time and Perl5 doesn't
have typed function pointers like, say, C++ or Delphi. So the simple
answer is you can't honour prototypes on indirect function invoctions
and method calls.
> use strict;
> sub a($) {
> print @_;
> }
> $b = \&a;
> &$b(1,2,3,4);
You can call $b indirectly with a predetermined prototype and even
make Perl barf at runtime if &$b has the wrong prototype thus:
use strict;
use warnings;
sub my_proto ($);
sub a($) {
print @_;
}
$b = \&a;
{
no warnings 'redefine';
use warnings FATAL=>'prototype';
*my_proto = $b;
my_proto(1,2,3,4); # Compile-time error
}
The only way to have the prototype change at runtime would be to use
eval but that solution would almost certainly be orders of magniture
worse than the problem.
> $b = \&a;
>
> Is there any way to get a reference to a sub without using the "&" syntax?
$b = *a{CODE};
> That would probably fix it....
No it would make no difference.
> In Perl 5 when you call a sub with & in front, it uses some kind of
> Perl 4 calling mode (ignores the prototype)
True.
> which I think is biting me here.
Wrong. You still get the same semantics if you do an indirect call
using -> syntax.
--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
------------------------------
Date: Thu, 04 Oct 2001 03:20:01 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: MAIN::PASSWD on true64
Message-Id: <3BBC0DA1.7E7970D3@earthlink.net>
Tony wrote:
>
> Hi All,
>
> I'm new to perl and trying some scripts from a book, the following
> code gives me this error;
>
> Name "main::passwd" used only once: possible typo at /scripts/whatgrp
> line 28.
>
> Platform;
> Alpha - True64 4.05
>
> Code;
>
#!/usr/local/bin/perl -w
use strict;
> my @groupNames = ();
> my @groupIds = ();
>
> # Do this until there are no more records.
> ***** offending line below ****
> while (($groupName, $passwd, $gid, $memberList) = getgrent())
while( my ($groupName, $gid, $memberlist) = (getgrent)[0,2,3] )
or:
while( my ($groupName, undef, $gid, $memberlist) = getgrent )
> {
> # We need to look for the login id in the member list.
> if ($memberList =~ /$LoginId/)
if( $memberList =~ m[(?!<\w)\Q$LoginID\E(?!\w)] )
Otherwise, if $LoginID happens to be a substring of a valid member, you
get a true when it should be false. A better way might be:
my %memberList = map { $_ => 1 } split ' ', $memberList;
if( $memberList{$LoginID} )
> {
> # Add the group name ti the group name array.
> push (@groupNames, $groupName);
>
> # Add the GID to the GID array.
> push (@groupIds, $gid);
> }
> }
>
> Any thoughs would be appreciated, as the script completes successfully
> otherwise, my guess is that Digital I mean Compaq I mean HP i thinks,
> implementation of getgrent returning differently nut the man doco just
> states the 4 paramaters as they are sequenced above.
The problem is that you are assigning to a global variable $passwd, and
never using it anywhere else in your script. Either get rid of that
variable, or declare it before using it.
--
"I think not," said Descartes, and promptly disappeared.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 1868
***************************************