[23942] in Perl-Users-Digest
Perl-Users Digest, Issue: 6143 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Feb 16 18:06:04 2004
Date: Mon, 16 Feb 2004 15:05:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 16 Feb 2004 Volume: 10 Number: 6143
Today's topics:
ANNOUNCE: Data::Mining::AssociationRules (Dan Frankowski)
Re: Confused by hashes/data structures (Ian Petts)
Re: Confused by hashes/data structures <ebohlman@earthlink.net>
do not reload anonymous@coolgroups.com
Re: do not reload ctcgag@hotmail.com
Re: do not reload anonymous@coolgroups.com
dup remove - why/how does this work - NEWBIE jason@cyberpine.com
Re: dup remove - why/how does this work - NEWBIE <tony_curtis32@_SPAMTRAP_yahoo.com>
Re: dup remove - why/how does this work - NEWBIE <gnari@simnet.is>
Re: Dynamic Ref names <usenet@morrow.me.uk>
Re: Environment question <flavell@ph.gla.ac.uk>
Re: extract parts of file - newbie <krahnj@acm.org>
Re: extract parts of file - newbie <dwall@fastmail.fm>
more stripping <hillmw@ram.lmtas.lmco.com>
Re: more stripping <ittyspam@yahoo.com>
OPEN( , Get , or slurping problem (Chris)
Re: OPEN( , Get , or slurping problem <ittyspam@yahoo.com>
Re: OPEN( , Get , or slurping problem <usenet@morrow.me.uk>
Re: OPEN( , Get , or slurping problem <noreply@gunnar.cc>
Re: Perl memory allocation <usenet@morrow.me.uk>
Re: Please.. Need some help with wwwboard <sbryce@scottbryce.com>
Problem with Perl/Tk <orion93@club-internet.fr>
Re: Problem with Perl/Tk <usenet@morrow.me.uk>
Re: T*ent C*rry, wsanford@wallysanford.com, and falsely <PleaseSubstituteMyActualFirstNameHere@wallysanford.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Mon, 16 Feb 2004 01:16:02 GMT
From: dfrankow@winternet.com (Dan Frankowski)
Subject: ANNOUNCE: Data::Mining::AssociationRules
Message-Id: <Ht6z8G.13pL@zorch.sf-bay.org>
DESCRIPTION
This module contains some functions to do association rule mining
from
text files. This sounds obscure, but really measures beautifully
simple
things through counting.
FREQUENT SETS
Frequent sets answer the question, "Which events occur together
more
than N times?"
RULES
Association rules answer the (related) question, "When these
events
occur, how often do those events also occur?"
Enjoy.
Dan
The README:
NAME
Data::Mining:AssociationRules - Mine association rules and
frequent sets
from data.
SYNOPSIS
use Data::Mining::AssociationRules;
my %transaction_map;
my $transaction_file = "foo.txt";
read_transaction_file(\%transaction_map, $transaction_file);
generate_frequent_sets(\%transaction_map, $output_file_prefix,
$support_threshold, $max_n);
generate_rules($output_file_prefix, $support_threshold,
$confidence_threshold, $max_n);
read_frequent_sets($set_map_ref, $file_prefix)
set_debug(1);
perl arm.pl -transaction-file foo.txt -support 2
-confidence-threshold 0.01 -max-set-size 6
See also FUNCTIONS, DESCRIPTION, and EXAMPLES below.
INSTALLATION
The typical:
0 perl Makefile.PL
0 make test
0 make install
FUNCTIONS
read_transaction_file($transaction_map_ref, $transaction_file)
Read in a transaction map from a file which has lines of two
whitespace-separated columns:
transaction-id item-id
generate_frequent_sets ($transaction_map_ref, $file_prefix,
$support_threshold, $max_n)
Given
0 a map of transactions
0 a file prefix
0 a support threshold
0 a maximum frequent set size to look for (optional)
generate the frequent sets in some files, one file per size of the
set.
That is, all 1-sets are in a file, all 2-sets in another, etc.
The files are lines of the form:
support-count item-set
where
0 support-count is the number of transactions in which the
item-set
appears
0 item-set is one or more space-separated items
read_frequent_sets($set_map_ref, $file_prefix)
Given
0 a set map
0 a file prefix
0 support threshold
0 max frequent set size (optional)
read all the frequent sets into a single map, which has as its key
the
frequent set (joined by single spaces) and as its value the
support.
generate_rules($file_prefix, $support_threshold, $max_n)
Given
0 a file prefix
0 a support threshold (optional)
0 a confidence threshold (optional)
0 maximum frequent set size to look for (optional)
create a file with all association rules in it. The output file is
of
the form:
support-count confidence left-hand-set-size right-hand-set-size
frequent-set-size left-hand-set => right-hand-set
DESCRIPTION
This module contains some functions to do association rule mining
from
text files. This sounds obscure, but really measures beautifully
simple
things through counting.
FREQUENT SETS
Frequent sets answer the question, "Which events occur together
more
than N times?"
The detail
The 'transaction file' contains items in transactions. A set of
items
has 'support' s if all the items occur together in at least s
transactions. (In many papers, support is a number between 0 and 1
representing the fraction of total transactions. I found the
absolute
number itself more interesting, so I use that instead. Sorry for
the
confusion.) For an itemset "A B C", the support is sometimes
notated
"T(A B C)" (the number of 'T'ransactions).
A set of items is called a 'frequent set' if it has support at
least the
given support threshold. Generating frequent set produces all
frequent
sets, and some information about each set (e.g., its support).
RULES
Association rules answer the (related) question, "When these
events
occur, how often do those events also occur?"
The detail
A rule has a left-hand set of items and a right-hand set of items.
A
rule "LHS => RHS" with a support s and 'confidence' c means that
the
underlying frequent set (LHS + RHS) occured together in at least s
transactions, and for all the transactions LHS occurred in, RHS
also
occured in at least the fraction c (a number from 0 to 1).
Generating rules produces all rules with support at least the
given
support threshold, and confidence at least the given confidence
threshold. The confidence is sometimes notated "conf(LHS => RHS) =
T(LHS
+ RHS) / T(LHS)". There is also related data with each rule (e.g.,
the
size of its LHS and RHS, the support, the confidence, etc.).
FREQUENT SETS AND ASSOCIATION RULES GENERALLY USEFUL
Although association rule mining is often described in commercial
terms
like "market baskets" or "transactions" (collections of events)
and
"items" (events), one can imagine events that make this sort of
counting
useful across many domains. Events could be
0 stock market went down at time t
0 patient had symptom X
0 flower petal length was > 5mm
For this reason, I believe counting frequent sets and looking at
association rules to be a fundamental tool of any data miner,
someone
who is looking for patterns in pre-existing data, whether
commercial or
not.
EXAMPLES
Given the following input file:
234 Orange
463 Strawberry
53 Apple
234 Banana
412 Peach
467 Pear
234 Pear
147 Pear
141 Orange
375 Orange
Generating frequent sets at support threshold 1 (a.k.a. 'at
support 1')
produces three files:
The 1-sets:
1 Strawberry
1 Banana
1 Apple
3 Orange
1 Peach
3 Pear
The 2-sets:
1 Banana Orange
1 Banana Pear
1 Orange Pear
The 3-sets:
1 Banana Orange Pear
Generating the rules at support 1 produces the following:
1 0.333 1 1 2 Orange => Pear
1 0.333 1 1 2 Pear => Orange
1 1.000 1 2 3 Banana => Orange Pear
1 0.333 1 2 3 Orange => Banana Pear
1 1.000 2 1 3 Banana Orange => Pear
1 0.333 1 2 3 Pear => Banana Orange
1 1.000 2 1 3 Banana Pear => Orange
1 1.000 2 1 3 Orange Pear => Banana
1 1.000 1 1 2 Banana => Orange
1 0.333 1 1 2 Orange => Banana
1 1.000 1 1 2 Banana => Pear
1 0.333 1 1 2 Pear => Banana
Generating frequent sets at support 2 produces one file:
3 Orange
3 Pear
Generating rules at support 2 produces nothing.
Generating rules at support 1 and confidence 0.5 produces:
1 1.000 1 2 3 Banana => Orange Pear
1 1.000 2 1 3 Banana Orange => Pear
1 1.000 2 1 3 Banana Pear => Orange
1 1.000 2 1 3 Orange Pear => Banana
1 1.000 1 1 2 Banana => Orange
1 1.000 1 1 2 Banana => Pear
Note all the lower confidence rules are gone.
ALGORITHM
Generating frequent sets
Generating frequent sets is straight-up Apriori. See for example:
http://www.almaden.ibm.com/software/quest/Publications/papers/vldb94_rj.
pdf
I have not optimized. It depends on having the transactions all in
memory. However, given that, it still might scale decently
(millions of
transactions).
Generating rules
Generating rules is a very vanilla implementation. It requires
reading
all the frequent sets into memory, which does not scale at all.
Given
that, since computers have lots of memory these days, you might
still be
able to get away with millions of frequent sets (which is
<<millions of
transactions).
BUGS
There is an existing tool (written in C) to mine frequent sets I
kept
running across:
http://fuzzy.cs.uni-magdeburg.de/~borgelt/software.html#assoc
I should check it out to see if it is easy or desirable to be
file-level
compatible with it.
One could imagine wrapping it in Perl, but the Perl-C/C++ barrier
is
where I have encountered all my troubles in the past, so I
wouldn't
personally pursue that.
VERSION
This document describes Data::Mining::AssociationRules version
0.1.
AUTHOR
Dan Frankowski
dfrankow@winternet.com
http://www.winternet.com/~dfrankow
Hey, if you download this module, drop me an email! That's the
fun
part of this whole open source thing.
LICENSE
This program is free software; you can redistribute it and/or
modify it
under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file
included
in the distribution and available in the CPAN listing for
Data::Mining::AssociationRules (see www.cpan.org or
search.cpan.org).
DISCLAIMER
To the maximum extent permitted by applicable law, the author of
this
module disclaims all warranties, either express or implied,
including
but not limited to implied warranties of merchantability and
fitness for
a particular purpose, with regard to the software and the
accompanying
documentation.
------------------------------
Date: 16 Feb 2004 13:57:54 -0800
From: ipetts@ozemail.com.au (Ian Petts)
Subject: Re: Confused by hashes/data structures
Message-Id: <7bca6a49.0402161357.64f360fe@posting.google.com>
Steve May <drumspoorly@reachone.net> wrote
> Since user names are unique (usually) I think I'd use a hash
> of hashes instead, like:
That's fine. Like I said, I was confused by all the options and I
wasn't sure which angle to tackle it from.
This is really good stuff, Steve. Thank you very much. Easy to read
and I think I actually understand it :-)
The only hitch I have now (I think) is when printing out the hash:
> while( sort( keys %userdata ) ){
> # error trapping ignored
> print <<END;
> User: $_ Desc: $userdata{$_}{'desc} Date: $userdata{$_}{'date}
> END
>
> }
Perl complains with the following:
--- 8< ---
Useless use of sort in scalar context at ./try2.pl line 28 (#1)
(W void) You used sort in scalar context, as in :
my $x = sort @y;
This is not very useful, and perl currently optimizes this away.
--- >8 ---
What's going on here?
Thanks to everyone who has replied to my original post. ALL of the
suggestions and help are very much appreciated.
Regards,
Ian.
------------------------------
Date: 16 Feb 2004 22:17:19 GMT
From: Eric Bohlman <ebohlman@earthlink.net>
Subject: Re: Confused by hashes/data structures
Message-Id: <Xns9491A622CCD02ebohlmanomsdevcom@130.133.1.4>
ipetts@ozemail.com.au (Ian Petts) wrote in
news:7bca6a49.0402161357.64f360fe@posting.google.com:
> Steve May <drumspoorly@reachone.net> wrote
> The only hitch I have now (I think) is when printing out the hash:
>
>> while( sort( keys %userdata ) ){
You want for (syn. foreach) rather than while there. Right now the loop
says "keep running (without setting $_ to anything) as long as the return
value of sort(...) in a scalar context is nonzero." I think you got thrown
by the special treatment <HANDLE> gets in while loops.
>> # error trapping ignored
>> print <<END;
>> User: $_ Desc: $userdata{$_}{'desc} Date: $userdata{$_}{'date}
The second-level key here is:
desc} Date: $userdata{$_}{
because the single quotes try to match.
>> END
>>
>> }
------------------------------
Date: Mon, 16 Feb 2004 20:01:13 GMT
From: anonymous@coolgroups.com
Subject: do not reload
Message-Id: <ad589dc795c762bb4df7a900557f7afe@news.scbiz.com>
How do I make sure that a perl CGI page doesn't get reloaded
when the user hits the back button and goes to it? I've
tried outputting the "Expires" header, but that doesn't seem
to do the trick.
------------------------------
Date: 16 Feb 2004 20:29:34 GMT
From: ctcgag@hotmail.com
Subject: Re: do not reload
Message-Id: <20040216152934.516$vN@newsreader.com>
anonymous@coolgroups.com wrote:
> How do I make sure that a perl CGI page doesn't get reloaded
> when the user hits the back button and goes to it?
You don't. You let it load, then do something else.
> I've
> tried outputting the "Expires" header, but that doesn't seem
> to do the trick.
Two pages ago, you generate a transaction id and embed that in the page
one page ago. Then the first time they submit, you record this transaction
id. The second time they submit, you look up the id in the database, see
that the transaction id has already been submitted, and write an error
page.
Or you use some modules that do effectively the same thing for you.
Or you let the user do whatever they want, and hold them responsible
for their actions.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service New Rate! $9.95/Month 50GB
------------------------------
Date: Mon, 16 Feb 2004 21:57:24 GMT
From: anonymous@coolgroups.com
Subject: Re: do not reload
Message-Id: <537e9fd056927416fabb34926a199ba8@news.scbiz.com>
I want the user to be able to go back to a previous page. I
just don't want the page to reload.
------------------------------
Date: 16 Feb 2004 13:44:10 -0800
From: jason@cyberpine.com
Subject: dup remove - why/how does this work - NEWBIE
Message-Id: <ef0a04d7.0402161344.15f31191@posting.google.com>
The below simple code works at removing dups from a 20k record file.
Looking for somebody to explain how/why.
$db = "workb.txt";
open (FILE,"$db");
@lines=<FILE>;
close(FILE);
foreach $key (@lines){
$lines{$key} = 1;
}
@lines = keys(%lines);
print @lines;
I understand I am adding a key = 1 to every line (is it to every
line?), but when we recreate @lines what exactly is keys(%lines)
doing/saying? I see that %lines contains 1+unique records in the
file).
Thanks.
------------------------------
Date: Mon, 16 Feb 2004 15:56:09 -0600
From: Tony Curtis <tony_curtis32@_SPAMTRAP_yahoo.com>
Subject: Re: dup remove - why/how does this work - NEWBIE
Message-Id: <87n07i3czq.fsf@limey.hpcc.uh.edu>
>> On 16 Feb 2004 13:44:10 -0800,
>> jason@cyberpine.com said:
> The below simple code works at removing dups from a 20k
> record file. Looking for somebody to explain how/why.
It's not even close, I'm afraid.
No strict, warnings.
> $db = "workb.txt";
> open (FILE,"$db");
open() untested. Unnecessary quotes around variable.
> @lines=<FILE>;
> close(FILE);
Slurp all lines into memory, then below do a 2nd pass. This
is wasteful, you only need to see each line once.
You'll probably want to chomp() the lines too, since the
trailing newline sequence is usually part of the file
representation, not part of the data content per se.
> foreach $key (@lines){
> $lines{$key} = 1;
> }
> @lines = keys(%lines);
> print @lines;
> I understand I am adding a key = 1 to every line (is it to
> every line?), but when we recreate @lines what exactly is
"Adding" is a misleading word here, implying that the value of
the line is being changed. "Associating" would be closer.
> keys(%lines) doing/saying? I see that %lines contains
> 1+unique records in the file).
Using a hash is the right choice here, but see
perldoc -q duplicate
Essentially you want to, for each line, output the line only
if you haven't seen that same line before (i.e. it's not th
key of a hash). Output means either print() or save into an
array for later processing, judging from your code.
hth
t
------------------------------
Date: Mon, 16 Feb 2004 22:06:36 -0000
From: "gnari" <gnari@simnet.is>
Subject: Re: dup remove - why/how does this work - NEWBIE
Message-Id: <c0rerm$hce$1@news.simnet.is>
<jason@cyberpine.com> wrote in message
news:ef0a04d7.0402161344.15f31191@posting.google.com...
> The below simple code works at removing dups from a 20k record file.
> Looking for somebody to explain how/why.
>
> $db = "workb.txt";
> open (FILE,"$db");
> @lines=<FILE>;
> close(FILE);
> foreach $key (@lines){
> $lines{$key} = 1;
> }
> @lines = keys(%lines);
> print @lines;
>
>
> I understand I am adding a key = 1 to every line (is it to every
> line?), but when we recreate @lines what exactly is keys(%lines)
> doing/saying? I see that %lines contains 1+unique records in the
> file).
this is a common technique using a hash.
a hash is a data structure that map a set of 'keys' to their
respective 'values'. each key has one value.
in this case the hash is %lines (totally unrelated to the array @lines)
each line of the input file is in turn addad as a key to the hash, with
an arbitrary value, in this case 1. as each key can only have 1 value,
when a duplicate is encountered, the value is simply replaced with
the new value, in this case the same value 1.
the function keys() returns a list of the keys of a hash in an
undefined order. in this case, the lines of the input file, with
duplicates removed.
the nice integration of hashes into the language, is one of the
distinctive features of Perl, and they are, along with regexes,
usually the key to solve most perl problems.
perldoc perldata
gnari
------------------------------
Date: Mon, 16 Feb 2004 19:52:33 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: Dynamic Ref names
Message-Id: <c0r721$kul$1@wisteria.csv.warwick.ac.uk>
Warrick FitzGerald <news.wfitzgerald@crtman.com> wrote:
>
> I'm trying to dynamically create worksheets using Spreadsheet::WriteExcel.
>
> I'm looking through a log file and would like to create a worksheet for
> each date.
>
> In all example I can find you simply do:
> $mySheet = $workbook->addworksheet("somename");
>
> then to add data to that you simply do:
> $mySheet->write(....);
>
> That works great until I try to dynamically create $mySheet.
>
> --------------
> This is what I tried.
>
> $date = $Fields[0]; <--- Get the date from the Logfile
>
> ${$date} = $workbook->addworksheet($date); <-- This created the workbook
> for each date. date is of course formatted to be excel friendly name.
Are you using strictures? Why not?
You don't need to name the variable after the sheet name:
$mySheet = $workbook->addworksheet($date);
will work perfectly well. If you're trying to keep hold of several
sheets at once, one for each date, you should use a hash, not the symbol
table:
my %sheets;
$sheets{$date} = $workbook->addworksheet($date);
Ben
--
$.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
$x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
{$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t # ben@morrow.me.uk
$J::u::s::t, $a::n::o::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.
------------------------------
Date: Mon, 16 Feb 2004 19:13:53 +0000
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: Environment question
Message-Id: <Pine.LNX.4.53.0402161910530.26716@ppepc56.ph.gla.ac.uk>
On Sun, 15 Feb 2004, Sherm Pendley wrote:
> Alan J. Flavell wrote:
>
> > It's been a long time since I felt that anyone rated two killfile
> > entries on a single posting. Well done (and you certainly have been).
>
> *Two* killfile entries? What are going to do, ignore him - then ignore him
> harder? ;-)
I see that your posting made it to ahbou; but I'll take the risk of
answering the question anyway. I had set killfile entries for two
aspects of the poster's headers, thinking that the one which I would
normally rely on - the email address, which in this case was a simple
fake - was quite likely to get changed.
------------------------------
Date: Mon, 16 Feb 2004 21:58:52 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: extract parts of file - newbie
Message-Id: <40313D19.412FD094@acm.org>
jason@cyberpine.com wrote:
>
> Hello. New to Perl and trying to figure out if beter way to do the
> following (in Active State Perl under Windows 2000):
>
> I have this DOS text file with about 20,000 lines. In the simple
> example below I can extract lines that contain a particular string.
>
> $db = "work.txt";
> open (FILE,"$db");
> @LINES=<FILE>;
> close(FILE);
> $SIZE=@LINES;
> print $SIZE,"\n";
> for ($i=0;$i<=$SIZE;$i++)
> {
> $_=$LINES[$i];
> if (/motion/i)
> {print "$_";}
> }
A more Perl-ish version of that would be:
use warnings;
use strict;
my $db = 'work.txt';
open FILE, $db or die "Cannot open $db: $!";
my @lines = <FILE>;
close FILE;
print @lines . "\n";
for ( @lines )
{
print if /motion/i;
}
> How can I extract:
>
> 1. 5 lines before and after the string
for my $i ( 0 .. $#lines )
{
print @lines[ $i - 5 .. $i + 5 ] if /motion/i;
}
> 2. Columns positions 5-15 (for all selected)
for ( @lines )
{
print substr $_, 4, 11 if /motion/i;
}
> 3. Limit selection to rows 5000-7000
for ( @lines[ 4999 .. 6999 ] )
{
print if /motion/i;
}
> 4. The last 5 lines of the entire file
for ( @lines[ $#lines - 5 .. $#lines ] )
{
print if /motion/i;
}
John
--
use Perl;
program
fulfillment
------------------------------
Date: Mon, 16 Feb 2004 22:44:06 -0000
From: "David K. Wall" <dwall@fastmail.fm>
Subject: Re: extract parts of file - newbie
Message-Id: <Xns9491B467F9C45dkwwashere@216.168.3.30>
jason@cyberpine.com wrote:
> I have this DOS text file with about 20,000 lines.
[snip]
> How can I extract:
>
> 1. 5 lines before and after the string
Search the Google usenet archives; you'll find a number of solutions. In
particular, see the thread starting with a post by Tom Christiansen,
message ID 37e1043e@cs.colorado.edu. (I searched for the words "before
after lines match" (but not as a phrase), and TC's thread was the first
match. Lots of other hits, too.)
> 2. Columns positions 5-15 (for all selected)
perldoc -f substr
> 3. Limit selection to rows 5000-7000
Check out $. in perlvar and read the section on range operators in perlop.
> 4. The last 5 lines of the entire file
Left as an exercise... :-)
------------------------------
Date: Mon, 16 Feb 2004 14:12:01 -0600
From: Michael Hill <hillmw@ram.lmtas.lmco.com>
Subject: more stripping
Message-Id: <40312411.191D97AB@ram.lmtas.lmco.com>
I have this input from a <textarea> object that is being submitted to a
script.
The input looks like:
<path fill="none" stroke="#000000" d="M0.437,185.49l87-156"/>
<path fill="none" stroke="#000000" d="M87.437,29.49l140-29"/>
<path fill="none" stroke="#000000" d="M227.437,0.49l39,118"/>
<path fill="none" stroke="#000000" d="M266.437,118.49l-104,160"/>
<path fill="none" stroke="#000000" d="M159.437,276.49l-32-101"/>
<path fill="none" stroke="#000000" d="M127.437,175.49l-127,10"/>
I'd like to get where the output for:
foreach $i (@arr)
{
($x, $y) = @$i;
print "d=$x,$y<br>";
}
should be:
d="0.437,185.49"
d="87.437,29.49"
d="227.437,0.49"
d="266.437,118.49"
d="159.437,276.49"
d="127.437,175.49"
This is where I am:
****************************************************************
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
foreach $pair (@pairs)
{
($name, $value) = split(/=/, $pair);
$value =~ s/%09//g; #strip the tabs out all of them
$value =~ s/%3C//g; #strip the < all of them
$value =~ s/%2F%3E//g; #strip the /> all of them
$value =~ s/%22//g; #strip the " all of them
$value =~ s/%23//g; #strip the # all of them
$value =~ s/%3D/=/g; #change %3D to = all of them
$value =~ s/%2C/,/g; #change %2C to , all of them
$value =~ s/%0D%0A//g; #strip out the carriage returns
$value =~ s/path//g; #strip out the word path .....
hmmm what if i have 'PATH' or Path or paTH? Need mod here
if ( $name eq 'path' )
{
$path = $value;
}
}
@arr = split(/+/, $path);
foreach $i (@arr)
{
($x, $y) = @$i;
print "d=$x,$y<br>";
}
Any help is appreciated.
Mike
------------------------------
Date: Mon, 16 Feb 2004 15:27:01 -0500
From: Paul Lalli <ittyspam@yahoo.com>
Subject: Re: more stripping
Message-Id: <20040216152117.C23965@dishwasher.cs.rpi.edu>
On Mon, 16 Feb 2004, Michael Hill wrote:
> I have this input from a <textarea> object that is being submitted to a
> script.
>
> The input looks like:
>
> <path fill="none" stroke="#000000" d="M0.437,185.49l87-156"/>
> <path fill="none" stroke="#000000" d="M87.437,29.49l140-29"/>
> <path fill="none" stroke="#000000" d="M227.437,0.49l39,118"/>
> <path fill="none" stroke="#000000" d="M266.437,118.49l-104,160"/>
> <path fill="none" stroke="#000000" d="M159.437,276.49l-32-101"/>
> <path fill="none" stroke="#000000" d="M127.437,175.49l-127,10"/>
>
> I'd like to get where the output for:
> foreach $i (@arr)
> {
> ($x, $y) = @$i;
> print "d=$x,$y<br>";
> }
>
> should be:
> d="0.437,185.49"
> d="87.437,29.49"
> d="227.437,0.49"
> d="266.437,118.49"
> d="159.437,276.49"
> d="127.437,175.49"
>
> This is where I am:
> ****************************************************************
> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
> @pairs = split(/&/, $buffer);
> foreach $pair (@pairs)
> {
> ($name, $value) = split(/=/, $pair);
> $value =~ s/%09//g; #strip the tabs out all of them
> $value =~ s/%3C//g; #strip the < all of them
> $value =~ s/%2F%3E//g; #strip the /> all of them
> $value =~ s/%22//g; #strip the " all of them
> $value =~ s/%23//g; #strip the # all of them
> $value =~ s/%3D/=/g; #change %3D to = all of them
> $value =~ s/%2C/,/g; #change %2C to , all of them
> $value =~ s/%0D%0A//g; #strip out the carriage returns
> $value =~ s/path//g; #strip out the word path .....
> hmmm what if i have 'PATH' or Path or paTH? Need mod here
> if ( $name eq 'path' )
> {
> $path = $value;
> }
> }
>
> @arr = split(/+/, $path);
> foreach $i (@arr)
> {
> ($x, $y) = @$i;
> print "d=$x,$y<br>";
> }
>
> Any help is appreciated.
>
> Mike
>
>
I don't understand why you're doing any of this. Why are you taking so
much effort to remove the stuff you don't want, instead of just taking
what you *do* want?
foreach $line (@pairs) {
($x, $y) = $line =~ /M(\d+\.\d+),(\d+\.\d{2})/;
push @arr, [$x, $y];
}
Now @arr is populated the way you claim to want it.
The above can really be shortened up even more, but I've left it like this
in the hope of clarity.
Paul Lalli
------------------------------
Date: 16 Feb 2004 12:04:27 -0800
From: chris.tunnecliff@markelintl.com (Chris)
Subject: OPEN( , Get , or slurping problem
Message-Id: <b8146f1.0402161204.1f0a7da5@posting.google.com>
Hi,
I'm trying to import a htm file (from an external site) into an array
and then parse each line to check for a certain line. I have tried
the following:
#!/usr/local/bin/perl -w
use warnings;
use strict;
use LWP::Simple;
my @site = ("http://www.webbuyeruk.co.uk/links.htm");
foreach my $site (@site){
my @content = get ($site);
print "Array entries: $#content\n";
}
the above puts all of the lines into the first array entry [0], how
can I change this??
Also the following:
open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
my @filedata = <MYFILE>;
close(MYFILE);
gives me the following result:
Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument
Is this because it is trying to change the file instead of reading it?
How can I get around this?
Chris.
------------------------------
Date: Mon, 16 Feb 2004 15:37:54 -0500
From: Paul Lalli <ittyspam@yahoo.com>
Subject: Re: OPEN( , Get , or slurping problem
Message-Id: <20040216153455.X23965@dishwasher.cs.rpi.edu>
On Mon, 16 Feb 2004, Chris wrote:
> Hi,
>
> I'm trying to import a htm file (from an external site) into an array
> and then parse each line to check for a certain line. I have tried
> the following:
>
> #!/usr/local/bin/perl -w
> use warnings;
> use strict;
>
> use LWP::Simple;
> my @site = ("http://www.webbuyeruk.co.uk/links.htm");
>
> foreach my $site (@site){
> my @content = get ($site);
>
> print "Array entries: $#content\n";
> }
>
> the above puts all of the lines into the first array entry [0], how
> can I change this??
>
perldoc LWP::Simple shows that get() returns a single string. That's it's
behavior. If you want each line in a different element of an array, do it
yourself:
my @content = split /\n/, get($site); #assumes \n is what you mean by 'line'
>
> Also the following:
>
> open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
> my @filedata = <MYFILE>;
> close(MYFILE);
>
> gives me the following result:
> Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument
>
> Is this because it is trying to change the file instead of reading it?
> How can I get around this?
What are you *trying* to do here? Your code is attempting to open a local
file named "(http://www.webbuyeruk.co.uk/links.htm)" and write read from
it. I find it decidedly unlikely such a file exists on your local system.
Paul Lalli
------------------------------
Date: Mon, 16 Feb 2004 20:42:01 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: OPEN( , Get , or slurping problem
Message-Id: <c0r9up$mqe$2@wisteria.csv.warwick.ac.uk>
chris.tunnecliff@markelintl.com (Chris) wrote:
> I'm trying to import a htm
HTML. Never mind that some people still use brain-damaged 8.3 names.
> file (from an external site) into an array
> and then parse each line to check for a certain line. I have tried
> the following:
>
> #!/usr/local/bin/perl -w
> use warnings;
No need for belt and braces: use warnings replaces -w :).
> use strict;
>
> use LWP::Simple;
> my @site = ("http://www.webbuyeruk.co.uk/links.htm");
>
> foreach my $site (@site){
> my @content = get ($site);
>
> print "Array entries: $#content\n";
> }
>
> the above puts all of the lines into the first array entry [0], how
> can I change this??
my $content = get $site;
my @content = split /\n/, $content;
Some people would object to my using both $content and @content here...
that is a matter of style you may wish to consider.
> Also the following:
>
> open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
> my @filedata = <MYFILE>;
> close(MYFILE);
>
> gives me the following result:
> Can't open http://www.webbuyeruk.co.uk/links.htm : Invalid argument
Well, what did you expect? Perl != PHP: open is for opening *files*.
Presuming you're on a Win32 system (something tells me you are :) this
will be looking for an 'http:' drive, which is, as the error message
said, invalid.
> Is this because it is trying to change the file instead of reading it?
> How can I get around this?
Use LWP, as you were.
You may also be better off using an HTML-parsing module than trying to
parse it by hand, depending on how constant the format of the page is.
Ben
--
perl -e'print map {/.(.)/s} sort unpack "a2"x26, pack "N"x13,
qw/1632265075 1651865445 1685354798 1696626283 1752131169 1769237618
1801808488 1830841936 1886550130 1914728293 1936225377 1969451372
2047502190/' # ben@morrow.me.uk
------------------------------
Date: Mon, 16 Feb 2004 21:59:48 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: OPEN( , Get , or slurping problem
Message-Id: <c0rbh7$19rh5n$1@ID-184292.news.uni-berlin.de>
Chris wrote:
> I'm trying to import a htm file (from an external site) into an
> array and then parse each line to check for a certain line. I have
> tried the following:
>
> #!/usr/local/bin/perl -w
> use warnings;
> use strict;
>
> use LWP::Simple;
> my @site = ("http://www.webbuyeruk.co.uk/links.htm");
>
> foreach my $site (@site){
> my @content = get ($site);
>
> print "Array entries: $#content\n";
> }
>
> the above puts all of the lines into the first array entry [0], how
> can I change this??
You need to think it over when it's suitable to use an array and when
it's not. the get() function returns the content as a string, so why
not just do:
use LWP::Simple;
my $site = get 'http://www.webbuyeruk.co.uk/links.htm';
while ( $site =~ /(.*)/g ) {
if ($1 =~ /PATTERN/) {
print "Found\n";
last;
}
}
> Also the following:
>
> open(MYFILE, "<($site[0])") || die "Can't open $site[0] : $!\n";;
You can't open a URL! Please learn the difference between a path and a
URL.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Mon, 16 Feb 2004 19:58:51 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: Perl memory allocation
Message-Id: <c0r7dr$kul$2@wisteria.csv.warwick.ac.uk>
Eric Bohlman <ebohlman@earthlink.net> wrote:
> "Ian Cass" <ian.cass@mblox.com> wrote in
> news:FE4Yb.164$_P.2707686@news-text.cableinet.net:
> > Eric Bohlman wrote:
> >>
> >> perldoc -q free
> >>
> >> (the second entry displayed is the one you want)
> >
> > My Perl is compiled with the OS's malloc rather than Perls.
>
> How is that relevant?
perlfaq3.pod:
| Some operating systems [...] can reclaim memory that is no longer
| used, but on such systems, perl must be configured and compiled to use
| the OS's malloc, not perl's.
was no doubt confusing the OP... the important part is 'Some OSen'.
Clearly the OP's isn't one of them.
(As an aside, would it be feasible to change perl's malloc to use mmap
where possible, and release mem back to the OS? This seems to be quite
often wanted...)
Ben
--
Razors pain you / Rivers are damp
Acids stain you / And drugs cause cramp. [Dorothy Parker]
Guns aren't lawful / Nooses give
Gas smells awful / You might as well live. ben@morrow.me.uk
------------------------------
Date: Mon, 16 Feb 2004 12:25:47 -0700
From: Scott Bryce <sbryce@scottbryce.com>
Subject: Re: Please.. Need some help with wwwboard
Message-Id: <103269rh34u66bb@corp.supernews.com>
SR wrote:
> I've got the board installed and working fine.
Matt's version, or the Sourceforge version?
> Can anyone give me a clue where to
> look for my problem ?
Is there a help forum at the site where you go the script? The
Sourceforge site has an email link you can use to get help with their
scripts.
--Scott
------------------------------
Date: 16 Feb 2004 19:56:22 GMT
From: Orion93 <orion93@club-internet.fr>
Subject: Problem with Perl/Tk
Message-Id: <Xns9491D5E63645Forion93clubinternetf@194.158.96.17>
Hi!
I try to make an interface for this script in Perl/Tk but it's the first
time i try to do it and i don't know how to use my variable in the sub. The
script below doesn't work and i don't know why! Please, i need help!
Thanks
use Tk;
$main = MainWindow -> new;
$main->title("Test 1");
$libelF=$main->Label(-text=>'Chemin:')->pack();
$montantF->Entry(-textvariable=>\$nomFic)->pack(-padx=>5);
$valid=$main->Button(-text=>'Ok',-command=>\&recupPages)->pack(-
side=>'left', expand=>1);
$end=$main->Button(-text=>'Fermer',-command=>sub {exit})->pack(-
side=>'right', expand=>1);
MainLoop();
sub recupPages
{
my $rep= $montantF->get();
my $result = shift;
open(F,'$nomFic');
open(SORTIE,'$result');
$i = 0;
while(<F>)
{
$i ++;
}
print SORTIE " $nomFic $i\n";
close F;
close SORTIE;
}
my $emplacement = $nomFic;
my $ficResultat = "e:\\result.txt";
recupPages($_, $ficResultat) for glob '$nomFic';
------------------------------
Date: Mon, 16 Feb 2004 20:34:26 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: Problem with Perl/Tk
Message-Id: <c0r9gi$mqe$1@wisteria.csv.warwick.ac.uk>
Orion93 <orion93@club-internet.fr> wrote:
use strict;
use warnings;
> use Tk;
>
> $main = MainWindow -> new;
my $main = ...;
> $main->title("Test 1");
> $libelF=$main->Label(-text=>'Chemin:')->pack();
my $libelF = ...;
&c.
> $montantF->Entry(-textvariable=>\$nomFic)->pack(-padx=>5);
> $valid=$main->Button(-text=>'Ok',-command=>\&recupPages)->pack(-
> side=>'left', expand=>1);
It would be a lot easier to see what is going on if you put some
whitespace in here; also, you never set $montantF:
my $nomFic;
my $montantF = $main->Entry (-textvariable => \$nomFic)
->pack (-padx => 5);
my $valid = $main->Button(-text => 'Ok', -command => \&recupPages)
->pack (-side => 'left', -expand => 1);
As an aside, I *really* hate apps that call buttons 'Ok': not only is it
inconsistent with every OS I've ever used, it's also Wrong. It's an
abbreviation, so it's spelt 'OK'.
> $end=$main->Button(-text=>'Fermer',-command=>sub {exit})->pack(-
> side=>'right', expand=>1);
> MainLoop();
>
> sub recupPages
> {
> my $rep= $montantF->get();
> my $result = shift;
> open(F,'$nomFic');
> open(SORTIE,'$result');
Use lexical filehandles: they close automatically, which makes your life
easier. Check the return value of open: yes, *every* time. Those single
quotes won't interpolate, so you're trying to open a file called
'$nomFic'. You don't need quotes at all.
open my $F, $nomFic or die "can't open $nomFic: $!";
open my $SORTIE, $result or die "can't open $result: $!";
> $i = 0;
> while(<F>)
> {
> $i ++;
> }
> print SORTIE " $nomFic $i\n";
> close F;
> close SORTIE;
> }
Sort out your indentation: it makes things much easier:
sub recupPages {
my $rep = $montantF->get();
my $result = shift;
I'm not sure what you think this does, but I doubt it's what you mean.
Do you not just mean
my $result = $montantF->get();
? Or, indeed, just use $nomFic, since you've set that up to contain the
value of the entry box... no, hang on, $nomFic is the input file. Where
do you want the name of the output file to come from?
open my $F...
...
while (<$F>) {
$i++;
}
Or, neater:
$i++ while <$F>;
Or use $. instead of $i:
1 while <$F>;
print $SORTIE " $nomFic $.\n";
See perldoc perlvar.
print $SORTIE " $nomFic $i\n";
# no need to close the FHs: they will close at the end of the scope.
}
> my $emplacement = $nomFic;
> my $ficResultat = "e:\\result.txt";
> recupPages($_, $ficResultat) for glob '$nomFic';
I'm not sure when you want this to execute, but as things stand it
won't, ever. MainLoop never returns, so Perl will never get here. If you
want it to be executed when the OK button is pressed, it needs to go
inside recupPages; if you want it to be executed at the end of the
program (ie. when the Fermer button is pressed) it needs to go in an END
block:
END {
my $emplacement = $nomFic; # why? you never use this variable.
my $ficResultat = 'e:/result.txt'; # yes, use / even on win32
recupPages($_, $ficResultat) for glob $nomFic;
# again, the '' quotes won't interpolate the variable.
}
I get the feeling you're not entirely clear about what you want this
program to do... or, at any rate, *I'm* not.
Ben
--
'Deserve [death]? I daresay he did. Many live that deserve death. And some die
that deserve life. Can you give it to them? Then do not be too eager to deal
out death in judgement. For even the very wise cannot see all ends.'
:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-: ben@morrow.me.uk
------------------------------
Date: Mon, 16 Feb 2004 19:43:20 GMT
From: "Wally Sanford" <PleaseSubstituteMyActualFirstNameHere@wallysanford.com>
Subject: Re: T*ent C*rry, wsanford@wallysanford.com, and falsely using existing email addresses
Message-Id: <s39Yb.7042$W74.2511@newsread1.news.atl.earthlink.net>
"Anno Siegel" <anno4000@lublin.zrz.tu-berlin.de> wrote in message
news:c0qgba$for$1@mamenchi.zrz.TU-Berlin.DE...
> From his past behavior I wouldn't expect him to use your address again.
> He seems to pick them up at random for use in one of his campaigns, which
> lasts until the forgery has become absolutely undeniable. The next
> time he picks new ones.
>
> Anno
Thanks, Anno. Yeah, I spent some time searching around about this loon, and
I see that he goes by Man*ny Wil*o, Al MacH**ney, R*bin G*vens, and a host
of others. I think I can tell when he stopped using one of them and started
using my name as an alias.
Harmless though he may be, you can imagine my dismay, in that I do not wish
to appear to be any more of a loon than I manage to be on my own, nor do I
wish to be confused with Uber Loons, lest I get a cool reception on some
group. Say I revived my interest in Perl and unwittingly posted to this
group using my actual name? At best, I might get support only from the, I'd
guess, few that haven't killfiled this character. That's why I was trying to
munge the character's aliases above, in hopes that those with loaded
killfiles might see this.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6143
***************************************