[23261] in Perl-Users-Digest
Perl-Users Digest, Issue: 5481 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Sep 10 18:15:46 2003
Date: Wed, 10 Sep 2003 15:15:20 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 10 Sep 2003 Volume: 10 Number: 5481
Today's topics:
perl regex question (Chad Williams)
Re: perl regex question <mpapec@yahoo.com>
Re: perl regex question <postmaster@castleamber.com>
Re: perl regex question <krahnj@acm.org>
Re: perl regex question <xx087@freenet.carleton.ca>
Re: perl regex question <pinyaj@rpi.edu>
Re: perl regex question (Quantum Mechanic)
Re: perl simple cms <tzz@lifelogs.com>
Re: Printing a hash of hashes using an array for the he <krahnj@acm.org>
Re: Printing a hash of hashes using an array for the he <usenet@dwall.fastmail.fm>
Re: Printing a hash of hashes using an array for the he <mothra@nowhereatall.com>
Re: Printing a hash of hashes using an array for the he <usenet@dwall.fastmail.fm>
Re: Printing a hash of hashes using an array for the he <mothra@nowhereatall.com>
Re: Reading Data File Records <mikeflan@earthlink.net>
Re: Slice an array of hashes? <cs@edu.edu>
Re: Slice an array of hashes? <krahnj@acm.org>
Re: Slice an array of hashes? (Graham)
Re: Slice an array of hashes? (Quantum Mechanic)
Re: Speeding up LWP::Simple <postmaster@castleamber.com>
Re: Speeding up LWP::Simple <tcurrey@no.no.no.i.said.no>
Re: Speeding up LWP::Simple (David Morel)
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 10 Sep 2003 11:40:14 -0700
From: chad_dude@yahoo.com (Chad Williams)
Subject: perl regex question
Message-Id: <2a3b16bc.0309101040.1d8148f7@posting.google.com>
my $procspeed="state on-line state_begin 1058768026 cpu_type sparcv9
fpu_type sparcv9 clock_MHz 450";
How come this:
$procspeed=~ s/.*\s(\d+)\w*/$1/;
gives me $procspeed=450
but this:
$procspeed=~ s/.*\s(\d+)\w*$/$1/;
matches the whole line?
What's the most efficient way to do this? (get the last word/word of
digits, etc)
TIA
------------------------------
Date: Wed, 10 Sep 2003 21:03:05 +0200
From: Matija Papec <mpapec@yahoo.com>
Subject: Re: perl regex question
Message-Id: <91tulvov12jealtunaj4sbb2ighg8o0hr1@4ax.com>
X-Ftn-To: Chad Williams
chad_dude@yahoo.com (Chad Williams) wrote:
>my $procspeed="state on-line state_begin 1058768026 cpu_type sparcv9
>fpu_type sparcv9 clock_MHz 450";
>
>How come this:
>
>$procspeed=~ s/.*\s(\d+)\w*/$1/;
>
>gives me $procspeed=450
>
>but this:
>
>$procspeed=~ s/.*\s(\d+)\w*$/$1/;
>
>matches the whole line?
>
>What's the most efficient way to do this? (get the last word/word of
>digits, etc)
Don't know for most efficient but this is the simplest :)
print ($procspeed =~ /(\d+)/g)[-1]
--
Matija
------------------------------
Date: Wed, 10 Sep 2003 21:14:47 +0200
From: John Bokma <postmaster@castleamber.com>
Subject: Re: perl regex question
Message-Id: <1063221395.261359@halkan.kabelfoon.nl>
Matija Papec wrote:
> X-Ftn-To: Chad Williams
>
> chad_dude@yahoo.com (Chad Williams) wrote:
>
>>my $procspeed="state on-line state_begin 1058768026 cpu_type sparcv9
>>fpu_type sparcv9 clock_MHz 450";
>>
>>How come this:
>>
>>$procspeed=~ s/.*\s(\d+)\w*/$1/;
>>
>>gives me $procspeed=450
>>
>>but this:
>>
>>$procspeed=~ s/.*\s(\d+)\w*$/$1/;
>>
>>matches the whole line?
>>
>>What's the most efficient way to do this? (get the last word/word of
>>digits, etc)
>
>
> Don't know for most efficient but this is the simplest :)
> print ($procspeed =~ /(\d+)/g)[-1]
print substr($procspeed, -3);
assuming it is always between 100 and 999.
--
Kind regards, feel free to mail: mail(at)johnbokma.com (or reply)
virtual home: http://johnbokma.com/ ICQ: 218175426
John web site hints: http://johnbokma.com/websitedesign/
------------------------------
Date: Wed, 10 Sep 2003 20:13:18 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: perl regex question
Message-Id: <3F5F85DA.EA7DB638@acm.org>
Chad Williams wrote:
>
> my $procspeed="state on-line state_begin 1058768026 cpu_type sparcv9
> fpu_type sparcv9 clock_MHz 450";
>
> How come this:
>
> $procspeed=~ s/.*\s(\d+)\w*/$1/;
>
> gives me $procspeed=450
>
> but this:
>
> $procspeed=~ s/.*\s(\d+)\w*$/$1/;
>
> matches the whole line?
They both match the same thing. The $ anchor doesn't make a difference
because the .* at the beginning is greedy.
John
--
use Perl;
program
fulfillment
------------------------------
Date: 10 Sep 2003 20:27:51 GMT
From: Glenn Jackman <xx087@freenet.carleton.ca>
Subject: Re: perl regex question
Message-Id: <slrnblv2a6.630.xx087@freenet10.carleton.ca>
Chad Williams <chad_dude@yahoo.com> wrote:
> my $procspeed="state on-line state_begin 1058768026 cpu_type sparcv9 fpu_type sparcv9 clock_MHz 450";
[...]
> What's the most efficient way to do this? (get the last word/word of
> digits, etc)
Also,
my ($last_digits) = $procspeed =~ m{
(\d+) # a sequence of digits
\D*$ # followed by non-digits at the end of the string
}x;
--
Glenn Jackman
NCF Sysadmin
glennj@ncf.ca
------------------------------
Date: Wed, 10 Sep 2003 17:56:46 -0400
From: Jeff 'japhy' Pinyan <pinyaj@rpi.edu>
To: Chad Williams <chad_dude@yahoo.com>
Subject: Re: perl regex question
Message-Id: <Pine.SGI.3.96.1030910174940.45177B-100000@vcmr-64.server.rpi.edu>
[posted & mailed]
On 10 Sep 2003, Chad Williams wrote:
>my $procspeed="state on-line state_begin 1058768026 cpu_type sparcv9
>fpu_type sparcv9 clock_MHz 450";
>
>How come this:
>
>$procspeed=~ s/.*\s(\d+)\w*/$1/;
>
>gives me $procspeed=450
>
>but this:
>
>$procspeed=~ s/.*\s(\d+)\w*$/$1/;
>
>matches the whole line?
They should both result in $procspeed being 450. (They do for me.)
>What's the most efficient way to do this? (get the last word/word of
>digits, etc)
You can use the .* approach, but you need to be sure you write it
properly. You did, ensuring there's a space before the digits. If you
had left it out, $procspeed would only be '0', since the .* is greedy, and
(\d+) is content in just matching one digit.
I wouldn't suggest the /(\d+)\D*$/ approach, because it requires you try
to match at EACH chunk of digits. If there are many chunks of digits in
your string, that'll be inefficient.
I personally suggest reversing the string, matching the "first" set of
digits, and then reversing that match.
$last_num = reverse( (reverse($str) =~ /(\d+)/)[0] );
Or:
$last_num = reverse $1 if reverse($str) =~ /(\d+)/;
--
Jeff Pinyan RPI Acacia Brother #734 2003 Rush Chairman
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)
------------------------------
Date: 10 Sep 2003 14:57:13 -0700
From: quantum_mechanic_1964@yahoo.com (Quantum Mechanic)
Subject: Re: perl regex question
Message-Id: <f233f2f0.0309101357.22eb12d0@posting.google.com>
chad_dude@yahoo.com (Chad Williams) wrote:
> What's the most efficient way to do this? (get the last word/word of
> digits, etc)
Generally slower to faster options:
$last_word = ($string =~ /(\w+)/g)[-1]; # only keep the last word
$last_word = reverse ( (reverse $string) =~ /(\w+)/ ); # double reverse
$last_word = (split /\W+/, $string )[-1];
$last_word = reverse( (split /\W+/, reverse $string, 2 )[0] );
-QM
------------------------------
Date: Wed, 10 Sep 2003 15:49:30 -0400
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: perl simple cms
Message-Id: <4nekyol8z9.fsf@lockgroove.bwh.harvard.edu>
On Wed, 10 Sep 2003, andrew@nospam_andicrook.demon.co.uk wrote:
> yes I have this covered ciding wise, its how I should store the page
> contents in a mysql database how I should store paragraphs, images,
> links ,etc for pages and the web site as a whole i.e. design the
> database structure around a cms
Oh, I see, your question has nothing to do with Perl itself. I
assumed otherwise.
I would store each distinct class of thing in a table, and link
between related things. There may also be other things involved.
You can instead look at any of the open-source CMS projects out there
and see how they implement the database structure. RDBMS table design
is definitely outside the scope of comp.lang.perl.misc, though.
Ted
------------------------------
Date: Wed, 10 Sep 2003 18:58:55 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: Printing a hash of hashes using an array for the headings and getting the columns to line up
Message-Id: <3F5F746B.F1D9B825@acm.org>
Mothra wrote:
>
> I am trying to print out a report using a Hash of Hashes and am having
> trouble getting the columns to print out correctly. I have an array that
> contains the column headings in the correct order but I am stuck as how
> to get the HoH to use this information. I have provided a test script as
> to what I have tried. How can I get the script to print the columns in
> the correct order.
>
> [snip]
>
> my @family = qw(UNIGRAPHICS_NX UNIGRAPHICS SOLID_EDGE WEBTOOLS other);
>
> [snip]
>
> printf "%24s %10s %10s %8s %5s \n", @family;
>
> foreach my $person( sort keys %people ) {
>
> printf "%9s %2d %2d %2d %2d %2d\n", $person,
> map { $people{$person}{$_} } sort keys %{ $people{$person} };
> }
printf "%9s %2d %2d %2d %2d %2d\n",
$person, @{ $people{ $person } }{ @family };
John
--
use Perl;
program
fulfillment
------------------------------
Date: Wed, 10 Sep 2003 18:17:14 -0000
From: "David K. Wall" <usenet@dwall.fastmail.fm>
Subject: Re: Printing a hash of hashes using an array for the headings and getting the columns to line up
Message-Id: <Xns93F29150BF9E5dkwwashere@216.168.3.30>
Mothra <mothra@nowhereatall.com> wrote:
>"David K. Wall" <usenet@dwall.fastmail.fm> wrote in message
>news:Xns93F27CE764669dkwwashere@216.168.3.30...
>>
>> printf "$form_str\n", ' ', @family;
>>
>> foreach my $person( sort keys %people ) {
>> printf "$form_str\n", $person,
>> map { $people{$person}{$_} } @family;
>> }
>>
> I added this and receive:
> Use of uninitialized value in printf at F:\scripts\rep.pl line 52.
I noticed the missing value and got the same warning message, but
since you didn't seem to care I didn't either.
foreach my $person( sort keys %people ) {
printf "$form_str\n", $person,
map { $people{$person}{$_} or 0 } @family;
}
Replace the '0' with whatever value is appropriate.
--
David Wall
------------------------------
Date: Wed, 10 Sep 2003 12:22:15 -0700
From: "Mothra" <mothra@nowhereatall.com>
Subject: Re: Printing a hash of hashes using an array for the headings and getting the columns to line up
Message-Id: <3f5f7985$1@usenet.ugs.com>
"David K. Wall" <usenet@dwall.fastmail.fm> wrote in message
news:Xns93F29150BF9E5dkwwashere@216.168.3.30...
> Mothra <mothra@nowhereatall.com> wrote:
>
> >"David K. Wall" <usenet@dwall.fastmail.fm> wrote in message
> >news:Xns93F27CE764669dkwwashere@216.168.3.30...
> >>
[snipped]
>
> I noticed the missing value and got the same warning message, but
> since you didn't seem to care I didn't either.
Hmn, now I am confused, how did you come to the conclusion that I didn't
seem to care?
In my orginal post the code I posted did not generate the warning.
>
>
> foreach my $person( sort keys %people ) {
> printf "$form_str\n", $person,
> map { $people{$person}{$_} or 0 } @family;
> }
>
> Replace the '0' with whatever value is appropriate.
>
Yep!! this works great!!!
Thanks very much!!
Mothra
------------------------------
Date: Wed, 10 Sep 2003 19:38:03 -0000
From: "David K. Wall" <usenet@dwall.fastmail.fm>
Subject: Re: Printing a hash of hashes using an array for the headings and getting the columns to line up
Message-Id: <Xns93F29F04AC5A9dkwwashere@216.168.3.30>
Mothra <mothra@nowhereatall.com> wrote:
>
> "David K. Wall" <usenet@dwall.fastmail.fm> wrote in message
> news:Xns93F29150BF9E5dkwwashere@216.168.3.30...
>> Mothra <mothra@nowhereatall.com> wrote:
>>
>> >"David K. Wall" <usenet@dwall.fastmail.fm> wrote in message
>> >news:Xns93F27CE764669dkwwashere@216.168.3.30...
>> >>
> [snipped]
>>
>> I noticed the missing value and got the same warning message, but
>> since you didn't seem to care I didn't either.
>
> Hmn, now I am confused, how did you come to the conclusion that I
> didn't seem to care?
> In my orginal post the code I posted did not generate the warning.
I copied, pasted, and ran the original code unaltered, and got this
output:
Use of uninitialized value in printf at E:\perlprog\junk.pl line 50.
UNIGRAPHICS_NX UNIGRAPHICS SOLID_EDGE WEBTOOLS other
friasd 10 21 81 25 2
jung 5 13 39 16 2
riches 18 10 83 24 2
wattsl 3 19 57 10 0
yamaguch 12 9 56 11 4
Looks like a warning to me.
>
>>
>>
>> foreach my $person( sort keys %people ) {
>> printf "$form_str\n", $person,
>> map { $people{$person}{$_} or 0 } @family;
>> }
>>
>> Replace the '0' with whatever value is appropriate.
>>
> Yep!! this works great!!!
> Thanks very much!!
You're welcome.
--
David Wall
------------------------------
Date: Wed, 10 Sep 2003 12:46:38 -0700
From: "Mothra" <mothra@nowhereatall.com>
Subject: Re: Printing a hash of hashes using an array for the headings and getting the columns to line up
Message-Id: <3f5f7f3c$1@usenet.ugs.com>
"David K. Wall" <usenet@dwall.fastmail.fm> wrote in message
news:Xns93F29F04AC5A9dkwwashere@216.168.3.30...
> Mothra <mothra@nowhereatall.com> wrote:
>
> >
> > "David K. Wall" <usenet@dwall.fastmail.fm> wrote in message
> > news:Xns93F29150BF9E5dkwwashere@216.168.3.30...
> >> Mothra <mothra@nowhereatall.com> wrote:
> >>
> >> >"David K. Wall" <usenet@dwall.fastmail.fm> wrote in message
> >> >news:Xns93F27CE764669dkwwashere@216.168.3.30...
> >> >>
> > [snipped]
> >>
> >> I noticed the missing value and got the same warning message, but
> >> since you didn't seem to care I didn't either.
> >
> > Hmn, now I am confused, how did you come to the conclusion that I
> > didn't seem to care?
> > In my orginal post the code I posted did not generate the warning.
>
> I copied, pasted, and ran the original code unaltered, and got this
> output:
>
> Use of uninitialized value in printf at E:\perlprog\junk.pl line 50.
[snipped]
I think I know what is going on.
F:\scripts>x.pl
UNIGRAPHICS_NX UNIGRAPHICS SOLID_EDGE WEBTOOLS other
friasd 10 21 81 25 2
jung 5 13 39 16 2
riches 18 10 83 24 2
wattsl 3 19 57 10 0
yamaguch 12 9 56 11 4
F:\scripts>perl -v
This is perl, v5.6.1 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail)
I knew I was not going crazy.
then this.
F:\scripts>x.pl
UNIGRAPHICS_NX UNIGRAPHICS SOLID_EDGE WEBTOOLS other
friasd 10 21 81 25 2
jung 5 13 39 16 2
riches 18 10 83 24 2
Use of uninitialized value in printf at F:\scripts\x.pl line 50.
wattsl 3 19 57 10 0
yamaguch 12 9 56 11 4
F:\scripts>perl -v
This is perl, v5.8.0 built for MSWin32-x86-multi-thread
I guess I need to use the latest version of perl, Sorry about that :)
Thanks
Mothra
------------------------------
Date: Wed, 10 Sep 2003 21:33:26 GMT
From: Mike Flannigan <mikeflan@earthlink.net>
Subject: Re: Reading Data File Records
Message-Id: <3F5F9944.B78447BB@earthlink.net>
Graham wrote:
>
> It seems it isn't just you. All I am trying to do is get the data
> blocks into a suitable perl structure so I can calculate some simple
> statistics and reformat it for another program. See comments in the
> second while loop.
>
> I really appreciate the help. I have a pile of files with this type
> of structure (a legacy of an ancient postdoc) that I need to
> manipulate and reformat.
snip
Don't be afraid to slurp the whole file. I slurp 400,000+
line files very quickly and do the processing. The only
trouble is if you do it more than once in the program.
You might see a big slowdown - at least on Win2000.
I never found a good solution to this (yet), so I just
run a bunch on individual perl scripts - one for each
file.
If you find a better solution, let us know.
Mike
------------------------------
Date: Wed, 10 Sep 2003 11:16:36 -0700
From: Chief Squawtendrawpet <cs@edu.edu>
Subject: Re: Slice an array of hashes?
Message-Id: <3F5F6A84.5A56899F@edu.edu>
Graham wrote:
> I want to find all the indices of atm where $id = "foo" or "bar" and
> extract them into a new data array called data2.
Your code uses @data, but your follow-up discussion speaks of @atm. I
assume they are the same.
Things like @atm[0..2]{id} don't make sense and give an error, as you
found out. Before you can get access to the items deeper in the data
structure (say, for the 'id' key), you have to commit to a subscript
at the higher levels; otherwise, Perl doesn't know which 'id' value to
grab.
Something like this would allow you to select items from @atm
according to the 'id' characteristics (code not tested):
@atm2 = grep { $_->{id} eq 'foo' } @atm;
Or if you want indexes to @atm instead of the hash refs themselves
(also not tested):
@ind = grep { $atm[$_]{id} eq 'foo' } 0 .. $#atm;
As a side note, use:
push @data, {HASH};
instead of code like this:
@data = (@data, {HASH});
Chief S.
------------------------------
Date: Wed, 10 Sep 2003 19:40:37 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: Slice an array of hashes?
Message-Id: <3F5F7E30.CA1B4408@acm.org>
Graham wrote:
>
> Is it possible to slice a list of hashes?
Yes.
> I read in data into the @data array as follows:
>
> while ( $i < $numLev )
> {
> $line = <$fh>; # Take next line
> $line =~ s/^\s+|\s+$//g; # Strip whitespace
> $line =~ s/,/ /g; # Replace commas
> @prf = ( @prf, (split /\s+/, $line) ); # Get the data
> $i = $#prf + 1; # Update total
> }
Better written as:
while ( my $line = <$fh> ) { # Take next line
s/^\s+//, s/\s+$// for $line; # Strip whitespace
$line =~ tr/,/ /; # Replace commas
push @prf, split ' ', $line; # Get the data
last if @prf >= $numLev; # Update total
}
> @data = (@data, {"id"=>$id, "units"=>$units, "data"=>[@prf]});
push @data, { id => $id, units => $units, data => [@prf] };
> }
>
> I want to find all the indices of atm where $id = "foo" or "bar" and
> extract them into a new data array called data2.
>
> Alas, at this point I cannot even slice into @data and select the "id"
> fields.
>
> print $atm[(0..2)]; # Gives: HASH(0x121064)
> print @atm[(0..2)]; # Gives:
> HASH(0x121064)HASH(0x11a0e0)HASH(0x112ce8)
> print @atm[(0..2)]{"id"}; # Gives: compilation error.
> print $atm[(0..2)]{"id"}; # Gives: string id tag for index 2
for my $hash ( @atm[ 0 .. 2 ] ) {
if ( $hash->{ id } eq 'foo' or $hash->{ id } eq 'bar' ) {
# do stuff with $hash->{ id }
# or $hash->{ units } or $hash->{ data }
John
--
use Perl;
program
fulfillment
------------------------------
Date: 10 Sep 2003 13:03:56 -0700
From: GrahamWilsonCA@yahoo.ca (Graham)
Subject: Re: Slice an array of hashes?
Message-Id: <eda30d78.0309101203.23854801@posting.google.com>
news:<eda30d78.0309100645.64745eb2@posting.google.com>...
>> #untested
>> my @idx = grep $atm[$_]{id} =~ /^(?:foo|bar)$/, 0 .. $#atm;
Thanks a lot Matija! That is very close to what I want. The only
trouble is that 'foo' and 'bar' are given in a command line option and
I cannot seem to build a (for lack of a better word) 'dynamic' regex
that is composed of my command line options.
What is wrong with:
$search = "\^(\?:" . join("|", @search) . ")\$";
my @idx = grep $atm[$_]{"id"} =~ /$search/, 0 .. $#atm;
Thanks again!
------------------------------
Date: 10 Sep 2003 13:49:45 -0700
From: quantum_mechanic_1964@yahoo.com (Quantum Mechanic)
Subject: Re: Slice an array of hashes?
Message-Id: <f233f2f0.0309101249.1fa9445a@posting.google.com>
GrahamWilsonCA@yahoo.ca (Graham) wrote in message news:<eda30d78.0309100645.64745eb2@posting.google.com>...
> Is it possible to slice a list of hashes?
Yes. But not as a literal slice, not the way you intended.
If you build @data like this:
@data = (@data, {"id"=>$id, "units"=>$units, "data"=>[@prf]});
then this will give you a slice:
@slice = @data[0,3,7..19];
But @slice now contains hashrefs, not just the "id" values you wanted.
To get the list of indices matching "foo" or "bar:
@indices = map { $data[$_]{id} =~ /^foo|bar$/ } 0..@data-1;
If instead you want the subset of @data itself:
@subset = map { $_->{id} =~ /^foo|bar$/ } @data;
But syntax like this:
@ids = @data[0..2]{"id"}
is equivlent to:
@ids = $data[2]{"id"}
which is the last index listed in the slice, regardless of the slice
indices or their order. [I tried this on ActivePerl 5.8.0.]
Someone more familiar with perlguts can provide an explanation, but it
appears that once the slice is generated, Perl treats it as a
comma-separated list, so only the last "expression" is returned.
-QM
------------------------------
Date: Wed, 10 Sep 2003 20:43:00 +0200
From: John Bokma <postmaster@castleamber.com>
Subject: Re: Speeding up LWP::Simple
Message-Id: <1063219488.463658@halkan.kabelfoon.nl>
2mb wrote:
> David,
> Why not just use one of the commercially available email harvesting
> packages. Most are available for 19.99. If you are going to spam, it is better
> to get your list built and operational as soon as possible, before more
> legislation goes into effect.
Why is David named a spammer without any proof? Just for harvesting
30,000,000 webpages? There are so many legal things to use it for like
text analyses, language analyses, making an estimate of size, analyzing
HTML tag use, analysing use of scripting, stylesheets etc.
Also, there are websites where one can download entire email databases
for free. I recently saw one. There was one file of 250 (!) MB. Also
files sorted based on country etc.
If he is a spammer, well let spammers find all the rope they need to
hang themselves.
Oh, and if you think I am a spammer or supporter... get a life.
--
Kind regards, feel free to mail: mail(at)johnbokma.com (or reply)
virtual home: http://johnbokma.com/ ICQ: 218175426
John web site hints: http://johnbokma.com/websitedesign/
------------------------------
Date: Wed, 10 Sep 2003 12:55:30 -0700
From: "Trent Curry" <tcurrey@no.no.no.i.said.no>
Subject: Re: Speeding up LWP::Simple
Message-Id: <bjnvnr$oi$1@news.astound.net>
John Bokma wrote:
> 2mb wrote:
>
>> David,
>> Why not just use one of the commercially available email harvesting
>> packages. Most are available for 19.99. If you are going to spam, it
>> is better to get your list built and operational as soon as
>> possible, before more legislation goes into effect.
>
> Why is David named a spammer without any proof? Just for harvesting
> 30,000,000 webpages? There are so many legal things to use it for like
> text analyses, language analyses, making an estimate of size,
> analyzing
> HTML tag use, analysing use of scripting, stylesheets etc.
>
> Also, there are websites where one can download entire email databases
> for free. I recently saw one. There was one file of 250 (!) MB. Also
> files sorted based on country etc.
While I don't know if he is a spammer or not, you have pointed out a rather
ugly flaw in this group: too many people are too dang eager to jump to
conclusions and can easily be mistaken, like in a case like this (and
others), and end up condemning a person who was completely undeserving of
such treatment. In this particular case its easy to say he could be
harvesting emails, but, as spoken above, there is no proof of that.
On the spam note, there already is more and more legislation either going
into effect, already in effect, or pending. Little by little, the government
(or governmentS I should say, as other countries have not been sitting idle
either.) It's still a big problem; it's not easy to get away once you've
been tagged by a spammer, short of changing your email address. Still, I
really hope one day we wont have to worry about them anymore, and hopefully
its not just a | dream...
------------------------------
Date: 10 Sep 2003 14:47:28 -0700
From: altalingua@hotmail.com (David Morel)
Subject: Re: Speeding up LWP::Simple
Message-Id: <60c4a7b1.0309101347.3fdd4ea2@posting.google.com>
John Bokma <postmaster@castleamber.com> wrote in message news:<1063219488.463658@halkan.kabelfoon.nl>...
> Why is David named a spammer without any proof? Just for harvesting
> 30,000,000 webpages? There are so many legal things to use it for like
> text analyses, language analyses, making an estimate of size, analyzing
> HTML tag use, analysing use of scripting, stylesheets etc.
Hi all,
No, I am not a spammer. I'm more likely to be the next google.com than
the next spammer. The mere accusation is offensive to me.
Let me take a moment to clarify some things. If I were a spammer, why
would I take the time to write my own harvester? I'm sure very
effective ones have already been written. Also, I use a hotmail
account here to prevent spam from reaching my school inbox. Don't
harvesters gather emails from Google groups? I hate spam as much as
you do.
Thank you John Bokma for giving me some suggestions. I thought that I
would have to learn how to do some programming with processes/threads,
and you confirmed this.
Why do I need to gather the HTML? Google already offers 900,000+ free
web pages to the general public for analysis purposes. See
http://www.google.com/programming-contest/index.html. I have used
these pages in the past, but there is one problem with them: the
900,000 pages are taken only from the .edu namespace... I need pages
from all of the namespaces. There are other uses to HTML data than
spamming, believe it or not. The Google programming contest lists a
few of them:
* Detecting common templates in pages, and separating out the common
structure from the individual content.
* Classifying links on a page.
* Detecting pages that are near-duplicates of one another.
* Clustering pages by topic or type.
Is there any company out there that sells big databases of web pages?
Perhaps I can avoid some work after all.
Thanks,
David Morel
P.S
Let's be nicer to each other here :)
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 5481
***************************************