[32421] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3688 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri May 11 09:09:40 2012

Date: Fri, 11 May 2012 06:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 11 May 2012     Volume: 11 Number: 3688

Today's topics:
        Taint mode help <dave@invalid.invalid>
    Re: Taint mode help <ben@morrow.me.uk>
    Re: Taint mode help (Randal L. Schwartz)
    Re: Taint mode help <ben@morrow.me.uk>
    Re: Taint mode help (Randal L. Schwartz)
    Re: Taint mode help <dave@invalid.invalid>
    Re: TIEHANDLE and deep recursion <ben@morrow.me.uk>
    Re: TIEHANDLE and deep recursion <tw+usenet@dionic.net>
    Re: TIEHANDLE and deep recursion <tw+usenet@dionic.net>
    Re: TIEHANDLE and deep recursion <rweikusat@mssgmbh.com>
    Re: TIEHANDLE and deep recursion <rweikusat@mssgmbh.com>
    Re: TIEHANDLE and deep recursion <tw+usenet@dionic.net>
    Re: TIEHANDLE and deep recursion <rweikusat@mssgmbh.com>
    Re: TIEHANDLE and deep recursion <tw+usenet@dionic.net>
    Re: WWW::Mechanize and 3rd party APIs (Google) <ben@morrow.me.uk>
    Re: WWW::Mechanize and 3rd party APIs (Google) <justin.1203@purestblue.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 10 May 2012 19:16:54 +0000 (UTC)
From: "Dave Saville" <dave@invalid.invalid>
Subject: Taint mode help
Message-Id: <fV45K0OBJxbE-pn2-SwGiYDdg7mHi@localhost>

One of my cgi scripts is throwing:

 Insecure dependency in chdir while running with -t switch at 
d:/usr/lib/perl/lib/5.8.2/File/Find.pm line 814., referer: 
*cgi-bin/upload.pl

I know how to fix this if it were in my code but in the supplied 
modules? I must be missing something :-)

I am not passing anything to find. I chdir to a hard coded directory 
and then find(&wanted, '.');
-- 
Regards
Dave Saville


------------------------------

Date: Thu, 10 May 2012 21:56:32 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Taint mode help
Message-Id: <0lmt79-sf.ln1@anubis.morrow.me.uk>


Quoth "Dave Saville" <dave@invalid.invalid>:
> One of my cgi scripts is throwing:
> 
>  Insecure dependency in chdir while running with -t switch at 
> d:/usr/lib/perl/lib/5.8.2/File/Find.pm line 814., referer: 
> *cgi-bin/upload.pl

-t and not -T? That's a bad idea for a CGI script. So is using 5.8.2,
which is 9 years old now and entirely unsupported.

> I know how to fix this if it were in my code but in the supplied 
> modules? I must be missing something :-)
> 
> I am not passing anything to find. I chdir to a hard coded directory 
> and then find(&wanted, '.');

See either the no_chdir option or the untaint* options to File::Find.

Ben



------------------------------

Date: Thu, 10 May 2012 13:46:24 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Taint mode help
Message-Id: <8662c3228v.fsf@red.stonehenge.com>

>>>>> "Dave" == Dave Saville <dave@invalid.invalid> writes:

Dave> I am not passing anything to find. I chdir to a hard coded directory 
Dave> and then find(&wanted, '.');

Give find() the full hardcoded path then.  It's very likely taint mode
is upset by the response from trying to turn '.' into an absolute path,
since it involves values "from the outside".

print "Just another Perl hacker,"; # the original

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


------------------------------

Date: Thu, 10 May 2012 23:15:39 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Taint mode help
Message-Id: <b9rt79-a11.ln1@anubis.morrow.me.uk>


Quoth merlyn@stonehenge.com (Randal L. Schwartz):
> >>>>> "Dave" == Dave Saville <dave@invalid.invalid> writes:
> 
> Dave> I am not passing anything to find. I chdir to a hard coded directory 
> Dave> and then find(&wanted, '.');
> 
> Give find() the full hardcoded path then.  It's very likely taint mode
> is upset by the response from trying to turn '.' into an absolute path,
> since it involves values "from the outside".

No, it's simpler than that: find (without no_chdir) chdirs into each
directory as it recurses down. Since those directory names came from
readdir, they are tainted, and you can't pass them back to chdir without
untainting them. This may seem a little silly, but that's the way it is.

(I'm not actually certain there isn't some way, on some system, of
arranging for readdir to return something which chdir will misinterpret.
ISTR, for instance, that it's possible for a misconfigured Samba server
to return directory names with / in, and Windows will happily pass them
back to the application but then chdir to the wrong place when you try
to enter the directory. I may be misremembering, though.)

Ben



------------------------------

Date: Thu, 10 May 2012 20:06:53 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Taint mode help
Message-Id: <86pqabza9e.fsf@red.stonehenge.com>

>>>>> "Ben" == Ben Morrow <ben@morrow.me.uk> writes:

Ben> No, it's simpler than that: find (without no_chdir) chdirs into
Ben> each directory as it recurses down. Since those directory names
Ben> came from readdir, they are tainted, and you can't pass them back
Ben> to chdir without untainting them. This may seem a little silly, but
Ben> that's the way it is.

Ahh, thanks for that.  Haven't had as much experience with taint as most
people attribute to me. :)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


------------------------------

Date: Fri, 11 May 2012 12:08:27 +0000 (UTC)
From: "Dave Saville" <dave@invalid.invalid>
Subject: Re: Taint mode help
Message-Id: <fV45K0OBJxbE-pn2-qlRdVxjRnXdH@localhost>

On Thu, 10 May 2012 20:56:32 UTC, Ben Morrow <ben@morrow.me.uk> wrote:

> 
> Quoth "Dave Saville" <dave@invalid.invalid>:
> > One of my cgi scripts is throwing:
> > 
> >  Insecure dependency in chdir while running with -t switch at 
> > d:/usr/lib/perl/lib/5.8.2/File/Find.pm line 814., referer: 
> > *cgi-bin/upload.pl
> 
> -t and not -T? That's a bad idea for a CGI script. So is using 5.8.2,
> which is 9 years old now and entirely unsupported.
> 


-t Ooops a forgotton switch. Well spotted thanks. Ages ago I had a 
problem in turning taint mode on at all with Apache. Our port barfs on
*any* command line switches for perl. I got around it by using the 
PERL5OPT environmental in the script that starts apache. You need to 
ensure it is in the list of passed environmentals though. I set it to 
-t to catch anything I may have missed in testing - you know what 
users are like :-) - seeing as it had been running without a taint 
check at all for some time it did not seem any worse. And of course I 
forgot all about it. 5.8.2 happens to be the latest that actually 
works on my platform and if it ain't broke...... The latest port has a
heap of problems that are yet to be resolved. :-(

> > I know how to fix this if it were in my code but in the supplied 
> > modules? I must be missing something :-)
> > 
> > I am not passing anything to find. I chdir to a hard coded directory 
> > and then find(&wanted, '.');
> 
> See either the no_chdir option or the untaint* options to File::Find.
> 

Thanks - no_chdir is perfect. Did not even need to touch the "wanted" 
code as I was working with FQNs anyway. :-)
-- 
Regards
Dave Saville


------------------------------

Date: Wed, 9 May 2012 23:52:11 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: TIEHANDLE and deep recursion
Message-Id: <r19r79-6dm2.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>         
>     ${*$fh{SCALAR}} = $name;

> It also relies on the feature that the SCALAR slot of a glob
> autovivifies like any other 'real reference', something which is
> documented as
> 
> 	This might change in a future release.
>         (perlref)

That's extremely unlikely. That sentence was (I believe) intended to
prepare people for the possibility of perl's glob creation mechanism
changing so that a glob did not necessarily have a SCALAR component. 

In 5.8 and earlier, when a glob was created the SCALAR slot was filled
at the same time:

    ~% perl5.8.9 -MDevel::Peek -e'Dump *foo'
    SV = PVGV(0x284b0068) at 0x28403d08
        [...]
      NAME = "foo"
      NAMELEN = 3
      GvSTASH = 0x2840309c	"main"
      GP = 0x284b7760
        SV = 0x28403cb4
          [...]
        AV = 0x0
        HV = 0x0
        CV = 0x0
          [...]

Notice that although AV, HV and CV are null, SV has already been filled.
This is reflected in the corresponding *foo{THING} operations:

    ~% perl5.8.9 -le'print for *foo{ARRAY}, *foo{SCALAR}'

    SCALAR(0x28403cc0)
    ~%

For 5.10 this was changed, since most globs in modern Perl programs only
use the CODE slot (since most variables are now lexicals), so creating a
scalar for the SCALAR slot was a waste of memory:

    ~% perl5.10.0 -MDevel::Peek -e'Dump *foo'
    SV = PVGV(0x800ecc630) at 0x800e8fed0
        [...]
      NAME = "foo"
      NAMELEN = 3
      GvSTASH = 0x800e0a108	"main"
      GP = 0x800ebb4c0
        SV = 0x0
          [...]
        AV = 0x0
        HV = 0x0
        CV = 0x0
          [...]

However, in order not to break code like the example above, it was made
impossible to see this from Perl. As soon as you try to look at the
SCALAR slot, it autovivs and pretends there was a scalar there all
along, so the Perl test gives the same result as before:

    ~% perl5.10.0 -le'print for *foo{SCALAR}, *foo{ARRAY}'

    SCALAR(0x800e0a2a0)
    ~%

Since the change has been made already, and in a way which preserves
backcompat, I think it extremely unlikely that this behaviour will ever
go away.

If you're worried, though, all you need to do is create a fresh scalar
and stuff it into the glob, just as you would if you wanted to use the
ARRAY or HASH slot. There isn't an explicit anonymous scalar constructor
(though I sometimes wish there was) but a lexical that's about to go out
of scope will do just fine:

    *$fh = do { \my $tmp };
    ${*$fh{SCALAR}} = $name;

In a short method, of course, you don't need to bother with the do
block.

Ben



------------------------------

Date: Thu, 10 May 2012 08:16:30 +0100
From: Tim Watts <tw+usenet@dionic.net>
Subject: Re: TIEHANDLE and deep recursion
Message-Id: <ej6s79-8kp.ln1@squidward.local.dionic.net>

Ben Morrow wrote:

> Conceptually the tied filehandle points to a different 'file' from the
> non-tied one: a tied filehandle doesn't need a real file behind it at
> all, and as far as Perl is concerned the object it's tied to is 'the
> file'. So you have one filehandle pointing to a real file, and another
> pointing to an object, and the fact that object copies data into the
> file is just a coincidence.
> 
> In principle you *could* use just one filehandle, by untying it and
> then retying afterwards, but it would be pretty awkward. For one thing,
> you'd need to set up your TIEHANDLE method so that if you passed it an
> already-constructed object it used that rather than constructing a new
> one. For another, as Rainer pointed out, you'd've just created a
> reference loop, and you'd need to explicitly break it before the object
> would be destroyed.

Yes - it's all clear now.

I will re-read the perltie man page, but I have a strong feeling that 
manpage is lacking in pointing out this - it was natural to assume one would 
tie the actual handle and that's where it all went wrong.

Perhaps as a "newbie" to this feature I should contribute a documentation 
patch :)

-- 
Tim Watts


------------------------------

Date: Thu, 10 May 2012 09:32:03 +0100
From: Tim Watts <tw+usenet@dionic.net>
Subject: Re: TIEHANDLE and deep recursion
Message-Id: <31bs79-lrq.ln1@squidward.local.dionic.net>



Rainer Weikusat wrote:

> 
> An entirely different approach to accomplish this:
> 
> ---------------
> package SafeFile;
> 
> sub new
> {
>     my ($class, $name) = @_;
>     my $fh;
> 
>     open($fh, '>', $name.'.tmp')
> or die("open: $name.tmp: $!");
>         
>     ${*$fh{SCALAR}} = $name;
>     return bless($fh, $class);
> }
> 
> sub DESTROY
> {
>     my $fh = $_[0];
>     my $name;
> 
>     close($fh);
> 
>     $name = ${*$fh{SCALAR}};
>     rename($name.'.tmp', $name);
> }
> 
> package main;
> 
> {
>     my $fh = SafeFile->new('/tmp/ziegenwurst');
> 
>     print $fh ("Ziege\n");
>     print $fh ("Salz\n");
> }

Hi Rainer,

I did consider something like this - but more like a fully OO style where 
the print/write methods would be implemented and the filehandle would not be 
obtainable.

In the end, given tie works, I like the tie approach as it is less accident 
prone.

Background:

When I worked at Imperial College, we built a perl framework for driving 
sysadmin scripts - and one of the modules implemented "SafeFile" - so that a 
crash could never leave a half written system file (eg /etc/passwd) - also 
the module added some functionality such as lockouts (eg if /etc/passwd-
special existed, then /etc/passwd would not be overwritten, it could force 
file uid/gig/modes and various other tricks. It was declared to be the 
*only* way to write a file from the system scripts that ran from cron.

There were loads of other modules for logging, machine-class handling, 
merging config files and so on.

Sadly, we never opensourced it (though the nod was given by the Head of 
Dept).

Now I work at Kings College London, I have need of the same sort of setup - 
so I am reimplementing the functionality, but changing the design a little 
to add some new ideas.

The original "SafeFile" returned a ($fh, $obj) pair - use $fh and close via 
$obj. I never liked that in hindsight but we were lazy and it worked - so I 
set out to "fix" it this time around.

My code is open source from the get-go - though it's going to be a while 
before any of it is ready. Imperial had 4-5 people working 3 months over 
summer to do theirs, in work time. I have 1 person (me), a few % work time 
and my own time. The code name (for when I eventually upload to GoogleCode 
or BitBucket) is Sys::SysUpdate

-- 
Tim Watts


------------------------------

Date: Thu, 10 May 2012 12:55:17 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: TIEHANDLE and deep recursion
Message-Id: <87lil08d3u.fsf@sapphire.mobileactivedefense.com>

Tim Watts <tw+usenet@dionic.net> writes:
> Ben Morrow wrote:
>> Conceptually the tied filehandle points to a different 'file' from the
>> non-tied one: a tied filehandle doesn't need a real file behind it at
>> all, and as far as Perl is concerned the object it's tied to is 'the
>> file'.

[...]

> I will re-read the perltie man page, but I have a strong feeling that 
> manpage is lacking in pointing out this - it was natural to assume one would 
> tie the actual handle and that's where it all went wrong.

This actually seems rather 'unnatural' to me: The idea behind the
tieing mechanism is that some kind of 'familiar' Perl construct (like a
hash or a filehandle) can be used to interface with some conceptually
similar 'other thing', the classic example being a hashed flat-file
database, based on a set of abstract operations defining 'a hash' (or
'a filehandle') not in terms of what it is but in terms of how it
behaves in reply to certain messages. This means that tieing a
filehandle which actually refers to some file implies loss of the
ability to use this filehandle to manipulate the file in the 'usual'
way using the built in filehandle-based file manipulation operations.
This may be ok if this filehandle is henceforth supposed to act as
mock filehandle interface object to $something_completely_different,
although I would rather avoid that.


------------------------------

Date: Thu, 10 May 2012 13:10:44 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: TIEHANDLE and deep recursion
Message-Id: <87bolw8ce3.fsf@sapphire.mobileactivedefense.com>

Tim Watts <tw+usenet@dionic.net> writes:
> Rainer Weikusat wrote:
>> An entirely different approach to accomplish this:
>> 
>> ---------------
>> package SafeFile;
>> 
>> sub new
>> {
>>     my ($class, $name) = @_;
>>     my $fh;
>> 
>>     open($fh, '>', $name.'.tmp')
>> or die("open: $name.tmp: $!");
>>         
>>     ${*$fh{SCALAR}} = $name;
>>     return bless($fh, $class);
>> }
>> 
>> sub DESTROY
>> {
>>     my $fh = $_[0];
>>     my $name;
>> 
>>     close($fh);
>> 
>>     $name = ${*$fh{SCALAR}};
>>     rename($name.'.tmp', $name);
>> }
>> 
>> package main;
>> 
>> {
>>     my $fh = SafeFile->new('/tmp/ziegenwurst');
>> 
>>     print $fh ("Ziege\n");
>>     print $fh ("Salz\n");
>> }
>
> Hi Rainer,
>
> I did consider something like this - but more like a fully OO style where 
> the print/write methods would be implemented

Read: You didn't "consider something like this". But you can't help
the temptation to badmouth it a little using whatever empty phrases
happen to come to you head such as  'not fully OO style' (aka doesn't
reimplement half of the universe uselessly just to control
finalization) 

[...]

> I like the tie approach as it is less accident prone.

or "it is accident prone" ...

> Background:
>
> When I worked at Imperial College,

[...]

> Now I work at Kings College London,

"I'm a varsity guy!"

How come I'm not surprised ...



------------------------------

Date: Thu, 10 May 2012 15:26:53 +0100
From: Tim Watts <tw+usenet@dionic.net>
Subject: Re: TIEHANDLE and deep recursion
Message-Id: <dqvs79-6r7.ln1@squidward.local.dionic.net>

Rainer Weikusat wrote:

> Tim Watts <tw+usenet@dionic.net> writes:
>> Rainer Weikusat wrote:
>>> An entirely different approach to accomplish this:
>>> 
>>> ---------------
>>> package SafeFile;
>>> 
>>> sub new
>>> {
>>>     my ($class, $name) = @_;
>>>     my $fh;
>>> 
>>>     open($fh, '>', $name.'.tmp')
>>> or die("open: $name.tmp: $!");
>>>         
>>>     ${*$fh{SCALAR}} = $name;
>>>     return bless($fh, $class);
>>> }
>>> 
>>> sub DESTROY
>>> {
>>>     my $fh = $_[0];
>>>     my $name;
>>> 
>>>     close($fh);
>>> 
>>>     $name = ${*$fh{SCALAR}};
>>>     rename($name.'.tmp', $name);
>>> }
>>> 
>>> package main;
>>> 
>>> {
>>>     my $fh = SafeFile->new('/tmp/ziegenwurst');
>>> 
>>>     print $fh ("Ziege\n");
>>>     print $fh ("Salz\n");
>>> }
>>
>> Hi Rainer,
>>
>> I did consider something like this - but more like a fully OO style where
>> the print/write methods would be implemented
> 
> Read: You didn't "consider something like this".

What's got your goat?

I did consider something like this (I think I'd know) - specifically:

my $fobj = SafeFile->new(filename, options...);

$obj->print(Blah);

$obj->abort(); # Backs out of the final rename and deletes the tmp file.

$obj->close();

> But you can't help
> the temptation to badmouth it a little using whatever empty phrases
> happen to come to you head such as  'not fully OO style' (aka doesn't
> reimplement half of the universe uselessly just to control
> finalization)

I am not "badmouthing" it - it is a valid solution. However, for the 
situation I have, I feel that the tie() solution fits better.

You said: "This doesn't work with explicit close calls"

This is exactly the type of error I am guarding against - this framework is 
designed to be fairly strict in use and it is not improbable that *someone 
else* programming against it might make the error of calling "close $fh" 
because it is normal to them.



> 
> [...]
> 
>> I like the tie approach as it is less accident prone.
> 
> or "it is accident prone" ...
> 
>> Background:
>>
>> When I worked at Imperial College,
> 
> [...]
> 
>> Now I work at Kings College London,
> 
> "I'm a varsity guy!"
> 
> How come I'm not surprised ...

What's that supposed to mean? I appreciate the good ideas you have suggested 
- and your explanation did help me to understand the internal mechanics of 
tie. But there is really no need to throw a strop. My job requires me to 
work with and configure and understand dozens of major software packages as 
well as the OS, VMWare, bits of python and god knows what else. Most of 
those I am aiming to be 70% competant in - sorry I cannot manage 100% on all 
of them, including perl!!



Tim

-- 
Tim Watts


------------------------------

Date: Thu, 10 May 2012 15:44:05 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: TIEHANDLE and deep recursion
Message-Id: <87wr4knlje.fsf@sapphire.mobileactivedefense.com>

Tim Watts <tw+usenet@dionic.net> writes:
> Rainer Weikusat wrote:
>> Tim Watts <tw+usenet@dionic.net> writes:
>>> Rainer Weikusat wrote:
>>>> An entirely different approach to accomplish this:
>>>> 
>>>> ---------------
>>>> package SafeFile;
>>>> 
>>>> sub new
>>>> {
>>>>     my ($class, $name) = @_;
>>>>     my $fh;
>>>> 
>>>>     open($fh, '>', $name.'.tmp')
>>>> or die("open: $name.tmp: $!");
>>>>         
>>>>     ${*$fh{SCALAR}} = $name;
>>>>     return bless($fh, $class);
>>>> }
>>>> 
>>>> sub DESTROY
>>>> {
>>>>     my $fh = $_[0];
>>>>     my $name;
>>>> 
>>>>     close($fh);
>>>> 
>>>>     $name = ${*$fh{SCALAR}};
>>>>     rename($name.'.tmp', $name);
>>>> }
>>>> 
>>>> package main;
>>>> 
>>>> {
>>>>     my $fh = SafeFile->new('/tmp/ziegenwurst');
>>>> 
>>>>     print $fh ("Ziege\n");
>>>>     print $fh ("Salz\n");
>>>> }
>>>
>>> Hi Rainer,
>>>
>>> I did consider something like this - but more like a fully OO style where
>>> the print/write methods would be implemented
>> 
>> Read: You didn't "consider something like this".
>
> What's got your goat?
>
> I did consider something like this (I think I'd know) - specifically:
>
> my $fobj = SafeFile->new(filename, options...);
>
> $obj->print(Blah);

As I already wrote: You didn't. And you wouldn't ever ...
Apart from that, I'm not interested in this pissing contest.


------------------------------

Date: Thu, 10 May 2012 15:53:59 +0100
From: Tim Watts <tw+usenet@dionic.net>
Subject: Re: TIEHANDLE and deep recursion
Message-Id: <7d1t79-ng8.ln1@squidward.local.dionic.net>

Rainer Weikusat wrote:


> As I already wrote: You didn't. And you wouldn't ever ...

Then you would be wrong. I have toyed with several possible methods.

> Apart from that, I'm not interested in this pissing contest.

Whatever...
-- 
Tim Watts


------------------------------

Date: Thu, 10 May 2012 00:15:01 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: WWW::Mechanize and 3rd party APIs (Google)
Message-Id: <lcar79-0jm2.ln1@anubis.morrow.me.uk>


Quoth Eli the Bearded <*@eli.users.panix.com>:
> In comp.lang.perl.misc, Justin C  <justin.1203@purestblue.com> wrote:
> > If the site allows a user to collate the data manually by visiting the
> > pages one at a time, then I believe they're happy to share that data. I
> > just happen to be lazy, and don't want to do this the hard way. I'm not
> > trying to access anything the site isn't happy to share.
> 
> That's something of an assumption. Why do the site owners want to
> share the data? Is it because they get advertising revenue from
> ads shown with the data or is it something else?

There is a difference between 'what the site owners want' and 'what they
are reasonably entitled to insist upon'. For instance, if you use an
ad-blocking browser plugin, you are theoretically depriving someone of
some income (either the site owner or the advertising client, depending
on exactly how the blocking is done and how payment is calculated). Does
that make such plugins 'unethical'?

Despite what some people try to pretend, information is not a thing you
can own. You can choose to keep it a secret, and you can choose not to
tell someone else unless they sign a contract agreeing that they, in
turn, will keep it a secret, but once you've told someone outside the
context of an NDA they can do what they like with it.

(Copyright is a thing you can own, at least in the English-speaking
world, but its scope is a lot more limited than most people think. It
applies only to the actual words you wrote, so if, for instance, you
take a piece of code, write out the algorithm in English, then write it
out again in the same language as before without reference to the
original, you've broken the chain of copyright and the new code is
entirely 'yours'.)

> Wikipedia has a site full of content. They are happy to share it
> with the world, and rely on people who like their content donating.
> They are not happy to share it with all random robots however,
> because some place too much strain on their servers.

Restricting access for that reason is, of course, entirely reasonable.
Doing so using dodgy heuristics is not ideal, but sometimes can't be
avoided.

> > I don't consider it unethical, but I do consider that the site owners
> > weren't aware that someone may want to access the data differently.
> 
> If you aren't paying for it, you have to consider strongly that you
> are the product, not the customer. The site operators are likely
> more interested in the customer's interests.

If I am the product then I have no moral obligations towards the site
owner whatever. They are trying to sell me to someone else without my
consent, and there is no reason I should cooperate with that.

Ben



------------------------------

Date: Fri, 11 May 2012 13:12:57 +0100
From: Justin C <justin.1203@purestblue.com>
Subject: Re: WWW::Mechanize and 3rd party APIs (Google)
Message-Id: <9bcv79-8fe.ln1@zem.masonsmusic.co.uk>

On 2012-05-09, Eli the Bearded <*@eli.users.panix.com> wrote:
> In comp.lang.perl.misc, Justin C  <justin.1203@purestblue.com> wrote:
>> If the site allows a user to collate the data manually by visiting the
>> pages one at a time, then I believe they're happy to share that data. I
>> just happen to be lazy, and don't want to do this the hard way. I'm not
>> trying to access anything the site isn't happy to share.
>
> That's something of an assumption. Why do the site owners want to
> share the data? Is it because they get advertising revenue from
> ads shown with the data or is it something else?
>
> Consider carefully who pays for the site and what they hope to get
> from it's existance. 

[snip]

>> I don't consider it unethical, but I do consider that the site owners
>> weren't aware that someone may want to access the data differently.
>
> If you aren't paying for it, you have to consider strongly that you
> are the product, not the customer. The site operators are likely
> more interested in the customer's interests.

In this specific case the site is a directory (of about 200
entries) of a specific type of independant (no chains, no wholly
owned by MegaCorps) UK business. The organisation promotes the
activity of those businesses as well as maintaining the directory.
I'm not sure where any money is being made by the organisation. It
may be run not-for-profit jointly by member businesses.

The businesses in question are, and have traditionally been, core
customers of my employers for over thirty years, and core customers
to other businesses in this line for many more decades. A large
percentage of the businesses are already known to us, we're looking
to reach those businesses we don't yet know - more recent entrants
to the field.

Before anyone goes off the deep end about the fact that we're
looking to make money by marketing to those businesses, I'd like to
point out that the products we hope to sell to them are for resale
at a profit. While our goal isn't altruistic, we are looking to
make those businesses more profitable and keep them active on our
high streets into the future. Yes, we want to make money out of
what I'm doing, but a side effect of that is that staff in those
businesses still have jobs, and mazonA, layP, etc., don't get a
small percentage of 1% increased turnover.

I'm not using the 'no loss of jobs' to morally justify my actions,
I see no moral problem with scraping the site because the site
exists to promote the businesses we're looking to reach - though
it's main goal is to get people to spend money with them, not sell
to them. The end result, I hope, is the same.

Maybe, if anyone wants to continue, we should go to
alt.comp.issues.ethics?

   Justin.

-- 
Justin C, by the sea.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3688
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32421] in Perl-Users-Digest

Perl-Users Digest, Issue: 3688 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri May 11 09:09:40 2012

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri May 11 09:09:40 2012