[33041] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4317 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Nov 26 05:17:18 2014

Date: Wed, 26 Nov 2014 02:17:04 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 26 Nov 2014     Volume: 11 Number: 4317

Today's topics:
        [Q] How does a sub know it is called during compilation <bauhaus@futureapps.invalid>
    Re: [Q] How does a sub know it is called during compila <peter@makholm.net>
    Re: [Q] How does a sub know it is called during compila <gravitalsun@hotmail.foo>
    Re: [Q] How does a sub know it is called during compila <rweikusat@mobileactivedefense.com>
    Re: How do I compare two files byte-by-byte? <gravitalsun@hotmail.foo>
    Re: How do I compare two files byte-by-byte? <m@rtij.nl.invlalid>
    Re: How do I compare two files byte-by-byte? <rweikusat@mobileactivedefense.com>
    Re: How do I compare two files byte-by-byte? <jblack@nospam.com>
    Re: How do I compare two files byte-by-byte? <kaz@kylheku.com>
    Re: open and pipes <rweikusat@mobileactivedefense.com>
    Re: open and pipes <gandalf23@mail.com>
    Re: open and pipes <rweikusat@mobileactivedefense.com>
    Re: open and pipes <gandalf23@mail.com>
    Re: push/shift/keys/... on refs (was: A hash of referen <whynot@pozharski.name>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 25 Nov 2014 13:27:55 +0100
From: "G.B." <bauhaus@futureapps.invalid>
Subject: [Q] How does a sub know it is called during compilation?
Message-Id: <m51sjs$mju$1@dont-email.me>

Hi,

a few Perl scripts I need to maintain and improve have

use constant => X;

where X is an object produced by a typical constructor, "new".
("use constant" needs to stay for now, as in MUST.)

The constructor "new" uses other objects and these other objects
appear to be intertwined with the compilation stage, maybe,
somewhere. I am now trying to avoid the references to the other
objects in "new" conditionally, by learning if "new" was called
in such unfortunate circumstances as compilation, to see if
it helps.

As first approximation, I am trying the following sub:

sub is_BEGIN_in_caller
{
     #
     my ($k, $who) = (0);

     while (1)
     {
         $who = (caller ++$k)[3];
         return 0 if ! $who;
         return 1 if $who =~ m/ \b BEGIN $/xso;
     }
}

Is this enough and reliable?  Would I have to consider "require"
and "eval" up the call chain, too, btw.?

The background is a mod_perl2 environment, and the "other
objects" mentioned above include the server object and very
likely I/O handles. I am assuming compilation stage because of
occurrences of sigkill in the server log and some silent deaths
during testing.


------------------------------

Date: Tue, 25 Nov 2014 15:22:23 +0100
From: Peter Makholm <peter@makholm.net>
Subject: Re: [Q] How does a sub know it is called during compilation?
Message-Id: <878uizqs0w.fsf@vps1.hacking.dk>

"G.B." <bauhaus@futureapps.invalid> writes:

> As first approximation, I am trying the following sub:
>
> sub is_BEGIN_in_caller
> {

Since Perl 5.14.0 there is a ${^GLOBAL_PHASE} that exports some
knowledge about what phase the perl interpreter is in. This might be
able to tell you whether your are called from a BEGIN block where the
global phase is set to 'START'.

That might be a more reliable way to solve your problem.

If this catches BEGIN blocks in code that is require'd isn't clear to
me. But that should be easy to figure out with a couple of
experiments. (Left as an excersise for the eager reader).

Read the documentation for ${^GLOBAL_PHASE} in the perlvar manual page.

//Makholm


------------------------------

Date: Tue, 25 Nov 2014 17:14:48 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: [Q] How does a sub know it is called during compilation?
Message-Id: <m526ca$2g69$1@news.ntua.gr>

On 25/11/2014 14:27, G.B. wrote:
> Hi,
>
> a few Perl scripts I need to maintain and improve have
>

I do not know if it helps but a an object can tell you where belongs in 
order to decide what to do




#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Digest::MD5;
my $md5 = Digest::MD5->new;
my $obj = SomeClass->new;

say "my class       : ". __PACKAGE__;
say "class of \$obj : ". ref $obj;
say "class of \$md5 : ". ref $md5;


BEGIN  {
package SomeClass;

	sub new {
	my $class = ref shift || __PACKAGE__;
	my $self  = {};	
	bless $self, $class;	
	return $self
	}

1;}# END OF CLASS



------------------------------

Date: Tue, 25 Nov 2014 15:28:20 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: [Q] How does a sub know it is called during compilation?
Message-Id: <873897e1uz.fsf@doppelsaurus.mobileactivedefense.com>

"G.B." <bauhaus@futureapps.invalid> writes:
> a few Perl scripts I need to maintain and improve have

[...]

> The background is a mod_perl2 environment, and the "other
> objects" mentioned above include the server object and very
> likely I/O handles. I am assuming compilation stage because of
> occurrences of sigkill in the server log and some silent deaths
> during testing.

Two a little more general remarks (which may or may not be useful)

	- Apache aggressively kills subordinate processes it considers
          to be 'behaving strangely', eg, take too long to execute or
          close their stdin input file descriptor

	- 'silent deaths' usually mean the interpreter encountered a
          fatal runtime error, eg, an undefined subroutine or method,
          and the stderr output went into the void. A 'global' execption
          handler (eval {}) which catches everything and deposits its
          stringification in a place guaranteed to exist might help
          here. Even 'during compilation' as the eval could be wrapped
          around a require pulling in the actual code.


------------------------------

Date: Mon, 24 Nov 2014 12:31:27 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <m4v1d1$2tua$1@news.ntua.gr>

On 24/11/2014 10:48, Robbie Hatley wrote:
>
> On 11/23/2014 03:02 PM, George Mpouras wrote:
>
>> did you read my answer at your previous thread ?
>
> Yes. I thought I answered this one already. Ah, I see, I
> clicked "reply" instead of "followup" so you alone got it.
> (Short recap for everyone else: I basically said "no" to
> databases, checksums, and CPAN modules, saying I prefered
> to write this program "manually" as a "learning Perl"
> exercise.)
>
>> By the way if you want to do that you have to open both
>> files in binary mode and seek/read/compare buffer in parallel
>> until their first difference or their end.
>
> Thanks for your mention of "read"; that's the function
> I didn't know I needed to get this program up and running!
> Turns out I didn't need "seek"; just open (in "raw" mode),
> read to end, compare bytes (had to use "ord()" to get the
> numerical value of each byte), and close.
>
>> ... but it is something completely useless ...
>
> I find it quite useful.
>
> I've now got my "dedup" program working correctly up to
> the point of alerting the user as to whether each pair
> of same-size files are "identical" or "different".
>


you should have change the .. microsoft.foo  to ... microsoft.com . 
Whatever.
If you want to do it correctly rewrite it and apply the following rules 
from top to bottom. At the first rule match the files are different and 
should skip at the next file.


1) file sizes differs
3) the md5 of the first    1024 bytes differs
4) the md5 of the last     1024 bytes differs
5) the md5 of the "middle" 1024 bytes differs
6) the md5 of the whole files are differs

Implement your code as parallel queues (or threads if you prefer) with 
the same number as CPU cores. Every queue will examine a different set 
of files.

All queues must update/read the same "parent" data structure or database 
connection










------------------------------

Date: Mon, 24 Nov 2014 12:35:03 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <7ohbkb-asc.ln1@news.rtij.nl>

On Mon, 24 Nov 2014 01:07:35 -0800, Robbie Hatley wrote:

> On 11/23/2014 11:39 PM, Martijn Lievaart wrote:
> 
>  > On Sun, 23 Nov 2014 14:49:41 -0800, Robbie Hatley wrote:
>  >
>  > > ... how to read two open files, synchronously, byte by byte. How
>  > > would I go about doing that?
>  >
>  > You don't as it is slow as molasseses. Open the files binary and use
>  > sysread to read large chunks, then compare the chunks.
> 
> According to "Programming Perl", Perl's "read()" uses a buffer between
> itself and the operating system's "read()", so I don't think it actually
> sends 1-byte requests to the OS. More likely 4096 bytes or something
> like that. So if a program compares two 15,000byte files, Perl likely
> sends 3 reads to the OS,
> not 15,000 reads.

True, but it is still slow as you compare byte by byte. Much better to 
compare large chunks, waaaay faster. Perl eq operator will do this nicely 
for you, it can compare megabytes efficiently.

Normally I would say, make it correct first, optimize later. But in this 
case I can tell you that the optimize later will be mostly a rewrite 
anyway and that experience tells that comparing by bytes is always slower 
that using specialized functions (in this case the eq operator).

But see for yourself, write both and benchmark them. You will be 
surprised at the difference.

And you have to open binary anyway, or you will be at the mercy of 
whatever encoding the file is in and what your program expects. If the 
file is a binary file (an executable f.i.), you may well run into invalid 
UTF8, if that is what your program expects. (And if you open the file in 
binary, sysread is the logical choice to read the file anyway.)

And a minor issue, on Windows iirc, both "\r\n" and "\n" are translated 
in text mode to "\n". Do you want them to compare the same?

Rule of thumb. If you want to process text files, use open and <>. If you 
want to process binary files, use sysopen and sysread.

> 
> But if I run into trouble comparing mid-sized files (such as 7MB mp3
> files) or large files (such as 4000MB Linux install DVD images), I'll
> try switching over to sysread and try to tinker with buffer sizes.
> 
>  > But as others already have stated, this may not be the best way to do
>  > it when there are more then two files to byte wise compare.
> 
> I think it's the best way for the purpose of a general-purpose script
> which one does not want to be dependent on databases and checksums
> (which may not exist), and which one wants to be portable to multiple
> platforms (Windows and Linux) with no modifications.

I would expect such a program to do something like this:

Make a list of all files
Sort the list by filesize
for each size in the list
  if the number of files > 5 # note you may want to tune this threshold
    calculate a checksum for these files
    explicitly compare the files where the checksum matches
  else
    explicitly compare those files
 
The algorithm with the checksum (the if part) runs in O(n), the else part 
runs in O(n*n). If n gets larger, the checksummed algorithm gets faster 
quick. For lower values of n, files may be processed twice making it 
slower than the simple compare algorithm.

[ Now if you want to do this on a regular basis, you could store the 
checksums in a hidden file and recalculate only if the file was modified 
since. However, this is completely optional ]

HTH,
M4


------------------------------

Date: Mon, 24 Nov 2014 15:08:04 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <8761e4iqln.fsf@doppelsaurus.mobileactivedefense.com>

Robbie Hatley <see.my.sig@for.my.address> writes:
> On 11/23/2014 11:39 PM, Martijn Lievaart wrote:
>> On Sun, 23 Nov 2014 14:49:41 -0800, Robbie Hatley wrote:
>>
>> > ... how to read two open files, synchronously, byte
>> > by byte. How would I go about doing that?
>>
>> You don't as it is slow as molasseses. Open the files binary
>> and use sysread to read large chunks, then compare the chunks.
>
> According to "Programming Perl", Perl's "read()" uses a buffer
> between itself and the operating system's "read()", so I don't
> think it actually sends 1-byte requests to the OS. More likely
> 4096 bytes or something like that. So if a program compares
> two 15,000byte files, Perl likely sends 3 reads to the OS,
> not 15,000 reads.

The speed difference is nevertheless dramatical. I ran the code below
with 8K input files filled with zeroes and the 'blocks' variant ran at
about 171 times the speed of the other despite the code isn't more
complicated:

-----------------
use Benchmark;

timethese(-3,
	  {
	   blocks => sub {
	      my ($fh0, $fh1, $d0, $d1);

	      open($fh0, '<', $ARGV[0]);
	      open($fh1, '<', $ARGV[1]);

	      while (sysread($fh0, $d0, 8192)) {
		  sysread($fh1, $d1, 8192);
		  last unless $d0 eq $d1;
	      }
	  },

	  bytes => sub {
	      my ($fh0, $fh1, $b0, $b1);

	      open($fh0, '<', $ARGV[0]);
	      open($fh1, '<', $ARGV[1]);

	      while (read($fh0, $b0, 1)) {
		  read($fh1, $b1, 1);
		  last unless $b0 eq $b1;
	      }
	  }});
----------------

Generally, using the built-in buffering is only sensible if

	a) developer time matters more than program runtime

        b) data is processed in units of differently sized records whose
           length isn't known in advance aka 'dealing with lines of
           text'

NB: I'm intentionally ignoring the 'STREAMS for Perl' features.


------------------------------

Date: Mon, 24 Nov 2014 14:33:55 -0600
From: John Black <jblack@nospam.com>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <MPG.2edd522c56e5e0079897fc@news.eternal-september.org>

In article <7ohbkb-asc.ln1@news.rtij.nl>, m@rtij.nl.invlalid says...
> And a minor issue, on Windows iirc, both "\r\n" and "\n" are translated 
> in text mode to "\n". Do you want them to compare the same?
> 

If the general strategy is to compare only those files whose lengths are the same, then these 
cannot compare the same, right?

John Black


------------------------------

Date: Mon, 24 Nov 2014 21:31:33 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <20141124132840.494@kylheku.com>

On 2014-11-24, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
> On Mon, 24 Nov 2014 01:07:35 -0800, Robbie Hatley wrote:
>
>> On 11/23/2014 11:39 PM, Martijn Lievaart wrote:
>> 
>>  > On Sun, 23 Nov 2014 14:49:41 -0800, Robbie Hatley wrote:
>>  >
>>  > > ... how to read two open files, synchronously, byte by byte. How
>>  > > would I go about doing that?
>>  >
>>  > You don't as it is slow as molasseses. Open the files binary and use
>>  > sysread to read large chunks, then compare the chunks.
>> 
>> According to "Programming Perl", Perl's "read()" uses a buffer between
>> itself and the operating system's "read()", so I don't think it actually
>> sends 1-byte requests to the OS. More likely 4096 bytes or something
>> like that. So if a program compares two 15,000byte files, Perl likely
>> sends 3 reads to the OS,
>> not 15,000 reads.
>
> True, but it is still slow as you compare byte by byte. Much better to 
> compare large chunks, waaaay faster. Perl eq operator will do this nicely 
> for you, it can compare megabytes efficiently.
>
> Normally I would say, make it correct first, optimize later. But in this 
> case I can tell you that the optimize later will be mostly a rewrite 
> anyway and that experience tells that comparing by bytes is always slower 
> that using specialized functions (in this case the eq operator).
>
> But see for yourself, write both and benchmark them. You will be 
> surprised at the difference.
>
> And you have to open binary anyway, or you will be at the mercy of 
> whatever encoding the file is in and what your program expects. If the 
> file is a binary file (an executable f.i.), you may well run into invalid 
> UTF8, if that is what your program expects. (And if you open the file in 
> binary, sysread is the logical choice to read the file anyway.)
>
> And a minor issue, on Windows iirc, both "\r\n" and "\n" are translated 
> in text mode to "\n". Do you want them to compare the same?

Although the subject line says "byte by byte", in the OP's article
body the phrase "byte for byte" appears, which has a different meaning.

A "byte by byte" comparison could indeed take line endings into account;
it just refers to looping over bytes.  Whereas "byte for byte" comparsion means
that the corresponding bytes form the two sources have to match.


------------------------------

Date: Mon, 24 Nov 2014 13:41:41 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: open and pipes
Message-Id: <87h9xowwa2.fsf@doppelsaurus.mobileactivedefense.com>

Kiuhnm Mnhuik <gandalf23@mail.com> writes:
> I'm not a Perl programmer, but I need to understand a command
> injection example. I looked up the open function in the
> documentation. AFAIK, a pipe can be put before or after a command, but
> what does the pipe in the middle mean here?
>
>   open($myhandle, "/a/b/|ls|")

It's a shell meta-character and because of this, this will end up
executing

sh -c '/a/b|ls'

which means the shell will try to start /a/b as a program whose output
is supposed to be fed into ls whose output ends up being available to
perl.

'Executable example': Create a file named noise with the following
content:

---------
#!/bin/sh
echo Noise! >/dev/tty
---------

and make it executable (chmod +x noise). The following will happen when
opening a similarly constructed filename:

[rw@doppelsaurus]/tmp#perl -e 'open($fh, "./noise|ls|")'
[rw@doppelsaurus]/tmp#Noise!


------------------------------

Date: Mon, 24 Nov 2014 07:59:28 -0800 (PST)
From: Kiuhnm Mnhuik <gandalf23@mail.com>
Subject: Re: open and pipes
Message-Id: <e2cde1b4-ce11-4ffe-8e7c-68c05e2e5b13@googlegroups.com>

On Monday, November 24, 2014 2:41:46 PM UTC+1, Rainer Weikusat wrote:
> Kiuhnm Mnhuik <gandalf23@mail.com> writes:
> > I'm not a Perl programmer, but I need to understand a command
> > injection example. I looked up the open function in the
> > documentation. AFAIK, a pipe can be put before or after a command, but
> > what does the pipe in the middle mean here?
> >
> >   open($myhandle, "/a/b/|ls|")
> 
> It's a shell meta-character and because of this, this will end up
> executing
> 
> sh -c '/a/b|ls'
> 
> which means the shell will try to start /a/b as a program whose output
> is supposed to be fed into ls whose output ends up being available to
> perl.

That's what I thought, but the problem is that /a/b doesn't exist. Maybe Linux doesn't care and proceed anyway? I'm on Windows right now so I can't try it out.


------------------------------

Date: Mon, 24 Nov 2014 16:08:38 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: open and pipes
Message-Id: <87y4r0h989.fsf@doppelsaurus.mobileactivedefense.com>

Kiuhnm Mnhuik <gandalf23@mail.com> writes:
> On Monday, November 24, 2014 2:41:46 PM UTC+1, Rainer Weikusat wrote:
>> Kiuhnm Mnhuik <gandalf23@mail.com> writes:
>> > I'm not a Perl programmer, but I need to understand a command
>> > injection example. I looked up the open function in the
>> > documentation. AFAIK, a pipe can be put before or after a command, but
>> > what does the pipe in the middle mean here?
>> >
>> >   open($myhandle, "/a/b/|ls|")
>> 
>> It's a shell meta-character and because of this, this will end up
>> executing
>> 
>> sh -c '/a/b|ls'
>> 
>> which means the shell will try to start /a/b as a program whose output
>> is supposed to be fed into ls whose output ends up being available to
>> perl.
>
> That's what I thought, but the problem is that /a/b doesn't exist.

That's an example. 

> Maybe Linux doesn't care and proceed anyway?

Because that's just an example, it shows you what would happen were it
an existing command. Apart from that, the pipeline operates 'as
intended' as ls doesn't care for its standard input but that's not the
point.


------------------------------

Date: Mon, 24 Nov 2014 09:20:56 -0800 (PST)
From: Kiuhnm Mnhuik <gandalf23@mail.com>
Subject: Re: open and pipes
Message-Id: <480728f6-98cb-4908-b6a9-ff406e8b1d47@googlegroups.com>

On Monday, November 24, 2014 5:08:43 PM UTC+1, Rainer Weikusat wrote:
> Kiuhnm Mnhuik <gandalf23@mail.com> writes:
> > On Monday, November 24, 2014 2:41:46 PM UTC+1, Rainer Weikusat wrote:
> >> Kiuhnm Mnhuik <gandalf23@mail.com> writes:
> >> > I'm not a Perl programmer, but I need to understand a command
> >> > injection example. I looked up the open function in the
> >> > documentation. AFAIK, a pipe can be put before or after a command, but
> >> > what does the pipe in the middle mean here?
> >> >
> >> >   open($myhandle, "/a/b/|ls|")
> >> 
> >> It's a shell meta-character and because of this, this will end up
> >> executing
> >> 
> >> sh -c '/a/b|ls'
> >> 
> >> which means the shell will try to start /a/b as a program whose output
> >> is supposed to be fed into ls whose output ends up being available to
> >> perl.
> >
> > That's what I thought, but the problem is that /a/b doesn't exist.
> 
> That's an example. 
> 
> > Maybe Linux doesn't care and proceed anyway?
> 
> Because that's just an example, it shows you what would happen were it
> an existing command. Apart from that, the pipeline operates 'as
> intended' as ls doesn't care for its standard input but that's not the
> point.

Mine wasn't just an example. I'm doing some challenges on https://www.hackthissite.org and I'm positive that /a/b/ doesn't exist.
You know what, I'll try it in a Linux VM...
Done. It works.
I tried with "/a/b/|ls" and it says:
  bash: /a/b/: No such file or directory
  Desktop
so it works.
It doesn't work in Windows and I thought it was the same in Linux.
Anyway, thanks for the help.


------------------------------

Date: Mon, 24 Nov 2014 12:01:08 +0200
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: push/shift/keys/... on refs (was: A hash of references to arrays of references to hashes... is there a better way?)
Message-Id: <slrnm760f4.5p3.whynot@orphan.zombinet>

with <slrnm73qmm.pgj.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:
> On 2014-11-23 07:55, Eric Pozharski <whynot@pozharski.name> wrote:
>> with <slrnm7107n.l3u.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:

*SKIP*
>>>     push $CurDirFiles{$Size}, ...
>>> which eliminates some line noise. I think it's still considered
>>> experimental, though.)
>> AFAIK it's deprecated alright in latest versions.
> What? That's one of the most useful additions to the Perl language in
> recent years. It makes my code a lot less cluttered. I really hope you
> are wrong.  Source?

Well, I was fascinated too (although for different reason).  I've seen
some noise on deprecation (probably, IRC) and that was a final straw
that made me subscribed to p5p (lurking only).  AAMOF I'm not interested
to verify the fact of deprecation as I'm interested in a story how
exactly the feature came *in*.  From what I've seen the feature has
affected parser.  Thus -- it's out.  I'm kind of busy these days, unless
I'll forget I'll do some fact checking.  Stay tuned.

*CUT*

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4317
***************************************


home help back first fref pref prev next nref lref last post