[32729] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3993 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jul 17 18:09:34 2013

Date: Wed, 17 Jul 2013 15:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 17 Jul 2013     Volume: 11 Number: 3993

Today's topics:
    Re: [OT] engineering <rweikusat@mssgmbh.com>
    Re: [OT] engineering <oneingray@gmail.com>
    Re: [OT] engineering <rweikusat@mssgmbh.com>
    Re: [OT] engineering <rweikusat@mssgmbh.com>
        names, values, boxes and microchips <rweikusat@mssgmbh.com>
        Open source Catalyst applications, for study? <allergic-to-spam@no-spam-allowed.invalid>
    Re: the fastest way to create a directory <hjp-usenet3@hjp.at>
    Re: the fastest way to create a directory <nospam.gravitalsun.noadsplease@hotmail.noads.com>
    Re: the fastest way to create a directory <nospam.gravitalsun.noadsplease@hotmail.noads.com>
    Re: the fastest way to create a directory <hjp-usenet3@hjp.at>
    Re: the fastest way to create a directory <rweikusat@mssgmbh.com>
    Re: the fastest way to create a directory <hjp-usenet3@hjp.at>
    Re: the fastest way to create a directory <nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
    Re: the fastest way to create a directory <nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
    Re: the fastest way to create a directory (Tim McDaniel)
    Re: the fastest way to create a directory <rweikusat@mssgmbh.com>
    Re: the fastest way to create a directory <rweikusat@mssgmbh.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 16 Jul 2013 21:49:57 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: [OT] engineering
Message-Id: <8761wai616.fsf@sapphire.mobileactivedefense.com>

Ivan Shmakov <oneingray@gmail.com> writes:
>>>>>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>>>>>> Ivan Shmakov <oneingray@gmail.com> writes:
>
> [...]
>
>  >>> When people can't have multiple disjunct sets of variables used by
>  >>> unrelated parts of the same 'aggregate subroutine', they do what
>  >>> they should be doing instead, namely, structure their code.
>
>  >> I disagree.  I deem the use of nested scopes as crucial to code
>  >> structuring.  Should the "roles" of the variables (whether input,
>  >> output, or local) become apparent later, it'd be trivial to split
>  >> the function, -- and that's likely to be done exactly along the
>  >> scope boundaries previously coded in.
>
>  > To quote my boss: Make it work now quickly and clean it up later :-).
>  > Which means: Take some existing code which performs a more-or-less
>  > related task, copy'n'paste it to some part of the countryside where
>  > no trenches have been dug yet or the old ones worn out over time,
>  > create a new nested scope lest all hell breaks lose because any
>  > accidental interaction with the surrounings, any details about them
>  > long lost in the land of ancient lore,
>
> [...]
>
> 	... Once, I will find the patience to wait for the food
> 	engineers out there to design a sound nutritional solution.
>
> 	Meanwhile, I'm forced to rely on the off-the-shelf products,
> 	which are known to be full of undocumented features, deviate
> 	from the specifications every now and then, and (while I'm yet
> 	to see one myself) are reported to contain actual bugs...

Minus some obvious misconceptions (eg, the 'off the shelf' food is
designed by 'food engineers' to be 'soundly nutrirional', ie, contain
everything fashion currently demands that it should and not contain
anything fashion demands that it currently mustn't, while less
sophisticated people like me get by somehow with vegetables, meat,
spices and tools to prepare these in some completely 'unscientific'
way), I have no idea what this was supposed to mean.


------------------------------

Date: Wed, 17 Jul 2013 09:27:29 +0000
From: Ivan Shmakov <oneingray@gmail.com>
Subject: Re: [OT] engineering
Message-Id: <87a9llv8n2.fsf@violet.siamics.net>

>>>>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>>>>> Ivan Shmakov <oneingray@gmail.com> writes:
>>>>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:

[...]

 >>> To quote my boss: Make it work now quickly and clean it up later

[...]

 >> ... Once, I will find the patience to wait for the food engineers
 >> out there to design a sound nutritional solution.

 >> Meanwhile, I'm forced to rely on the off-the-shelf products, which
 >> are known to be full of undocumented features, deviate from the
 >> specifications every now and then, and (while I'm yet to see one
 >> myself) are reported to contain actual bugs...

 > Minus some obvious misconceptions (eg, the 'off the shelf' food is
 > designed by 'food engineers' to be 'soundly nutrirional', ie, contain
 > everything fashion currently demands that it should and not contain
 > anything fashion demands that it currently mustn't,

	The nutritional requirements of an average healthy adult are
	more or less well-known (check, e. g., [1]), and do not depend
	much on "fashion," whatever one's misconceptions may be.

[1] http://www.iom.edu/Global/News%20Announcements/~/media/Files/Activity%20Files/Nutrition/DRIs/DRI_Summary_Listing.pdf

 > while less sophisticated people like me get by somehow with
 > vegetables, meat, spices

	(... Except that all of the above were "engineered," one way or
	the other.)

 > and tools to prepare these in some completely 'unscientific' way),

	The "food engineers" of today have learned that they have to
	make food "tasty", not "healthy," in order to succeed.  Which
	more or less corresponds to what I may otherwise call an
	"unscientific" way.

	(Not that there's much difference to "software engineers" in
	this respect.)

 > I have no idea what this was supposed to mean.

	My point is simple: if the deadline is today, one has to forget
	about "science" (be it Wirth's, Borlaug's, or someone's else),
	and use whatever "ingredients" available to solve the task at
	hand.  Be it a program, or a dinner.

	And using nested scopes is as beneficial to writing software,
	as washing one's hands is to preparing food.

-- 
FSF associate member #7257


------------------------------

Date: Wed, 17 Jul 2013 15:12:30 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: [OT] engineering
Message-Id: <878v15z35d.fsf@sapphire.mobileactivedefense.com>

The text below is only remotely concerned with software engineering
and some parts of it may be seriously offensive to some people or
groups of people.

Ivan Shmakov <oneingray@gmail.com> writes:
>>>>>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>>>>>> Ivan Shmakov <oneingray@gmail.com> writes:
>>>>>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>
> [...]
>
>  >>> To quote my boss: Make it work now quickly and clean it up later
>
> [...]
>
>  >> ... Once, I will find the patience to wait for the food engineers
>  >> out there to design a sound nutritional solution.
>
>  >> Meanwhile, I'm forced to rely on the off-the-shelf products, which
>  >> are known to be full of undocumented features, deviate from the
>  >> specifications every now and then, and (while I'm yet to see one
>  >> myself) are reported to contain actual bugs...
>
>  > Minus some obvious misconceptions (eg, the 'off the shelf' food is
>  > designed by 'food engineers' to be 'soundly nutrirional', ie, contain
>  > everything fashion currently demands that it should and not contain
>  > anything fashion demands that it currently mustn't,
>
> 	The nutritional requirements of an average healthy adult are
> 	more or less well-known (check, e. g., [1]), and do not depend
> 	much on "fashion," whatever one's misconceptions may be.
>
> [1] http://www.iom.edu/Global/News%20Announcements/~/media/Files/Activity%20Files/Nutrition/DRIs/DRI_Summary_Listing.pdf

I'm sorry if I didn't pay proper respect to your preferred
(mis-)conception, but there are simply to many of them, even when just
counting 'current' ones which include books of impressively-looking
tables. Prior to Mad Cow Disease, nutrional requirements of cows were
already well-known. Do we really have a surge of ideologically blinded
suicide bombers nowadays? Or maybe Mad Muslim Disease caused by an
unfortunate diet combining with unfortunate circumstances?

>  > while less sophisticated people like me get by somehow with
>  > vegetables, meat, spices
>
> 	(... Except that all of the above were "engineered," one way or
> 	the other.)

Indeed. I remember an old joke which went roughly like this: Imagine
there's a mathematician on a strange planet wholly covered with gras
and the only other living being is a single sheep. The mathematician
is to catch this sheep, how does he proceed? Answer: He builds a
fence around himself and defines the place occupied by him as
'outside'.

With the help of a suitable set of definitions, any term can
be interpreted to mean anything, at the expense of rendering
meaningful communication impossible (which may be desired).

>  > and tools to prepare these in some completely 'unscientific' way),
>
> 	The "food engineers" of today have learned that they have to
> 	make food "tasty", not "healthy," in order to succeed.  Which
> 	more or less corresponds to what I may otherwise call an
> 	"unscientific" way.

The purpose of 'the sense of taste' is to enable distinction between
'healthy' and 'unhealthy' things one could possibly eat. It works
better for horses because these tend to approach the matter
empirically and with an unprejudiced mind, something humans, especially
humans wielding statistics, rarely do.

>  > I have no idea what this was supposed to mean.
>
> 	My point is simple: if the deadline is today, one has to forget
> 	about "science" (be it Wirth's, Borlaug's, or someone's else),
> 	and use whatever "ingredients" available to solve the task at
> 	hand.  Be it a program, or a dinner.

That just a convienent justification the proverbial old poodle uses in
order to defend against the supposition of having to learn new tricks:
Whatever the benefits might be, I've got no time for this ATM, I'm to
busy performing the old ones, constantly working around their
deificiencies, and won't ever have any time for that, either.


------------------------------

Date: Wed, 17 Jul 2013 15:53:57 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: [OT] engineering
Message-Id: <8738rdz18a.fsf@sapphire.mobileactivedefense.com>

Ivan Shmakov <oneingray@gmail.com> writes:

[...]

> 	My point is simple: if the deadline is today, one has to forget
> 	about "science" (be it Wirth's, Borlaug's, or someone's else),
> 	and use whatever "ingredients" available to solve the task at
> 	hand.  Be it a program, or a dinner.

I also wrote a longer reply to this which I shouldn't have posted
(which will continue to be available via servers not processing
cancels) but this is the more interesting part: According to my
experience, writing 'bad' code (for a suitable definition of 'bad')
doesn't take less time than writing 'good' code (for a suitable
definition of 'good') to begin with and 'corners cut in order to be
able to present something superficially suitable to $superior now'
will come back to haunt you, ie, when taking the time necessary for
repairs/ maintenance and the time wasted trying to puzzle out the
meaning of 'the usual thicket' into account, 'bad code' needs a lot
more time overall.




------------------------------

Date: Tue, 16 Jul 2013 21:45:59 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: names, values, boxes and microchips
Message-Id: <87a9lmi67s.fsf@sapphire.mobileactivedefense.com>

Instead of trying another explanation of the 'it is a name given to a
value' versus 'it is a register' issue, I thought I should rather try
an example.

Introductory remarks: The subroutine whose code is included below
takes two sorted lists of 'DNS [reverse] domain objects' as arguments
(represented as references to arrays[*]) and is supposed to produce an
output list of all objects on the first input list which don't
'conflict' with the items on the filter list. One application this is
put to is to determine which RFC1918 reverse zones still need to be
'terminated' on some DNS server, given a list of reverse domains
configured by a customer which may cover some or all of the RFC1918
address space(s).

The subroutine could be regarded as a software emulation of a
seriously 'high-level' special purpose IC performing the same
operation. In particular, it has a set of 'working registers'
(represented as my variables) whose contents change as the algorithm
proceeds. This is something totally different from 'assigning a
[transient] name to a value' and not at all closely related to
'programming a general purpose IC using its machine language'.

The return value of the net_compare method call is stored using list
assignment because this method returns a list of two values in some
cases, the first of which is interesting to this subroutine ('is it
before or behind') the second ('is it immediately adjacent to the
other object') is used by another subroutine.

	[*] Contrary to another, immensely popular myth, using linked
	lists instead of arrays, based on using anonymous 2-element
	arrays for representing 'cons cells', is actually faster for
	algorithms like this, especially if the lists become
	large. For this particular case, they're expected to be small
	and the operation performed infrequently (whenever someone
	changes the configuration in the GUI), hence, the more
	convenient approach was chosen.

sub filter_against
{
    my ($in, $filter) = @_;
    my ($in_item, $in_pos, $filter_item, $filter_pos);
    my (@out, $step_in, $step_filter, $rc);

    $in_pos = 0;
    $in_item = $in->[0];
    $step_in = sub {
	last LOOP if ++$in_pos == @$in;
	$in_item = $in->[$in_pos];
    };

    $filter_pos = 0;
    $filter_item = $filter->[0];
    $step_filter = sub {
	last LOOP if ++$filter_pos == @$filter;
	$filter_item = $filter->[$filter_pos];
    };

    LOOP: {
	($rc) = $in_item->net_compare($filter_item);

	p_alloc('in %s, filter %s, rc %u',
		$in_item, $filter_item, $rc);
	
	given ($rc) {
	    when (R_AFTER) {
		push(@out, $in_item);
		&$step_in;
	    }

	    when ([R_BEFORE, R_SUB]) {
		&$step_filter;
	    }

	    when ([R_SAME, R_SUPER]) {
		p_alloc('%s: dropping %s (%s)', __func__,
			$in_item, $filter_item);
		
		&$step_in;
	    }
	}

	redo;
    }

    push(@out, @$in[$in_pos .. $#$in]);
    return \@out;
}

[This code is copyrighted by my employer and cited here for
educational purposes].


------------------------------

Date: Wed, 17 Jul 2013 00:17:08 +0000 (UTC)
From: Jim Cochrane <allergic-to-spam@no-spam-allowed.invalid>
Subject: Open source Catalyst applications, for study?
Message-Id: <slrnkubok4.em6.allergic-to-spam@no-spam-allowed.invalid>

I'm looking for some well-designed, relatively modern applications that
use the Catalyst web framework, with source code available.  My goal is
to study these applications to learn more about current good practices
in designing and implementing a Catalyst application.  I'd appreciate
any pointers to such applications.

Also, if you know of a newsgroup that is a more appropriate place to
post this question, please let me know.


Thanks!


------------------------------

Date: Tue, 16 Jul 2013 23:44:15 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: the fastest way to create a directory
Message-Id: <slrnkubflf.ef9.hjp-usenet3@hrunkner.hjp.at>

On 2013-07-15 22:40, Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>:
>> Create a directory with all upper directories if missing.
>> it uses the minimum possible disk access and checks.
>> 
>> Mkdir_recursive('/some/dir/d1/d2') or die;
>> 
>> sub Mkdir_recursive
>> {
>> return 1 if $_[0] eq '' || -d $_[0];
>> Mkdir_recursive( $_[0] =~/^(.*?)[\\\/][^\\\/]+$/ ) || return undef;
>> mkdir $_[0] || return undef
>> }
>
> You'd be better off calling mkdir blind and keying off $! if it fails.
> That way you save a stat in the case where the creation succeeds.

That shouldn't make a noticeable difference. If the stat does cause any
disk accesses, those would also have been caused by the mkdir, and if it
doesn't (i.e. everything is already in the cache) the time for the stat
calls is completely swamped by the mkdir's. 

To my surprise the second loop of my test program seems actually to be
a bit faster with a blind mkdir, but the difference is less than the
variability, so I'd need more runs to see if the difference is
significant.

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpat. -- Ralph Babel


------------------------------

Date: Wed, 17 Jul 2013 11:09:29 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
To: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: the fastest way to create a directory
Message-Id: <51E65139.2010905@hotmail.noads.com>

Please write the code you test


------------------------------

Date: Wed, 17 Jul 2013 11:10:07 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Subject: Re: the fastest way to create a directory
Message-Id: <ks5jfi$1cgl$2@news.ntua.gr>

Please write the code you test


------------------------------

Date: Wed, 17 Jul 2013 10:24:34 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: the fastest way to create a directory
Message-Id: <slrnkucl65.e0g.hjp-usenet3@hrunkner.hjp.at>

On 2013-07-17 08:09, George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com> wrote:
> Please write the code you test

Please quote relevant parts of the postings you are replying to.

	hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpat. -- Ralph Babel


------------------------------

Date: Wed, 17 Jul 2013 10:49:16 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: the fastest way to create a directory
Message-Id: <87mwplo6sj.fsf@sapphire.mobileactivedefense.com>

"Peter J. Holzer" <hjp-usenet3@hjp.at> writes:
> On 2013-07-15 22:40, Ben Morrow <ben@morrow.me.uk> wrote:
>> Quoth George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>:
>>> Create a directory with all upper directories if missing.
>>> it uses the minimum possible disk access and checks.
>>> 
>>> Mkdir_recursive('/some/dir/d1/d2') or die;
>>> 
>>> sub Mkdir_recursive
>>> {
>>> return 1 if $_[0] eq '' || -d $_[0];
>>> Mkdir_recursive( $_[0] =~/^(.*?)[\\\/][^\\\/]+$/ ) || return undef;
>>> mkdir $_[0] || return undef
>>> }
>>
>> You'd be better off calling mkdir blind and keying off $! if it fails.
>> That way you save a stat in the case where the creation succeeds.
>
> That shouldn't make a noticeable difference. If the stat does cause any
> disk accesses, those would also have been caused by the mkdir, and if it
> doesn't (i.e. everything is already in the cache) the time for the stat
> calls is completely swamped by the mkdir's.

Both stat and mkdir are system calls and 'one system call' is going to
be faster than 'two system calls'. 'In certain situations' (eg, for
the FFS and UFS filesystems or one of the Linux ext? mounted with the
dirsync option), mkdir will be executed synchronously and then, it is
going to take a 'long' time (possibly, an infinitely/ arbitrary long
time for storage devices expected to fail routinely during normal
operation where the error handling method is 'retry until successful
or powered down' aka SSDs). But all these ought to be regarded as
corner case and "everything's in the cache" the general one and in
this case, Ben's suggestion is sensible if it is expected that
directories a more often created than not created.

I don't think I'd use recursion for that because an iterative
equivalent is still fairly simple, example

------------------
use Errno qw(EEXIST ENOTDIR);

sub mkdir_p
{
    my ($cur, $next, $remain);

    $remain = $_[0];
    while ($remain) {
	($next, $remain) = $remain =~ /^(\/*[^\/]*)(.*)/;
	$cur .= $next;
	
	mkdir($cur) or do {
	    return if $! != EEXIST;
	    next if $remain;
	    
	    -d($cur) or $! = ENOTDIR, return;
	};
    }

    return 1;
}

mkdir_p($ARGV[0]) or die("$!");
--------------------

NB: I didn't perform any benchmarks on this. But it works with
pathnames a la /////b/c///d/e//g while the original doesn't (OTOH, it
doesn't support MS-DOSE [German for 'tin'] but that's not my problem
:-).




------------------------------

Date: Wed, 17 Jul 2013 22:37:19 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: the fastest way to create a directory
Message-Id: <slrnkue03v.dn8.hjp-usenet3@hrunkner.hjp.at>

On 2013-07-17 09:49, Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> "Peter J. Holzer" <hjp-usenet3@hjp.at> writes:
>> On 2013-07-15 22:40, Ben Morrow <ben@morrow.me.uk> wrote:
>>> Quoth George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>:
>>>> sub Mkdir_recursive
>>>> {
>>>> return 1 if $_[0] eq '' || -d $_[0];
>>>> Mkdir_recursive( $_[0] =~/^(.*?)[\\\/][^\\\/]+$/ ) || return undef;
>>>> mkdir $_[0] || return undef
>>>> }
>>>
>>> You'd be better off calling mkdir blind and keying off $! if it fails.
>>> That way you save a stat in the case where the creation succeeds.
>>
>> That shouldn't make a noticeable difference. If the stat does cause any
>> disk accesses, those would also have been caused by the mkdir, and if it
>> doesn't (i.e. everything is already in the cache) the time for the stat
>> calls is completely swamped by the mkdir's.
>
> Both stat and mkdir are system calls and 'one system call' is going to
> be faster than 'two system calls'.

As stated, that's trivially untrue ('one call to exec(2)' will *not* be
faster than 'two calls to time(2)' except under pathological
circumstances), but even if I translate that into 'an additional system
call will take additional time', it's not necessarily true.

In this case I think that the stat system call would normally add a
little time but it will be completely swamped by the time spent in
mkdir:

Each successful mkdir call will cause at least 5 disk accesses on a
typical Linux file system: 1 for the journal, 2 for the inode and
content of the parent directory and 2 for the inode and content of the
new directory (Oh, I forgot the bitmaps, add another 2 or 4 ...). These
will happen *after* mkdir returns because of the writeback cache, and
the kernel will almost certainly succeed in coalescing at least some and
maybe many of those writes, but if you create a lot of directories
(George wrote about "thousands", in my tests I created about 150000)
these writes will eventually dominate.

Now, in addition to writing new blocks, where does the pair 
    stat($d); mkdir($d)
spend time? 

If the ancestor directories of $d are in cache (that would be the normal
case), both stat and mkdir will walk exactly the same in-memory
structure until they fail to find $d. So, yes, that part will be
uselessly duplicated, but it's very fast compared to actually writing a
new directory to the disk, so the extra time is negligible.

If the ancestor directories of $d are not in cache, stat will load them
into the cache, which may take a noticable time. But that time will then
be saved by mkdir which can now use the cache instead of loading the
directories itself: So again the difference is one walk through
in-memory structures, which is insignificant compared to loading the
structures from disk and then writing a new directory (which will happen
anyway).

The ratios will be different depending on the relative speed of RAM and
storage: Maybe SSDs are fast enough that the additional walk through the
cache is noticable, but I doubt it. Of course anybody is free to post
benchmark results to prove me wrong.

	hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpat. -- Ralph Babel


------------------------------

Date: Thu, 18 Jul 2013 00:14:19 +0300
From: "George Mpouras" <nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
Subject: Re: the fastest way to create a directory
Message-Id: <ks71fk$2rgk$1@news.ntua.gr>

your code is not good. It ALWAYS access the hard disk as many times as the 
number of upper dirs .
This completly uneccassery .
To check ti with your own eyes run:



#!/usr/bin/perl
use Errno qw(EEXIST ENOTDIR);
my $access=0;
mkdir_p("/opt/d1/d2/d3/d4/d5") or die("$!");
print "disk touches : $access\n";


sub mkdir_p
{
my ($cur, $next, $remain);
$remain = $_[0];

while ($remain) {
($next, $remain) = $remain =~ /^(\/*[^\/]*)(.*)/;
$cur .= $next;

$access++;

mkdir($cur) or do {
return if $! != EEXIST;
    next if $remain;

    -d($cur) or $! = ENOTDIR, return;
};
    }

    return 1;
} 



------------------------------

Date: Thu, 18 Jul 2013 00:16:03 +0300
From: "George Mpouras" <nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
Subject: Re: the fastest way to create a directory
Message-Id: <ks71is$2rtb$1@news.ntua.gr>



Ο "Peter J. Holzer"  έγραψε στο μήνυμα 
news:slrnku8saq.gkv.hjp-usenet3@hrunkner.hjp.at...

On 2013-07-15 21:04, George Mpouras 
<nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam> wrote:
> yes there are about thousands dirs. try to undestand the code. looks
> simple but it is not

What? It's trivial.


probably you understand nothing at all





------------------------------

Date: Wed, 17 Jul 2013 21:21:31 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: the fastest way to create a directory
Message-Id: <ks71sr$dta$1@reader2.panix.com>

In article <ks71fk$2rgk$1@news.ntua.gr>,
George Mpouras <nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam> wrote:
>your code is not good. It ALWAYS access the hard disk as many times as the 
>number of upper dirs .

I think you don't understand disk caching.  Any reasonable modern
system will cache data.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Wed, 17 Jul 2013 22:22:22 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: the fastest way to create a directory
Message-Id: <87vc48yj8x.fsf@sapphire.mobileactivedefense.com>

"Peter J. Holzer" <hjp-usenet3@hjp.at> writes:
> On 2013-07-17 09:49, Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>> "Peter J. Holzer" <hjp-usenet3@hjp.at> writes:
>>> On 2013-07-15 22:40, Ben Morrow <ben@morrow.me.uk> wrote:
>>>> Quoth George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>:
>>>>> sub Mkdir_recursive
>>>>> {
>>>>> return 1 if $_[0] eq '' || -d $_[0];
>>>>> Mkdir_recursive( $_[0] =~/^(.*?)[\\\/][^\\\/]+$/ ) || return undef;
>>>>> mkdir $_[0] || return undef
>>>>> }
>>>>
>>>> You'd be better off calling mkdir blind and keying off $! if it fails.
>>>> That way you save a stat in the case where the creation succeeds.
>>>
>>> That shouldn't make a noticeable difference. If the stat does cause any
>>> disk accesses, those would also have been caused by the mkdir, and if it
>>> doesn't (i.e. everything is already in the cache) the time for the stat
>>> calls is completely swamped by the mkdir's.
>>
>> Both stat and mkdir are system calls and 'one system call' is going to
>> be faster than 'two system calls'.
>
> As stated, that's trivially untrue ('one call to exec(2)' will *not* be
> faster than 'two calls to time(2)' except under pathological
> circumstances),

Yes. And a single call to pause(2) will take forever except 'under
pathological circumstances'. Another interesting (Linux-)system call would be
reboot because not only doesn't it ever return, it might even cause
the system to be powered down!

Morale: Thanks to inherent vagueness in natural language and the
sloppy ways people usually use it, anything can be interpreted in a
way which doesn't make sense.

> but even if I translate that into 'an additional system
> call will take additional time', it's not necessarily true.

What about trying the intended meaning: Calling stat and mkdir is
going to take more time than calling stat and not mkdir or mkdir and
not stat (the two system calls in question).

BTW: Should I immediately supply a fresh round of nonsensical
interpretations?

[...]

> Each successful mkdir call will cause at least 5 disk accesses on a
> typical Linux file system: 1 for the journal, 2 for the inode and
> content of the parent directory and 2 for the inode and content of the
> new directory (Oh, I forgot the bitmaps, add another 2 or 4 ...). These
> will happen *after* mkdir returns because of the writeback cache, and
> the kernel will almost certainly succeed in coalescing at least some and
> maybe many of those writes, but if you create a lot of directories
> (George wrote about "thousands", in my tests I created about 150000)
> these writes will eventually dominate.

[more of this]

As I already tried to communicate in the earlier posting: 'Disk I/O'
is in itself a pathological situation or at least one the kernel tries
very hard to avoid. Since it is going to be slower than pretty much
everything else, talking about the execution speed of different
algorithms becomes completely meaningless then. Consequently, it
should be disregarded when doing this.




------------------------------

Date: Wed, 17 Jul 2013 22:31:14 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: the fastest way to create a directory
Message-Id: <87oba0yiu5.fsf@sapphire.mobileactivedefense.com>

"George Mpouras"
<nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
writes:
> your code is not good. It ALWAYS access the hard disk as many times as
> the number of upper dirs .

Believe it or not but I understand how this algorithm works because I
wrote it. Anything like this depends heavily on what the expected
situation happens to be. Eg, if I run your code with an argument of
/tmp/a/b/c/d/e, assuming nothing except /tmp exists initially, it will
do the following system calls:

stat("/tmp/a/b/c/d/e", 0x602130)        = -1 ENOENT (No such file or directory)
stat("/tmp/a/b/c/d", 0x602130)          = -1 ENOENT (No such file or directory)
stat("/tmp/a/b/c", 0x602130)            = -1 ENOENT (No such file or directory)
stat("/tmp/a/b", 0x602130)              = -1 ENOENT (No such file or directory)
stat("/tmp/a", 0x602130)                = -1 ENOENT (No such file or directory)
stat("/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=94208, ...}) = 0
mkdir("/tmp/a", 0777)                   = 0
mkdir("/tmp/a/b", 0777)                 = 0
mkdir("/tmp/a/b/c", 0777)               = 0
mkdir("/tmp/a/b/c/d", 0777)             = 0
mkdir("/tmp/a/b/c/d/e", 0777)           = 0

OTOH, the mkdir_p I posted will just do

mkdir("/tmp", 0777)                     = -1 EEXIST (File exists)
mkdir("/tmp/a", 0777)                   = 0
mkdir("/tmp/a/b", 0777)                 = 0
mkdir("/tmp/a/b/c", 0777)               = 0
mkdir("/tmp/a/b/c/d", 0777)             = 0
mkdir("/tmp/a/b/c/d/e", 0777)           = 0


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3993
***************************************


home help back first fref pref prev next nref lref last post