[32746] in Perl-Users-Digest
Perl-Users Digest, Issue: 4010 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 8 16:09:27 2013
Date: Thu, 8 Aug 2013 13:09:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 8 Aug 2013 Volume: 11 Number: 4010
Today's topics:
Re: configuring STD* IO to use locale's encoding? (Seymour J.)
Re: fast scan <rweikusat@mssgmbh.com>
fork it <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Re: fork it <peter@makholm.net>
Re: fork it <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Re: fork it <kst-u@mib.org>
Re: fork it <derykus@gmail.com>
Re: fork it <derykus@gmail.com>
Re: Merge files <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Re: translate human-readable time shorthand (Seymour J.)
Re: translate human-readable time shorthand <hjp-usenet3@hjp.at>
Re: translate human-readable time shorthand <ben@morrow.me.uk>
Re: translate human-readable time shorthand <rweikusat@mssgmbh.com>
Re: translate human-readable time shorthand <rweikusat@mssgmbh.com>
Re: translate human-readable time shorthand <ben@morrow.me.uk>
Re: translate human-readable time shorthand <ben@morrow.me.uk>
Re: translate human-readable time shorthand <rweikusat@mssgmbh.com>
Re: translate human-readable time shorthand (Tim McDaniel)
Re: translate human-readable time shorthand <jimsgibson@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 07 Aug 2013 09:41:43 -0400
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: configuring STD* IO to use locale's encoding?
Message-Id: <52024e97$6$fuzhry+tra$mr2ice@news.patriot.net>
In <slrnl03sq8.icm.hjp-usenet3@hrunkner.hjp.at>, on 08/07/2013
at 09:12 AM, "Peter J. Holzer" <hjp-usenet3@hjp.at> said:
>???
The pod2ipf utilitye generates pragma documentation under the heading
"Pragmata: change Perl's behaviour"; I thought that the code came from
pod2latex, but it appears not.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Wed, 07 Aug 2013 14:43:25 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: fast scan
Message-Id: <87haf1d3b6.fsf@sapphire.mobileactivedefense.com>
Charles DeRykus <derykus@gmail.com> writes:
> On 8/6/2013 4:26 PM, Rainer Weikusat wrote:
>> Charles DeRykus <derykus@gmail.com> writes:
>>> On 8/6/2013 3:32 AM, Rainer Weikusat wrote:
>>>> Charles DeRykus <derykus@gmail.com> writes:
>>>>> On 8/5/2013 12:28 PM, Rainer Weikusat wrote:
>>>> ...
>>>>
>>>> The solution is really simply to rate-limit requests being sent
>>
>> ..
>>
>>> Overall send-rate is throttled with the configurable parallelism
>>> setting.
>>
>> 'Send 1000 requests as fast as you can, then, do nothing for ten
>> seconds' is not the same as 'continue sending a request every 0.01s
>> for 10 seconds': Again simplifying things, 'an ethernet is binary': At
>> any given time, it is either 'in use' or 'not in use'. The bulk send
>> means it is 'in use' for a relatively long period of time at the
>> beginning and will be 'in use' for a similarly long period of time as
>> soon as the replies start arriving. Otherwise, it will be 'in use' for
>> many short time periods and 'be available' in between (the same is
>> true for resources on the sending/ receiving host where it means 'be
>> available to deal with replies').
[...]
> POE auto-sizes various default settings including 'Parallelism'
> based on OS and other factors.
According to what I read yesterday, it sets it to the number which
results from dividing the socket buffer size by the 'assumed' size of
an ICMP echo request message. Naively, this can be thought of as the
number to send calls the application could make before being blocked
until space in the socket buffer is again available but this isn't
really true because as soons as there's data in the socket buffer, the
process of sending it and thus removing it from the buffer begins in
the background and the application can neither observe that (if it
isn't monitoring all outgoing ethernet frames in real-time) nor
predict how long the execution of its send calls will actually take:
The way to avoid blocking is to keep sending in non-blocking mode
until an EAGAIN error occurs and then continue sending once 'a
suitable system call' (for Perl, that would be select) indicated that
buffer space is again available. There's really no reason for using
this particular number except "I need some number. This is a
number. Therefore, I need that."
In an ideal scenario, this 'parallellism' idea would work such that
one starts by sending as many message without any delay as someone
guesses that network will be capable of handling and afterwards, the
algorithm auto-adjusts because each reception of a reply will trigger
the next send. Even this can already be considered as problematic
because its the opposite of the TCP 'slow start' behaviour: TCP is
supposed to start transmitting data slowly and increase the send rate
as information regarding what 'the network' can actually handle
becomes available'. This algorithm 'starts fast' and is supposed to
slow down if fast was actually too fast. Also, this ignores the fact
that 'sending requests as fast as we can' means 'replies arriving as
fast as they can' and they are more and more powerful than us: There's
no attempt at 'reply flow control'.
And real conditions are not ideal: Let's assume the local computer
starts with sending a batch of 100 requests within a millisecond. Now
what? TCP says than one should wait for twice the estimated RTT before
assuming that a message was lost in transit but what is 'the estimated
RTT'? There's no data available yet. Also, what if the batch send
happened at an unfortunate time and 95% of the message got dropped? Or
if the 'host list segment' in question happens to be a black hole in
the sense that the IP addresses it is composed of are unused? In both
cases, the application will now uselessly sit idle for some 'best
random guess' for a sensible RTO which may not have the slighest
correlation to the actual RTT.
I could go on for this for some paragraphs because this idea is
really less-than-brilliant in so many ways to mentioning them all is
likely to put off a reader to a sufficient degree that he rather
ignores the issue than works through half of a library ...
------------------------------
Date: Thu, 08 Aug 2013 14:55:55 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Subject: fork it
Message-Id: <ku00uv$ehh$1@news.ntua.gr>
The idea is to finish a "special" job as soon as possible by auto split
it and explicitly assign its parts to dedicated cores.
it worked with significant time benefits. Simplistic the idea is the
following
my @Threads;
foreach my $cpu (0 .. (qx[grep "processor" /proc/cpuinfo|wc -l] - 1))
{
my $answer = fork;
die "Could not fork because \"$^E\"\n" unless defined $answer;
if (0 == $answer)
{
print "I the thread $$ doing some parallel work\n";
for(;;){}
exit
}
else
{
push @Threads, $answer;
`/bin/taskset -pc $cpu $answer`;
}
}
print "Main program $$ waiting the threads: @Threads\n";
sleep 20;
foreach my $tid (@Threads) { kill(9,$tid) }
------------------------------
Date: Thu, 08 Aug 2013 15:06:18 +0200
From: Peter Makholm <peter@makholm.net>
Subject: Re: fork it
Message-Id: <87r4e4fi2d.fsf@vps1.hacking.dk>
George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
writes:
> The idea is to finish a "special" job as soon as possible by auto
> split it and explicitly assign its parts to dedicated cores.
Often it will be hard to reliably split the job into into a number of
chunks that precisely fits with the number of cores available.
Most of the time I split the task into natural chunks and then maintain
a queue of chunks to be processed. Then I fork a new process for each
chunk with some code to ensure that I only have $N jobs running at the
same time.
This scheme is implemented by Parallel::ForkManager available on CPAN.
https://metacpan.org/module/Parallel::ForkManager
I have never cared about pinning a task to a specific CPU. Most of my
task are inherently IO-bound and often running on servers doing other
work at the same time. Both issues that makes pinning less important, if
not right out bad.
//Makholm
------------------------------
Date: Thu, 08 Aug 2013 16:39:47 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Subject: Re: fork it
Message-Id: <ku071n$ua4$1@news.ntua.gr>
> This scheme is implemented by Parallel::ForkManager available on CPAN.
>
> https://metacpan.org/module/Parallel::ForkManager
this module is a only a fork wrapper that keep track of the threads
> I have never cared about pinning a task to a specific CPU. Most of my
> task are inherently IO-bound and often running on servers doing other
> work at the same time. Both issues that makes pinning less important, if
> not right out bad.
>
> //Makholm
>
it can be bad or no. It depends of the scenario
------------------------------
Date: Thu, 08 Aug 2013 11:15:56 -0700
From: Keith Thompson <kst-u@mib.org>
Subject: Re: fork it
Message-Id: <lnk3jwoxpf.fsf@nuthaus.mib.org>
George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
writes:
[...]
> my @Threads;
>
> foreach my $cpu (0 .. (qx[grep "processor" /proc/cpuinfo|wc -l] - 1))
> {
> my $answer = fork;
> die "Could not fork because \"$^E\"\n" unless defined $answer;
>
> if (0 == $answer)
> {
> print "I the thread $$ doing some parallel work\n";
> for(;;){}
> exit
> }
> else
> {
> push @Threads, $answer;
> `/bin/taskset -pc $cpu $answer`;
> }
> }
>
> print "Main program $$ waiting the threads: @Threads\n";
> sleep 20;
> foreach my $tid (@Threads) { kill(9,$tid) }
Consistent indentation would make this a lot easier to read.
`/bin/taskset -pc $cpu $answer`;
Since you're not using the output, this would be clearer as:
system("/bin/taskset -pc $cpu $answer");
or, even better:
system('/bin/taskset', '-pc', $cpu, $answer);
which avoids the overhead of invoking a shell.
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
------------------------------
Date: Thu, 08 Aug 2013 11:21:49 -0700
From: Charles DeRykus <derykus@gmail.com>
Subject: Re: fork it
Message-Id: <ku0nku$527$1@speranza.aioe.org>
On 8/8/2013 4:55 AM, George Mpouras wrote:
> The idea is to finish a "special" job as soon as possible by auto split
> it and explicitly assign its parts to dedicated cores.
> it worked with significant time benefits. Simplistic the idea is the
> following
> ...
>
> print "Main program $$ waiting the threads: @Threads\n";
> sleep 20;
> foreach my $tid (@Threads) { kill(9,$tid) }
^^^^^^^^^^^^
Why not first try waitpid to reap the forked processes normally before
resorting to SIGKILL...?
--
Charles DeRykus
------------------------------
Date: Thu, 08 Aug 2013 11:42:34 -0700
From: Charles DeRykus <derykus@gmail.com>
Subject: Re: fork it
Message-Id: <ku0orr$8g2$1@speranza.aioe.org>
On 8/8/2013 11:21 AM, Charles DeRykus wrote:
> On 8/8/2013 4:55 AM, George Mpouras wrote:
>> The idea is to finish a "special" job as soon as possible by auto split
>> it and explicitly assign its parts to dedicated cores.
>> it worked with significant time benefits. Simplistic the idea is the
>> following
>> ...
>>
>> print "Main program $$ waiting the threads: @Threads\n";
>> sleep 20;
>> foreach my $tid (@Threads) { kill(9,$tid) }
> ^^^^^^^^^^^^
>
> Why not first try waitpid to reap the forked processes normally before
> resorting to SIGKILL...?
>
Also, preferable to try SIGTERM first:
kill(TERM=>$tid) or kill(KILL=>$tid)
--
Charles DeRykus
------------------------------
Date: Thu, 08 Aug 2013 16:40:49 +0300
From: George Mpouras <nospam.gravitalsun.noadsplease@hotmail.noads.com>
Subject: Re: Merge files
Message-Id: <ku073m$ua4$2@news.ntua.gr>
a little bug correction
# Merge all tiles that exist in a directory to a big one.
# It tries to be clever by merging files to the biggest of them
# Also it prefers the newer files first to help any potential sort later
# MergeFiles( DIR => '/tmp' , OUTPUTFILE => /tmp/big' ) || die;
#
sub MergeFiles
{
my %option = @_;
exists $option{$_} || die "The \"$_\" argument is missing from ".(caller
0)[3]."\n" foreach qw/DIR OUTPUTFILE/;
opendir DIRFORMERGEFILES, $option{'DIR'} or return 0;
my @File;
while (my $name = readdir DIRFORMERGEFILES) {
my $path = "$option{'DIR'}/$name";
next unless -f $path;
push @File, [ $path , -s _ , -M _ ] }
closedir DIRFORMERGEFILES;
return 1 if -1 == $#File;
my @FileSorted;
for ( sort { $b->[1] <=> $a->[1] || $a->[2] <=> $b->[2] } @File )
{
push @FileSorted, $_->[0]
}
@File=();
if (scalar @FileSorted > 1)
{
# Put a final new line character at the bigger file we are going to
merge to in case it does not exist
my $data;
open BIGERFILETOMERGE, '<', $FileSorted[0] or return 0;
binmode BIGERFILETOMERGE, ':raw';
seek BIGERFILETOMERGE, -1 , 2;
read BIGERFILETOMERGE, $data , 1;
close BIGERFILETOMERGE;
my $the_bigger_file_a_final_new_line_character = $data eq chr 10 ? 1 : 0;
open BIGERFILETOMERGE, '>>', $FileSorted[0] or return 0;
print BIGERFILETOMERGE "\n" unless
$the_bigger_file_a_final_new_line_character;
for (my $i=1; $i < @FileSorted; $i++) {
open MERGETHISFILE, '<', $FileSorted[$i] or return 0;
while (<MERGETHISFILE>) { next if /^\s*$/; chomp; s/\s*$//; print
BIGERFILETOMERGE "$_\n" }
close MERGETHISFILE;
unlink $FileSorted[$i] || die "Could not delete file
\"$FileSorted[$i]\" because \"$^E\"\n" }
close BIGERFILETOMERGE;
}
unless ($FileSorted[0] eq $option{'OUTPUTFILE'}) {
rename($FileSorted[0], $option{OUTPUTFILE}) or die "Could not rename
file \"$FileSorted[0]\" to \"$option{OUTPUTFILE}\" because \"$^E\"\n" }
return 1
}
------------------------------
Date: Wed, 07 Aug 2013 08:50:34 -0400
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: translate human-readable time shorthand
Message-Id: <5202429a$5$fuzhry+tra$mr2ice@news.patriot.net>
In <ktsq35$tnt$1@news.ntua.gr>, on 08/07/2013
at 09:40 AM, George Mpouras
<nospam.gravitalsun.noadsplease@hotmail.noads.com> said:
># There is the correct answer and the fast one. Here is the fast one
>!
I'd prefer a clear problem definition and a fast correct answer to the
stated problem. In particular, "human input" normally implies case
independent.
Even had the problem statement allowed for treat "m" and "M"
differently, months come in four sizes, at least in the Gregorian
calendar.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Wed, 7 Aug 2013 23:39:42 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: translate human-readable time shorthand
Message-Id: <slrnl05fku.iao.hjp-usenet3@hrunkner.hjp.at>
On 2013-08-07 08:31, Ivan Shmakov <oneingray@gmail.com> wrote:
>>>>>> Ulli Horlacher <framstag@rus.uni-stuttgart.de> writes:
>>>>>> Mathias K?rber <mathias@koerber.org> wrote:
>
> >> I am looking for a module which can help translate human input for
> >> durations such as
>
> >> 3w4d20m10s into seconds (2161210). Spaces inside the input should
> >> be ignored. If it can accept other formats, the better.
>
> > This is easy:
>
> > sub seconds {
>
> > local $_ = shift; my $seconds = 0;
>
> > s/\s//g;
>
> This one above seems a bit redundant...
>
> > $seconds += $1*60*60*24*7 if /(\d+)w/;
> > $seconds += $1*60*60*24 if /(\d+)d/;
>
> ... given that these REs are already going to ignore spaces.
No, they don't. For example, they won't accept "1 w" as input.
> And not only spaces, BTW. Consider, e. g.:
>
> my $r
> = seconds ("Hello, wor1d!");
>
> Now $r is 86400.
>
Yes. So Ulli's solution accepts a lot of input which should probably not
be accepted. However, to satisfy the spec "Spaces inside the input
should be ignored" the statement s/\s//g is necessary.
hp
--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
| | | hjp@hjp.at | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
------------------------------
Date: Wed, 7 Aug 2013 22:45:27 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: translate human-readable time shorthand
Message-Id: <norada-ai8.ln1@anubis.morrow.me.uk>
Quoth =?UTF-8?B?TWF0aGlhcyBLxZFyYmVy?= <mathias@koerber.org>:
> I am looking for a module which can help
> translate human input for durations such as
>
> 3w4d20m10s
> into seconds (2161210). Spaces inside the
> input should be ignored. If it can accept
> other formats, the better.
If you s/m/mn/g; s/(?<!\d)(?=\d)/ /g; then these can be parsed by
Date::Manip::Delta. You could also use DateTime::Format::DateManip; in
general I'd recommend using DateTime, because it gets all the nasty
corner cases right. There are probably other DateTime modules which will
do the job as well.
Ben
------------------------------
Date: Thu, 08 Aug 2013 12:02:39 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: translate human-readable time shorthand
Message-Id: <878v0ch2cw.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth =?UTF-8?B?TWF0aGlhcyBLxZFyYmVy?= <mathias@koerber.org>:
>> I am looking for a module which can help
>> translate human input for durations such as
>>
>> 3w4d20m10s
>> into seconds (2161210). Spaces inside the
>> input should be ignored. If it can accept
>> other formats, the better.
>
> If you s/m/mn/g; s/(?<!\d)(?=\d)/ /g; then these can be parsed by
> Date::Manip::Delta.
In other words: Date::Manip::Delta can't solve the problem.
> You could also use DateTime::Format::DateManip; in general I'd
> recommend using DateTime, because it gets all the nasty corner cases
> right.
Which 'nasty corner cases'? The only real problem are the ill-defined
units. When assuming that George's/ Ulli's 'approximations' (which is a
euphemism for 'garbage in, garbage out' here) are appropriate, the
problem is simple:
----------------
$d = $ARGV[0];
$d =~ s/\s//g;
%units = (
Y => 365 * 86400,
M => 30 * 86400,
w => 7 * 86400,
d => 86400,
h => 3600,
m => 60,
s => 1);
$p = 0;
$p = pos($d), $seconds += $1 * $units{$2} while $d =~ /\G(\d+)([YMdwhms])/g;
die("error at $p") if $p < length($d);
print($seconds, "\n");
----------------
In scalar context, the $d =~ /\G(\d+)([YMdwhms])/g matches a sequence of
digits followed by a 'unit abbreviation'. The is put into $1, the
latter into $2. The expression rertuns true if a match could be found
and false otherwise. The \G means 'start where the last match stopped'
the /g 'continue with this string' (also an approximation).
------------------------------
Date: Thu, 08 Aug 2013 12:14:11 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: translate human-readable time shorthand
Message-Id: <874nb0h1to.fsf@sapphire.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> Ben Morrow <ben@morrow.me.uk> writes:
>> Quoth =?UTF-8?B?TWF0aGlhcyBLxZFyYmVy?= <mathias@koerber.org>:
>>> I am looking for a module which can help
>>> translate human input for durations such as
>>>
>>> 3w4d20m10s
>>> into seconds (2161210). Spaces inside the
>>> input should be ignored. If it can accept
>>> other formats, the better.
[...]
> ----------------
> $d = $ARGV[0];
> $d =~ s/\s//g;
>
> %units = (
> Y => 365 * 86400,
> M => 30 * 86400,
> w => 7 * 86400,
> d => 86400,
> h => 3600,
> m => 60,
> s => 1);
>
> $p = 0;
> $p = pos($d), $seconds += $1 * $units{$2} while $d =~ /\G(\d+)([YMdwhms])/g;
[...]
It is possible to do without the explicit whitespace removal:
$p = pos($d), $seconds += $1 * $units{$2} while $d =~ /\G\s*(\d+)\s*([YMdwhms])\s*/g;
NB: Thanks to Unicode, the \d and \s might match 'arbitrary
garbage'. In particular, \d+ might match something Perl doesn't
consider to be a number.
------------------------------
Date: Thu, 8 Aug 2013 12:45:26 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: translate human-readable time shorthand
Message-Id: <mvccda-3ua2.ln1@anubis.morrow.me.uk>
Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>
> NB: Thanks to Unicode, the \d and \s might match 'arbitrary
> garbage'. In particular, \d+ might match something Perl doesn't
> consider to be a number.
That's what /a is for.
Ben
------------------------------
Date: Thu, 8 Aug 2013 12:44:55 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: translate human-readable time shorthand
Message-Id: <nuccda-3ua2.ln1@anubis.morrow.me.uk>
Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Ben Morrow <ben@morrow.me.uk> writes:
> > Quoth =?UTF-8?B?TWF0aGlhcyBLxZFyYmVy?= <mathias@koerber.org>:
> >> I am looking for a module which can help
> >> translate human input for durations such as
> >>
> >> 3w4d20m10s
> >> into seconds (2161210). Spaces inside the
> >> input should be ignored. If it can accept
> >> other formats, the better.
> >
> > If you s/m/mn/g; s/(?<!\d)(?=\d)/ /g; then these can be parsed by
> > Date::Manip::Delta.
>
> In other words: Date::Manip::Delta can't solve the problem.
It can be used to solve the problem. The slightly weird input format
means it can't be used directly.
> > You could also use DateTime::Format::DateManip; in general I'd
> > recommend using DateTime, because it gets all the nasty corner cases
> > right.
>
> Which 'nasty corner cases'? The only real problem are the ill-defined
> units. When assuming that George's/ Ulli's 'approximations' (which is a
> euphemism for 'garbage in, garbage out' here) are appropriate, the
> problem is simple:
>
> ----------------
> $d = $ARGV[0];
> $d =~ s/\s//g;
>
> %units = (
> Y => 365 * 86400,
> M => 30 * 86400,
> w => 7 * 86400,
> d => 86400,
> h => 3600,
> m => 60,
> s => 1);
A year is not always 365 days. A month is only occasionally 30 days. A
day is not always 86400 seconds. DateTime will handle expressions like
'one month and three days' correctly; it also handles Summer Time
correctly when working in local time.
If I were doing this I might write my own DateTime::Format module to
handle this format, rather than trying to make it fit the Date::Manip
format.
Ben
------------------------------
Date: Thu, 08 Aug 2013 14:45:09 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: translate human-readable time shorthand
Message-Id: <8738qkqot6.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
[...]
>> Which 'nasty corner cases'? The only real problem are the ill-defined
>> units. When assuming that George's/ Ulli's 'approximations' (which is a
>> euphemism for 'garbage in, garbage out' here) are appropriate, the
>> problem is simple:
>>
>> ----------------
>> $d = $ARGV[0];
>> $d =~ s/\s//g;
>>
>> %units = (
>> Y => 365 * 86400,
>> M => 30 * 86400,
>> w => 7 * 86400,
>> d => 86400,
>> h => 3600,
>> m => 60,
>> s => 1);
>
> A year is not always 365 days. A month is only occasionally 30 days. A
> day is not
[...]
I figure I have now written 3 or 4 postings pointing out that the
units used in this example are not well-defined, including the one
you're replying to, cf first paragraph. DateTime can't "handle that"
because in absence of a start date the interval is supposed to apply
to, the problem can't be solved. Even then, it can't be solved,
neither by DateTime nor anything because 'leap second insertion' is
not predictable. Consequently, I take this as 'no such corner cases
exist in the parser'.
------------------------------
Date: Thu, 8 Aug 2013 15:48:24 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: translate human-readable time shorthand
Message-Id: <ku0ek8$87s$1@reader1.panix.com>
In article <norada-ai8.ln1@anubis.morrow.me.uk>,
Ben Morrow <ben@morrow.me.uk> wrote:
>
>Quoth =?UTF-8?B?TWF0aGlhcyBLxZFyYmVy?= <mathias@koerber.org>:
>> I am looking for a module which can help
>> translate human input for durations such as
>>
>> 3w4d20m10s
>> into seconds (2161210). Spaces inside the
>> input should be ignored. If it can accept
>> other formats, the better.
>
>If you s/m/mn/g; s/(?<!\d)(?=\d)/ /g; then these can be parsed by
>Date::Manip::Delta.
I am not at all familiar with the fancy-pants newfangled stuff in
regexps like in that second example. To save other people trouble,
- "m" has to be expressed as "mn" (in Date::Manip::Delta,
"m" appears to be "month" and "mn" is "minute")
- Date::Manip::Delta requires space (or comma) before digits.
Find each place where the character before is not a digit and the
character following is a digit, and put a space there.
(Those are zero-width assertions.) I see no reason why it could not
be expressed, albeit probably with less efficiency, as
s/(\d+)/ $1/g
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Thu, 08 Aug 2013 09:41:25 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: translate human-readable time shorthand
Message-Id: <080820130941259904%jimsgibson@gmail.com>
In article <nuccda-3ua2.ln1@anubis.morrow.me.uk>, Ben Morrow
<ben@morrow.me.uk> wrote:
> > > Quoth =?UTF-8?B?TWF0aGlhcyBLxZFyYmVy?= <mathias@koerber.org>:
> > >> I am looking for a module which can help
> > >> translate human input for durations such as
> > >>
> > >> 3w4d20m10s
> > >> into seconds (2161210). Spaces inside the
> > >> input should be ignored. If it can accept
> > >> other formats, the better.
> > >
> A year is not always 365 days. A month is only occasionally 30 days. A
> day is not always 86400 seconds. DateTime will handle expressions like
> 'one month and three days' correctly; it also handles Summer Time
> correctly when working in local time.
There is some ambiguity in the concept of /duration/ when applied to
such large units as years and months. A "unit" of duration should have
a fixed definition that doesn't depend upon the date and time when the
period starts and stops.
For example, if you say you want your egg boiled for "three minutes",
you mean exactly 180 seconds, regardless of when you actually place the
egg into the boiling water and whether or not a leap second is added to
the calendar during the cooking. If you want your concrete driveway to
cure for "three days" before parking your car on it, you want to wait
no less than 259,200 seconds, even if you happen to pour the concrete
on the Friday before daylight savings goes into effect.
So we should try to agree that one minute of duration is 60 seconds,
one hour is 60 minutes, one day is 24 hours, and one week is 7 days,
regardless of when those periods start or stop.
When it comes to months and years, there is more ambiguity and more
possibility for disagreement. A solar year is 365.242 days, or
31,556,908.8 seconds. An average "month" would be one-twelfth of that,
or 2,629,742.4 seconds = 30.4368 days. The potential disagreements of
what constitutes a "year" or a "month" probably preclude those terms
from being used as units of duration unless it is obvious what values
are being used.
--
Jim Gibson
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 4010
***************************************