[31446] in Perl-Users-Digest
Perl-Users Digest, Issue: 2698 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Nov 28 18:09:41 2009
Date: Sat, 28 Nov 2009 15:09:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sat, 28 Nov 2009 Volume: 11 Number: 2698
Today's topics:
Re: DLL unload question for embedded Perl on Windows <u8526505@gmail.com>
Re: DLL unload question for embedded Perl on Windows <ben@morrow.me.uk>
Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Per <justin.0911@purestblue.com>
Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Per <sysadmin@example.com>
Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Per <justin.0911@purestblue.com>
Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Per <hjp-usenet2@hjp.at>
Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Per <OJZGSRPBZVCX@spammotel.com>
Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Per <ben@morrow.me.uk>
Re: perl hash: low-level implementation details? <xhoster@gmail.com>
Re: perl hash: low-level implementation details? <xhoster@gmail.com>
perlio vs. sysread speed (was: Quick CGI question (spec <hjp-usenet2@hjp.at>
Re: regexp for removing {} around latin1 characters <hjp-usenet2@hjp.at>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 28 Nov 2009 00:27:00 -0800 (PST)
From: cyl <u8526505@gmail.com>
Subject: Re: DLL unload question for embedded Perl on Windows
Message-Id: <640e9702-4c8d-4982-ac10-78ec0499dd09@f20g2000prn.googlegroups.com>
On 11=E6=9C=8828=E6=97=A5, =E4=B8=8A=E5=8D=884=E6=99=8237=E5=88=86, Ben Mor=
row <b...@morrow.me.uk> wrote:
> Quoth cyl <u8526...@gmail.com>:
>
> > 1. The loaded DLLs do not unload after the Perl interpreter is
> > shutdown
>
> I would not expect Cwd.dll to be unloaded until after perl_free is
> called. It is not normal for a perl interpreter to ever unload a loaded
> extension dll.
>
I just want to have a clean environment like a new process starts. Is
there any method to achieve this?
> > 2. perl_destruct() always throws exception. I have to comment out it
> >
>
> I suspect this may have something to do with your misuse of SYS_INIT3
> and SYS_TERM, but I don't know. If fixing that doesn't help, build perl
> with -DDEBUGGING and see if you get more information.
>
I modified my code in this way
int main(int argc, char **argv, char **env)
{
int i=3D0;
PERL_SYS_INIT3(&argc,&argv,NULL);
for (i=3D0;i<2;i++)
runperl();
PERL_SYS_TERM();
}
and the result is a bit confusing. It ran fine on the machine that
build the executable but still threw exception on perl_destruct() on
another machine. The only difference I can tell is one machine has MS
Visual Studio 2005 and the other not.
>
> This is expected. Your perl is built with threads (since you're on
> WIn32) and you are effectively trying to use two different interpreters
Multi-thread is another problem I met but didn't mention. I'm using
Active Perl 5.8.8 but my program always crash with multi-thread.
Currently I just remove the thread functions and want to solve the
problems I mentioned first.
------------------------------
Date: Sat, 28 Nov 2009 17:53:27 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: DLL unload question for embedded Perl on Windows
Message-Id: <nl48u6-l3j1.ln1@osiris.mauzo.dyndns.org>
Quoth cyl <u8526505@gmail.com>:
> On 11月28日, 上午4時37分, Ben Morrow <b...@morrow.me.uk> wrote:
> > Quoth cyl <u8526...@gmail.com>:
> >
> > > 1. The loaded DLLs do not unload after the Perl interpreter is
> > > shutdown
> >
> > I would not expect Cwd.dll to be unloaded until after perl_free is
> > called. It is not normal for a perl interpreter to ever unload a loaded
> > extension dll.
> >
>
> I just want to have a clean environment like a new process starts. Is
> there any method to achieve this?
I don't know. Have you checked to see whether perl_free unloads the dlls
or not? If it does then that's what you want.
> > > 2. perl_destruct() always throws exception. I have to comment out it
> > >
> >
> > I suspect this may have something to do with your misuse of SYS_INIT3
> > and SYS_TERM, but I don't know. If fixing that doesn't help, build perl
> > with -DDEBUGGING and see if you get more information.
> >
>
> I modified my code in this way
>
> int main(int argc, char **argv, char **env)
> {
> int i=0;
> PERL_SYS_INIT3(&argc,&argv,NULL);
Um, what did I say? *You need to pass the same arguments as were passed
to main*. That *includes* env.
> for (i=0;i<2;i++)
> runperl();
> PERL_SYS_TERM();
> }
>
> and the result is a bit confusing. It ran fine on the machine that
> build the executable but still threw exception on perl_destruct() on
> another machine. The only difference I can tell is one machine has MS
> Visual Studio 2005 and the other not.
As a general rule you must use the same compiler your copy of perl was
built with. If you are using 32-bit AS perl that means MSVC 6. If you
don't have the right compiler, rebuild perl with the compiler you *do*
have.
> > This is expected. Your perl is built with threads (since you're on
> > WIn32) and you are effectively trying to use two different interpreters
>
> Multi-thread is another problem I met but didn't mention. I'm using
> Active Perl 5.8.8 but my program always crash with multi-thread.
> Currently I just remove the thread functions and want to solve the
> problems I mentioned first.
Have you read *all* of perlembed? It explains wuite carefully what you
must do to handle multiple threads.
Ben
------------------------------
Date: Sat, 28 Nov 2009 13:46:54 +0000
From: Justin C <justin.0911@purestblue.com>
Subject: Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Message-Id: <e7m7u6-n7r.ln1@purestblue.com>
In article <jSYPm.59723$rE5.9343@newsfe08.iad>, PerlFAQ Server wrote:
> This is an excerpt from the latest version perlfaq4.pod, which
> comes with the standard Perl distribution. These postings aim to
> reduce the number of repeated questions as well as allow the community
> to review and update the answers. The latest version of the complete
> perlfaq is at http://faq.perl.org .
>
> --------------------------------------------------------------------
>
> 4.18: Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
I don't believe that this is still a FAQ. What sort of person would be
asking it ten years after the event?
Justin.
--
Justin C, by the sea.
------------------------------
Date: Sat, 28 Nov 2009 13:41:05 -0800
From: Wanna-Be Sys Admin <sysadmin@example.com>
Subject: Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Message-Id: <RNgQm.75892$Xf2.9289@newsfe12.iad>
Justin C wrote:
> In article <jSYPm.59723$rE5.9343@newsfe08.iad>, PerlFAQ Server wrote:
>> This is an excerpt from the latest version perlfaq4.pod, which
>> comes with the standard Perl distribution. These postings aim to
>> reduce the number of repeated questions as well as allow the
>> community to review and update the answers. The latest version of the
>> complete perlfaq is at http://faq.perl.org .
>>
>> --------------------------------------------------------------------
>>
>> 4.18: Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
>
> I don't believe that this is still a FAQ. What sort of person would be
> asking it ten years after the event?
>
> Justin.
>
Probably the same people that still create or see the problem. I've
seen a lot of sites online still show odd dates for the year. But,
you're right in a way, because those people will never read (or care to
read) the FAQ anyway.
--
Not really a wanna-be, but I don't know everything.
------------------------------
Date: Sat, 28 Nov 2009 22:28:49 +0000
From: Justin C <justin.0911@purestblue.com>
Subject: Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Message-Id: <1qk8u6-6e5.ln1@purestblue.com>
In article <slrnhh2gg8.323.hjp-usenet2@hrunkner.hjp.at>, Peter J. Holzer wrote:
> On 2009-11-28 13:46, Justin C <justin.0911@purestblue.com> wrote:
>> In article <jSYPm.59723$rE5.9343@newsfe08.iad>, PerlFAQ Server wrote:
>>> 4.18: Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
>>
>> I don't believe that this is still a FAQ. What sort of person would be
>> asking it ten years after the event?
>
> I predict that it won't take very long until programs start breaking
> when they have to deal with dates before 2000.
>
> Also, the year 2038 is approaching. Is perl Y2038-compliant yet? ;-).
See FAQ 4.18 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Justin.
--
Justin C, by the sea.
------------------------------
Date: Sat, 28 Nov 2009 16:30:16 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Message-Id: <slrnhh2gg8.323.hjp-usenet2@hrunkner.hjp.at>
On 2009-11-28 13:46, Justin C <justin.0911@purestblue.com> wrote:
> In article <jSYPm.59723$rE5.9343@newsfe08.iad>, PerlFAQ Server wrote:
>> 4.18: Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
>
> I don't believe that this is still a FAQ. What sort of person would be
> asking it ten years after the event?
I predict that it won't take very long until programs start breaking
when they have to deal with dates before 2000.
Also, the year 2038 is approaching. Is perl Y2038-compliant yet? ;-).
hp
------------------------------
Date: Sat, 28 Nov 2009 17:49:39 +0100
From: "Jochen Lehmeier" <OJZGSRPBZVCX@spammotel.com>
Subject: Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Message-Id: <op.u336c1chmk9oye@frodo>
On Sat, 28 Nov 2009 16:30:16 +0100, Peter J. Holzer <hjp-usenet2@hjp.at>
wrote:
> Also, the year 2038 is approaching. Is perl Y2038-compliant yet? ;-).
Not out of the box, but there is a Y2038 module which makes it compliant.
------------------------------
Date: Sat, 28 Nov 2009 22:31:11 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: FAQ 4.18 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
Message-Id: <fuk8u6-dek1.ln1@osiris.mauzo.dyndns.org>
Quoth "Peter J. Holzer" <hjp-usenet2@hjp.at>:
> On 2009-11-28 13:46, Justin C <justin.0911@purestblue.com> wrote:
> > In article <jSYPm.59723$rE5.9343@newsfe08.iad>, PerlFAQ Server wrote:
> >> 4.18: Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
> >
> > I don't believe that this is still a FAQ. What sort of person would be
> > asking it ten years after the event?
>
> I predict that it won't take very long until programs start breaking
> when they have to deal with dates before 2000.
>
> Also, the year 2038 is approaching. Is perl Y2038-compliant yet? ;-).
5.12 will be, insofar as the underlying OS allows (with very reasonable
workarounds where it doesn't).
Ben
------------------------------
Date: Sat, 28 Nov 2009 11:11:04 -0800
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: perl hash: low-level implementation details?
Message-Id: <4b118df5$0$5242$ed362ca5@nr5-q3a.newsreader.com>
Ilya Zakharevich wrote:
> On 2009-11-25, Xho Jingleheimerschmidt <xhoster@gmail.com> wrote:
>>>> I'd guess roughly it comes up to something like: 48 bytes for the key
>>>> and associated structure, 40 bytes for the value-scalar (which holds an
>>>> arrayref), 160 bytes for the array overhead, and 48 bytes for each
>>>> scalar (usually 1) inside each array.
>>> ??? Where do you get these numbers?
>> Memory of past experiments, and a little guessing.
>
> Probably "a faulty memory of past experiments"...
I've since repeated some experiments on a 64 bit machine. I greatly
underestimated the size of the hash keys (and associated overhead).
They take a minimum of 88 bytes, and obviously more if they are long.
(but less if they are shared with other hashes). My original estimate
of the array overhead included the scalars to hold the array references
(and a single array to hold those scalars), as you can't have millions
of arrays without some way to hold them. So I was double counting that
part.
The values for small arrays are quite wobbly, because they are path
dependent. For example, pushing onto an autoviv takes more room than
explicitly assigning an arrayref, that is
push @{$x{$_}}, 1, 2;
versus
$x{$_}=[1,2];
Also, adding the first element to previously empty array takes a lot
more space than adding additional elements, so unless you plan on having
lots of references to empty arrays, you would want to benchmark the size
of a single-element array and then subtract the size of the element (or
create the array as a single element, then delete that element, inside
the loop), rather virginal arrays.
>
>>> On 32-bit machine,
>> He isn't using a 32 bit machine.
>
> .... so multiplying by 2 gives an upper bound...
Maybe, maybe not. I get better results with the experimental method
rather than trying to do it all from first principles, and inevitably
overlooking things.
Xho
------------------------------
Date: Sat, 28 Nov 2009 11:30:31 -0800
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: perl hash: low-level implementation details?
Message-Id: <4b118dfb$0$5244$ed362ca5@nr5-q3a.newsreader.com>
David Harmon wrote:
> On Mon, 23 Nov 2009 21:23:14 -0800 in comp.lang.perl.misc, Xho
> Jingleheimerschmidt <xhoster@gmail.com> wrote,
>> Each value slot will have exactly one value in it--that is how Perl
>> hashes work. However, in you code that value will be a reference to an
>> array, which array will on average have close to 1 element in it.
>>
>> And there goes your memory. You have about 50 million tiny arrays, each
>> one using a lot of overhead.
>
> I'm thinking a good way to store it would be the actual value in the
> hash as long as there is only one for that key, or an array reference as
> soon as it grows to more than one.
Been there, done that. But I wouldn't recommend it. If every store and
fetch has to have special case code anyway, and as long as you can pick
a character guaranteed not to be in the values, I'd just join the
multiple values on the magic character on storage and split on that
character on retrieval. That way you get the space savings even if many
or most values end up being multivalued.
I guess it depends on what assumption is least likely to be violated in
the future, the one that the vast majority of entries will remain
single-valued, or the one that your magic character will remain magic.
Or maybe pack and unpack would be better than join/concat and split,
depending on the nature of the values.
> You would need a sub to insert and
> to access, to keep complexity under control. At which point maybe it
> makes sense to make it a class module. Am I going down a bad path here?
I'm not sure what you mean by a class module. An OO module? I think
that if you need to optimize for space today, then there is a good
chance you will need to optimize for speed tomorrow. So I'd be somewhat
reluctant to start routing all of the accesses (probably in the
innermost loop) through OO calls. But I guess it would depend on how
many different places in the code needed to access this, and on whether
I already knew that something else was going to be irreducible bottleneck.
Xho
------------------------------
Date: Sat, 28 Nov 2009 16:26:24 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))
Message-Id: <slrnhh2g93.323.hjp-usenet2@hrunkner.hjp.at>
On 2009-11-27 18:49, Uri Guttman <uri@StemSystems.com> wrote:
>>>>>> "PJH" == Peter J Holzer <hjp-usenet2@hjp.at> writes:
>
> PJH> On 2009-11-25 22:11, Uri Guttman <uri@StemSystems.com> wrote:
> >> if you want more speed, use sysread and syswrite.
>
> PJH> sysread/syswrite probably aren't much faster than read/print. The latter
> PJH> have a bit more buffer handling overhead but that is almost certainly
> PJH> negligible when you read data from a disk and send it over the network.
>
> they both avoid stdio (or perl's version) so they are faster. how much
> depends on the amount of i/o and how many calls are made. this is why
> file::slurp uses sysread/write. see its benchmarks to see the difference
> from read/print.
Your benchmark was for a 300 MHz SPARC. CPU speed has improved more than
disk speed since then.
So I grabbed the server with the fastest disks I had access to (disk
array of SSDs), created a file with 400 million lines of 80 characters
(plus newline) each and ran some benchmarks:
method time speed (MB/s)
----------------------------------------------
perlio $/ = "\n" 2:35.12 209
perlio $/ = \4096 1:35.36 340
perlio $/ = \1048576 1:35.25 340
sysread bs = 4096 1:35.28 340
sysread bs = 1048576 1:35.18 340
The times are the median of three runs. Times between the runs differed
by about 1 second, so the difference between reading line by line and
block by block is significant, but the difference between perlio and
sysread or between different blocksizes isn't.
I was a bit surprised that reading line by line was so much slower than
blockwise reading. Was it because of the higher loop overhead (81 bytes
read per loop instead of 4096 means 50 times more overhead) or because
splitting a block into lines is so expensive?
So I did another run of benchmarks with different block sizes:
method block user system cpu total
read_file_by_perlio_block 4096 0.64s 26.87s 31% 1:27.91
read_file_by_perlio_block 2048 1.48s 28.65s 34% 1:28.56
read_file_by_perlio_block 1024 5.14s 29.03s 37% 1:30.59
read_file_by_perlio_block 512 11.98s 31.33s 47% 1:31.22
read_file_by_perlio_block 256 26.84s 33.13s 61% 1:36.85
read_file_by_perlio_block 128 43.53s 29.05s 71% 1:41.66
read_file_by_perlio_block 64 77.26s 28.16s 88% 1:59.70
read_file_by_line 104.68s 28.01s 93% 2:22.34
(the times are a bit lower now because here the system was idle while it
had a (relatively constant) load during the first batch)
As expected elapsed time as well as CPU time increases with shrinking
block size. However, even at 64 bytes, reading in blocks is still 20%
faster than reading in lines, even though the loop is now executed 27%
more often.
Conclusions:
* The difference between sysread and blockwise <> isn't even measurable.
* Above 512 Bytes the block size matters very little (and above 4k, not
at all).
* Reading line by line is significantly slower than reading by blocks.
> PJH> However, if the files are large (and videos can be quite large),
> PJH> you can save quite a lot of time by reading the file in smallish
> PJH> chunks (a few kB to a few MB) and send each chunk immediately. If
> PJH> you read the whole file into memory first and then send it to the
> PJH> client the times for reading from disk and sending over the net
> PJH> add up. Otherwise they overlap resulting in a shorter total time.
>
> for some definition of large and small! :)
Let's use a specific example. I have several videos on my disk. The
largest of them is 542 MB.
Let's assume I have this file on the aforementioned SSD array and want to
send it over a gbit network connection. I can read the whole file in
542MB / 340MB/s == 1.6s. I can send it over the network in
542MB / 120MB/s == 4.5 seconds. If I first read it completely into memory
and then send it over the network, the total transfer time is
1.6s + 4.5s == 6.1s. If I read the file in 4kB blocks or even line by
line (not that reading a video line by line makes much sense) I can
still read it faster than it can be sent over the network, but since I
start sending only milliseconds after I start reading, the total
transfer time now is 4.5 seconds, or 35% faster.
hp
------------------------------
Date: Sat, 28 Nov 2009 11:43:17 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: regexp for removing {} around latin1 characters
Message-Id: <slrnhh1vm5.323.hjp-usenet2@hrunkner.hjp.at>
[Please don't cc usenet postings]
On 2009-11-27 20:59, Michael Friendly <friendly@yorku.ca> wrote:
> Peter J. Holzer wrote:
>> On 2009-11-27 17:39, Glenn Jackman <glennj@ncf.ca> wrote:
>>> At 2009-11-27 12:05PM, "Michael Friendly" wrote:
>>>> I have BibTeX files containing accented characters in forms like
>>>> {Johann Peter S{\"u}ssmilch}
>>>> {Johann Peter S\"ussmilch}
>>>> where, in BibTex, the {} are optional.
>>> [...]
>>>> So, I'm looking to complete the process by finding a regexp to remove
>>>> the braces around single accented latin1 characters.
>>>>
>>>> recode latex..latin1 < my.bib | perl -pe "s|\{([WHATGOESHERE])\}|$1|g"
>>> Maybe:
>>>
>>> s#{(\\.(?:{.+?}|.+?))}#$1#g
>>
>> more likely:
>>
>> perl -pe "s|\{([\xA0-\xFF])\}|$1|g"
>>
>>
>> I think you are trying to replace the recode, too, but for that you need
>> a lookup table with all the accented characters.
>
> No, all I want to do is to strip the {} around the accented characters;
Yes, I was following up to Glenn here, so the "you" was referring him.
If you look at his regexp, you will see that it matches the for example
{\"{u}} or {\"u}. That doesn't work after the recode, because the \" has
already been replaced.
> recode does the conversion well. With the small test bib file below,
> here's what I get using only recode, vs. recode + perl
>
>
> % recode latex..latin1 < timeref.bib | grep ssmilch
> @BOOK{Sussmilch:1741,
> author = {Johann Peter S{}ssmilch},
>
> % recode latex..latin1 < timeref.bib | perl -pe
> "s|\{([\xA0-\xFF])\}|$1|g" | grep ssmilch
> @BOOK{Sussmilch:1741,
> author = {Johann Peter Sssmilch},
>
> Note that the just disappears.
You are using a unixish system? The shell replaces $1 inside the double
quotes with the current value of the shell variable $1 (in your case
probably nothing), so that the code that perl sees is:
s|\{([\xA0-\xFF])\}||g
On unixish systems you should always use single quotes to enclose perl
code unless you want the shell to substitute part of your code. Since
you were using double quotes I was assuming you are on Windows.
In general you should only use one-liners if you are familiar with the
shell you are using.
hp
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2698
***************************************