[32785] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4049 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Oct 5 09:09:37 2013

Date: Sat, 5 Oct 2013 06:09:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 5 Oct 2013     Volume: 11 Number: 4049

Today's topics:
        multiple codepages <gravitalsun@hotmail.foo>
    Re: multiple codepages <bjoern@hoehrmann.de>
    Re: multiple codepages <gravitalsun@hotmail.foo>
    Re: multiple codepages <hjp-usenet3@hjp.at>
    Re: multiple codepages <gravitalsun@hotmail.foo>
    Re: multiple codepages <gravitalsun@hotmail.foo>
    Re: multiple codepages <*@eli.users.panix.com>
    Re: multiple codepages <hjp-usenet3@hjp.at>
    Re: multiple codepages <jurgenex@hotmail.com>
    Re: multiple codepages <hhr-m@web.de>
    Re: multiple codepages <gravitalsun@hotmail.foo>
    Re: openCV - cpan/Cv <netnews@invalid.com>
    Re: Opening a file with its Windows app in Activestate <bernie@fantasyfarm.com>
    Re: perl hash utilities <rweikusat@mobileactivedefense.com>
    Re: perl hash utilities <ben@morrow.me.uk>
    Re: perl hash utilities <rweikusat@mobileactivedefense.com>
        runnable.com <eric@fruitcom.com>
    Re: runnable.com <*@eli.users.panix.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 03 Oct 2013 23:39:13 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: multiple codepages
Message-Id: <l2kkli$olj$1@news.ntua.gr>

I receive files containing text of multiple codepages (at the same file) 
 . You can not know the codepage of every line from before.
Any idea to convert it to valid utf8 ?


------------------------------

Date: Thu, 03 Oct 2013 22:47:11 +0200
From: Bjoern Hoehrmann <bjoern@hoehrmann.de>
Subject: Re: multiple codepages
Message-Id: <7olr49tbvji40sba15bemp3kabsogvs8c3@hive.bjoern.hoehrmann.de>

* George Mpouras wrote in comp.lang.perl.misc:
>I receive files containing text of multiple codepages (at the same file) 
>. You can not know the codepage of every line from before.
>Any idea to convert it to valid utf8 ?

In order to properly convert to UTF-8 you have to know the encoding the
bytes are in prior to the conversion. Switching between encodings inside
a single file should be no problem so long as you can isolate the bytes
around the positions where the encoding changes. If you cannot do that,
or cannot know the encoding of the bytes through any means at all, then
you have a problem. Perhaps you can elaborate on your problem?
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 


------------------------------

Date: Fri, 04 Oct 2013 00:29:04 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: multiple codepages
Message-Id: <l2knj1$108j$1@news.ntua.gr>

Î£Ï„Î¹Ï‚ 3/10/2013 23:47, Î¿/Î· Bjoern Hoehrmann ÎÎ³ÏÎ±ÏˆÎµ:
> * George Mpouras wrote in comp.lang.perl.misc:
>> I receive files containing text of multiple codepages (at the same file)
>> . You can not know the codepage of every line from before.
>> Any idea to convert it to valid utf8 ?
>
> In order to properly convert to UTF-8 you have to know the encoding the
> bytes are in prior to the conversion. Switching between encodings inside
> a single file should be no problem so long as you can isolate the bytes
> around the positions where the encoding changes. If you cannot do that,
> or cannot know the encoding of the bytes through any means at all, then
> you have a problem. Perhaps you can elaborate on your problem?
>

there is no way to know it , they are email headers on big log


------------------------------

Date: Thu, 3 Oct 2013 23:31:57 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: multiple codepages
Message-Id: <slrnl4roid.6vv.hjp-usenet3@hrunkner.hjp.at>

On 2013-10-03 21:29, George Mpouras <gravitalsun@hotmail.foo> wrote:
> Î£Ï„Î¹Ï‚ 3/10/2013 23:47, Î¿/Î· Bjoern Hoehrmann ÎÎ³ÏÎ±ÏˆÎµ:
>> * George Mpouras wrote in comp.lang.perl.misc:
>>> I receive files containing text of multiple codepages (at the same file)
>>> . You can not know the codepage of every line from before.
>>> Any idea to convert it to valid utf8 ?
>>
>> In order to properly convert to UTF-8 you have to know the encoding the
>> bytes are in prior to the conversion. Switching between encodings inside
>> a single file should be no problem so long as you can isolate the bytes
>> around the positions where the encoding changes. If you cannot do that,
>> or cannot know the encoding of the bytes through any means at all, then
>> you have a problem. Perhaps you can elaborate on your problem?
>>
>
> there is no way to know it , they are email headers on big log

Email headers use RFC 2047 encoding.

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaÃŸt. -- Ralph Babel


------------------------------

Date: Fri, 04 Oct 2013 00:38:46 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: multiple codepages
Message-Id: <l2ko57$11rc$1@news.ntua.gr>

Î£Ï„Î¹Ï‚ 4/10/2013 00:31, Î¿/Î· Peter J. Holzer ÎÎ³ÏÎ±ÏˆÎµ:
> On 2013-10-03 21:29, George Mpouras <gravitalsun@hotmail.foo> wrote:
>> Î£Ï„Î¹Ï‚ 3/10/2013 23:47, Î¿/Î· Bjoern Hoehrmann ÎÎ³ÏÎ±ÏˆÎµ:
>>> * George Mpouras wrote in comp.lang.perl.misc:
>>>> I receive files containing text of multiple codepages (at the same file)
>>>> . You can not know the codepage of every line from before.
>>>> Any idea to convert it to valid utf8 ?
>>>
>>> In order to properly convert to UTF-8 you have to know the encoding the
>>> bytes are in prior to the conversion. Switching between encodings inside
>>> a single file should be no problem so long as you can isolate the bytes
>>> around the positions where the encoding changes. If you cannot do that,
>>> or cannot know the encoding of the bytes through any means at all, then
>>> you have a problem. Perhaps you can elaborate on your problem?
>>>
>>
>> there is no way to know it , they are email headers on big log
>
> Email headers use RFC 2047 encoding.
>
> 	hp
>
>

maybe but there are Cyrillic, France , whatever .. at "username"


------------------------------

Date: Fri, 04 Oct 2013 00:45:35 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: multiple codepages
Message-Id: <l2koi0$13dm$1@news.ntua.gr>

Î£Ï„Î¹Ï‚ 3/10/2013 23:47, Î¿/Î· Bjoern Hoehrmann ÎÎ³ÏÎ±ÏˆÎµ:
> * George Mpouras wrote in comp.lang.perl.misc:
>> I receive files containing text of multiple codepages (at the same file)
>> . You can not know the codepage of every line from before.
>> Any idea to convert it to valid utf8 ?
>
> In order to properly convert to UTF-8 you have to know the encoding the
> bytes are in prior to the conversion. Switching between encodings inside
> a single file should be no problem so long as you can isolate the bytes
> around the positions where the encoding changes. If you cannot do that,
> or cannot know the encoding of the bytes through any means at all, then
> you have a problem. Perhaps you can elaborate on your problem?
>

I remember a module called encode-guess ... maybe this will work


------------------------------

Date: Thu, 3 Oct 2013 21:47:43 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: multiple codepages
Message-Id: <eli$1310031747@qz.little-neck.ny.us>

In comp.lang.perl.misc, George Mpouras  <gravitalsun@hotmail.foo> wrote:
> there is no way to know it , they are email headers on big log

Email headers should be UTF-8 (if RFC-6532 compliant message/global) or
seven-bit with RFC-2047 MIME-Encoded words that describe the "charset"
and transfer-encoding.

https://www.rfc-editor.org/rfc/rfc6532.txt
https://www.rfc-editor.org/rfc/rfc2047.txt

In general, a well-trained n-gram system can make good guesses about
the language and character set of a string in an unknown format. The
"well-trained" bit can be hard to do, if you don't have large
resources to apply to the problem. 

ObPerl: I have used the Lingua::Ident module in the past for n-gram
language identification.

Elijah
------
statistics for the win!


------------------------------

Date: Thu, 3 Oct 2013 23:52:05 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: multiple codepages
Message-Id: <slrnl4rpo5.9ik.hjp-usenet3@hrunkner.hjp.at>

On 2013-10-03 21:38, George Mpouras <gravitalsun@hotmail.foo> wrote:
> Î£Ï„Î¹Ï‚ 4/10/2013 00:31, Î¿/Î· Peter J. Holzer ÎÎ³ÏÎ±ÏˆÎµ:
>> On 2013-10-03 21:29, George Mpouras <gravitalsun@hotmail.foo> wrote:
>>> Î£Ï„Î¹Ï‚ 3/10/2013 23:47, Î¿/Î· Bjoern Hoehrmann ÎÎ³ÏÎ±ÏˆÎµ:
>>>> * George Mpouras wrote in comp.lang.perl.misc:
>>>>> I receive files containing text of multiple codepages (at the same file)
>>>>> . You can not know the codepage of every line from before.
>>>>> Any idea to convert it to valid utf8 ?
>>>>
>>>> In order to properly convert to UTF-8 you have to know the encoding the
>>>> bytes are in prior to the conversion. Switching between encodings inside
>>>> a single file should be no problem so long as you can isolate the bytes
>>>> around the positions where the encoding changes. If you cannot do that,
>>>> or cannot know the encoding of the bytes through any means at all, then
>>>> you have a problem. Perhaps you can elaborate on your problem?
>>>>
>>>
>>> there is no way to know it , they are email headers on big log
>>
>> Email headers use RFC 2047 encoding.
>
> maybe but there are Cyrillic, France , whatever .. at "username"

RFC 2047 encoding includes the encoding. So there s a way to know it
(otherwise non-ascii characters in subjects, from or to headers etc.
would be impossible).

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaÃŸt. -- Ralph Babel


------------------------------

Date: Thu, 03 Oct 2013 16:06:30 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: multiple codepages
Message-Id: <3rtr49t0840pfemj4d5sl860f0j489eig7@4ax.com>

George Mpouras <gravitalsun@hotmail.foo> wrote:
>I receive files containing text of multiple codepages (at the same file) 
>. You can not know the codepage of every line from before.
>Any idea to convert it to valid utf8 ?

Given your statement that you do not know the codepage for each line,
no, that is not possible.
The simple text 'abcd' would be exactly the same byte sequence (0x61
0x62 0x63 0x64) in ASCII, Latin-1, Latin-15, Windows-1252, UTF-8, and
several dozen other encodings. Without additional external information
it is not possible to determine which one is the right one.

jue


------------------------------

Date: Fri, 4 Oct 2013 10:02:15 +0200
From: Helmut Richter <hhr-m@web.de>
Subject: Re: multiple codepages
Message-Id: <alpine.LNX.2.00.1310040959150.5543@badwlrz-clhri01.ws.lrz.de>

On Thu, 3 Oct 2013, George Mpouras wrote:

> I receive files containing text of multiple codepages (at the same file) . You
> can not know the codepage of every line from before.
> Any idea to convert it to valid utf8 ?

I have a tool that translates a mixture of UTF-8 and *one* codepage into 
pure UTF-8 (under the assumption that a valid UTF-8 byte sequence is 
indeed meant as an UTF-8 character). But if more than one 8-bit code is 
involved, you have to do some hand massage before or after.

If you are interested, I'll make it available somehow.

-- 
Helmut Richter


------------------------------

Date: Sat, 05 Oct 2013 14:30:25 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: multiple codepages
Message-Id: <l2ot8l$ra6$1@news.ntua.gr>

Î£Ï„Î¹Ï‚ 4/10/2013 11:02, Î¿/Î· Helmut Richter ÎÎ³ÏÎ±ÏˆÎµ:
> On Thu, 3 Oct 2013, George Mpouras wrote:
>
>> I receive files containing text of multiple codepages (at the same file) . You
>> can not know the codepage of every line from before.
>> Any idea to convert it to valid utf8 ?
>
> I have a tool that translates a mixture of UTF-8 and *one* codepage into
> pure UTF-8 (under the assumption that a valid UTF-8 byte sequence is
> indeed meant as an UTF-8 character). But if more than one 8-bit code is
> involved, you have to do some hand massage before or after.
>
> If you are interested, I'll make it available somehow.
>

I would love to have a look if you can


------------------------------

Date: Thu, 03 Oct 2013 14:51:39 -0700
From: HASM <netnews@invalid.com>
Subject: Re: openCV - cpan/Cv
Message-Id: <87siwikopg.fsf@127.0.0.1>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

>> I had tried that already, but I guess I put it in the wrong place.

> First, you should determine if the Perl seed is used in the module code.

The Cv module is not up to par with others in cpan in easy of use and
installation, and the japanese to english translation of the instructions
don't really help someone like me that is not that familiar with XS.

I had put the #undef seed after the perl.h in a file called Cv.inc, next to
other #undefs in the same file for other macros.  Then I compiled and got
some many more warnings that I didn't realize the seed error was no longer
in there.

I cleared the whole thing, restarted, put the #undef in the same place and
saved the make output to a file.  It's several "pages" long but it is all
warnings.  

Some of the tests file, but some are due to the unavailability of the SURF
stuff (patent related), but I was able to make install, use the code.

Thanks,

-- HASM



------------------------------

Date: Thu, 03 Oct 2013 14:51:49 -0400
From: Bernie Cosell <bernie@fantasyfarm.com>
Subject: Re: Opening a file with its Windows app in Activestate
Message-Id: <sjer495ev7bf0476absnhd91fgo6e6mt3t@library.airnews.net>

Ben Morrow <ben@morrow.me.uk> wrote:

Thanks for the advice.  It wouldn't have occurred to me that
 system("mysheet.xls")

would do the right thing and actually start Excel.  If it does return
immediately, I can always wait for the user to hit <ENTER> or something to
move on [the user leaving Excel running or not, at their pleasure]

} IME the only even-remotely reliable way to invoke real Windows programs
} (rather than ported Unix programs, which tend to behave a little
} differently) is to use Win32::Process directly. The Create call will
} always return immediately, but you can then call ->Wait to wait for it
} to terminate. (Obviously if you used 'start' this still won't get you
} anywhere.)

AHA!   I couldn't quite figure that out through the win32 docs.  But I see
it now:

Win32::Process::Create($ProcessObj,
                                "C:\\winnt\\system32\\notepad.exe",
                                "notepad temp.txt",
                                0,
                                NORMAL_PRIORITY_CLASS,
                                ".")|| die ErrorReport();

that's perfect! [there's even with a  ->wait method!].  The second are is
just given as "command line args"  I guess that means a string, with
double-quotes in the necessary places and spaces between the args, just as
in cmd.exe.

And more poking around in the activestate docs [I wish there was a better
search!]  I found:

ActiveState::Win32::Shell - Windows Shell Functions
FindExecutable( $document )

    Returns the executable registered to open the $document or undef.

So presumably that could be used to find the first argument to feed to
Process::Create.   Oh boy... if this all works it'll be a miracle...:o)

THANKS!!   /bernie\

-- 
Bernie Cosell                     Fantasy Farm Fibers
bernie@fantasyfarm.com            Pearisburg, VA
    -->  Too many people, too few sheep  <--          


------------------------------

Date: Fri, 04 Oct 2013 13:56:21 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: perl hash utilities
Message-Id: <874n8xqjnu.fsf@sable.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Cal Dershowitz <cal@example.invalid>:
>> 
>> and as long as the $ref itself or a copy of $ref exists, the data is 
>> available.  If the references go undefined then all bets are off.  I'm 
>> not sure if I understood that section of the alpaca book.
>
> It's very simple (or at least, it's very simple as long as you don't
> involve circular or weak refs). A data structure continues to exist for
> as long as someone can get to it. As soon as it becomes completely
> invisible, it self-destructs.

This is based on the same wrong-headed "we don't want to do this now,
some student can do this later" 'everything is an IBM 704 running batch
jobs' assumption as a so-called tracing garbage collector (this is not
quite true, one can argue that the truth is clearly visible in this
statement to everyone who already knows it).

perl provides automatic resource management based on reference
counting. This means every Perl object has a counter associated with it
and this counter is incremented whenever a new reference to the object
is created, eg,

[rw@sable]~#perl -MDevel::Peek -e '$x = []; Dump($x); $y = $x; Dump($x); $x = undef; Dump($y);'
SV = RV(0x8195ca4) at 0x8195c98
  REFCNT = 1
  FLAGS = (ROK)
  RV = 0x817b818
  SV = PVAV(0x817c880) at 0x817b818
    REFCNT = 1
    FLAGS = ()
    ARRAY = 0x0
    FILL = -1
    MAX = -1
    ARYLEN = 0x0
    FLAGS = (REAL)
SV = RV(0x8195ca4) at 0x8195c98
  REFCNT = 1
  FLAGS = (ROK)
  RV = 0x817b818
  SV = PVAV(0x817c880) at 0x817b818
    REFCNT = 2
    FLAGS = ()
    ARRAY = 0x0
    FILL = -1
    MAX = -1
    ARYLEN = 0x0
    FLAGS = (REAL)
SV = RV(0x8195d24) at 0x8195d18
  REFCNT = 1
  FLAGS = (ROK)
  RV = 0x817b818
  SV = PVAV(0x817c880) at 0x817b818
    REFCNT = 1
    FLAGS = ()
    ARRAY = 0x0
    FILL = -1
    MAX = -1
    ARYLEN = 0x0
    FLAGS = (REAL)

In this example, the anonymous array created with [] is the
object. That's the SV = PVAV (array value) thing in the output.
REFCNT is the reference count of the AV.

As can be seen above, the counter is decremented when a reference to the
object is destroyed, as in the $x = undef in the executed code. Once
the counter value drops to zero, the object is destroyed. The obvious
drawback of this scheme is that it can't handle so-called 'circular
references' on its own. Example:

---------------
use Devel::Peek;

{
    my $x;
    $x->[0] = \$x;
    Dump($x);
}
---------------

The output of that is

SV = RV(0x8195cac) at 0x8195ca0
  REFCNT = 2
  FLAGS = (PADMY,ROK)
  RV = 0x817b968
  SV = PVAV(0x817c86c) at 0x817b968
    REFCNT = 1
    FLAGS = ()
    ARRAY = 0x81906b0
    FILL = 0
    MAX = 3
    ARYLEN = 0x0
    FLAGS = (REAL)
    Elt No. 0
    SV = RV(0x817b704) at 0x817b6f8
      REFCNT = 1
      FLAGS = (ROK)
      RV = 0x8195ca0
      SV = RV(0x8195cac) at 0x8195ca0
        REFCNT = 2
        FLAGS = (PADMY,ROK)
        RV = 0x817b968
        SV = PVAV(0x817c86c) at 0x817b968
          REFCNT = 1
          FLAGS = ()
          ARRAY = 0x81906b0
          FILL = 0
          MAX = 3
          ARYLEN = 0x0
          FLAGS = (REAL)

The outermost object is an RV (reference value), that's what's directly
referred to be $x. As can be seen, the reference count of this RV is
2. As in the previous example, the RV refers to an AV with a reference
count of one. Inside the AV, there's a reference to the same RV object
which points to the AV. That's why the reference count of the RV is 2.
When $x goes out of scope after the block, the reference count of the RV
is decremented. Because of the additional reference to that in the
array, it will be 1 and not 0 afterwards, hence, the object won't be
freed. Because of this, the reference count of the array is never
decremented and it continues to exist. But the only way to access this
array was by going through $x which doesn't exist anymore, hence, the AV
containing the RV will sit inaccessible in the memory of the perl
interpreter until that exits, causeing a so-called 'memory leak'.

In order to deal with that, Perl also supports so-called 'weak
references'. These can be created with the Scalar::Util::weaken
routine. Example:

-------------
use Devel::Peek;
use Scalar::Util qw(weaken);

{
    my ($x, $y);

    $x->[0] = \$y;
    $y = $x;
    weaken($y);

    Dump($x);

    print STDERR ("\n========\n");

    $x = undef;

    Dump($y);
}
--------------

I'm not going to post the output of that because it is too lengthy. For
a weak reference the owner/ object relationship is inverted: While an
ordinary reference owns the object it refers to (and hence, causes its
reference count to be incremented), weak references are owned by the
object they refer to whose reference count is not incremented (or, more
correctly, is decremented as side-effect of weaken) because of them. If
the reference count of an object owning weak references goes to zero,
the weak reference are cleared. That's why $y has no value after $x was
set to undef.

It is important to understand that 'weakness' is an attribute of the RV
and that it is not transitive. The following code sequence

--------------
use Scalar::Util qw(weaken)

my ($x, $y, $z);

$x = [];
$y = $x;
weaken($y);
$z = $y
-------------

does not cause the $z RV to be a weak reference as well.

NB: I could continue this with elaborating about the relative merits and
historical origins of both schemes for automatic resource management but
this text is too long already.


------------------------------

Date: Fri, 4 Oct 2013 20:15:51 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: perl hash utilities
Message-Id: <7og3ia-idr2.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mobileactivedefense.com>:
> Ben Morrow <ben@morrow.me.uk> writes:
> > Quoth Cal Dershowitz <cal@example.invalid>:
> >> 
> >> and as long as the $ref itself or a copy of $ref exists, the data is 
> >> available.  If the references go undefined then all bets are off.  I'm 
> >> not sure if I understood that section of the alpaca book.
> >
> > It's very simple (or at least, it's very simple as long as you don't
> > involve circular or weak refs). A data structure continues to exist for
> > as long as someone can get to it. As soon as it becomes completely
> > invisible, it self-destructs.
> 
> This is based on the same wrong-headed "we don't want to do this now,
> some student can do this later" 'everything is an IBM 704 running batch
> jobs' assumption as a so-called tracing garbage collector (this is not
> quite true, one can argue that the truth is clearly visible in this
> statement to everyone who already knows it).

What on Earth?

Do you have a 'gratuitous-obscure-insult-a-day' calendar, or something?

[snip excessive technical detail I already know]

How exactly is that meant to help someone who's having trouble
understanding the lifetime of Perl data structures?

Ben



------------------------------

Date: Fri, 04 Oct 2013 21:18:30 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: perl hash utilities
Message-Id: <87pprk6b8p.fsf@sable.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mobileactivedefense.com>:
>> Ben Morrow <ben@morrow.me.uk> writes:
>> > Quoth Cal Dershowitz <cal@example.invalid>:
>> >> 
>> >> and as long as the $ref itself or a copy of $ref exists, the data is 
>> >> available.  If the references go undefined then all bets are off.  I'm 
>> >> not sure if I understood that section of the alpaca book.
>> >
>> > It's very simple (or at least, it's very simple as long as you don't
>> > involve circular or weak refs). A data structure continues to exist for
>> > as long as someone can get to it. As soon as it becomes completely
>> > invisible, it self-destructs.
>> 
>> This is based on the same wrong-headed "we don't want to do this now,
>> some student can do this later" 'everything is an IBM 704 running batch
>> jobs' assumption as a so-called tracing garbage collector (this is not
>> quite true, one can argue that the truth is clearly visible in this
>> statement to everyone who already knows it).
>
> What on Earth?
>
> Do you have a 'gratuitous-obscure-insult-a-day' calendar, or
> something?

Trying to sum this up briefly: As can be seen in this text,

http://www-formal.stanford.edu/jmc/history/lisp/node3.html

the inventors of LISP originally considered using reference counting for
automatic memory management but they chose against it for two reasons:

1) Because of constraints imposed by both the IBM 704 hardware and the
code which had been written so far, there was no convenient way to store
the counters.

2) By adopting a "well, we didn't run out of memory yet so let's wait
until we do" approach, they could postpone dealing with the problem
until 'some later time' ("[...] only toy examples were being done").

The idea that, once all memory has been allocated, the set of all
allocated ojects can be partitioned into a 'live set' and a 'dead set'
by determining whether an object is reachable according to the
information available in the memory of the machine, with the 'dead set'
becoming the free store list afterwards, relies on the fact that all
memory can actually be examined while its contents don't change, which
means the system must be a single-tasking system, so that stopping the
'current tasks' means 'all activity stops', the single task must
actually be able to examine all storage locations, ie there isn't such a
thing as a higher-privileged kernel whose memory is protected from
user-mode access, and there must not be any realtime requirements, not
even a single user using a single computer interactively, so that
'stopping all activity' will delay the final result but won't change it.

All of this ceased to be true about 40 years ago (simplification). But
why abandon an ill-conceived makeshift approach partially chosen because
of researcher laziness just because it is also technically completely
outdated? 

> [snip excessive technical detail I already know]
>
> How exactly is that meant to help someone who's having trouble
> understanding the lifetime of Perl data structures?

Ideally, by avoiding diffuse and subtly wrong analogies like "someone
can get to it" or "it is visible" in favour of explaining how the thing
works, including its obvious limitation and the workarounds for that.


------------------------------

Date: 04 Oct 2013 08:32:29 GMT
From: Eric smith <eric@fruitcom.com>
Subject: runnable.com
Message-Id: <524e7d1d$0$15971$e4fe514c@news2.news.xs4all.nl>

Methinks someone should put perl onto http://runnable.com/


-- 
Eric Smith


------------------------------

Date: Fri, 4 Oct 2013 21:01:19 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: runnable.com
Message-Id: <eli$1310041658@qz.little-neck.ny.us>

In comp.lang.perl.misc, Eric smith  <mailbox@fruitcom.com> wrote:
> Methinks someone should put perl onto http://runnable.com/

So, why don't you?

I just looked at their "about" page, where they have an address
on Hamilton Ave in Palo Alto with a map pointer showing Turk St
in San Francisco, and I'm not impressed with them. How hard is
it to (a) link to a live map or (b) get a screen shot of the
right map, instead of (c) using this:

http://runnable.com/images/about-contact.png

Elijah
------
maybe someone has some runnable code for including maps in pages


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4049
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32785] in Perl-Users-Digest

Perl-Users Digest, Issue: 4049 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sat Oct 5 09:09:37 2013

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Oct 5 09:09:37 2013