[31759] in Perl-Users-Digest
Perl-Users Digest, Issue: 3022 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jul 7 09:09:22 2010
Date: Wed, 7 Jul 2010 06:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 7 Jul 2010 Volume: 11 Number: 3022
Today's topics:
Re: Archive::Tar, difference in size of output file <justin.1007@purestblue.com>
Re: Archive::Tar, difference in size of output file <justin.1007@purestblue.com>
Re: Are there any MySQL queries or software packages fo <bugbear@trim_papermule.co.uk_trim>
Re: Are there any MySQL queries or software packages fo <jstucklex@attglobal.net>
Re: Are there any MySQL queries or software packages fo <bugbear@trim_papermule.co.uk_trim>
Re: Are there any MySQL queries or software packages fo <jstucklex@attglobal.net>
building a hash path index? <bugbear@trim_papermule.co.uk_trim>
Re: building a hash path index? <bugbear@trim_papermule.co.uk_trim>
Re: FAQ 5.38 Why does Perl let me delete read-only file <nospam-abuse@ilyaz.org>
Re: FAQ 5.38 Why does Perl let me delete read-only file <hjp-usenet2@hjp.at>
Re: how to pass a function name to a function, and have <rvtol+usenet@xs4all.nl>
Re: how to pass a function name to a function, and have <jurgenex@hotmail.com>
Re: Posting Guidelines for comp.lang.perl.misc ($Revisi <ralph@happydays.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 7 Jul 2010 12:09:18 +0100
From: Justin C <justin.1007@purestblue.com>
Subject: Re: Archive::Tar, difference in size of output file
Message-Id: <slrni38o2u.vif.justin.1007@zem.masonsmusic.co.uk>
On 2010-07-06, Peter Makholm <peter@makholm.net> wrote:
> Justin C <justin.1007@purestblue.com> writes:
>
>> my $tar = Archive::Tar->new();
>> foreach my $dir (0..9, 'a'..'z') {
>> $tar->add_files(glob "$dir/*jpg");
>> }
>> $tar->write($fname, COMPRESS_GZIP, "catalogue_images")
>>
>> This was creating .tgz files much, much larger than the total
>> uncompressed size of images.
>
> If you examine the resulting file, does it contain exactly what you
> expect?
Yes. But I'm not able to recreate this problem now. I've tried changing
the code back but I must have missed something because the files are now
the almost identical sizes (+/-200 bytes in the case of one, and +/-2000 in
the case of another).
Thank you for your help, but it looks like there could have been
something else in my code causing this, but I left it out when trying to
re-create it.
Justin.
--
Justin C, by the sea.
------------------------------
Date: Wed, 7 Jul 2010 12:13:54 +0100
From: Justin C <justin.1007@purestblue.com>
Subject: Re: Archive::Tar, difference in size of output file
Message-Id: <slrni38obi.vif.justin.1007@zem.masonsmusic.co.uk>
On 2010-07-06, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth Justin C <justin.1007@purestblue.com>:
>> I'm working on a program to create .tgz archives of catalogue images for
>> our customers to download. Initially I was doing this:
>>
>> my $tar = Archive::Tar->new();
>> foreach my $dir (0..9, 'a'..'z') {
>> $tar->add_files(glob "$dir/*jpg");
>> }
>> $tar->write($fname, COMPRESS_GZIP, "catalogue_images")
>>
>> This was creating .tgz files much, much larger than the total
>> uncompressed size of images. I decided to try a different way of
>> creating the archive, and now do this:
>>
>> my @files;
>> foreach my $dir (0..9, 'a'..'z') {
>> push @files, glob "$dir/*jpg";
>> }
>> Archive::Tar->create_archive($fname, COMPRESS_GZIP, @files);
>>
>> and the file sizes are, as I would expect, much smaller.
>>
>> Can someone tell me why this is?
>
> I don't see that here: the two files are exactly the same size. What
> version of Archive::Tar are you using? Can you see what the difference
> is between the two files: is one of them simply not compressed?
Thank you for your reply, Ben. I've had another look at this and now I'm
not able to recreate it. It wasn't a one of, it was consistenly
happening until I changed my code. Though the contents of a tgz created
using each method contained identical files, the .tgz files were
dramatically different (about four times the size - and bigger than the
uncompressed directory tree!).
I shall, should I post a similar query another time, ensure to not
detroy my code before the fault with it can be found (as I'm sure now
that the fault is with me and not Archive::Tar because I can't recreate
this).
Sorry for wasting your time on this.
Justin.
--
Justin C, by the sea.
------------------------------
Date: Tue, 06 Jul 2010 16:29:41 +0100
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: Re: Are there any MySQL queries or software packages for "finding similar items"
Message-Id: <GOGdnbhVC4t71q7RnZ2dnUVZ8g-dnZ2d@brightview.co.uk>
Jerry Stuckle wrote:
> But what you're looking for is to get a computer to be a natural
> language processor, which is still beyond our current programming
> capabilities. IBM has recently come up with a test system ("Watson")
> which does a fair job, but still has a long ways to go. Once we get
> there, we'll have a Star Trek capability :)
>
> With that said, it doesn't mean all is hopeless. Levenstein can help,
> as can trigram matching and other things mentioned (except SoundEx). But
> it will also require a lot of work on your part to "train" the system as
> to whether two questions are similar or not.
Surely something like concept extraction/matching
(like the old Excite ICE model)
would be helpful.
BugBear
------------------------------
Date: Tue, 06 Jul 2010 17:08:41 -0400
From: Jerry Stuckle <jstucklex@attglobal.net>
Subject: Re: Are there any MySQL queries or software packages for "finding similar items"
Message-Id: <i1060v$vqt$1@news.eternal-september.org>
bugbear wrote:
> Jerry Stuckle wrote:
>> But what you're looking for is to get a computer to be a natural
>> language processor, which is still beyond our current programming
>> capabilities. IBM has recently come up with a test system ("Watson")
>> which does a fair job, but still has a long ways to go. Once we get
>> there, we'll have a Star Trek capability :)
>>
>> With that said, it doesn't mean all is hopeless. Levenstein can help,
>> as can trigram matching and other things mentioned (except SoundEx).
>> But it will also require a lot of work on your part to "train" the
>> system as to whether two questions are similar or not.
>
> Surely something like concept extraction/matching
> (like the old Excite ICE model)
> would be helpful.
>
> BugBear
It's possible, but I'm not sure it's public domain, is it? And trying
to generate your own concept extraction/matching module would be a huge
undertaking.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
------------------------------
Date: Wed, 07 Jul 2010 09:08:43 +0100
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: Re: Are there any MySQL queries or software packages for "finding similar items"
Message-Id: <yI6dnTbQs_OWq6nRnZ2dnUVZ7o2dnZ2d@brightview.co.uk>
Jerry Stuckle wrote:
> bugbear wrote:
>> Jerry Stuckle wrote:
>>> But what you're looking for is to get a computer to be a natural
>>> language processor, which is still beyond our current programming
>>> capabilities. IBM has recently come up with a test system ("Watson")
>>> which does a fair job, but still has a long ways to go. Once we get
>>> there, we'll have a Star Trek capability :)
>>>
>>> With that said, it doesn't mean all is hopeless. Levenstein can
>>> help, as can trigram matching and other things mentioned (except
>>> SoundEx). But it will also require a lot of work on your part to
>>> "train" the system as to whether two questions are similar or not.
>>
>> Surely something like concept extraction/matching
>> (like the old Excite ICE model)
>> would be helpful.
>>
>> BugBear
>
> It's possible, but I'm not sure it's public domain, is it? And trying
> to generate your own concept extraction/matching module would be a huge
> undertaking.
There have been manu academic version (indeed, they came first):
google for "latent semantic analysis"
and/or
"singular value decomposition"
I think the excite engine's novelty was an efficient
and fairly accurate "incremental mode", where the entire
SVD didn't have to be fully redone when a document was added to the corpus.
BugBear
------------------------------
Date: Wed, 07 Jul 2010 07:24:22 -0400
From: Jerry Stuckle <jstucklex@attglobal.net>
Subject: Re: Are there any MySQL queries or software packages for "finding similar items"
Message-Id: <i11o57$d3t$2@news.eternal-september.org>
bugbear wrote:
> Jerry Stuckle wrote:
>> bugbear wrote:
>>> Jerry Stuckle wrote:
>>>> But what you're looking for is to get a computer to be a natural
>>>> language processor, which is still beyond our current programming
>>>> capabilities. IBM has recently come up with a test system
>>>> ("Watson") which does a fair job, but still has a long ways to go.
>>>> Once we get there, we'll have a Star Trek capability :)
>>>>
>>>> With that said, it doesn't mean all is hopeless. Levenstein can
>>>> help, as can trigram matching and other things mentioned (except
>>>> SoundEx). But it will also require a lot of work on your part to
>>>> "train" the system as to whether two questions are similar or not.
>>>
>>> Surely something like concept extraction/matching
>>> (like the old Excite ICE model)
>>> would be helpful.
>>>
>>> BugBear
>>
>> It's possible, but I'm not sure it's public domain, is it? And trying
>> to generate your own concept extraction/matching module would be a
>> huge undertaking.
>
> There have been manu academic version (indeed, they came first):
>
> google for "latent semantic analysis"
> and/or
> "singular value decomposition"
>
> I think the excite engine's novelty was an efficient
> and fairly accurate "incremental mode", where the entire
> SVD didn't have to be fully redone when a document was added to the corpus.
>
> BugBear
Have you ever used these?
Academic versions are not the same as commercial, and generally have
restrictions on their use. Also, early versions are comparatively
limited in their functionality. And significant training is still required.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
------------------------------
Date: Wed, 07 Jul 2010 09:48:12 +0100
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: building a hash path index?
Message-Id: <H5OdnUWCOKfRoqnRnZ2dnUVZ7sKdnZ2d@brightview.co.uk>
I have a large (array) of hashes, and each hash
has several fields.
I would like to be able to group
the hashes by some of the fields,
so I thought of creating a hash so
that I find an array of selected hashes via:
$index->{field1}->{field2}->{field3}
I would like to create a method which could
be called like this:
my $hash_index = make_index($list_of_hashes, [ 'date', 'name' ])
resulting in $hash_index being a hash such that
$index->{"jul-10"}->{paul} was an array of all hashes
with the corresponding date and name.
This is easy to do with a fixed field list,
but I can't see a clear road to parameterising
it as per my example call.
It's the variable length of the index-name-array
that causes me difficulty.
BugBear
------------------------------
Date: Wed, 07 Jul 2010 10:05:54 +0100
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: Re: building a hash path index?
Message-Id: <b6mdnR1E3OLv3qnRnZ2dnUVZ8qadnZ2d@brightview.co.uk>
bugbear wrote:
> I have a large (array) of hashes, and each hash
> has several fields.
>
> I would like to be able to group
> the hashes by some of the fields,
> so I thought of creating a hash so
> that I find an array of selected hashes via:
>
> $index->{field1}->{field2}->{field3}
>
> I would like to create a method which could
> be called like this:
>
> my $hash_index = make_index($list_of_hashes, [ 'date', 'name' ])
>
> resulting in $hash_index being a hash such that
>
> $index->{"jul-10"}->{paul} was an array of all hashes
> with the corresponding date and name.
>
> This is easy to do with a fixed field list,
> but I can't see a clear road to parameterising
> it as per my example call.
>
> It's the variable length of the index-name-array
> that causes me difficulty.
Here's my inelegant code; I suspect there's a MUCH
more elegant solution to be had:
sub _mk_index {
my ($dst, $hash, $fields) = @_;
if(scalar(@$fields) == 0) {
if(!defined($dst)) {
$dst = [];
}
push @$dst, $hash;
} else {
if(!defined($dst)) {
$dst = {};
}
my $key = $hash->{$fields->[0]};
my @tail = @$fields;
shift @tail;
$dst->{$key} = _mk_index($dst->{$key}, $hash, \@tail);
}
return $dst;
}
sub mk_index {
my ($list, $fields) = @_;
my $index;
foreach my $h (@$list) {
$index = _mk_index($index, $h, $fields);
}
return $index;
}
BugBear
------------------------------
Date: Tue, 6 Jul 2010 22:23:21 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: FAQ 5.38 Why does Perl let me delete read-only files? Why does "-i" clobber protected files? Isn't this a bug in Perl?
Message-Id: <slrni37b6o.uo2.nospam-abuse@powdermilk.math.berkeley.edu>
On 2010-07-05, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
>> On 2010-07-05, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> > This answer is Unix-centric. AFAIK on Windows the permissions on the
>> > file are also considered, plus a few other circumstances (file open,
>> > locked, ...). So while I very much endorse the first sentence "learn how
>> > your filesystem works", I would qualify the rest with "On Unix systems
>> > ..." or maybe "On POSIX-conforming file systems ..." (Frankly, I don't
>> > know how NTFS on Linux or ext3 on Windows behave).
>>
>> Does POSIX specify behaviour of filesystems at all?
>
> From SUSv3:
>
> The unlink() function shall fail and shall not unlink the file if:
>
> [EACCES]
> Search permission is denied for a component of the path prefix,
> or write permission is denied on the directory containing the
> directory entry to be removed.
That's properties of an API call, not filesystem behaviour. It may be
implemented in CRTL... So it would be
"On POSIX-conforming CRTL"
not
"On POSIX-conforming file systems"
(As in: HOLYFS is not POSIX-confirming on RH version < 23.2 ;-)
Yours,
Ilya
------------------------------
Date: Wed, 7 Jul 2010 13:23:38 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: FAQ 5.38 Why does Perl let me delete read-only files? Why does "-i" clobber protected files? Isn't this a bug in Perl?
Message-Id: <slrni38otq.a99.hjp-usenet2@hrunkner.hjp.at>
On 2010-07-06 22:23, Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
> On 2010-07-05, Ben Morrow <ben@morrow.me.uk> wrote:
>> Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
>>> Does POSIX specify behaviour of filesystems at all?
>>
>> From SUSv3:
>>
>> The unlink() function shall fail and shall not unlink the file if:
>>
>> [EACCES]
>> Search permission is denied for a component of the path prefix,
>> or write permission is denied on the directory containing the
>> directory entry to be removed.
>
> That's properties of an API call, not filesystem behaviour. It may be
> implemented in CRTL... So it would be
>
> "On POSIX-conforming CRTL"
>
> not
>
> "On POSIX-conforming file systems"
POSIX cares only about the behaviour of the unlink call, not the
implementation. However, on most systems at least part of the
implementation is filesystem specific and it has been common for Unix
systems to provide both filesystems with POSIX semantics (UFS, FFS, ...)
and without (full) POSIX semantics (NFS, ...). So you could never say
"the unlink function on HP-UX 8.0 is POSIX-conforming", you could only
say "the unlink function of HP-UX 8.0 on a UFS file system with default
options is POSIX conforming, but the unlink function of HP-UX 8.0 on an
NFS file system is not". So it makes sense to view this as a property of
the file system code, not the OS as a whole. On modern systems this is
even more pronounced because they support more file systems.
hp
------------------------------
Date: Tue, 06 Jul 2010 17:36:57 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: how to pass a function name to a function, and have it call it
Message-Id: <4c334d99$0$22918$e4fe514c@news.xs4all.nl>
Sherm Pendley wrote:
> Nick Wedd <nick@maproom.co.uk> writes:
>> Now I would like to generalise it, to work for subroutines other than
>> 'arc'. I can promise that their first two arguments will be the 'from'
>> point and the 'to' point, I can't promise anything about the other
>> arguments. So I want to do something like
>>
>> sub polyanything {
>> my ( $displist, $from, $to, $functionname, @rest ) = @_;
>> foreach my $d ( @$displist ) {
>> CALL $functionname( add($from,$d), add($to,$d), @rest );
>> }
>> }
>>
>> but, how do I do CALL?
>
> Symbolic references (which is what you're asking about here) are evil.
> Use a reference to a function instead:
>
> sub somefunc { ... }
>
> sub polyanything {
> my ( $displist, $from, $to, $func, @rest ) = @_;
> foreach my $d ( @$displist ) {
> $func->( add($from, $d), add($to, $d), @rest );
> }
> }
>
> Then you can call your polyanything with:
>
> polyanything( \@displist, $from, $to, \&somefunc, $foo, $bar, $baz);
Nobody mentioned a dispatch table yet. It is often very handy.
--
Ruud
------------------------------
Date: Tue, 06 Jul 2010 18:59:44 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: how to pass a function name to a function, and have it call it
Message-Id: <5ln736t8j5kh76tsa3jbg5smm8ckki4fm4@4ax.com>
Nick Wedd <nick@maproom.co.uk> wrote:
[...]
>Now I would like to generalise it, to work for subroutines other than
>'arc'. I can promise that their first two arguments will be the 'from'
>point and the 'to' point, I can't promise anything about the other
>arguments. So I want to do something like
>
> sub polyanything {
> my ( $displist, $from, $to, $functionname, @rest ) = @_;
> foreach my $d ( @$displist ) {
> CALL $functionname( add($from,$d), add($to,$d), @rest );
> }
> }
>
>but, how do I do CALL? I have found googling for "Perl function call"
>unhelpful, as you might expect.
While from a technical point of view this is possible, in general it is
A Very Bad Idea(TM).
A much, much better approach would be using references, dispatch tables,
and if applicable even closures.
jue
------------------------------
Date: Tue, 06 Jul 2010 13:51:33 -0400
From: Ralph Malph <ralph@happydays.com>
Subject: Re: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.9 $)
Message-Id: <e3287$4c336d25$40779ac3$24444@news.eurofeeds.com>
tl, dnr
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3022
***************************************