[31908] in Perl-Users-Digest
Perl-Users Digest, Issue: 3171 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Oct 13 00:10:34 2010
Date: Tue, 12 Oct 2010 21:09:57 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 12 Oct 2010 Volume: 11 Number: 3171
Today's topics:
Re: Communication across Perl scripts <hjp-usenet2@hjp.at>
Re: Communication across Perl scripts <ben@morrow.me.uk>
Re: Communication across Perl scripts <ben@morrow.me.uk>
cryptic error in cgi script <andrey.vul@gmail.com>
Re: cryptic error in cgi script <tadmc@seesig.invalid>
Re: cryptic error in cgi script <sherm.pendley@gmail.com>
Re: cryptic error in cgi script <ben@morrow.me.uk>
Re: cryptic error in cgi script <andrey.vul@gmail.com>
Re: cryptic error in cgi script <uri@StemSystems.com>
Re: cryptic error in cgi script <sherm.pendley@gmail.com>
Re: Date difference in days <paul@pstech-inc.com>
Re: How to make XSUBs thread-safe? xsubpp switches? <nospam-abuse@ilyaz.org>
Re: How to make XSUBs thread-safe? xsubpp switches? <ben@morrow.me.uk>
Re: suitable key for a hash <jimsgibson@gmail.com>
Re: suitable key for a hash <xhoster@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 12 Oct 2010 22:24:47 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Communication across Perl scripts
Message-Id: <slrnib9h0g.q1l.hjp-usenet2@hrunkner.hjp.at>
On 2010-10-12 00:59, Ted Zlatanov <tzz@lifelogs.com> wrote:
> On Mon, 11 Oct 2010 14:25:59 -0700 (PDT) "C.DeRykus" <derykus@gmail.com> wrote:
> CD> With a named pipe though, each script just deals with the named file
> CD> for reading or writing while the OS takes care of the messy IPC
> CD> details for you. The 2nd script will just block until data is
> CD> available so running order isn't a concern. As long as the two
> CD> scripts are running more or less concurrently, I would guess memory
> CD> use will be manageable too since the reader will be draining the
> CD> pipe as the data arrives.
>
> The only warning I have there is that pipes are pretty slow and have
> small buffers by default in the Linux kernel (assuming Linux).
Hmm. On my system (a 1.86 GHz Core2 - not ancient, but not the latest
and greatest, either) I can transfer about 800 MB/s through a pipe at
32 kB buffer size. For larger buffers it gets a bit slower, but a buffer
size of 1MB is still quite ok.
You may confuse that with other systems. Windows pipes have a reputation
for being slow. Traditionally Unix pipes were restricted to a rather
small buffer (8 or 10 kB). I do think Linux pipes become synchronous for
large writes, though.
> I forget exactly why, I think it's due to terminal disciplines or
> something, I didn't dig too much.
Unix pipes have nothing to do with terminals. Originally they were
implemented as files, BSD 4.x reimplemented them on top of Unix sockets.
I don't now how Linux implements them, but I'm quite sure that no
terminals are involved, and certainly no terminal disciplines.
Are you confusing them with ptys, perhaps?
> I ran into this earlier this year.
Can you dig up the details?
hp
------------------------------
Date: Tue, 12 Oct 2010 21:20:27 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Communication across Perl scripts
Message-Id: <bhseo7-07f.ln1@osiris.mauzo.dyndns.org>
Quoth "jl_post@hotmail.com" <jl_post@hotmail.com>:
> On Oct 12, 2:05 am, Peter Makholm <pe...@makholm.net> wrote:
> >
> > As long as you just use it for a single host for very temporary files,
> > Storable is fine. But I have been bitten by Storable not being
> > compatible between versions or different installations one time to
> > many to call it 'highly recommended'.
>
>
> I was under the impression that Storable::nstore() was cross-
> platform compatible (as opposed to Storable::store(), which isn't).
> "perldoc Storable" has this to say about it:
Storable::nstore is compatible across different byte-orders. Different
versions of Storable and different word sizes (32bit vs 64bit machines)
are still often incompatible.
Ben
------------------------------
Date: Wed, 13 Oct 2010 03:32:18 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Communication across Perl scripts
Message-Id: <iaifo7-g9k.ln1@osiris.mauzo.dyndns.org>
Quoth "Peter J. Holzer" <hjp-usenet2@hjp.at>:
>
> Unix pipes have nothing to do with terminals. Originally they were
> implemented as files, BSD 4.x reimplemented them on top of Unix sockets.
> I don't now how Linux implements them, but I'm quite sure that no
> terminals are involved, and certainly no terminal disciplines.
Linux and FreeBSD keep pipes in what is effectively their own
filesystem; that is, they have their own set of hooks into the VFS, and
if you fstat a named pipe you don't get the st_dev of the filesystem the
pipe appears in, but the st_dev for the pipe filesystem. IIRC Linux used
to use different implementations for named and anonymous pipes, but that
may have changed.
> Are you confusing them with ptys, perhaps?
Or SysV STREAMS? I think pipes on true SysV systems are always
implemented with STREAMS.
Ben
------------------------------
Date: Tue, 12 Oct 2010 18:20:03 -0700 (PDT)
From: andrey the giant <andrey.vul@gmail.com>
Subject: cryptic error in cgi script
Message-Id: <e5a83431-207a-4ae0-a032-98e487dcdae3@a36g2000yqc.googlegroups.com>
Vim doesn't give me any red marks and the code looks right, but I get
the following errors:
syntax error at contact.pl line 50, near ")
{"
syntax error at contact.pl line 64, near "}"
contact.pl had compilation errors.
The code is a cgi email script. All sensitive data has been
genericized.
The code is:
#!/usr/bin/perl
use warnings;
use strict;
use CGI;
use Email::MIME::Creator;
use Email::Sender::Simple qw(sendmail);
use Try::Tiny;
our $cgi = CGI->new;
our $name = $cgi->param('name');
our $addr = $cgi->param('email');
our $subject = $cgi->param('subj');
our $xdest = $cgi->param('dest');
our $message = $cgi->param('body');
our $urgent = $cgi->param('urgent');
if ( !defined($subject) || !length($subject) ) { $subject = ""; }
if ( !defined($urgent) || !length($urgent) ) { $urgent = "no"; }
my $defineds = 0;
$defineds |= 1 if ( defined($name) && length($name) );
$defineds |= 2 if ( defined($addr) && length($addr) );
$defineds |= 4
if ( defined($xdest) && length($xdest) && ( $xdest =~ /[aucm]/ ) );
$defineds |= 8 if ( defined($message) && length($message) );
our $complete = ( $defineds == 15 );
our $ret = 0;
our $err = "";
if ($complete) {
our $dest;
our $SMS = '1234567890@example.com';
if ( $xdest =~ /a/ ) { $dest = 'a@example.com'; }
elsif ( $xdest =~ /u/ ) { $dest = 'b@example.com'; }
elsif ( $xdest =~ /c/ ) { $dest = 'c@example.com'; }
elsif ( $xdest =~ /m/ ) { $dest = '@example.com'; }
our $email = Email::MIME->create(
header => [
From => $name . ' <' . $addr . '>',
To => $dest,
Subject => $subject,
'X-Source' => 'webform',
],
body => $message,
);
try {
sendmail( $email, { from => 'postmaster@example.com' } );
$ret = 1;
}
catch { $ret = 2; $err = "$_"; }
if ( ( $urgent =~ /yes/ ) && ( $ret != 2 ) )
{ #error here
$email = Email::MIME->create(
header => [
From => 'a@example.com',
To => $SMS,
],
body => "Urgent message in $dest.\n",
);
try {
sendmail( $email, { from => 'a@example.com' } );
$ret = 3;
}
catch { $ret = 4; $err = "$_"; };
}
} #and here
print $cgi->header();
if ( $ret == 0 ) {
print $cgi->start_html('Message not sent'),
"Not all required fields were filled", $cgi->end_html();
}
elsif ( $ret == 1 ) {
print $cgi->start_html('Message sent'), "Message successfully
sent",
$cgi->end_html();
}
elsif ( $ret == 2 ) {
print $cgi->start_html('Error sending mail'), "Sendmail: $err",
$cgi->end_html();
}
elsif ( $ret == 3 ) {
print $cgi->start_html('Message sent'),
"Message successfully sent, mobile notifed", $cgi->end_html();
}
elsif ( $ret == 4 ) {
print $cgi->start_html('Error sending mail'),
"Message successfully sent, but mobile could not be notified:
$err",
$cgi->end_html();
}
------------------------------
Date: Tue, 12 Oct 2010 21:02:39 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: cryptic error in cgi script
Message-Id: <slrniba4vs.s83.tadmc@tadbox.sbcglobal.net>
andrey the giant <andrey.vul@gmail.com> wrote:
> use Try::Tiny;
try/catch is a statement...
> try {
> sendmail( $email, { from => 'postmaster@example.com' } );
> $ret = 1;
> }
> catch { $ret = 2; $err = "$_"; }
...so it needs a statement separator if there are more statements after it.
You are missing a semicolon on that catch line.
(I saw it easily, because I tend to forget the semicolon often
with eval BLOCK, so now I jump right on it :-)
)
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
------------------------------
Date: Tue, 12 Oct 2010 22:11:52 -0400
From: Sherm Pendley <sherm.pendley@gmail.com>
Subject: Re: cryptic error in cgi script
Message-Id: <m24ocqrgd3.fsf@sherm.shermpendley.com>
andrey the giant <andrey.vul@gmail.com> writes:
> Vim doesn't give me any red marks and the code looks right, but I get
> the following errors:
> syntax error at contact.pl line 50, near ")
> {"
> syntax error at contact.pl line 64, near "}"
> contact.pl had compilation errors.
>
> try {
> sendmail( $email, { from => 'postmaster@example.com' } );
> $ret = 1;
> }
> catch { $ret = 2; $err = "$_"; }
You're missing a semicolon after that catch block.
The above is actually two anonymous sub references being passed to a
try() method via indirect object syntax (see 'perldoc perlobj') to mimic
the creation of new language syntax. But, code blocks don't normally
require semicolons after them, and your editor probably isn't aware
that this one does.
Remember - only perl can parse Perl. ;-)
sherm--
--
Sherm Pendley
<http://camelbones.sourceforge.net>
Cocoa Developer
------------------------------
Date: Wed, 13 Oct 2010 03:40:12 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: cryptic error in cgi script
Message-Id: <cpifo7-g9k.ln1@osiris.mauzo.dyndns.org>
Quoth Sherm Pendley <sherm.pendley@gmail.com>:
> andrey the giant <andrey.vul@gmail.com> writes:
>
> > try {
> > sendmail( $email, { from => 'postmaster@example.com' } );
> > $ret = 1;
> > }
> > catch { $ret = 2; $err = "$_"; }
>
> You're missing a semicolon after that catch block.
>
> The above is actually two anonymous sub references being passed to a
> try() method via indirect object syntax (see 'perldoc perlobj') to mimic
> the creation of new language syntax.
No it's not. It parses as (and could in fact be written as)
try( sub { ... }, catch( sub { ... } ) )
where 'catch' is a subroutine that packages up the subref so 'try' knows
what it is.
> But, code blocks don't normally
> require semicolons after them, and your editor probably isn't aware
> that this one does.
In Perl, they do if they're expressions: do{}, eval{}, sub{}. Of course,
one wouldn't expect try/catch to be an expression, but that's the only
even-remotely-reliabe way to create new syntax (at least, until 5.14
comes out, which has *lots* of hooks to make this sort of thing easier).
> Remember - only perl can parse Perl. ;-)
PPI?
Ben
------------------------------
Date: Tue, 12 Oct 2010 20:35:35 -0700 (PDT)
From: andrey the giant <andrey.vul@gmail.com>
Subject: Re: cryptic error in cgi script
Message-Id: <f391ab3c-92ee-4258-b65b-a3e4103a2a34@l20g2000yqm.googlegroups.com>
On Oct 12, 10:40=A0pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth Sherm Pendley <sherm.pend...@gmail.com>:
>
> > andrey the giant <andrey....@gmail.com> writes:
>
> > > =A0 =A0 try {
> > > =A0 =A0 =A0 =A0 sendmail( $email, { from =3D> 'postmas...@example.com=
' } );
> > > =A0 =A0 =A0 =A0 $ret =3D 1;
> > > =A0 =A0 }
> > > =A0 =A0 catch { $ret =3D 2; $err =3D "$_"; }
>
> > You're missing a semicolon after that catch block.
A subtle issue completely overlooked by one fluent in primarily C/C++/
Java.
>
> > The above is actually two anonymous sub references being passed to a
> > try() method via indirect object syntax (see 'perldoc perlobj') to mimi=
c
> > the creation of new language syntax.
>
> No it's not. It parses as (and could in fact be written as)
>
> =A0 =A0 try( sub { ... }, catch( sub { ... } ) )
>
> where 'catch' is a subroutine that packages up the subref so 'try' knows
> what it is.
>
> > But, code blocks don't normally
> > require semicolons after them, and your editor probably isn't aware
> > that this one does.
>
Ugh, I'm so used to C/C++/Java I totally missed the semicolon after
the catch in the cpan docs for Email::Sender::Simple :|
And I thought C++ templates gave cryptic error messages :P
Everything works now.
Thanks to everybody.
------------------------------
Date: Tue, 12 Oct 2010 23:38:23 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: cryptic error in cgi script
Message-Id: <87aami92z4.fsf@quad.sysarch.com>
>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:
>> Remember - only perl can parse Perl. ;-)
BM> PPI?
it fails with wacky prototypes and such as it doesn't do a true parse
based on declarations. it is a very good top level parser though.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Tue, 12 Oct 2010 23:57:06 -0400
From: Sherm Pendley <sherm.pendley@gmail.com>
Subject: Re: cryptic error in cgi script
Message-Id: <m239sa219p.fsf@sherm.shermpendley.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Sherm Pendley <sherm.pendley@gmail.com>:
>> andrey the giant <andrey.vul@gmail.com> writes:
>>
>> > try {
>> > sendmail( $email, { from => 'postmaster@example.com' } );
>> > $ret = 1;
>> > }
>> > catch { $ret = 2; $err = "$_"; }
>>
>> You're missing a semicolon after that catch block.
At least I got *that* part right. :-)
>> The above is actually two anonymous sub references being passed to a
>> try() method via indirect object syntax (see 'perldoc perlobj') to mimic
>> the creation of new language syntax.
>
> No it's not. It parses as (and could in fact be written as)
>
> try( sub { ... }, catch( sub { ... } ) )
>
> where 'catch' is a subroutine that packages up the subref so 'try' knows
> what it is.
Oh, I get it now (after examining the source, and reviewing perlsub...)
It was the lack of a comma that led me to think of indirect object
syntax. So, sub arguments that are prototyped with '&' allow one to
call the sub with no comma after the block? I suppose the intent is
to allow subs that have similar syntax to the block forms of grep(),
map(), eval(), etc.
>> Remember - only perl can parse Perl. ;-)
>
> PPI?
Is written in Perl. :-)
sherm--
--
Sherm Pendley
<http://camelbones.sourceforge.net>
Cocoa Developer
------------------------------
Date: Tue, 12 Oct 2010 23:06:30 -0400
From: "Paul E. Schoen" <paul@pstech-inc.com>
Subject: Re: Date difference in days
Message-Id: <5n9to.15043$nj3.13133@newsfe04.iad>
"Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in message
news:slrnib9cpj.q1l.hjp-usenet2@hrunkner.hjp.at...
> On 2010-10-12 06:58, Paul E. Schoen <paul@pstech-inc.com> wrote:
>>
>> "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid> wrote in message
>> news:4cb31756$0$2511$815e3792@news.qwest.net...
>>> Paul E. Schoen wrote:
>>> [...]
>>>> The script can be seen in action as the hit counter in:
>>> [...]
>>>
>>> perldoc -q "I still don't get locking."
>>
>> Yes, I read that (although I had to log in to my server with telnet,
>> which
>> is inconvenient from a Win Vista machine).
>>
>> This is the relevant code.
>>
>> open (LOG, '<', "$logpath");
>> my @file = <LOG>; # an array of the file contents
>> close(LOG);
>>
>> my $count = $file[0]; # the count value is the first line in the file,
>> i.e. $file[0]
>> $count++; # increments the counter value
>>
>> open (LOG, ">$logpath"); # opens the log file for writing
>> flock(LOG, 2); # file lock set
>> print LOG "$count\n"; # prints out the new counter value to the file
>> flock(LOG, 8); # file lock unset
>> close(LOG);
>>
>> This was from an existing script.
>
> Wherever you got that script from, don't get any more scripts from
> there. That's just awful.
>
> As Tina mentioned, the flock there is useless. It doesn't protect the
> two obvious race conditions. All it does protect is a single printf,
> which almost certainly doesn't change the file anyway (the count is only
> written at close, *after* you release the lock).
>
> There are two ways to do this.
>
> The safe way:
>
> use IO::Handle; # for flush
>
> open (my $log_fh, '+<', $logpath) or die "cannot open $logpath: $!";
>
> # the file is now open for reading and writing, so we can lock it
> flock($log_fh, LOCK_EX) or die "cannot lock $logpath: $!";
>
> # read current counter and increment it
> my $count = <$log_fh>;
> $count++;
>
> # rewind to begin of file and write the new counter
> seek($log_fh, 0, SEEK_SET);
> print $log_fh $count;
> $log_fh->flush() or die "cannot flush $logpath: $!";
>
> # after flush we know that the file counter has been written
> # (at least to the OS disk cache, not necessarily the disk),
> # so we can release the lock
>
> flock($log_fh, LOCK_UN);
>
> # done - close the file
> close($log_fh) or die "cannot close $logpath: $!";
>
> (actually, in this simple case, flush and flock($log_fh, LOCK_UN) are
> redundant - close will automaticall flush any pending writes and unlock
> the file (in this order)).
>
>> BTW this webpage gets about 12 hits per day. Not much chance of a
>> collision,
>> and probably not much damage if it happens.
>
> If you don't mind losing a hit every now and then you can do it safely
> without locks:
>
> open (my $log_fh, '<', $logpath) or die "cannot open $logpath: $!";
> my $count = <$log_fh>;
> $count++;
> close($log_fh);
> open (my $log_fh, '>', "$logpath.$$") or die "cannot open $logpath.$$:
> $!";
> print $log_fh $count;
> close($log_fh) or die "cannot close $logpath: $!";
> rename("$logpath.$$", $logpath) or die "cannot rename $logpath.$$ to
> $logpath: $!";
>
> If a second hit happens bitween the open and the rename, it won't be
> counted. But the counter will never be accidentally reset to zero.
This is one source (but not the one I used):
http://www.comptechdoc.org/independent/web/cgi/perlmanual/perlhit.html
I think this is it:
http://www.akamarketing.com/simple-hit-counter-with-perl.html
There are many other scripts, and some have safeguards against "false" hits,
and some keep track of user data. Thanks for the many explanations and
suggestions. I may try searching the logs for information as a learning
exercise. I have viewed them manually but there is only a small portion of
interest.
Thanks,
Paul
------------------------------
Date: Tue, 12 Oct 2010 21:09:37 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: How to make XSUBs thread-safe? xsubpp switches?
Message-Id: <slrnib9jkh.k6d.nospam-abuse@powdermilk.math.berkeley.edu>
On 2010-10-12, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
>> Suppose I have an interface module to an external library with
>> hundreds (or thousands) of XSUBs, and I want to make it thread-safe.
>> The external library is not. So what looks like the simplest
>> first-stage solution is to automatically insert locking code about all
>> XSUBs. But we already have a preprocessor which massages XSUBs:
>> xsubpp.
>>
>> So: is it possible to instruct xsubpp to insert locking code about
>> all/selected XSUBs? At least the docs of perl-5.8.8 one do not
>> mention anything like this...
>
> I don't think so. It's not even clear what should be locked: the
> ithreads locking code only allows you to lock a shared variable, so the
> 5005threads solution of locking the CV doesn't work any more.
As far as I can see, this makes only locking of BOOT problematic.
Other XSUBs would lock local_XSUB_lock_shared_SV. It would be a
function of BOOT to initialize this SV*.
So one would need to manually instrument BOOT only, the rest could be
done (inefficiently but) automatically...
For best result, one would have "global" switches like
AUTOLOCK_ALL: local_XSUB_lock_shared_SV
and an attribute
AUTOLOCK: local_XSUB_lock_shared_SV
AUTOLOCK: 0
for individual XSUBs...
Ilya
------------------------------
Date: Wed, 13 Oct 2010 04:57:26 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to make XSUBs thread-safe? xsubpp switches?
Message-Id: <6anfo7-2on.ln1@osiris.mauzo.dyndns.org>
Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
> On 2010-10-12, Ben Morrow <ben@morrow.me.uk> wrote:
> >
> > Quoth Ilya Zakharevich <nospam-abuse@ilyaz.org>:
> >> Suppose I have an interface module to an external library with
> >> hundreds (or thousands) of XSUBs, and I want to make it thread-safe.
> >> The external library is not. So what looks like the simplest
> >> first-stage solution is to automatically insert locking code about all
> >> XSUBs. But we already have a preprocessor which massages XSUBs:
> >> xsubpp.
> >>
> >> So: is it possible to instruct xsubpp to insert locking code about
> >> all/selected XSUBs? At least the docs of perl-5.8.8 one do not
> >> mention anything like this...
> >
> > I don't think so. It's not even clear what should be locked: the
> > ithreads locking code only allows you to lock a shared variable, so the
> > 5005threads solution of locking the CV doesn't work any more.
>
> As far as I can see, this makes only locking of BOOT problematic.
> Other XSUBs would lock local_XSUB_lock_shared_SV. It would be a
> function of BOOT to initialize this SV*.
>
> So one would need to manually instrument BOOT only, the rest could be
> done (inefficiently but) automatically...
>
> For best result, one would have "global" switches like
>
> AUTOLOCK_ALL: local_XSUB_lock_shared_SV
>
> and an attribute
>
> AUTOLOCK: local_XSUB_lock_shared_SV
> AUTOLOCK: 0
>
> for individual XSUBs...
Well, you can get the latter with
SCOPE: ENABLE
PREINIT:
dMY_CXT;
INIT:
SvLOCK(MY_CXT.locksv);
plus appropriate logic in BOOT and CLONE. It's not ideal, and it's
certainly not as convenient as an AUTOLOCK_ALL option, but it's better
than nothing.
If you don't mind nasty tricks, it's possible to re#define dXSARGS for
this sort of thing. After all, that's what 5005's XSlock.h did...
You could always doing up a patch for ExtUtils::ParseXS, though I'm not
sure I'd recommend it :). IMHO this feature would be sufficiently useful
it should go in.
Ben
------------------------------
Date: Tue, 12 Oct 2010 14:00:57 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: suitable key for a hash
Message-Id: <121020101400574036%jimsgibson@gmail.com>
In article
<9a9c9ce1-b08f-4781-a055-8af8cca793ae@28g2000yqm.googlegroups.com>,
ccc31807 <cartercc@gmail.com> wrote:
> I have a data file to process that consists of about 25K rows and
> about 30 columns. This file contains no column with unique values,
> that is, every column contains duplicate values. I am placing the data
> in a hash to process it (so I can access the data values by name
> rather than position), and the only 'key' I can come up with is the $.
> variable for the input line numbers.
>
> Surely someone must have dealt with this problem before. Is there a
> better solution?
If you have records with duplicate keys and you want to store the data
in a hash for rapid lookup, use array references as hash values
(untested):
while(<>) {
my( $name, @rest ) = split;
push( @{$data{$name}}, \@rest );
}
>
> The processing requires dumping the data into discrete categories,
> e.g., level, state, person's name, status, for the purpose of
> generating reports, e.g., by level, by state, by name, by status, and
> not having a unique key isn't an issue.
Store the data in an array and create indices for key fields (untested);
while(<>) {
my @fields = split;
push( @data, \@fields );
push( @{$field1_index{$field[0]}}, $#data );
push( @{$field2_index{$field[1]}}, $#data );
...
}
--
Jim Gibson
------------------------------
Date: Tue, 12 Oct 2010 18:00:20 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: suitable key for a hash
Message-Id: <4cb517b0$0$14744$ed362ca5@nr5-q3a.newsreader.com>
ccc31807 wrote:
> I have a data file to process that consists of about 25K rows and
> about 30 columns. This file contains no column with unique values,
> that is, every column contains duplicate values.
Jointly, or just severly?
> I am placing the data
> in a hash to process it (so I can access the data values by name
> rather than position),
If you wish to access it by name, then you must know what the name is.
> and the only 'key' I can come up with is the $.
> variable for the input line numbers.
Why not just an array, in that case?
>
> Surely someone must have dealt with this problem before. Is there a
> better solution?
>
> The processing requires dumping the data into discrete categories,
> e.g., level, state, person's name, status, for the purpose of
> generating reports, e.g., by level, by state, by name, by status, and
> not having a unique key isn't an issue.
Ok, so just stick it directly into those structures.
Xho
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3171
***************************************