[32199] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3464 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 4 16:09:27 2011

Date: Thu, 4 Aug 2011 13:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 4 Aug 2011     Volume: 11 Number: 3464

Today's topics:
        'invalid token' error and segmentation fault (XML issue <dn.perl@gmail.com>
    Re: 'invalid token' error and segmentation fault (XML i <bjoern@hoehrmann.de>
    Re: 'invalid token' error and segmentation fault (XML i <jurgenex@hotmail.com>
    Re: 'invalid token' error and segmentation fault (XML i <nospam-abuse@ilyaz.org>
    Re: 'invalid token' error and segmentation fault (XML i <dn.perl@gmail.com>
        Delaying interpolation in a qr <bernie@fantasyfarm.com>
    Re: Delaying interpolation in a qr <rvtol+usenet@xs4all.nl>
    Re: Delaying interpolation in a qr <uri@StemSystems.com>
    Re: Harcopy/book on Moose <bernie@fantasyfarm.com>
    Re: need help in perl arrays (New to perl but know TCL  <news@lawshouse.org>
    Re: perl + doxygen  + dbi <lyttlec@removegmail.com>
        seeking advice on problem difficulty <ela@yantai.org>
    Re: seeking advice on problem difficulty <rweikusat@mssgmbh.com>
    Re: seeking advice on problem difficulty <ben.usenet@bsb.me.uk>
    Re: seeking advice on problem difficulty <ela@yantai.org>
    Re: seeking advice on problem difficulty <rweikusat@mssgmbh.com>
    Re: seeking advice on problem difficulty <ela@yantai.org>
    Re: seeking advice on problem difficulty <rweikusat@mssgmbh.com>
    Re: seeking advice on problem difficulty <ela@yantai.org>
    Re: seeking advice on problem difficulty <ela@yantai.org>
    Re: seeking advice on problem difficulty (Randal L. Schwartz)
        the super hidden <r@thevoid1.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 3 Aug 2011 16:44:15 -0700 (PDT)
From: "dn.perl@gmail.com" <dn.perl@gmail.com>
Subject: 'invalid token' error and segmentation fault (XML issue?)
Message-Id: <d86eb330-0de4-478e-9491-939590b3ae5b@m5g2000prh.googlegroups.com>


I have started getting an error:
not well-formed (invalid token) at line 12 -- but the error-log does
not say which file happens to have this invalid token.

If it helps, the shell script has started giving Segmentation fault
for:
   echo     $PERL extract-history.pl   --start "$dt 00:00:00"
But in the very next line, it seems to call correctly:
/home/y/bin/perl extract-history.pl  --start 2011-08-02 00:00:00

Well, extract-history.pl does use Data::Dumper, but it is difficult to
say where any xml or utf-8 related stuff may be popping up and causing
the error messages.

These scripts were working perfect until 10 hours ago.

Please advise.


------------------------------

Date: Thu, 04 Aug 2011 01:57:17 +0200
From: Bjoern Hoehrmann <bjoern@hoehrmann.de>
Subject: Re: 'invalid token' error and segmentation fault (XML issue?)
Message-Id: <8nnj379jng0s10i0v6i7hj6hot3t9bdi19@hive.bjoern.hoehrmann.de>

* dn.perl@gmail.com wrote in comp.lang.perl.misc:
>I have started getting an error:
>not well-formed (invalid token) at line 12 -- but the error-log does
>not say which file happens to have this invalid token.
>
>If it helps, the shell script has started giving Segmentation fault
>for:
>   echo     $PERL extract-history.pl   --start "$dt 00:00:00"
>But in the very next line, it seems to call correctly:
>/home/y/bin/perl extract-history.pl  --start 2011-08-02 00:00:00
>
>Well, extract-history.pl does use Data::Dumper, but it is difficult to
>say where any xml or utf-8 related stuff may be popping up and causing
>the error messages.

Well, you don't tell us what the code is and don't tell us what the
input is, and you claim there is a segmentation fault, but a segmen-
tation fault would not come with a "not well-formed (invalid token)"
error message. The error message is most likely from `expat`, an XML
parser that is used by modules such as XML::Parser. Your favourite
search engine should point out, for a query like '"not well-formed
(invalid token)" perl' the perl-xml FAQ

  http://perl-xml.sourceforge.net/faq/#not_well_formed

which lists a couple of common errors in XML documents that trigger
this error. Most likely your script obtains an XML document from
somewhere, and the document is not well-formed, meaning it cannot be
parsed by typical XML processors. In order to address this, you would
have to find out where the malformed document is coming from and fix
it (or have others fix it, or find some workaround), or otherwise re-
move the errorneous document from the process modeled by the script.

It might help to use a debugger or profiler to get a better idea of
what the script does, like which modules are involved, where the XML
parsing code is called from, and so on.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 


------------------------------

Date: Wed, 03 Aug 2011 17:28:35 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: 'invalid token' error and segmentation fault (XML issue?)
Message-Id: <aspj37duakrtlt0vc9oulkt7uk2pt0n7dr@4ax.com>

"dn.perl@gmail.com" <dn.perl@gmail.com> wrote:
>I have started getting an error:
>not well-formed (invalid token) at line 12 -- but the error-log does
>not say which file happens to have this invalid token.

You are missing semicolon in line 42.

jue


------------------------------

Date: Thu, 4 Aug 2011 11:36:43 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: 'invalid token' error and segmentation fault (XML issue?)
Message-Id: <slrnj3l12b.fil.nospam-abuse@chorin.math.berkeley.edu>

On 2011-08-03, dn.perl@gmail.com <dn.perl@gmail.com> wrote:

> I have started getting an error:
> not well-formed (invalid token) at line 12 -- but the error-log does
> not say which file happens to have this invalid token.

You need to provide an exact error message (including newlines).

> If it helps, the shell script has started giving Segmentation fault
> for:
>    echo     $PERL extract-history.pl   --start "$dt 00:00:00"
> But in the very next line, it seems to call correctly:
> /home/y/bin/perl extract-history.pl  --start 2011-08-02 00:00:00

Likewise.

Note that a shell SCRIPT cann't be "giving Segmentation fault".  You
need to find which EXECUTABLE faults before asking for help...

Ilya


------------------------------

Date: Thu, 4 Aug 2011 08:51:30 -0700
From: "dn.perl" <dn.perl@gmail.com>
Subject: Re: 'invalid token' error and segmentation fault (XML issue?)
Message-Id: <99vtfeF34eU1@mid.individual.net>


Jurgen was off by 6 lines.
A semi-colon was missing in line 48 (not 42), the error said line 12, which
is (48/4). And 12 = (48/6) + 4.

A totally unsuspected line had created the issue. I had to comment out a
block of 20 lines and find the offending line by uncommenting these 20 lines
one-by-one. The owner of a called script (which gives out the vague
error-message: not-well-formed line 12, no-filename-mentioned) has since
fixed the issue.

Will sort out later why the 'segmentation fault' error also appeared in the
bargain.
Thanks for the leads, Bjoern and Ilya. The 'segmentation fault' error really
threw me off balance.





------------------------------

Date: Thu, 04 Aug 2011 14:30:55 -0400
From: Bernie Cosell <bernie@fantasyfarm.com>
Subject: Delaying interpolation in a qr
Message-Id: <88ol371md1212t0at36c5v4pe7b52kkonm@library.airnews.net>

This with perl 5.6 [alas!!].  I have a regular expression that includes a
'runtime' variable.  My problem is twofold: first, that's one of several
REs I need to handle as variables, so I'm using qr{}s for them.  And
second, I need the interpolation of the variable to be deferred until the
RE is *used*.  Is there any way to defer the interpolation of a variable in
a qr{} re to when the pattern is actually used?  The kind of test I tried
was:
 ------------------------------
#!/usr/bin/perl

# Test of interpolation in qr's

my $vbl = 'abc' ;
my $pat = qr{!$vbl!} ;
$vbl = 'def' ;

sub check
{   my $vbl = 'ghi' ;
    warn "first\n" if "!abc!" =~ /$pat/ ;
    warn "second\n" if "!def!" =~ /$pat/ ;
    warn "local\n" if "!ghi!" =~ /$pat/ ;
}
check() ;
------------------------------------
And what I'd like is to have the match hit on the 'local', but I could deal
with it if it hit on 'second', but everything I've tried hits 'first'.  Is
there some trick that'll do that -- carry the vbl un-evaluated in the qr
until it is used?  THANKS

  /B\
-- 
Bernie Cosell                     Fantasy Farm Fibers
bernie@fantasyfarm.com            Pearisburg, VA
    -->  Too many people, too few sheep  <--          


------------------------------

Date: Thu, 04 Aug 2011 21:25:21 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: Delaying interpolation in a qr
Message-Id: <4e3af221$0$23969$e4fe514c@news2.news.xs4all.nl>

On 2011-08-04 20:30, Bernie Cosell wrote:

> I have a regular expression that includes a
> 'runtime' variable. [...] I need the interpolation of
> the variable to be deferred until the RE is *used*.

Postponed evaluation spells 'sub'.

my $make_qr = sub { qr/\Q!$_[0}!/ };

But the example you gave, only needs eq, not any regular expression.
See also perldoc -f index.

-- 
Ruud


------------------------------

Date: Thu, 04 Aug 2011 15:30:10 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Delaying interpolation in a qr
Message-Id: <871ux13rgd.fsf@quad.sysarch.com>

>>>>> "R" == Ruud  <rvtol+usenet@xs4all.nl> writes:

  R> On 2011-08-04 20:30, Bernie Cosell wrote:
  >> I have a regular expression that includes a
  >> 'runtime' variable. [...] I need the interpolation of
  >> the variable to be deferred until the RE is *used*.

  R> Postponed evaluation spells 'sub'.

  R> my $make_qr = sub { qr/\Q!$_[0}!/ };

but that still doesn't fix his request. i suspect an xy problem
here. why does the interpolation have to happen so late? why not build
up a hash of qr's and select the needed one in the actual regex. the
same logic used to assign to the scalar can be used to select the qr
from the hash. and even why is the qr needed? just interpolate the
latest value of that scalar. in any case we need more explanation of the
reason this is needed. if given, i suspect a very easy answer (like the
hash of qr's) will work fine.

  R> But the example you gave, only needs eq, not any regular expression.
  R> See also perldoc -f index.

agreed, but it may have been a too simplistic example.

uri

-- 
Uri Guttman  --  uri AT perlhunter DOT com  ---  http://www.perlhunter.com --
------------  Perl Developer Recruiting and Placement Services  -------------
-----  Perl Code Review, Architecture, Development, Training, Support -------


------------------------------

Date: Thu, 04 Aug 2011 14:57:13 -0400
From: Bernie Cosell <bernie@fantasyfarm.com>
Subject: Re: Harcopy/book on Moose
Message-Id: <8rql37tp2bi6fqmlg3777cjp3gh7phistv@library.airnews.net>

Mart van de Wege <mvdwege@mail.com> wrote:

} Bernie Cosell <bernie@fantasyfarm.com> writes:
} 
} > Is there a document or book on Moose?

} 'Modern Perl' by chromatic also has a lot on Moose. That one is even
} available for free at http://www.modernperlbooks.com/mt/index.html

I've been meaning to get that book... didn't know it covered Moose, too.
Great. Thanks!  /b\
-- 
Bernie Cosell                     Fantasy Farm Fibers
bernie@fantasyfarm.com            Pearisburg, VA
    -->  Too many people, too few sheep  <--          


------------------------------

Date: Wed, 03 Aug 2011 17:56:14 +0100
From: Henry Law <news@lawshouse.org>
Subject: Re: need help in perl arrays (New to perl but know TCL well)
Message-Id: <TMKdnXwtAu8y4KTTnZ2dnUVZ8uCdnZ2d@giganews.com>

On 03/08/11 12:18, Anil A Kumar wrote:
> I saw some where:
>
> my @options = ("value1!", "value2:s", "value3:s@", "value4:s%");
>
> I would like to know what this !, @, % characters represent here?

Anil, my friend, you've hit upon a very confusing bit of Perl code to 
learn from!

As others have said, the !, @ and % characters (other than the @ in 
"@options") have no special meaning to Perl here, but what may confuse 
you is (a) they do have special meanings in /almost every/ other case; 
and (b) it's very probable that in this case they do have special 
meanings to some Perl code (as distinct from Perl itself), almost 
certainly the Perl module Getopt::Long.

So be aware that @options is defined as an array (by the @ character) 
and initialised to four character strings which include the characters 
!, @ and %.  $options[0] is "value1!", $options[3] is "value4:s%", and 
so on.

-- 

Henry Law            Manchester, England


------------------------------

Date: Wed, 03 Aug 2011 16:42:32 -0400
From: lyttlec <lyttlec@removegmail.com>
Subject: Re: perl + doxygen  + dbi
Message-Id: <j1cbrm$rf9$1@speranza.aioe.org>

On 08/02/2011 06:21 AM, George Mpouras wrote:
> You could read the xml files recursive and for each one of them you could do
> something e.g. greate an other file.
> the pm files can be used also easily with a  do ... pm statement
>
>
>
>
>
> my $xml = '
>
>    <compound refid="d8/d5a/structtm" kind="struct"><name>tm</name>
>       <member refid="d8/d5a/structtm_1a4d098a9a5c03a00b2ee61e10851de81e"
> kind="variable"><name>tm_sec</name></member>
>       <member refid="d8/d5a/structtm_1af414eb7c86cc3099595211eee4d4211b"
> kind="variable"><name>tm_min</name></member>
>       <member refid="d8/d5a/structtm_1a3e7ca4e37f1abcaf56b8a916c38eb9fe"
> kind="variable"><name>tm_hour</name></member>
>       <member refid="d8/d5a/structtm_1ab8d8904bad43b0c8b96e61941c5b5310"
> kind="variable"><name>tm_mday</name></member>
>       <member refid="d8/d5a/structtm_1a112ac36fa2f593777138a417cf031e17"
> kind="variable"><name>tm_mon</name></member>
>       <member refid="d8/d5a/structtm_1a33adf78fd6476b2120ce3b9c4a852053"
> kind="variable"><name>tm_year</name></member>
>       <member refid="d8/d5a/structtm_1afe81a8c46f1c693c43f259b288859f4f"
> kind="variable"><name>tm_wday</name></member>
>       <member refid="d8/d5a/structtm_1a93a0ba77cc23796df84405dcbcc57eb1"
> kind="variable"><name>tm_yday</name></member>
>       <member refid="d8/d5a/structtm_1a5645ca0580c8ab2c24f6c2965d9c9f9c"
> kind="variable"><name>tm_isdst</name></member>
>     </compound>
>
> ';
>
>
>
> use XML::Simple;
> my $xml = XML::Simple::parse_string($xml );
> print "do something with tm_min  : $xml->{member}->{'tm_min'}->{'refid'}\n";
> print "do something with tm_hour :
> $xml->{member}->{'tm_hour'}->{'refid'}\n";
>
>
  I guess it boils down to which is best :
xml-> perl-> xml,
latex -> perl -> xml,
or perlmod -> perl -> xml?

The Ray & McIntosh "Perl & XML", 2002 seems a bit dated now.
How is XML::Simple now?


------------------------------

Date: Thu, 4 Aug 2011 20:14:17 -0700
From: "ela" <ela@yantai.org>
Subject: seeking advice on problem difficulty
Message-Id: <j1duq8$dni$1@ijustice.itsc.cuhk.edu.hk>

Dear all,

I've been working on this problem for 4 days and still cannot come out a 
good solution and would appreciate if you could comment on the problem.

Given a table containing cells delimited by tab like this (the 1st line 
being the header and below I use delimiter space for clarity):

ID F1 F2 F3 F4 F5
1 SuperC3 C1 subC4 dummy hi
1 SuperC3 C1 subC3 dumdum hell
2 SuperC3 C1 subC3 hello hello
3 SuperC3 C2 subC7 hel hel
 ...
1000 SuperC1 C8 subC10 hi hi


and I have another table that group the ID's together, e.g.
Group 1:1,2, 16, 200
Group 2:99, 136, 555
 ...
Group 15: 123, 124, 999


The two tables above can contain non-overlapping entries though most of the 
time the entries do overlap. So the task is to make use of the 2nd table and 
to look up the first table for evaluation. By checking group 1, we know that 
ID's 1,2, 16 and 200 are in the same group. And what to evaluate? Say, if 
ID's 16 and 200 also contain subC3 in F3, then we can assign this group 1 
members to be subC3. If the consistency does not exceed certain threshold 
(e.g. 0.7), we then look up F2 and check the statistics again and may 
conclude C1, and finally for F1 (SuperC3). If F1 consistency also cannot 
pass the threshold, "inconsistent" is returned. The main problem here is: I 
have been overwhelmed by hash of hash of hash !

First, I use a hash to bin the groups with the ID's, e.g.
$vhash{$group}{$ID} = 1;
and memorize which $ID belongs to which $group
$group{$ID} = $groupID;


Then I use another hash to bin the other table, e.g. ($cells[] refers to the 
parsing result while reading the table in a file)

$chash{$ID}{F1}{$cells[$F1_location]} = 1;
$chash{$ID}{F2}{$cells[$F2_location]} = 1;
$chash{$ID}{F3}{$cells[$F3_location]} = 1;

And the complicated thing appears here....

for $cid (keys %chash) {
    if (exists $group{$cid}) {
        for $f1 (keys %{%chash{$cid}{F1} {
            for $gid (keys %($vhash{$group{$cid}}) {
 ....

Okay, I know it's difficult to read... and that's why I'm seeking advice 
here as the complicated stuff only just starts there... Is there any 
convenient data structure that helps better manage this kind of 
"cross/circulate-referencing"-like problem? 




------------------------------

Date: Thu, 04 Aug 2011 14:26:45 +0200
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: seeking advice on problem difficulty
Message-Id: <87ei11uzui.fsf@sapphire.mobileactivedefense.com>

"ela" <ela@yantai.org> writes:
> I've been working on this problem for 4 days and still cannot come out a 
> good solution and would appreciate if you could comment on the problem.
>
> Given a table containing cells delimited by tab like this

[ please see original for the indeed gory details ]

Provided I understood the problem correctly, a possible solution could
look like this (this code has had very little testing): First, you
define your groups by associating array references containing the group
members with the 'group ID' with the help of a hash:

$grp{1} = [1, 2];

Then, you create a hash mapping the column name to the column value
for each ID and put these hashes into an id hash associated with the
ID:

$id{1} = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC4' };
$id{2} = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC3' };

Provided this has been done, a 'consistency' routine can be defined as

sub consistency($$)
{
    my ($grp, $col) = @_;
    my %seen;

    $seen{$_} = 1 
	for (map { $id{$_}{$col} } @{$grp{$grp}});

    return 1.0 / keys(%seen);
}

This takes a group ID and a column name as argument and returns the
'consistency' of this column for this group. Then, an array needs to
be created which names the columns in the order they are supposed to
be checked in:

@order  = qw(F3 F2 F1);

Now, a 'decide' routine can be defined like this:

sub decide($)
{
    my $grp = $_[0];

    consistency($grp, $_) >= THRESHOLD and return $_
	for (@order);

    return undef;
}

This takes a group ID as argument and returns either the name of the
first column (checked in the order given by @order) whose consistency
is >= the THRESHOLD or undef (::= inconsistent). As a complete script:

------------------
#!/usr/bin/perl

use constant THRESHOLD =>	0.7;

my (%grp, %id, @order, $res);

@order  = qw(F3 F2 F1);

$grp{1} = [1, 2];

$id{1} = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC4' };
$id{2} = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC3' };

sub consistency($$)
{
    my ($grp, $col) = @_;
    my %seen;

    $seen{$_} = 1 
	for (map { $id{$_}{$col} } @{$grp{$grp}});

    return 1.0 / keys(%seen);
}

sub decide($)
{
    my $grp = $_[0];

    consistency($grp, $_) >= THRESHOLD and return $_
	for (@order);

    return undef;
}

$res = decide(1);
$res //= 'inconsistent';

print("$res\n");
--------------------

NB: This might not be a very sensible algorithm (ie, "I don't know
this").




------------------------------

Date: Thu, 04 Aug 2011 14:09:23 +0100
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: seeking advice on problem difficulty
Message-Id: <0.1974a0af27d9bc0648a0.20110804140923BST.87ty9xtjb0.fsf@bsb.me.uk>

"ela" <ela@yantai.org> writes:

> I've been working on this problem for 4 days and still cannot come out a 
> good solution and would appreciate if you could comment on the problem.
>
> Given a table containing cells delimited by tab like this (the 1st line 
> being the header and below I use delimiter space for clarity):
>
> ID F1 F2 F3 F4 F5
> 1 SuperC3 C1 subC4 dummy hi
> 1 SuperC3 C1 subC3 dumdum hell
> 2 SuperC3 C1 subC3 hello hello
> 3 SuperC3 C2 subC7 hel hel

You have a duplicate ID.  If this is not a typo, please ignore the rest
of the post -- I've assumed that IDs are unique.

> ...
> 1000 SuperC1 C8 subC10 hi hi
>
>
> and I have another table that group the ID's together, e.g.
> Group 1:1,2, 16, 200
> Group 2:99, 136, 555
> ...
> Group 15: 123, 124, 999
>
>
> The two tables above can contain non-overlapping entries though most of the 
> time the entries do overlap. So the task is to make use of the 2nd table and 
> to look up the first table for evaluation. By checking group 1, we know that 
> ID's 1,2, 16 and 200 are in the same group. And what to evaluate? Say, if 
> ID's 16 and 200 also contain subC3 in F3, then we can assign this group 1 
> members to be subC3. If the consistency does not exceed certain threshold 
> (e.g. 0.7), we then look up F2 and check the statistics again and may 
> conclude C1, and finally for F1 (SuperC3). If F1 consistency also cannot 
> pass the threshold, "inconsistent" is returned. The main problem here is: I 
> have been overwhelmed by hash of hash of hash !

I would not use hashes for this data.  It is naturally an array of
arrays.  Because you need to scan a column at selected row positions,
I'd make it an array of columns:

  # after skipping the first line...
  my @column = ();
  while (<>) {
      my $c = 0;
      push(@{$column[$c++]}, $_) foreach split;
  }

Now you can use a slice to extract the data for a particular column.
Having got your group list as an array:

  my @group = map {$_ - 1} split /,\s*/, $group_string;

(the -1 is to adjust for zero based indexing) you can write

  @{$column[0]}[@group]

to get the items from the first column whose frequencies you are
interested in.

<snip code>
> for $cid (keys %chash) {
>     if (exists $group{$cid}) {
>         for $f1 (keys %{%chash{$cid}{F1} {
>             for $gid (keys %($vhash{$group{$cid}}) {
> ....

Functions are crucial to managing complexity.  I'd want a function
'most_frequent' that can take an array of values and find the frequency
of the most common value among them.  It could return both that value
and the frequency.  Something like:

  sub most_frequent
  {
      my ($most_freq, %count) = ('', '' => 0);
      for my $item (@_) {
          $most_freq = $item if ++$count{$item} > $count{$most_freq};
      }
      return ($most_freq, $count{$most_freq}/@_);
  }

This is passed a slice for a given column and group:

  most_frequent(@{$column[$col]}[@group])

With that function you simply need to find the first $col whose most
frequent item meets your threshold.  'First' may not be correct, since
in your example you consider F3 before F1.  Maybe you need the item from
any column that has the greatest frequency?  Maybe there is a specified
order in which the columns should be tested?  Whatever the answer, it
should be easy with the above function.

<snip>
-- 
Ben.


------------------------------

Date: Thu, 4 Aug 2011 22:25:50 -0700
From: "ela" <ela@yantai.org>
Subject: Re: seeking advice on problem difficulty
Message-Id: <j1e6gt$gdt$1@ijustice.itsc.cuhk.edu.hk>

Thanks both Rainer and Ben. The 1st table does contain duplicate ID's but 
not duplicate rows because one ID can contain several F3's so (ID, F3) form 
a primary key.

Now I learn more about how to tackle this problem from you and shall benefit 
in reusing the skills learnt in future. Thanks a lot!! 




------------------------------

Date: Thu, 04 Aug 2011 15:47:02 +0200
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: seeking advice on problem difficulty
Message-Id: <87wretthk9.fsf@sapphire.mobileactivedefense.com>

Ben Bacarisse <ben.usenet@bsb.me.uk> writes:

[...]


>   my @column = ();

There is little reason to empty an empty array by assigning an empty
list to it.

[...]

>   sub most_frequent
>   {
>       my ($most_freq, %count) = ('', '' => 0);

A more sensible initialization would be

	my ($most_freq, %count);

        $most_freq = shift;
        $count{$most_freq} = 1;
        
>       for my $item (@_) {
>           $most_freq = $item if ++$count{$item} > $count{$most_freq};
>       }
>       return ($most_freq, $count{$most_freq}/@_);

and then use (@_ + 1) for the division.


------------------------------

Date: Thu, 4 Aug 2011 23:18:57 -0700
From: "ela" <ela@yantai.org>
Subject: Re: seeking advice on problem difficulty
Message-Id: <j1e9kg$hl9$1@ijustice.itsc.cuhk.edu.hk>


"Rainer Weikusat" <rweikusat@mssgmbh.com> wrote in message 
news:87ei11uzui.fsf@sapphire.mobileactivedefense.com...
> Then, you create a hash mapping the column name to the column value
> for each ID and put these hashes into an id hash associated with the
> ID:
>
> $id{1} = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC4' };

Well, the duplicate ID curse appears here... so maybe an array has to use 
instead?

$id{1}[0] = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC4' };
$id{1}[1] = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC5' };
$id{2}[0] = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC3' };

>    $seen{$_} = 1 for (map { $id{$_}{$col} } @{$grp{$grp}});
>    return 1.0 / keys(%seen);
> This takes a group ID and a column name as argument and returns the
> 'consistency' of this column for this group.

In fact, I don't quite understand what these codes are doing. what does 
$grp{$grp} refer to? And as it's no longer $id{$_} but $id{}[], so how to 
adjust then?

>    my $grp = $_[0];
Why do you make it an array element?

>    consistency($grp, $_) >= THRESHOLD and return $_ for (@order);
the exact item (here, say, C1, in addition to F2) should also be 
returned.... (Well, that's why I said the problem is complicated...)

> $res //= 'inconsistent';
Is this a typo? What's this? 




------------------------------

Date: Thu, 04 Aug 2011 16:31:37 +0200
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: seeking advice on problem difficulty
Message-Id: <87sjphtfhy.fsf@sapphire.mobileactivedefense.com>

"ela" <ela@yantai.org> writes:
> "Rainer Weikusat" <rweikusat@mssgmbh.com> wrote in message 
> news:87ei11uzui.fsf@sapphire.mobileactivedefense.com...
>> Then, you create a hash mapping the column name to the column value
>> for each ID and put these hashes into an id hash associated with the
>> ID:
>>
>> $id{1} = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC4' };
>
> Well, the duplicate ID curse appears here... so maybe an array has to use 
> instead?
>
> $id{1}[0] = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC4' };
> $id{1}[1] = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC5' };
> $id{2}[0] = { F1 => 'SuperC1', F2 => 'C1',  F3 => 'subC3' };
>
>>    $seen{$_} = 1 for (map { $id{$_}{$col} } @{$grp{$grp}});
>>    return 1.0 / keys(%seen);
>> This takes a group ID and a column name as argument and returns the
>> 'consistency' of this column for this group.
>
> In fact, I don't quite understand what these codes are doing. what does 
> $grp{$grp} refer to?

Using the same name twice was arguably somewhat confusing: The inner
$grp refers to the scalar variable $grp (defined in the function) that
holds the group ID and it is used to select the value (array of IDs)
associated with the corresponding key in the %grp hash.

> And as it's no longer $id{$_} but $id{}[], so how to 
> adjust then?

Provided that all different 'column lists' associated with the same ID
can be treated as if they were associated with different IDs, you
could turn the map expression into something like

	map { map { $_->{$col} } @{$id{$_}}; } @{$grp{$grp}}

(uncompiled).
        
>>    my $grp = $_[0];
> Why do you make it an array element?

Make what? This line defined a scalar variable named $grp and assigned
the first argument of the sub to it: All arguments are passed in the
array @_, hence $_[0] is the first argument.

>>    consistency($grp, $_) >= THRESHOLD and return $_ for (@order);
> the exact item (here, say, C1, in addition to F2) should also be 
> returned.... (Well, that's why I said the problem is complicated...)

Then, you'll need something similar to Ben's most_frequent routine.

>> $res //= 'inconsistent';
> Is this a typo? What's this?

// is similar to || except that it tests for 'definedness' and not
'trueness': It will assign the string to the variable if its value is
undef.



------------------------------

Date: Fri, 5 Aug 2011 01:42:02 -0700
From: "ela" <ela@yantai.org>
Subject: Re: seeking advice on problem difficulty
Message-Id: <j1ei0p$l2p$1@ijustice.itsc.cuhk.edu.hk>


"Rainer Weikusat" <rweikusat@mssgmbh.com> wrote in message 
news:87sjphtfhy.fsf@sapphire.mobileactivedefense.com...
> Provided that all different 'column lists' associated with the same ID
> can be treated as if they were associated with different IDs, you
> could turn the map expression into something like
>
> map { map { $_->{$col} } @{$id{$_}}; } @{$grp{$grp}}

"map" seems to be doing the "cross-referencing" task that I have to 
accomplish. I have just learnt more complicated data structures a few months 
ago and would appreciate if you could explain what's doing here. So when an 
element from the $grp is extracted, it becomes the second "$_" by the 
outside map? and then the inner map extracts another array element from $id 
and this element becomes the first "$_"? Finally, is there a quick way to 
find the intersection between the $grp and $id hashes? I want to learn your 
elegant one-line map code but it cannot process the non-overlapping id's so 
I have to process them the other way.

> // is similar to || except that it tests for 'definedness' and not
> 'trueness': It will assign the string to the variable if its value is
> undef.

but this error "Search pattern not terminated at test.pl" occurs... 




------------------------------

Date: Fri, 5 Aug 2011 01:55:27 -0700
From: "ela" <ela@yantai.org>
Subject: Re: seeking advice on problem difficulty
Message-Id: <j1eipu$l9v$1@ijustice.itsc.cuhk.edu.hk>


"Ben Bacarisse" <ben.usenet@bsb.me.uk> wrote in message 
news:0.1974a0af27d9bc0648a0.20110804140923BST.87ty9xtjb0.fsf@bsb.me.uk...
>      my ($most_freq, %count) = ('', '' => 0);
I guess the above line is doing some sort of initialization of zero's, but 
why is ", " used? Making both $most_freq and %count separated by comma to be 
zero?

>      for my $item (@_) {
How does this @_ correspond to @{$column[$col]}[@group] below?

>          $most_freq = $item if ++$count{$item} > $count{$most_freq};
>      }
>      return ($most_freq, $count{$most_freq}/@_);
>  }
>
> This is passed a slice for a given column and group:
>
>  most_frequent(@{$column[$col]}[@group])





------------------------------

Date: Thu, 04 Aug 2011 11:13:28 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
To: "ela" <ela@yantai.org>
Subject: Re: seeking advice on problem difficulty
Message-Id: <86aabpavuf.fsf@red.stonehenge.com>

>>>>> "ela" == ela  <ela@yantai.org> writes:

ela> but this error "Search pattern not terminated at test.pl"
ela> occurs... 

Upgrade your Perl to 5.10 or later.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


------------------------------

Date: Wed, 03 Aug 2011 18:13:36 -0600
From: robin <r@thevoid1.net>
Subject: the super hidden
Message-Id: <j1co7m$n8k$1@speranza.aioe.org>

truth can be used to stop wars

-hydra


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3464
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32199] in Perl-Users-Digest

Perl-Users Digest, Issue: 3464 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Thu Aug 4 16:09:27 2011

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 4 16:09:27 2011