[31776] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3039 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jul 22 03:09:23 2010

Date: Thu, 22 Jul 2010 00:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 22 Jul 2010     Volume: 11 Number: 3039

Today's topics:
    Re: data structures/query question <m@rtij.nl.invlalid>
    Re: data structures/query question <m@rtij.nl.invlalid>
        Extracting table in html page <mulshankar@gmail.com>
    Re: Extracting table in html page <jimsgibson@gmail.com>
    Re: Extracting table in html page sln@netherlands.com
    Re: Extracting table in html page <netnews@invalid.com>
    Re: Extracting table in html page <sopan.shewale@gmail.com>
    Re: Is Mason dead? Was: MasonCompRoot and globs <tw@dionic.net>
        Mason - is this sensible? <tw@dionic.net>
    Re: Need help/advice to improve script sln@netherlands.com
    Re: Need help/advice to improve script sln@netherlands.com
    Re: Need help/advice to improve script sln@netherlands.com
    Re: Need help/advice to improve script <sopan.shewale@gmail.com>
    Re: Need help/advice to improve script <uri@StemSystems.com>
    Re: Stress testing process start <nospam-abuse@ilyaz.org>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 21 Jul 2010 22:13:35 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: data structures/query question
Message-Id: <f01kh7-i09.ln1@news.rtij.nl>

On Wed, 21 Jul 2010 07:33:52 -0700, ccc31807 wrote:

> I've solved this twice, but still am not happy with my solution. I can
> post working code that meets the requirements, and I will do so. But
> first, I'd like to listen to the wisdom of this august body.
> 
> I have reproduced a sample of the data file below. The top row consists
> of position codes. The left column consists of function codes. The cells
> consists of the number of people who have the position identified in the
> top row that perform the function identified in the left column.

From counting columns, I suspect that fID is the column name for the 
function codes and not a real position.

> 
> The requirement is: return a list of all position codes where at least
> one person with that position performs all the functions in the query. A
> query can look like this:
> ----------------query (in.txt)---------------- 53ProcStuR 15DataEntr
> 54ProspStu 02AcadAdvUg 48PrtcipOr 68Trnscrpt 69TrnfrCrd 27GrdProcA
> 
> Here is a sample run:
> ------------output-----------------
> jobs1>perl crunch_data_2.plx
> Enter the name of your data file (comma separated values): Consol.csv
> Enter the name of the functions file (space separated values): in.txt
> 
> The functions you specified are: 53ProcStuR 15DataEntr 54ProspStu
> 02AcadAdvUg 48PrtcipOr 68Trnscrpt 69Trn frCrd 27GrdProcA
> 
> The positions that matched: fID 1BDD 1ADSS 1SC 1E 1SD 1ASD 1AA 1SSS 1ASC
> 1DSS
> ------------end output---------------------

Your output does not match your sample data. F.i. there is no position 
code 1E.

> 
> As I said, I can and will post working code that meets the requirements,
> but I don't want to poison the well before hand. I would appreciate very
> much your best idea of how to do this.

OK, first note that we can skip any line that starts with a function code 
we're not interested in. After that, we just count the number of non-zero 
functions per position and print the ones that have the same number as 
the query.

M4


#!/usr/bin/perl

use strict;
use warnings;

# I/O simplyfied
# Error handling omitted

my @in = qw/53ProcStuR 15DataEntr 54ProspStu 02AcadAdvUg 48PrtcipOr 
68Trnscrpt 69TrnfrCrd 27GrdProcA/;

my $line = <DATA>;
chomp $line;
my @positioncodes = split /,/, $line;
shift @positioncodes; # skip fID column

my %positions;
while ($line = <DATA>) {
    my ($function, @people) = split /,/, $line;
    next unless grep { $_ eq $function } @in; # skip uninteresting
    for my $pos (@positioncodes) {
	my $t = shift @people;
	$positions{$pos}++ if $t;
    }
}

my @matches;
while (my ($position, $nr) = each %positions) {
#    print "$position=$nr\n";
    push @matches, $position if ($nr == @in);
}

print "The positions that matched: @matches\n";

exit

__DATA__
fID,1AA,1AAD,1AD,1ADA,1ADB,1ADBFA,1ADBO,1ADFG,1ADFS,1ADMAS,1ADOF,1ADSS,1AE,1AF,1AS,1AS2,1ASC,1ASD,1ASDSS,1ASST,1ATRD,1ATTD,1BD,1BDD,1BDR,1BDS,1BM,1CA,1CLK1,1CLK3
15DataEntr,31,1,1,1,1,1,1,,,1,1,5,1,1,1,1,4,6,1,1,1,1,1,3,1,1,,1,1,1
54ProspStu,31,,1,1,,,,,,1,1,5,1,,,,4,5,1,1,,,1,6,1,,,,1,1
03ClericSup,23,,,1,1,1,1,,1,1,,3,1,1,1,1,2,4,,1,1,1,,1,1,1,,1,1,1
48PrtcipOr,27,,1,1,,1,,,1,1,1,4,1,1,,1,3,5,1,,1,,,4,,,,,,
53ProcStuR,36,,1,1,,,,,,1,1,3,1,1,,,4,3,,,,,1,2,1,,,,,
42MPlnRcrt,12,1,,,,,,1,,,,3,,,,,1,5,,1,1,,1,7,1,1,,,,1
52PrepStuF,22,,,1,1,,,,,1,1,2,,1,,1,3,4,,,,,,1,,,,,,
55RcvScanD,13,,,,,1,,,,1,,,,1,,1,1,2,,1,1,1,,,,,,1,1,1
24FileDevM,15,,,1,1,1,,,,1,1,4,,1,1,1,2,3,1,1,,,,3,,,,1,1,1
56RecepDut,12,,,,,,,,,1,,,,1,,,1,2,,1,1,1,,,,,,1,1,1
65TestProc,18,,,1,,,,1,,,1,3,1,,,,4,2,,,1,,,2,,,,,,1
04AdmssProc,18,,,1,,,,,,1,1,3,1,1,,,3,3,,,,,,2,,,,,1,
02AcadAdvUg,30,,,1,,,,1,,,1,4,1,1,,,4,3,,,,,,2,,,,,,
27GrdProcA,23,1,,1,1,,1,,1,1,,6,,1,1,,2,4,1,1,,,,2,1,,,,,
09CommInvPR,8,1,1,,,,,1,,1,,2,,,,,2,4,1,1,,,1,7,1,1,,,,1
17RetenStr,7,1,1,1,,,,,,1,,4,,,,,2,6,1,,,,1,6,1,1,,,,
16RecrtStr,7,1,,,,,,1,,1,,1,1,,,,1,5,,,,,1,7,1,1,,,,
18DocImagi,9,,,1,,,,,,,1,2,,1,,1,2,3,,1,,,,,,,,1,1,
68Trnscrpt,27,,,,,,,,,,1,4,1,,,,2,3,,,,,1,1,1,,,,,
57RecrdsMa,8,,,1,1,1,,,,1,1,4,,1,,1,2,3,1,,,,,1,,,,,,
01AcadAdvGr,23,,,1,,,,1,,1,,6,,,,,3,3,,,,,,1,,,,,,
07ClassCoor,5,,1,1,,,,,,,1,3,,,,,1,6,,1,,,,,,,,,,1
58ReprtPre,9,,,,1,1,1,,,1,,1,,1,,1,1,2,1,,,1,,2,,,,1,,
28GrdProce,17,,1,1,,1,,,,,1,6,,,,1,3,3,1,,,,,,,,,,,
45SupEqpPu,1,,,,1,1,1,,1,,1,,,1,1,1,2,2,,,,1,,,,1,,,,
49EndTrmPr,20,,1,1,,,,,,,1,4,,,,,3,3,,,,,,,,,,,,
20EnrLeadM,3,1,,,,,,1,,1,,,,,,,1,3,1,,,,1,6,1,1,,,,
33MailRoom,1,,,,,1,,,1,,,,,1,1,1,1,2,,1,,1,,,,1,,1,,1
51ClassSch,4,1,,1,,,,,,,,2,1,,,,2,2,,,1,,,,,,,,,
29ImplPoli,1,1,1,,,1,2,1,,1,1,5,,,,,1,3,1,,,,,2,,,,,,
35MktgBusR,4,1,,,,,,,,1,,1,,,,,1,2,,,,,,6,,1,,,,
69TrnfrCrd,26,,,,,,,,,,1,3,1,,,,2,1,,,,,,1,,,,,,
13StuSvcsT,7,,1,1,,1,,,,,1,8,1,,,,2,3,1,,,,,1,,,,,,
70TrblshtS,7,,,,1,1,,,1,1,1,1,,1,1,1,3,2,,,,,,1,,,,,,
06BldgMonSe,3,1,,,,1,,,,,1,,1,,,,1,4,,,,,,,,,2,,,1
10HRDocs,,1,,,1,1,2,,1,,1,3,,,,,1,1,1,,,1,,,,,,,,
47OvrseeSt,5,,1,1,,1,,1,,,1,4,1,,,,1,2,1,,,,,,,,,,,
11ComplRpt,2,1,,,1,1,1,,,1,,,,1,,1,,2,,,1,,,1,,,,1,,
62StaffSup,,1,,,,1,1,,1,,1,3,,,,,1,1,,,,,,2,,,,,,


------------------------------

Date: Wed, 21 Jul 2010 22:51:30 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: data structures/query question
Message-Id: <i73kh7-i09.ln1@news.rtij.nl>

On Wed, 21 Jul 2010 22:13:35 +0200, Martijn Lievaart wrote:

Here's a variant. Start out with all positions in a set, delete those 
positions from the set where there are no people in a function we're 
interested in. Print what is left.

#!/usr/bin/perl

use strict;
use warnings;

# I/O simplyfied
# Error handling omitted

my @in = qw/53ProcStuR 15DataEntr 54ProspStu 02AcadAdvUg 48PrtcipOr 
68Trnscrpt 69TrnfrCrd 27GrdProcA/;

my $line = <DATA>;
chomp $line;
my @positioncodes = split /,/, $line;
shift @positioncodes;

my %positions = map { ($_, 1) } @positioncodes;

while ($line = <DATA>) {
    my ($function, @people) = split /,/, $line;
    next unless grep { $_ eq $function } @in; # skip uninteresting
    for my $pos (@positioncodes) {
	my $t = shift @people;
	delete $positions{$pos} unless $t;
    }
}

my @matches = keys %positions;

print "The positions that matched: @matches\n";

exit

__DATA__

(same as previous)


------------------------------

Date: Wed, 21 Jul 2010 15:21:33 -0700 (PDT)
From: shankar_perl_rookie <mulshankar@gmail.com>
Subject: Extracting table in html page
Message-Id: <233f66ab-b5eb-449d-b3b0-b2542b8dbe31@i31g2000yqm.googlegroups.com>

Hello All,

I have an html file where I am trying to extract a table. The problem
I am facing is there are lot of tables in the page and the table I am
looking to extract appears after a particular string say $some_text. I
know of a way that I can search for the string in the html page but
what I want to do is capture a table that immediately follows the
$some_text.

Any suggestions on how to do this ??

Thanks,
Shankar


------------------------------

Date: Wed, 21 Jul 2010 16:08:17 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: Extracting table in html page
Message-Id: <210720101608173447%jimsgibson@gmail.com>

In article
<233f66ab-b5eb-449d-b3b0-b2542b8dbe31@i31g2000yqm.googlegroups.com>,
shankar_perl_rookie <mulshankar@gmail.com> wrote:

> Hello All,
> 
> I have an html file where I am trying to extract a table. The problem
> I am facing is there are lot of tables in the page and the table I am
> looking to extract appears after a particular string say $some_text. I
> know of a way that I can search for the string in the html page but
> what I want to do is capture a table that immediately follows the
> $some_text.
> 
> Any suggestions on how to do this ??

The most reliable way would be to use the HTML::Parser module to parse
the html file, register appropriate handlers for the table elements
(<table>, <tr>, <td>) and one for text elements, look for your string,
and process the next table encountered in a callback (handler
subroutines are called as callbacks by the parsing method).

Another way would be to use a module to extract tables from HTML. There
are at least two on CPAN: HTML::TableExtract and HTML::TableParser. The
problem using these is to find the table after the specified text. Is
there some other way of identifying the table?

The quick and dirty way is to use a regular expression (untested):

if( $html =~ m{ $some_text .*? <table> (.*?) </table> }isx ) {
  # table contents in $1
}

However, this will not always work. It fails if you have nested tables,
for example, which is a common occurrence in some HTML. However, if you
are in a hurry it might work for you. It is always better to use a real
parser for HTML.

-- 
Jim Gibson


------------------------------

Date: Wed, 21 Jul 2010 18:01:23 -0700
From: sln@netherlands.com
Subject: Re: Extracting table in html page
Message-Id: <sf5f46dlg4kvfrq30vpq34jmemp69bv363@4ax.com>

On Wed, 21 Jul 2010 16:08:17 -0700, Jim Gibson <jimsgibson@gmail.com> wrote:

>In article
><233f66ab-b5eb-449d-b3b0-b2542b8dbe31@i31g2000yqm.googlegroups.com>,
>shankar_perl_rookie <mulshankar@gmail.com> wrote:
>
[snip]

>The quick and dirty way is to use a regular expression (untested):
>
>if( $html =~ m{ $some_text .*? <table> (.*?) </table> }isx ) {
>  # table contents in $1
>}
>
>However, this will not always work. It fails if you have nested tables,
>for example, which is a common occurrence in some HTML. However, if you
>are in a hurry it might work for you. It is always better to use a real
>parser for HTML.

Its ALWAYS trivial to parse a markup language's markup.
ie: parse out tags(open|close)/attributes and content.
Creating an element tree (document) with HTML is another
process altogether. Xhtml/Xml, not so bad, sgml er ..

I always laugh when people say a 'real parser for HTML' because they
don't know what thier saying, instead, just parroting phrases from
so called God's, then passing them along.
As if a SAX parser does nothing more than a realtime parse on a stream,
ie: a markup parse. Easily done by regular expressions.

Oh, and before anybody starts that "regular language" crap, they better
be able to explain what the "can't" part means!

-sln


------------------------------

Date: Wed, 21 Jul 2010 18:06:25 -0700
From: HASM <netnews@invalid.com>
Subject: Re: Extracting table in html page
Message-Id: <m3sk3cnx4e.fsf@127.0.0.1>

Jim Gibson <jimsgibson@gmail.com> writes:

>> I have an html file where I am trying to extract a table. The problem
>> I am facing is there are lot of tables in the page and the table I am
>> looking to extract appears after a particular string say $some_text. 

> The most reliable way would be to use the HTML::Parser module to parse
> the html file, 

Or HTML::TreeBuilder;

use HTML::TreeBuilder;
use LWP::UserAgent;
my $url = 'http://www.example.com/...";
my $browser = LWP::UserAgent->new;
my $response = $browser->request (HTTP::Request->new(GET => $url));
if ($response->is_success) {
  my $tree = HTML::TreeBuilder->new;
  my $content =
    $tree->parse_content($response->decoded_content);
  # search for text with look_down (there are other way)
  my $text = $content->look_down (...)
  # then for your table
  my $table = $content->look_down ('_tag', 'table', ...)

etc,

-- HASM


------------------------------

Date: Wed, 21 Jul 2010 22:22:33 -0700 (PDT)
From: "sopan.shewale@gmail.com" <sopan.shewale@gmail.com>
Subject: Re: Extracting table in html page
Message-Id: <ea454385-76f7-466b-9e53-ae786513abcc@m35g2000prn.googlegroups.com>

The best way can be:
use split on $some_text and throw away the first part.
my ($junk, $interest_html) =3D split (/$some_text/, $html);

on $interest_html - use HTML::TreeBuilder to parse the tables.
grab the first table - you are done.

Let me know if you find difficult to use HTML::TreeBuilder.

--sopan shewale



On Jul 22, 3:21=A0am, shankar_perl_rookie <mulshan...@gmail.com> wrote:
> Hello All,
>
> I have an html file where I am trying to extract a table. The problem
> I am facing is there are lot of tables in the page and the table I am
> looking to extract appears after a particular string say $some_text. I
> know of a way that I can search for the string in the html page but
> what I want to do is capture a table that immediately follows the
> $some_text.
>
> Any suggestions on how to do this ??
>
> Thanks,
> Shankar



------------------------------

Date: Wed, 21 Jul 2010 19:11:49 +0100
From: Tim Watts <tw@dionic.net>
Subject: Re: Is Mason dead? Was: MasonCompRoot and globs
Message-Id: <i27d96$n0t$1@news.eternal-september.org>

J. Gleixner <glex_no-spam@qwest-spam-no.invalid>
  wibbled on Friday 16 July 2010 17:09

> Maybe this will help you:
> 
> http://masonhq.com/?Component:Recurser
> 

Hi,

Yes I found that most useful.

I rehacked a version that will do what I want (wit due credits to your 
source in the comments which are snipped here for brevity).

Mine bases its search path relative to the request component as that's the 
canonical search path for my application.

eg if I want a component called '_mason/wibble.mas' and the request uri was 
/a/b/c/doobry.html

I want it to search 

/a/b/c/_mason/wibble.mas
/a/b/_mason/wibble.mas
/a/_mason/wibble.mas
/_mason/wibble.mas

I kept the reverse, exec and collectall functionality as I could see that 
being useful and the cost is minor (complexity and speed).

=========

<%perl>
	# Cleanup the path
	$name =~ s#/{2,}#/#g;

	my @sdirs = split('/', $m->request_comp->dir_path);
	my @paths = ();

	while (@sdirs) 
	{
		my $path = join('/', @sdirs) . '/';
		if ($m->comp_exists($path . $name)) 
		{
			push @paths,$path . $name;
			last if !$all;
		}
		pop @sdirs;
	}
	@paths = reverse @paths if $reverse;
	if ($exec) 
	{ 
		foreach my $comp (@paths) 
		{ 
			$m->comp($comp); 
		}
	}
	else 
	{ 
		scalar @paths > 1 ? return \@paths : return $path[0] 
	}
	print join(':', @paths);
</%perl>

<%args>
	$name
	$exec => 1
	$all => 0
	$reverse => 0
</%args>

========================

Thanks again.

-- 
Tim Watts

Managers, politicians and environmentalists: Nature's carbon buffer.



------------------------------

Date: Wed, 21 Jul 2010 19:25:25 +0100
From: Tim Watts <tw@dionic.net>
Subject: Mason - is this sensible?
Message-Id: <i27e2m$34h$1@news.eternal-september.org>

(Also posted to mason-users)

Hi,

OK - scenario:

I want .html files to be served as complete content, and .page files to be 
wrapped with the HTML head stuff, a menu and a footer. The idea is that most 
of my content will be .page files, wrapped with a constant set of stuff. But 
I can stick a .html file in and know that it will not be wrapped, should I 
need to...

ie wibble.page is only the text content of a single page without any common 
stuff, tedious head/html tags and friends etc...

I have a default handler with this in (gets called if the target URI cannot 
be resolved by apache):

====================
<%perl>
	# Handle permanant redirects
	$m->redirect('/venuespace.html', 301) if $r->uri() eq 
'/villagehall.html';

	# OK - there is no target file for the URI
	my $rcompname = $r->uri();
	
	# Try for a .page component
	$rcompname =~ s/\.html$/\.page/;
	if ($m->comp_exists($rcompname))
	{
		$m->subexec($rcompname);
		return;
	}
	$m->abort(404); # Not found
</%perl>

<%attr>
	html_extra_headers 	=> '<!-- Wibble -->'
	pagetitle 		=> 'Robertsbridge'
	footercomment		=> '&copy; 2010'
	navid			=> ''
</%attr>
====================

So far, for a target exisiting .html, .cgi, .pl etc file, this does not get 
called.

If the client asks for /index.html but index.page exists, this subexecs 
/index.page after checking for redirects, and if it fails, aborts with a 
404.

OK - now to handle the wrapping of index.page:

We have an autohander thus
===================
<%perl>
	# If the request is for a .page component, remapped by default.mas
	if ( $m->request_comp->name() =~ m/\.page$/ )
	{
		$m->comp('/_mason/page.mas');
		return;
	}
	# Otherwise, run with the original request
	else
	{
		$m->call_next;
	}
</%perl>
===================


Just for completeness, page.mas looks roughly like, with lots of snippage 
for brevity:

===================
 ...
<html xmlns="http://www.w3.org/1999/xhtml">
 ...
<head>
    <title>
            <% $m->request_comp->attr('pagetitle') %>
    </title>

 ...
    <% $m->request_comp->attr('html_extra_headers') %>

 ...

</head>
 ...
<body id="<% $m->request_comp->attr('navid') %>">


 ...
        <div id="headertext">
            <h1><% $m->request_comp->attr('pagetitle') %></h1>
        </div> <!--headertext-->
 ...
# Here we incorporate the index.page component: # 
	<% $m->call_next %>


            <& /_mason/main_menu.mas &>
 ...


<p id="footer-author"><% $m->request_comp->attr('footercomment') %></p>

 ...

</body>
</html>
===================

Obviously there's loads of div blocks and other stuff in there, but you see 
the general execution flow...


Does that look reasonably sane? If so, it means I "get" Mason. If not, 
well.. But I can say that it seems to function in reality.

Thanks for any comments :)

Cheers

Tim

-- 
Tim Watts

Managers, politicians and environmentalists: Nature's carbon buffer.



------------------------------

Date: Wed, 21 Jul 2010 16:30:43 -0700
From: sln@netherlands.com
Subject: Re: Need help/advice to improve script
Message-Id: <7l0f46dnhr2kbc4cbapml57lbga5nuta2q@4ax.com>

On Wed, 21 Jul 2010 02:17:20 -0700 (PDT), "sopan.shewale@gmail.com" <sopan.shewale@gmail.com> wrote:

>Hello,
>
>I have Apache htpasswd format password store. The fields are ":"
>separated and has more fields than standard format. I have script to
>read the fields and load into Hash.
>
>I need advice/help to improve the script. The current one works-but
>not really good script.  The script goes as follow:
>--------------------------------
>#!/usr/bin/perl
>use strict;
>use warnings;
>
>my $data = {};
>
>for (<DATA>) {
>
>    my $line = $_;
>    if ( defined $line ) {
>        if (   $line =~ /^(.*?):(.*?):(.*?):(.*?):(.*?)(?::(.*))?$/
>            || $line =~ /^(.*?):(.*?):(.*?):(.*?)(?::(.*))?$/
>            || $line =~ /^(.*?):(.*?):(.*?)(?::(.*))?$/
>            || $line =~ /^(.*?):(.*?)(?::(.*))?$/ )
>        {
>            $data->{$1}->{pass} = $2;
>            $data->{$1}->{emails} = $3 || '';
>            $data->{$1}->{flag} =
>              ( ( defined $4 ) && ( $4 == 0 ) ) ? 0 : ( $4 || '' );
>            $data->{$1}->{pass_change} = $5 || '';
>            $data->{$1}->{flag_change} = $6 || '';
>        }
>    }
>
>}
>
>use Data::Dumper;
>print Data::Dumper->Dump( [$data] );
>
>__DATA__
>AllPresent:hjliEO35kCgwI:all@example.com:1:23232:24324
>LastMissing:CyL92g3OKi.jM:lastmissing@example.com:1:2323
>FlagZero:CyL92g3OKi.jM:flagzero@example.com:0:23232
>OnlyFlag:ZuqpLZ7AxHBvw:onlyflag@example.com:0
>RestMissing:ZuqpLZ7AxHBvw:missing@example.com
>
>-------------------------
>Please note - only first two fields are must fields, rest may be
>missing from lines.  first is - username, second is encrypted
>password, third is email, fourth is flag to force user to change
>password(0-pass change is not required, 1-user must change pass),
>fifth- the last time password changed epoch time, sixth- time of
>change of flag (some times flag is changed by admins of the system).
>
>thanks in advance

use strict;
use warnings;

my $data = {};

for (<DATA>) {

# ^^ Why for ?

    my $line = $_;
    if ( defined $line ) {
        if (
#               $line =~ /^ (.*?) : (.*?) : (.*?)   : (.*?) : (.*?)  (?:  : (.*) ) ? $ /x
#            || $line =~ /^ (.*?) : (.*?) : (.*?)   : (.*?)          (?:  : (.*) ) ? $ /x
#            || $line =~ /^ (.*?) : (.*?) : (.*?)                    (?:  : (.*) ) ? $ /x
#            || $line =~ /^ (.*?) : (.*?)                            (?:  : (.*) ) ? $ /x )

               # ^^ Replace with this ->
                $line =~ /^ (.*?) :(.*?) (?::(.*?))? (?::(.*?))? (?::(.*?))? (?::(.*))? $ /x

               # Here, be more proactive, expect certain class of values with a length
               # Should 'if()' fail, send the line to a log file, don't clutter the hash

        )
        {
           # Validate more stuff, at least check that $1 is > ''

            $data->{$1}->{pass} = $2;
                #   ^^ could be '' then key is ''

            $data->{$1}->{emails} = $3 || '';
            $data->{$1}->{flag} =
              ( ( defined $4 ) && ( $4 == 0 ) ) ? 0 : ( $4 || '' );
                 #                     ^^ What if $4 is alpha-num ? Throws a warning
                 # Consolidate the above logic !

            $data->{$1}->{pass_change} = $5 || '';
            $data->{$1}->{flag_change} = $6 || '';
        }
    }

}

### ^^^^
### Would 'split()' be more affective ??


use Data::Dumper;
print Data::Dumper->Dump( [$data] );

__DATA__
AllPresent:hjliEO35kCgwI:all@example.com:1:23232:24324
LastMissing:CyL92g3OKi.jM:lastmissing@example.com:1:2323
FlagZero:CyL92g3OKi.jM:flagzero@example.com:0:23232
OnlyFlag:ZuqpLZ7AxHBvw:onlyflag@example.com:0
RestMissing:ZuqpLZ7AxHBvw:missing@example.com
:::asdf:

-sln


------------------------------

Date: Wed, 21 Jul 2010 16:35:42 -0700
From: sln@netherlands.com
Subject: Re: Need help/advice to improve script
Message-Id: <6v0f46lfupqo5b2enedjtt4mspd59142ao@4ax.com>

On Wed, 21 Jul 2010 16:30:43 -0700, sln@netherlands.com wrote:

>On Wed, 21 Jul 2010 02:17:20 -0700 (PDT), "sopan.shewale@gmail.com" <sopan.shewale@gmail.com> wrote:
>
>>Hello,
>>
>>I have Apache htpasswd format password store. The fields are ":"
>>separated and has more fields than standard format. I have script to
>>read the fields and load into Hash.
>>
>>I need advice/help to improve the script. The current one works-but
>>not really good script.  The script goes as follow:
>>--------------------------------
>>#!/usr/bin/perl
>>use strict;
>>use warnings;
>>
>>my $data = {};
>>
>>for (<DATA>) {
>>
>>    my $line = $_;
>>    if ( defined $line ) {
>>        if (   $line =~ /^(.*?):(.*?):(.*?):(.*?):(.*?)(?::(.*))?$/
>>            || $line =~ /^(.*?):(.*?):(.*?):(.*?)(?::(.*))?$/
>>            || $line =~ /^(.*?):(.*?):(.*?)(?::(.*))?$/
>>            || $line =~ /^(.*?):(.*?)(?::(.*))?$/ )
>>        {
>>            $data->{$1}->{pass} = $2;
>>            $data->{$1}->{emails} = $3 || '';
>>            $data->{$1}->{flag} =
>>              ( ( defined $4 ) && ( $4 == 0 ) ) ? 0 : ( $4 || '' );
>>            $data->{$1}->{pass_change} = $5 || '';
>>            $data->{$1}->{flag_change} = $6 || '';
>>        }
>>    }
>>
>>}
>>
>>use Data::Dumper;
>>print Data::Dumper->Dump( [$data] );
>>
>>__DATA__
>>AllPresent:hjliEO35kCgwI:all@example.com:1:23232:24324
>>LastMissing:CyL92g3OKi.jM:lastmissing@example.com:1:2323
>>FlagZero:CyL92g3OKi.jM:flagzero@example.com:0:23232
>>OnlyFlag:ZuqpLZ7AxHBvw:onlyflag@example.com:0
>>RestMissing:ZuqpLZ7AxHBvw:missing@example.com
>>
>>-------------------------
>>Please note - only first two fields are must fields, rest may be
>>missing from lines.  first is - username, second is encrypted
>>password, third is email, fourth is flag to force user to change
>>password(0-pass change is not required, 1-user must change pass),
>>fifth- the last time password changed epoch time, sixth- time of
>>change of flag (some times flag is changed by admins of the system).
>>
>>thanks in advance
>
>use strict;
>use warnings;
>
>my $data = {};
>
>for (<DATA>) {
>
[snip]

Oh, and the array primary keys are suseptible to bein overwritten as your
list processes.

-sln


------------------------------

Date: Wed, 21 Jul 2010 16:51:42 -0700
From: sln@netherlands.com
Subject: Re: Need help/advice to improve script
Message-Id: <1t1f46hgse4d6r1ec1mt8i56n2q4qvkjg2@4ax.com>

On Wed, 21 Jul 2010 19:54:36 +0200, "Dr.Ruud" <rvtol+usenet@xs4all.nl> wrote:

>Peter J. Holzer wrote:
>
>> But “empty” (you mean “undef”) isn't the same as a reference to an empty
>> hash. It makes no difference in this program (the anonymous hash is
>> autovivified at the first access), but an explicit initialization like
>> this:
>> 
>>  * Serves as a reminder that this variable is intended to be used as a 
>>    hashref (and not as an arrayref, string, number, or whatever)
>>  * May catch some usage errors 
>>  * May prevent autovivification at the wrong place. For example,
>>     my $data; func($data);
>>    is not the same as
>>     my $data = {}; func($data);
>>    if func() accesses $data->{something}.
>> 
>> So it's sometimes necessary, often useful and otherwise harmless. 
>
>I also, and pretty often, use it in this way:
>
>     my $dfbb = $data{ foo }{ bar }{ baz } ||= {};

If I have to spend more than 3 seconds understanding a line of
Perl that isin't a regular expression, I pass it by.

-sln


------------------------------

Date: Wed, 21 Jul 2010 22:08:22 -0700 (PDT)
From: "sopan.shewale@gmail.com" <sopan.shewale@gmail.com>
Subject: Re: Need help/advice to improve script
Message-Id: <c7835f7d-47d5-4d56-b08c-515dc5a3f77e@m35g2000prn.googlegroups.com>

Appreciate every one for responding to this question.

The solution with split is also good idea. I am sticking to following
solution.

------------------------

#!/usr/bin/perl
use strict;
use warnings;

my $data = {};

for (<DATA>) {
    chomp;
    my $line = $_;

    if ( defined $line ) {
        if ( $line =~ /^([^:]*):([^:]*):([^:]*)(?::([^:]*)(?::([^:]*)
(?::([^:]*)(?::(.*))?)?)?)?$/ )
 {

            $data->{$1}->{pass} = $2;
               #   ^^ $1 will never be '' or undef...
               #   in that case there is issue with password line
            $data->{$1}->{emails} = $3 || '';
            $data->{$1}->{flag} =
              ( ( defined $4 ) && ( $4 == 0 ) ) ? 0 : ( $4 || '' );
               #                we are assumming $4 as 0 or 1
            $data->{$1}->{pass_change} = $5 || '';
            $data->{$1}->{flag_change} = $6 || '';
        }
    }

}

use Data::Dumper;
print Data::Dumper->Dump( [$data] );

__DATA__
AllPresent:hjliEO35kCgwI:all@example.com:1:23232:24324
LastMissing:CyL92g3OKi.jM:lastmissing@example.com:1:2323
FlagZero:CyL92g3OKi.jM:flagzero@example.com:0:23232
OnlyFlag:ZuqpLZ7AxHBvw:onlyflag@example.com:0
RestMissing:ZuqpLZ7AxHBvw:missing@example.com
-----------------------------------------------------------

I got the help on regex from Peter Thoeny (http://twiki.org)
The part of this code is checked-in into TWiki repository
thanks every one for helping





------------------------------

Date: Thu, 22 Jul 2010 01:43:05 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Need help/advice to improve script
Message-Id: <8739vcm5qu.fsf@quad.sysarch.com>

>>>>> "ssc" == sopan shewale@gmail com <sopan.shewale@gmail.com> writes:

  ssc> Appreciate every one for responding to this question.
  ssc> The solution with split is also good idea. I am sticking to following
  ssc> solution.

why would you stick with this complex solution over the much simpler and
faster split?

  ssc>     if ( defined $line ) {
  ssc>         if ( $line =~ /^([^:]*):([^:]*):([^:]*)(?::([^:]*)(?::([^:]*)
  ssc> (?::([^:]*)(?::(.*))?)?)?)?$/ )

ok, here is a test. explain what that line is doing in 30 words or
less. if you can't explain a single line of code like that, it is overly
complex code. note that the ? and : chars both play two different roles
(and many dupes of them). that is NOT a good solution. just my strong
opinion.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Wed, 21 Jul 2010 22:42:48 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: Stress testing process start
Message-Id: <slrni4etv8.1aq.nospam-abuse@powdermilk.math.berkeley.edu>

On 2010-07-21, QoS <email@invalid.net> wrote:
> On Windows XP (This is perl, v5.8.8 built for MSWin32-x86-multi-thread):
>
> List form of pipe open not implemented at C:\Documents and Settings\Admin\Deskto
> p\tst.pl line 23.

You mean that

      open GRANDKID, q(-|), "$^X -wl tmp_script.pl 2>&1" or die 12;

is not working?!!  This is not even a "List form", it is just 3-arg form...
I specially spent a lot of effort to not use the list form to support
inferior ports...

So: Thanks, I converted 3-arg form to 2-arg form, and updated the script.

Yours,
Ilya



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3039
***************************************


home help back first fref pref prev next nref lref last post