[32111] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3376 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu May 5 18:09:25 2011

Date: Thu, 5 May 2011 15:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 5 May 2011     Volume: 11 Number: 3376

Today's topics:
    Re: FAQ flood MUST end <john@castleamber.com>
        Moving data from one machine to another <justin.1104@purestblue.com>
    Re: Moving data from one machine to another <cartercc@gmail.com>
    Re: Moving data from one machine to another <john@castleamber.com>
    Re: Sandboxing re <xhoster@gmail.com>
    Re: Sandboxing re <john@castleamber.com>
    Re: Sandboxing re <derykus@gmail.com>
    Re: Sandboxing re <cartercc@gmail.com>
    Re: UK postcode validation <bytebrothers.uk@gmail.com>
    Re: UK postcode validation sln@netherlands.com
    Re: UK postcode validation sln@netherlands.com
    Re: UK postcode validation (David Canzi)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 04 May 2011 20:32:52 -0500
From: John Bokma <john@castleamber.com>
Subject: Re: FAQ flood MUST end
Message-Id: <87ei4e2apn.fsf@castleamber.com>

Keith Thompson <kst-u@mib.org> writes:

> John Bokma <john@castleamber.com> writes:
> [...]
>> Maybe moving the outdated ones to a separate "outdated" FAQ. I do think
>> the variable suicide entry is still good to know about (I do now and
>> then help customers running on very antique Perl), but not to be
>> posted here.
> [...]
>
> I should have read that before posting my previous followup.
>
> Yes, it makes sense that some FAQs might be worth keeping, but not worth
> posting.  On the other hand, posting them here doesn't bother me (and
> the first sentence makes it clear that it applies only to old versions
> of Perl).

Hmmm, maybe flag them in the subject?

N - for new FAQ entry
U - for updated FAQ entry
O - for obsolete FAQ entry

That way I can score the new & updated ones high, and the obsolete ones
low.

-- 
John Bokma                                                               j3b

Blog: http://johnbokma.com/    Facebook: http://www.facebook.com/j.j.j.bokma
    Freelance Perl & Python Development: http://castleamber.com/


------------------------------

Date: Thu, 5 May 2011 16:49:22 +0100
From: Justin C <justin.1104@purestblue.com>
Subject: Moving data from one machine to another
Message-Id: <2hta98-ive.ln1@zem.masonsmusic.co.uk>

I have some SQL that extracts data from a database into a text file
(it's run by a cron job), the text file is then emailed to another
machine where the data will be used.

I did look at running the SQL from the final destination machine so that
I don't have to massage the data twice but I can't install the module
required (DBD::Unify) without a 'working environment' - which is on the
other machine.

So I'm exporting the data to perl sticking it into a hash, exporting
with Storable and emailing the file to another machine where procmail
hands the file off to another perl program which, with Storable,
retrieves the data... only it doesn't. If I print the exported data into
the email I get 'Corrupted storable file', if I use nstore I get 'Byte
order is not compatible at...'. And if I, instead, use MIME::Lite I
don't seem to get the same file out the other end that I put in...
probably down to MIME::Lite encoding the file - also, I'm not convinced
that I can get procmail to just give me the attachment, and not the
whole email.

I've contemplated Net::SCP, but the two users are different and
therefore the scp can't be done without authentication (no pre-shared
keys being different users).

This seems to be getting much more complicated than it need be, there
must be [another|a better] way. Any suggestions how I might better do
this? Am I over thinking this and I should just massage the data twice?

   Justin.

-- 
Justin C, by the sea.


------------------------------

Date: Thu, 5 May 2011 09:28:55 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Moving data from one machine to another
Message-Id: <e75bfce6-e1e5-4b17-b521-18e4832a6d47@f15g2000pro.googlegroups.com>

On May 5, 11:49=A0am, Justin C <justin.1...@purestblue.com> wrote:
> I have some SQL that extracts data from a database into a text file
> (it's run by a cron job), the text file is then emailed to another
> machine where the data will be used.

I do this a lot in my job, and my usual method is to use FTP (or
similar) to do the transfer. Now, I mostly use pscp (part of putty),
and it works without a hitch.

SERVER 1: Connect to the database, run the query, retrieve the data,
run a script to create an appropriate CSV file, and park the file on
the machine.

SERVER 2: Connect to SERVER 1, transfer the file using FTP or similar,
and run a script that (1) opens and reads the file, (2) processes the
data as necessary, (3) creates the appropriate documents (PDF, RTF,
CSV, etc.), (4) sends the emails with attached files, and (5) writes
to various logs. This also includes making the files available for
other processes to use, and FTPing some files back to a database
server for inserting the altered records in the database.

In my case, I have to merge and alter data in various ways, and it's
easier to do this on an ad hoc basis, rather than have one monolithic
script that does it all. I put the common code in a module, so I only
have to maintain one copy, and call those functions in the script
files that run automatically.

I have five scripts that run on a daily basis that do this, have run
for five or more years, and that have never failed except for hardware
or network problems.

CC.


------------------------------

Date: Thu, 05 May 2011 12:05:12 -0500
From: John Bokma <john@castleamber.com>
Subject: Re: Moving data from one machine to another
Message-Id: <87iptp3won.fsf@castleamber.com>

Justin C <justin.1104@purestblue.com> writes:

> I have some SQL that extracts data from a database into a text file
> (it's run by a cron job), the text file is then emailed to another
> machine where the data will be used.
>
> I did look at running the SQL from the final destination machine so that
> I don't have to massage the data twice but I can't install the module
> required (DBD::Unify) without a 'working environment' - which is on the
> other machine.

You mean a compiler? While most likely not recommended in general, just
install it on the working environment in a directory you specify (CPAN),
make a tarball of it, copy it to the other machine, and set PERL5LIB, or
use use lib '....';

> So I'm exporting the data to perl sticking it into a hash, exporting
> with Storable and emailing the file to another machine where procmail
> hands the file off to another perl program which, with Storable,
> retrieves the data... only it doesn't. If I print the exported data into
> the email I get 'Corrupted storable file', if I use nstore I get 'Byte
> order is not compatible at...'. And if I, instead, use MIME::Lite I
> don't seem to get the same file out the other end that I put in...
> probably down to MIME::Lite encoding the file - also, I'm not convinced
> that I can get procmail to just give me the attachment, and not the
> whole email.

Why on Earth are you using Storable? It's a text file... I would say:
gzip the data and email that, and then do the post-processing on the
other machine. If you want to "massage" the data on-site, why not store
it using JSON or YAML? It sounds like most of it is text data. You can
always compress the JSON/YAML output using gzip.

MIME::Lite shouldn't be a problem. How do you decode the email on the
other side?

-- 
John Bokma                                                               j3b

Blog: http://johnbokma.com/    Facebook: http://www.facebook.com/j.j.j.bokma
    Freelance Perl & Python Development: http://castleamber.com/


------------------------------

Date: Wed, 04 May 2011 20:17:28 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: Sandboxing re
Message-Id: <4dc2171e$0$27407$ed362ca5@nr5-q3a.newsreader.com>

John Bokma wrote:
> I want to be able to allow users to enter regular expressions. Since
> Perl's default re are way too powerful (code execution), 

I thought that that was allowed only under "use re qw(eval);"

Xho


------------------------------

Date: Wed, 04 May 2011 22:36:59 -0500
From: John Bokma <john@castleamber.com>
Subject: Re: Sandboxing re
Message-Id: <87hb99n7hg.fsf@castleamber.com>

Xho Jingleheimerschmidt <xhoster@gmail.com> writes:

> John Bokma wrote:
>> I want to be able to allow users to enter regular expressions. Since
>> Perl's default re are way too powerful (code execution), 
>
> I thought that that was allowed only under "use re qw(eval);"

Thanks Xho, wasn't aware of that one (I still haven't learned all new
features).

If it's default off, are there any other possible unsafe constructs?

Thanks,

-- 
John Bokma                                                               j3b

Blog: http://johnbokma.com/    Facebook: http://www.facebook.com/j.j.j.bokma
    Freelance Perl & Python Development: http://castleamber.com/


------------------------------

Date: Thu, 5 May 2011 04:29:53 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: Sandboxing re
Message-Id: <b7a23b4d-e126-4cf1-9ffe-b3f80ddb6770@k15g2000pri.googlegroups.com>

On May 4, 8:36=A0pm, John Bokma <j...@castleamber.com> wrote:
> Xho Jingleheimerschmidt <xhos...@gmail.com> writes:
> > John Bokma wrote:
> >> I want to be able to allow users to enter regular expressions. Since
> >> Perl's default re are way too powerful (code execution),
>
> > I thought that that was allowed only under "use re qw(eval);"
>
> Thanks Xho, wasn't aware of that one (I still haven't learned all new
> features).
>
> If it's default off, are there any other possible unsafe constructs?
>
>

although tagged experimental, the extended
"(?{ code })" pattern:


perl  -we '"f" =3D~/f(?{qx{rm somefile}})/'

* deletes in 5.10.1 and 5.12.2

--
Charles DeRykus





------------------------------

Date: Thu, 5 May 2011 06:22:29 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Sandboxing re
Message-Id: <589a5094-11a7-49cc-a47e-3792899757c8@k15g2000pri.googlegroups.com>

On May 4, 4:41=A0pm, John Bokma <j...@castleamber.com> wrote:
> I want to be able to allow users to enter regular expressions. Since
> Perl's default re are way too powerful (code execution), is there some
> way to sandbox or limit the the capabilities?

You have a similar condition when you accept user supplied SQL
statements for a database app. If you use taint mode (assuming use of
HTTP and port 80) you wash the user supplied command through a regular
expression.

I'm not sure how you would code an application to use a user supplied
regular expression as a command, and I don't really think you can,
again assuming use of HTTP and port 80. If you are using the REs to
match against text, I think you would have to engage in some pretty
fancy gymnastics to execute the user supplied values as code.

Do you mean, like this?

use strict;
use warnings;

print "Enter search string: ";
my $user_supplied =3D <STDIN>;
chomp $user_supplied;
while (<DATA>)
{
  print();
  do_something($_) if $_ =3D~ /$user_supplied/;
}

sub do_something
{
  my $val =3D shift;
  print qq(\nI say again: $val\n);
}


__DATA__
George Washington
John Adams
Thomas Jefferson
James Madison
James Monroe
John Quincy Adams


------------------------------

Date: Thu, 5 May 2011 01:24:36 -0700 (PDT)
From: Keith <bytebrothers.uk@gmail.com>
Subject: Re: UK postcode validation
Message-Id: <1623d5a5-2916-4c8c-b4bf-f7838d94c06f@x1g2000yqb.googlegroups.com>

On May 4, 4:46=A0pm, "Uri Guttman" <u...@StemSystems.com> wrote:
> >>>>> "K" =3D=3D Keith =A0<bytebrothers...@gmail.com> writes:
>
> =A0 K> I recently found this monster on Wikipedia, which validates almost=
 all
> =A0 K> UK postcodes, catching all invalid areas, and most invalid distric=
ts.
> =A0 K> I'm just wondering whether this can be 'cleaned up' or made more
> =A0 K> readable for maintainers. =A0I've had a go today, but it still loo=
ks a
> =A0 K> mess. =A0Here's the regex:
<snip>
> some quick comments.
>
> use the /x modifier so you can write that over multiple lines and
> comment the parts. it will also allow you to show the nested groups
> better.

I'd missed that one.  This is much better - here's what my little test
script looks like now...

#=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
#!perl
use strict;
use warnings;

my $postcode_regex =3D '^(GIR 0AA)|
                      (((A[BL]|
                         B[ABDHLNRSTX]?|
                         C[ABFHMORTVW]|
                         D[ADEGHLNTY]|
                         E[HNX]?|
                         F[KY]|
                         G[LUY]?|
                         H[ADGPRSUX]|
                         I[GMPV]|
                         JE|
                         K[ATWY]|
                         L[ADELNSU]?|
                         M[EKL]?|
                         N[EGNPRW]?|
                         O[LX]|
                         P[AEHLOR]|
                         R[GHM]|
                         S[AEGKLMNOPRSTY]?|
                         T[ADFNQRSW]|
                         UB|
                         W[ADFNRSV]|
                         YO|
                         ZE)
                      [1-9]?[0-9]|
                      ((E|N|NW|SE|SW|W)1|
                      EC[1-4]|
                      WC[12])[A-HJKMNPR-Y]|
                      (SW|W)([2-9]|[1-9][0-9])|
                      EC[1-9][0-9])\
                      [0-9][ABD-HJLNP-UW-Z]{2})$';
while (<>)
{
  next if substr($_,0,1) eq '#';
  chomp;
  if (/$postcode_regex/ox) {
    print $_, 'is a valid postcode', "\n";
  }
  else {
    print $_, 'is a bad postcode', "\n";
  }
}
#=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D


------------------------------

Date: Thu, 05 May 2011 09:05:46 -0700
From: sln@netherlands.com
Subject: Re: UK postcode validation
Message-Id: <a5i5s696sucbd50uq5isg69am6enfbicrq@4ax.com>

On Wed, 4 May 2011 08:37:08 -0700 (PDT), Keith <bytebrothers.uk@gmail.com> wrote:

>
>I recently found this monster on Wikipedia, which validates almost all
>UK postcodes, catching all invalid areas, and most invalid districts.
>I'm just wondering whether this can be 'cleaned up' or made more
>readable for maintainers.  I've had a go today, but it still looks a
>mess.  Here's the regex:
>
>'^(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|
>F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|
>N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|
>W[ADFNRSV]|YO|ZE)[1-9]?[0-9]|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-
>HJKMNPR-Y]|(SW|W)([2-9]|[1-9][0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]
>{2})$';
>
>I know the space after the initial 'GIR' screws the formatting, but
>that really is just one long regex.
>
>Anyone any good at these things?

I ran this through my cgrx program:

c:\temp>perl cgrx.pl itt.txt

'^(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|
--1---------234--------------------------------------------------------
F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|
N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|
W[ADFNRSV]|YO|ZE)[1-9]?[0-9]|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-
-----------------------------56-----------------------------------
HJKMNPR-Y]|(SW|W)([2-9]|[1-9][0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]
-----------7-----8-----------------------------------------------------
{2})$';

It found all the ugly capture grouping.

It is better to expand it as others have said with the /x modifier.
As for those embedded spaces, if using /x, throw in a [ ] for the space.
The proper way to FULLY format this regex is something like below.
This is individual taste though.
Note that capture groups are not even necessary (although I left them in),
use (?:) instead.

-sln

-------------

use strict;
use warnings;


if ("EC22 0AA" =~ /

^
  (GIR [ ] 0AA)                         #1
|
  (                                 #start 2
     (                                  #start 3
          (                                  #start 4
              A[BL]
            | B[ABDHLNRSTX]?
            | C[ABFHMORTVW]
            | D[ADEGHLNTY]
            | E[HNX]?
            | F[KY]
            | G[LUY]?
            | H[ADGPRSUX]
            | I[GMPV]
            | JE
            | K[ATWY]
            | L[ADELNSU]?
            | M[EKL]?
            | N[EGNPRW]?
            | O[LX]
            | P[AEHLOR]
            | R[GHM]
            | S[AEGKLMNOPRSTY]?
            | T[ADFNQRSW]
            | UB
            | W[ADFNRSV]
            | YO
            | ZE
          )                                  #end 4
          [1-9]?[0-9]
       | 
          (                                  #start 5
              ( E | N | NW | SE | SW | W )       #6
              1
            | EC[1-4]
            | WC[12]
          )                                  #end 5
          [A-HJKMNPR-Y]
       |
          (SW | W)                           #7
          ([2-9] | [1-9][0-9])               #8
       |
          EC[1-9][0-9] [ ] 
     )                                  #end 3
     [0-9][ABD-HJLNP-UW-Z]{2}
  )                                 #end 2

$
/x )

{
    print "matched $&\n";
}



------------------------------

Date: Thu, 05 May 2011 09:26:52 -0700
From: sln@netherlands.com
Subject: Re: UK postcode validation
Message-Id: <qfj5s6lpl2eilav6dof2gqstd00nfeva5u@4ax.com>

On Thu, 5 May 2011 01:24:36 -0700 (PDT), Keith <bytebrothers.uk@gmail.com> wrote:

>On May 4, 4:46 pm, "Uri Guttman" <u...@StemSystems.com> wrote:
>> >>>>> "K" == Keith  <bytebrothers...@gmail.com> writes:
>>
>>   K> I recently found this monster on Wikipedia, which validates almost all
>>   K> UK postcodes, catching all invalid areas, and most invalid districts.
>>   K> I'm just wondering whether this can be 'cleaned up' or made more
>>   K> readable for maintainers.  I've had a go today, but it still looks a
>>   K> mess.  Here's the regex:
><snip>
>> some quick comments.
>>
>> use the /x modifier so you can write that over multiple lines and
>> comment the parts. it will also allow you to show the nested groups
>> better.
>
>I'd missed that one.  This is much better - here's what my little test
>script looks like now...

But it doesn't work does it?

>
>#=============================================
>#!perl
>use strict;
>use warnings;
>
>my $postcode_regex = '^(GIR 0AA)|
                            ^
  this is ignored when using /x modifier.
  better to use [] or '\ ' here

>                      (((A[BL]|
                       ^^^
  overall, this is bad formatting when using /x modifier.

[snip]
>                      (SW|W)([2-9]|[1-9][0-9])|
>                      EC[1-9][0-9])\
                                    ^ 
  you are escaping a newline, there is no space after this.
  better to use [] or '\ ' here

>                      [0-9][ABD-HJLNP-UW-Z]{2})$';

Better to quote the regex using qr//.
my $postcode_regex = qr/^ ... $/;
 
-sln


------------------------------

Date: Thu, 5 May 2011 21:59:57 +0000 (UTC)
From: dmcanzi@remulak.uwaterloo.ca (David Canzi)
Subject: Re: UK postcode validation
Message-Id: <ipv6kt$m8n$1@rumours.uwaterloo.ca>

In article <4b14e639-a0ca-48c5-931b-000ab781c362@e35g2000yqc.googlegroups.com>,
Keith  <bytebrothers.uk@gmail.com> wrote:
>
>I recently found this monster on Wikipedia, which validates almost all
>UK postcodes, catching all invalid areas, and most invalid districts.
>I'm just wondering whether this can be 'cleaned up' or made more
>readable for maintainers.  I've had a go today, but it still looks a
>mess.  Here's the regex:
>
>'^(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|
>F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|
>N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|
>W[ADFNRSV]|YO|ZE)[1-9]?[0-9]|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-
>HJKMNPR-Y]|(SW|W)([2-9]|[1-9][0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]
>{2})$';
>
>I know the space after the initial 'GIR' screws the formatting, but
>that really is just one long regex.
>
>Anyone any good at these things?

The ^ and $ in the expression above are not doing what you think
they're doing.  Your pattern would classify "GIR 0AA badger badger"
as a valid postcode.

The regex "^(abc)|(def)$" matches "abc foo" and "bar def".  This
can be corrected by adding more parentheses: "^((abc)|(def))$"
Now it only matches 'abc' and "def".

The original parentheses in my example pattern are unnecessary,
so it can be simplified to: "^(abc|def)$"  The same two pairs of
parentheses in your expression are also unnecessary.  This is
easy to tell for the pair around "GIR 0AA", and not so easy to
tell for the other pair.

Wouldn't it be nice if there was some site you could use to
check the validity of postcodes over the web?

-- 
David Canzi		| Life is too short to point out every mistake. |


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3376
***************************************


home help back first fref pref prev next nref lref last post