[33127] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4403 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Mar 30 16:48:42 2015

Date: Sun, 29 Mar 2015 06:09:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 29 Mar 2015     Volume: 11 Number: 4403

Today's topics:
    Re: An error on page 142 of The Camel Book. (Seymour J.)
        idea for use of a & prototype <rweikusat@mobileactivedefense.com>
        Replacing addresses with regex's <noreply2me@yahoo.com>
    Re: Replacing addresses with regex's <hhr-m@web.de>
    Re: Replacing addresses with regex's <rweikusat@mobileactivedefense.com>
    Re: Replacing addresses with regex's <gravitalsun@hotmail.foo>
    Re: Replacing addresses with regex's <rweikusat@mobileactivedefense.com>
    Re: Replacing addresses with regex's <sbryce@scottbryce.com>
    Re: Replacing addresses with regex's <rweikusat@mobileactivedefense.com>
    Re: Replacing addresses with regex's <gamo@telecable.es>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 27 Mar 2015 17:51:55 -0400
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: An error on page 142 of The Camel Book.
Message-Id: <5515d0fb$12$fuzhry+tra$mr2ice@news.patriot.net>

In <87r3sacsr6.fsf@doppelsaurus.mobileactivedefense.com>, on
03/27/2015
   at 04:25 PM, Rainer Weikusat <rweikusat@mobileactivedefense.com>
said:

>I also don't see much of a difference between

>while (!$done) {
>	# some statements here

>	if (something()) {
>        	$done = 1
>	} else {
>        	# some more statements here
>	}
>}

>and

>ups:
>while (!$done) {
>	# ...

>        if (something()) goto ups;

>        # ...
>}

The obvious difference is that the first terminates after something()
returns true while the second does not. Perhaps you meant to put the
label at the end.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org



------------------------------

Date: Fri, 27 Mar 2015 23:18:29 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: idea for use of a & prototype
Message-Id: <87384qc9mi.fsf@doppelsaurus.mobileactivedefense.com>

Using & as prototype for the first argument of a subroutine means the
first argument pass to this subroutine has to be a reference to an
anonymous subroutine and the 'sub' keyword may be omitted from the
definition. I didn't use this for anything (except 'pseudo loop
examples') so far, however, an idea I consider rather neat is using at
as follows:

    until_success {
	sleep(rand(10));
	exit(1) if rand(100) >= 50;
	
	$rq = make_request($srv_info);
	print($rq->format());
    }

[the code in the subroutine is obviously just for testing]

Ultimatively, this will end up connecting to an Apple DEP server in
order to retrieve a (so-called) session authentication token which comes
with all the usual perils of network programming so the 'communication'
part should be retried (with some randomized delay) unless the process
is stopped by admin intervention before it succeeded (not supposed to be
used interactively).

The until_success subroutine implements the 'try this, wait for some
time if it didn't work and try again' part while the productive code
will end up in the block.


------------------------------

Date: Sat, 28 Mar 2015 04:23:24 -0700
From: "Robert Crandal" <noreply2me@yahoo.com>
Subject: Replacing addresses with regex's
Message-Id: <-pqdnV_modu2EovInZ2dnUVZ5hOdnZ2d@giganews.com>

So, if I have the following string:

"He lives at 550 S. Gutensohn Road near me."

I need to transform it into this new string:

"He lives at ****************** near me."

But, since there are lots of different ways to
represent a US street address, I am not sure how
to setup a good regular expression that can cover
as many addresses as possible.

What comes to mind is something like this:

(\d)+ (N.|S.|E.|W.) \w*(Road|Street|Avenue|Drive)

That is a basic solution that comes to mind, but can anyone
give me a better solution or any tips?  I don't expect to be
able to detect every possible street address, but any suggestions
will be helpful.

thank you





------------------------------

Date: Sat, 28 Mar 2015 12:38:08 +0100
From: Helmut Richter <hhr-m@web.de>
Subject: Re: Replacing addresses with regex's
Message-Id: <mf63qr$1np$1@news.in.tum.de>

Am 28.03.2015 um 12:23 schrieb Robert Crandal:

> What comes to mind is something like this:
>
> (\d)+ (N.|S.|E.|W.) \w*(Road|Street|Avenue|Drive)

So, you are sure that every street *must* contain N/S/E/W, and that 
these are always abbreviated with the initial letter plus one more 
character (this is what the dot stands for)? I would be much more 
liberal in what follows the street number, e.g.


\d+ .+? (Road|Street|Avenue|Drive|Place|Square|Plaza)

The best solution is probably between your restrictive and my liberal one.

-- 
Helmut Richter



------------------------------

Date: Sat, 28 Mar 2015 14:59:02 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Replacing addresses with regex's
Message-Id: <87619lcgnd.fsf@doppelsaurus.mobileactivedefense.com>

"Robert Crandal" <noreply2me@yahoo.com> writes:
> So, if I have the following string:
>
> "He lives at 550 S. Gutensohn Road near me."
>
> I need to transform it into this new string:
>
> "He lives at ****************** near me."
>
> But, since there are lots of different ways to
> represent a US street address, I am not sure how
> to setup a good regular expression that can cover
> as many addresses as possible.

What about something like this?

s/\d+ (?:[A-Z]\w*\.?([ .?!,]))+/**$1/

(matches a sequence of digits followed by some possibly abbreviated,
capitalized words followed by a space or an 'end of sentence' marker).

The general solution is to define some kind of grammar for 'addresses'
and use a regex implementing that.

Possibly helpful idea:

------
my $in  = <<TT;
He lives at 550 S. Gutensohn Road, she lives at 16 Possessed Gravel Mews,
the kids roam about 550 S. Gutensohn Road or 16 Possessed Gravel Mews.
TT

my (%addrs, $n, $a, $s);

$in =~ s/(\d+ (?:[A-Z]\w*\.?([ .?!,]))+)/$a = substr($1, 0, -1);$s = $2;$addrs{$a} or $addrs{$a} = "**".++$n;"$addrs{$a}$s"/ge;

print $in;

print STDERR ("$_\t=>\t$addrs{$_}\n") for keys(%addrs);
------

ie, attach a running number to each replacement string and print a
dictionary of replaced and replacement strings so that errors made by
the code can be correct easily.


------------------------------

Date: Sat, 28 Mar 2015 17:17:24 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: Replacing addresses with regex's
Message-Id: <mf6gm8$6td$1@news.grnet.gr>

On 28/3/2015 1:23 μμ, Robert Crandal wrote:
> So, if I have the following string:
>
> "He lives at 550 S. Gutensohn Road near me."
>
> I need to transform it into this new string:
>
> "He lives at ****************** near me."
>
> But, since there are lots of different ways to
> represent a US street address, I am not sure how
> to setup a good regular expression that can cover
> as many addresses as possible.
>
> What comes to mind is something like this:
>
> (\d)+ (N.|S.|E.|W.) \w*(Road|Street|Avenue|Drive)
>
> That is a basic solution that comes to mind, but can anyone
> give me a better solution or any tips?  I don't expect to be
> able to detect every possible street address, but any suggestions
> will be helpful.
>
> thank you
>
>
>
I suspect that what you need is this
https://metacpan.org/pod/HTML::Template



------------------------------

Date: Sat, 28 Mar 2015 15:20:58 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Replacing addresses with regex's
Message-Id: <871tk9cfmt.fsf@doppelsaurus.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

[...]

> $in =~ s/(\d+ (?:[A-Z]\w*\.?([ .?!,]))+)/$a = substr($1, 0, -1);$s = $2;$addrs{$a} or $addrs{$a} = "**".++$n;"$addrs{$a}$s"/ge;

RSPCA[*] version

$in=~s/(\d+ (?:[A-Z]\w*\.?([ .?!,]))+)/$s=$2;($a)=$1=~m=(.*).=;($addrs{$a}||='='.++$n).$s/ge;

[*] Royal Society for Parsing Convoluted Agglomerations


------------------------------

Date: Sat, 28 Mar 2015 11:19:39 -0600
From: Scott Bryce <sbryce@scottbryce.com>
Subject: Re: Replacing addresses with regex's
Message-Id: <mf6npr$mi5$1@dont-email.me>

On 3/28/2015 5:23 AM, Robert Crandal wrote:
> That is a basic solution that comes to mind

But it won't work in Utah where it is not uncommon for an address to
look like this:

5643 West 4700 South

Or Seattle, where addresses can look like

4567 NE 8th St

or

4567 8th Ave NE

I'm sure there a lot of other common patterns that you will never be
able to identify.


------------------------------

Date: Sat, 28 Mar 2015 17:51:37 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Replacing addresses with regex's
Message-Id: <871tk9au3a.fsf@doppelsaurus.mobileactivedefense.com>

Scott Bryce <sbryce@scottbryce.com> writes:
> On 3/28/2015 5:23 AM, Robert Crandal wrote:
>> That is a basic solution that comes to mind
>
> But it won't work in Utah where it is not uncommon for an address to
> look like this:
>
> 5643 West 4700 South
>
> Or Seattle, where addresses can look like
>
> 4567 NE 8th St
>
> or
>
> 4567 8th Ave NE
>
> I'm sure there a lot of other common patterns that you will never be
> able to identify.

The problem is that there isn't a common pattern, ie, there is no
defined/ enforced grammar for 'street addresses' but the OP already
wrote that he wasn't expecting to be able to match all existing or
conceivable addresses. OTOH, matching _most_ of the street addresses
actually appearing in some text (while not matching lots of things
which aren't addresses) ought to be possible (the only 'common pattern'
I can see in your examples is that they all contain several sequence of
lower- and uppercase letters and digits).


------------------------------

Date: Sat, 28 Mar 2015 21:35:00 +0100
From: gamo <gamo@telecable.es>
Subject: Re: Replacing addresses with regex's
Message-Id: <mf739l$i56$1@speranza.aioe.org>

El 28/03/15 a las 18:51, Rainer Weikusat escribió:
> Scott Bryce <sbryce@scottbryce.com> writes:
>> On 3/28/2015 5:23 AM, Robert Crandal wrote:
>>> That is a basic solution that comes to mind
>>
>> But it won't work in Utah where it is not uncommon for an address to
>> look like this:
>>
>> 5643 West 4700 South
>>
>> Or Seattle, where addresses can look like
>>
>> 4567 NE 8th St
>>
>> or
>>
>> 4567 8th Ave NE
>>
>> I'm sure there a lot of other common patterns that you will never be
>> able to identify.
>
> The problem is that there isn't a common pattern, ie, there is no
> defined/ enforced grammar for 'street addresses' but the OP already
> wrote that he wasn't expecting to be able to match all existing or
> conceivable addresses. OTOH, matching _most_ of the street addresses
> actually appearing in some text (while not matching lots of things
> which aren't addresses) ought to be possible (the only 'common pattern'
> I can see in your examples is that they all contain several sequence of
> lower- and uppercase letters and digits).
>

But there are warnings before an address, as live/lives/reside/address
after you could replace any digit by ***, obfuscating the content.

-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4403
***************************************


home help back first fref pref prev next nref lref last post