[31164] in Perl-Users-Digest
Perl-Users Digest, Issue: 2409 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon May 11 16:09:48 2009
Date: Mon, 11 May 2009 13:09:13 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 11 May 2009 Volume: 11 Number: 2409
Today's topics:
Re: FAQ 6.15 How can I print out a word-frequency or li jidanni@jidanni.org
Re: FAQ 6.15 How can I print out a word-frequency or li <uri@stemsystems.com>
Re: FAQ 6.15 How can I print out a word-frequency or li <nat.k@gm.ml>
Re: Finding domain and subdomains from host name <nospam@somewhere.com>
Re: Help with regexp <cartercc@gmail.com>
Re: Help with regexp <1usa@llenroc.ude.invalid>
Re: IO::Socket::INET on OSX or TCP stack problem derykus@gmail.com
Re: IO::Socket::INET on OSX or TCP stack problem <uri@stemsystems.com>
Re: IO::Socket::INET on OSX or TCP stack problem derykus@gmail.com
Re: Proposal: Image::EXIF::DateTime::Parser <marcin.owsiany@gmail.com>
Syntax of split (WAS: writing get_script()) <jurgenex@hotmail.com>
Re: writing get_script() <jurgenex@hotmail.com>
Re: writing get_script() <jurgenex@hotmail.com>
Re: writing get_script() <someone@example.com>
Re: writing get_script() <someone@example.com>
Re: writing get_script() <jurgenex@hotmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 12 May 2009 01:59:33 +0800
From: jidanni@jidanni.org
Subject: Re: FAQ 6.15 How can I print out a word-frequency or line-frequency summary?
Message-Id: <877i0n7c6y.fsf@jidanni.org>
I saved you 2 bytes!:
--- perlfaq6.ORIG.pod 2009-05-12 01:56:38.000000000 +0800
+++ perlfaq6.pod 2009-05-12 01:57:44.706933151 +0800
@@ -539,8 +539,8 @@
in the previous question:
while (<>) {
- while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
- $seen{$1}++;
+ while ( /\b[^\W_\d][\w'-]+\b/g ) { # misses "`sheep'"
+ $seen{$&}++;
}
}
------------------------------
Date: Mon, 11 May 2009 14:17:42 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: FAQ 6.15 How can I print out a word-frequency or line-frequency summary?
Message-Id: <87pref8px5.fsf@quad.sysarch.com>
>>>>> "j" == jidanni <jidanni@jidanni.org> writes:
j> I saved you 2 bytes!:
and you slowed down every other regex in the entire program! congrats!!
j> + $seen{$&}++;
read up on why $& is bad for your program speed and is highly
deprecated.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Mon, 11 May 2009 12:03:54 -0700
From: Nathan Keel <nat.k@gm.ml>
Subject: Re: FAQ 6.15 How can I print out a word-frequency or line-frequency summary?
Message-Id: <uE_Nl.62773$WT7.50966@newsfe11.iad>
jidanni@jidanni.org wrote:
> I saved you 2 bytes!:
> --- perlfaq6.ORIG.pod 2009-05-12 01:56:38.000000000 +0800
> +++ perlfaq6.pod 2009-05-12 01:57:44.706933151 +0800
> @@ -539,8 +539,8 @@
> in the previous question:
>
> while (<>) {
> - while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
> - $seen{$1}++;
> + while ( /\b[^\W_\d][\w'-]+\b/g ) { # misses "`sheep'"
> + $seen{$&}++;
> }
> }
I wouldn't use $&
------------------------------
Date: Mon, 11 May 2009 15:39:47 -0400
From: "Thrill5" <nospam@somewhere.com>
Subject: Re: Finding domain and subdomains from host name
Message-Id: <gu9uu5$pl8$1@news.motzarella.org>
"Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in message
news:slrnh0eeoi.phs.hjp-usenet2@hrunkner.hjp.at...
> On 2009-05-10 17:46, Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
>> John wrote:
>>> But, my original problem which is to locate all domains using a single
>>> IP
>>> address remains.
>>> My recent searching throws up 'reverse IP' as maybe the term I should be
>>> looking at to find all domains on a single IP.
>>
>> No. All domains resolve to an IP,
>
> No. Many domains don't have A records. Some have NS or MX or SRV or TXT
> records, some don't have any records at all and serve only as containers
> for their subdomains. And that's without considering specialized domains
> like in-addr.arpa.
>
>> but the other way around only works accationally.
>
> Right. There is no way to find all domains which contain an A record
> with a specific IP address. The only way to do that would be to walk
> recursively through the complete domain name space, but most name
> servers don't allow that any more.
>
> "Reverse lookups" typically return only the canonical name of the
> interface. It is rare that a PTR lookup returns more than one result
> (try "dig -x 143.130.20.2" for an example)
>
> hp
The "in-addr.arpa" domain is where the reverse lookups are (i.e. the PTR
records). The IP address is reversed and placed into the "in-addr.arpa"
domain. For example to create a PTR record for 143.130.20.2, the actual
record is:
2.20.130.143.in-addr.arpa PTR mx.luga.at.
Just because you have created an "A" record doesn't mean that there is a
corresponding "PTR" record, they are created independently of each other. A
"PTR" is very similar to a "CNAME", in that you query a name, and a name is
returned. If a "PTR" record isn't created then there is no way to do a
reverse lookup. In a nutshell, given an IP address there is no way to find
all the domains that is associated with that IP address unless someone
created a PTR record for A record. This is very rare, as it is standard
practice to create only a single PTR record for each IP address.
------------------------------
Date: Mon, 11 May 2009 07:12:19 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Help with regexp
Message-Id: <b2ab8de9-e977-4610-985a-00e9753c6ea5@21g2000vbk.googlegroups.com>
On May 11, 2:47=A0am, mike <mikaelpetter...@hotmail.com> wrote:
> I have the following string that I need to check the following:
>
> CXP_1212232_R1A01
# 1st, assign your text to a variable
my $var =3D 'CXP_1212232_R1A01'; #or somesuch
# 2nd, assign the parts to different variables
my ($first, $second, $third) =3D split /_/, $var;
# 3rd, test it
if ($first =3D~ /CXP/) { ... do stuff ... }
if ($second =3D~ /[0-]{7}/) { ... do stuff ... }
if ($third =3D~ /R\d[0-9A-Z]+/) { ... do stuff ... )
If getting the key in $third is necessary, use substr like this:
my $key =3D substr($third, 2);
CC
------------------------------
Date: Mon, 11 May 2009 17:48:05 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Help with regexp
Message-Id: <Xns9C088C65A2DE9asu1cornelledu@127.0.0.1>
ccc31807 <cartercc@gmail.com> wrote in news:b2ab8de9-e977-4610-985a-
00e9753c6ea5@21g2000vbk.googlegroups.com:
> On May 11, 2:47 am, mike <mikaelpetter...@hotmail.com> wrote:
>> I have the following string that I need to check the following:
>>
>> CXP_1212232_R1A01
>
> # 1st, assign your text to a variable
> my $var = 'CXP_1212232_R1A01'; #or somesuch
> # 2nd, assign the parts to different variables
> my ($first, $second, $third) = split /_/, $var;
> # 3rd, test it
Careless as usual:
> if ($first =~ /CXP/) { ... do stuff ... }
Is
+++CXP!!!_1212232_R1A01
acceptable?
> if ($second =~ /[0-]{7}/) { ... do stuff ... }
This specifies that the second part of the string should match a string
of seven zeros or dashes. I'll ask again. Is
CXP_$++-0-0-00_R1A01
acceptable?
> if ($third =~ /R\d[0-9A-Z]+/) { ... do stuff ... )
So I guess you think:
CXP_1212232_ZR\x{1815}AAAAAAAAAAAAAAAAAAAAAAAAAAA
is also OK.
To the OP, see script below which I wrote based on my understanding of
your problem:
#!/usr/bin/perl
use strict;
use warnings;
while ( <DATA> ) {
next unless /\S/;
chomp;
/\ACXP_[0-9]{7}_R[0-9][A-Z][0-9]{2}\z/ and next;
print "'$_' did not match\n";
}
__DATA__
CXP_1212232_R1A01
CXP_1212232_RAB01
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
------------------------------
Date: Mon, 11 May 2009 04:59:54 -0700 (PDT)
From: derykus@gmail.com
Subject: Re: IO::Socket::INET on OSX or TCP stack problem
Message-Id: <34da8ec2-d194-4bec-95df-180e235ac49c@z16g2000prd.googlegroups.com>
On May 10, 6:39=A0pm, "Dr.Ruud" <rvtol+use...@xs4all.nl> wrote:
> dery...@gmail.com wrote:
> > eval { local $SIG{ALRM} =3D sub { die 'socket t/o';
> > =A0 =A0 =A0 =A0alarm(...);
> > =A0 =A0 =A0 =A0$socket->read($r, 6) };
> > =A0 =A0 =A0 =A0alarm(0);
> > =A0 =A0 =A0};
> > if ( $@ =3D~ m{socket t/o} and not $socket->connected ) {
> > =A0 =A0... =A0reopen socket etc.
> > }
>
> That template looks wrong. My go at it:
>
> =A0 =A0eval {
> =A0 =A0 =A0 =A0local $SIG{ALRM} =3D sub { die 'socket t/o' };
> =A0 =A0 =A0 =A0alarm 8;
> =A0 =A0 =A0 =A0$socket->read($r, 6);
> =A0 =A0 =A0 =A0alarm 0;
> =A0 =A0 =A0 =A01; =A0# success
> =A0 =A0}
> =A0 =A0or do {
> =A0 =A0 =A0 =A0my $err =3D $@ || "unknown";
> =A0 =A0 =A0 =A0alarm 0;
> =A0 =A0 =A0 =A0if ( $err =3D~ m{socket t/o} and not $socket->connected ) =
{
> =A0 =A0 =A0 =A0 =A0 =A0... =A0reopen socket etc.
> =A0 =A0 =A0 =A0}
> =A0 =A0};
>
Much better.
Perhaps a lower-level recv/send pair might be a further improvement to
get the RST directly:
unless ( $socket->recv($r, 6) ) {
if ( $! =3D=3D ECONNRESET ) {
... re-open socket, etc.
}
...
--
Charles DeRykus
------------------------------
Date: Mon, 11 May 2009 11:45:48 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: IO::Socket::INET on OSX or TCP stack problem
Message-Id: <87ljp3bq37.fsf@quad.sysarch.com>
>>>>> "d" == derykus <derykus@gmail.com> writes:
d> Perhaps a lower-level recv/send pair might be a further improvement to
d> get the RST directly:
recv/send are just different apis from sysread/syswrite. they don't do
anything special underneath. they were intended for tcp to support data
packet boundaries. i forget the proto name but the flags are defined and
unsupported. recv/send would work with those boundaries but since they
aren't supported, they are just like sysread/write.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Mon, 11 May 2009 11:05:07 -0700 (PDT)
From: derykus@gmail.com
Subject: Re: IO::Socket::INET on OSX or TCP stack problem
Message-Id: <8b884b2a-7bc9-41fd-9ec0-30be6d761a4a@j9g2000prh.googlegroups.com>
On May 11, 8:45=A0am, Uri Guttman <u...@stemsystems.com> wrote:
> >>>>> "d" =3D=3D derykus =A0<dery...@gmail.com> writes:
>
> =A0 d> Perhaps a lower-level recv/send pair might be a further improvemen=
t to
> =A0 d> get the RST directly:
>
> recv/send are just different apis from sysread/syswrite. they don't do
> anything special underneath. they were intended for tcp to support data
> packet boundaries. i forget the proto name but the flags are defined and
> unsupported. recv/send would work with those boundaries but since they
> aren't supported, they are just like sysread/write.
>
Ah, the usual sysread/syswrite should work then if
the TCP errors all propagate back:
unless ( $socket->sysread($r,6) ) {
if ( $! =3D=3D ECONNRESET ) {
... re-open socket etc.
--
Charles DeRykus
------------------------------
Date: Mon, 11 May 2009 10:11:46 -0700 (PDT)
From: Marcin Owsiany <marcin.owsiany@gmail.com>
Subject: Re: Proposal: Image::EXIF::DateTime::Parser
Message-Id: <19437ce7-c982-4fa3-acef-42cff0981658@o14g2000vbo.googlegroups.com>
On May 9, 10:59 am, Ilya Zakharevich <nospam-ab...@ilyaz.org> wrote:
> On 2009-05-08, Marcin Owsiany <mar...@owsiany.pl> wrote:
>
> >| This module provides a parser for "DateTime" strings as defined in
> >| Exchangeable image file format for digital still cameras:
> >| Exif Version 2.2
> >| Section 4.6.4 "TIFF Revision 6.0 Attribute information"
> >| Subsection D. "Other Tags", DateTime
>
> Why use a separate module? Should not just ExifTool be fixed to
> transparently translate to a uniform format?
Sorry, I meant to clarify this but forgot.
This module is simply a string parser. It does NOT know how to extract
a date/time string from an image file.
You just give it a short string and it returns a time_t.
Keeping this module separate gives the programmer freedom to choose
the method of reading the actual image, without forcing them to use
ExifTool or any other EXIF library. At the same time it gives us a
single place to maintain the date/time parsing code.
Marcin
------------------------------
Date: Mon, 11 May 2009 08:17:58 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Syntax of split (WAS: writing get_script())
Message-Id: <72gg051mkr04tna7267bk0ketk85bfl3ng@4ax.com>
Franken Sense <frank@example.invalid> wrote:
>
> split /PATTERN/,EXPR,LIMIT
>
>Since EXPR would default to $_, I have to wonder how general this syntax
>could be. What would happen if limit weren't 2? (Perldoc perlfunc doesn't
>have a lot on this.)
What potential problem do you see? Limit can be whatever value you
choose. The docs say
If LIMIT is specified and positive, splits into no more than
that many fields (though it may split into fewer).
The clarification in paranthesis is obviously addressing the case that
there are fewer fields in the data than LIMIT, so that is covered, too.
jue
------------------------------
Date: Mon, 11 May 2009 06:46:14 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: writing get_script()
Message-Id: <lvag05d2u9fb51vr1vc8i3e8l11q4ggpql@4ax.com>
"Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>On 2009-05-11 04:38, Jürgen Exner <jurgenex@hotmail.com> wrote:
>> Franken Sense <frank@example.invalid> wrote:
>> [...[
>>> my @s = split /\s+/, $_;
>>> my $verse = $s[0];
>>> my $script = join(' ', @s[1..$#s]);
>>
>> As I wrote before you can replace those three lines above with a single
>>
>> my ($verse, $script) = split /\s+/, $_, 2;
>
>That's not the same. Franken's version splits the script into
>white-space separated words and then joins them with a single space. In
>other words, it replaces all sequences of whitespace with a single
>space. Your version doesn't.
You are right, I missed that desired side effect, sorry.
jue
------------------------------
Date: Mon, 11 May 2009 08:22:42 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: writing get_script()
Message-Id: <gegg059f2ocr2ln4cig3f5ca3aoc167vng@4ax.com>
Franken Sense <frank@example.invalid> wrote:
>I want to get rid of the cr-lf's in the middle of a verse. I think with
>the more verbose version, I get rid of newlines that don't immediately
>precede the next verse.
The most prudent course of action might be to radically clean up your
data by removing _ALL_ cr-lf's anywhere in your text by a simple
s/\n//g;
Then at least you know exactly what your data looks like and you can
easily add a new "\n" at the very end if you feel like it.
Personally I consider "\n" not to be data but formatting and in general
will add those only while actually print()ing data.
jue
------------------------------
Date: Mon, 11 May 2009 11:11:57 -0700
From: "John W. Krahn" <someone@example.com>
Subject: Re: writing get_script()
Message-Id: <OTZNl.48515$bi7.35480@newsfe07.iad>
J=FCrgen Exner wrote:
> Franken Sense <frank@example.invalid> wrote:
>> I want to get rid of the cr-lf's in the middle of a verse. I think wi=
th
>> the more verbose version, I get rid of newlines that don't immediately=
>> precede the next verse.
>=20
> The most prudent course of action might be to radically clean up your
> data by removing _ALL_ cr-lf's anywhere in your text by a simple=20
> s/\n//g;
cr is represented by \r so that doesn't remove any cr just lf.
John
--=20
Those people who think they know everything are a great
annoyance to those of us who do. -- Isaac Asimov
------------------------------
Date: Mon, 11 May 2009 11:17:49 -0700
From: "John W. Krahn" <someone@example.com>
Subject: Re: writing get_script()
Message-Id: <hZZNl.16140$hX2.4386@newsfe19.iad>
Franken Sense wrote:
> In Dread Ink, the Grave Hand of J=FCrgen Exner Did Inscribe:
>=20
>>> What I want it to do is join the first through the ultimate words in =
s.
>> If you don't know what $#s means, then maybe you could ask? If you don=
't
>> know what (1..$#s) means, then maybe you could ask? Using code fragmen=
ts
>> that you picked up somewhere without knowing their meaning an throwing=
>> them together rarely produces useful code.
>>
>> $@s is the highest index in the array @s, i.e. a number.
>> (1..$#s) is the list of numbers from 1 to the highest index of @s.
>> Nowhere does it relate to the content of @s.
>>
>> What you want is maybe=20
>> @s[1..$#s]
>> which is a slice of the array @s, containing the elements from index 1=
>> to the highest index. However I would probably use shift() instead to
>> remove the first element from an array.
>=20
>=20
> C:\MinGW\source>perl m9.pl
> 44:005:017 Then the high priest rose up, and all they that were with hi=
m,
> (which
> is the sect of the Sadducees,) and were filled with indignation,
> 44:005:018 And laid their hands on the apostles, and put them in the co=
mmon
> pris
> on.
> 44:005:019 But the angel of the Lord by night opened the prison doors, =
and
> broug
> ht them forth, and said,
> 44:005:020 Go, stand and speak in the temple to the people all the word=
s of
> this
> life.
>=20
> C:\MinGW\source>type m9.pl
> #!/usr/bin/perl
> # perl m9.pl
> use warnings;
> use strict;
>=20
> local $/=3D"";
> while ( <DATA> ) {
> my @s =3D split /\s+/, $_;
> my $verse =3D $s[0];
> my $script =3D join(' ', @s[1..$#s]);
> print "$verse $script\n";
> }
local $/ =3D '';
while ( <DATA> ) {
my ( $verse, @s ) =3D split;
my $script =3D join ' ', @s;
print "$verse $script\n";
}
John
--=20
Those people who think they know everything are a great
annoyance to those of us who do. -- Isaac Asimov
------------------------------
Date: Mon, 11 May 2009 12:27:40 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: writing get_script()
Message-Id: <h0vg059koliu4pj9ogdq47mp6tm48arcsd@4ax.com>
"John W. Krahn" <someone@example.com> wrote:
>Jürgen Exner wrote:
>> Franken Sense <frank@example.invalid> wrote:
>>> I want to get rid of the cr-lf's in the middle of a verse. I think with
>>> the more verbose version, I get rid of newlines that don't immediately
>>> precede the next verse.
>>
>> The most prudent course of action might be to radically clean up your
>> data by removing _ALL_ cr-lf's anywhere in your text by a simple
>> s/\n//g;
>
>cr is represented by \r so that doesn't remove any cr just lf.
That depends upon what your OS considers to be "\n".
jue
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2409
***************************************