[31797] in Perl-Users-Digest
Perl-Users Digest, Issue: 3060 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Aug 3 18:09:25 2010
Date: Tue, 3 Aug 2010 15:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 3 Aug 2010 Volume: 11 Number: 3060
Today's topics:
Re: [OT] Memory architecture [not only] Perl <nospam-abuse@ilyaz.org>
Re: CGI Program Questions <tzz@lifelogs.com>
Re: Extract variable length numbers (tab delimitered) f sln@netherlands.com
Re: Help with regular expression <tzz@lifelogs.com>
Re: Help with regular expression <hhr-m@web.de>
Re: Help with regular expression sln@netherlands.com
Re: Help with regular expression <tzz@lifelogs.com>
Re: Help with regular expression sln@netherlands.com
Re: Help with regular expression <hhr-m@web.de>
Re: Help with regular expression sln@netherlands.com
Re: If Perl is compiled on a 32-bit system, and the sys <hjp-usenet2@hjp.at>
Re: piped open and shell metacharacters <nospam-abuse@ilyaz.org>
Re: Plot Module Question <paduille.4061.mumia.w+nospam@earthlink.net>
Re: Plot Module Question <nospam-abuse@ilyaz.org>
Re: Posting Guidelines for comp.lang.perl.misc ($Revisi <ralph@happydays.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 3 Aug 2010 21:51:43 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: [OT] Memory architecture [not only] Perl
Message-Id: <slrni5h3rf.km7.nospam-abuse@powdermilk.math.berkeley.edu>
On 2010-08-03, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> They do on OS/2: the DLL's-related memory is loaded into shared
>> address region. (This way one does not need any "extra"
>> per-process-context patching or redirection of DLL address accesses.)
> Sounds a bit like the pre-ELF shared library system in Linux.
No, there is a principal difference: on Linux (and most other flavors
of Unix), you never know whether your program would be "assembled"
(from shared modules) correctly or not: it is a russian roulette which
C symbol is resolved to which shared module (remember these Perl_ and
PL_ prefixes? They are the only workaround I know of). On OS/2, the
linking is done at link time; each symbol KNOWS to which DLL (and
which entry point inside the DLL) it must link.
So:
a DLL is compiled to the same assebler code as EXE (no indirection)
if a DLL is used from two different programs, its text pages would be
the same - provided all modules it links to are loaded at the same
addresses (no process-specific fixups).
- and it is what happens. DLL runs as quick as EXE, and there is no
overhead if it is reused.
[And, of course, a program loader living in user space is another
gift from people having no clue about security... As if an
executable stack was not enough ;-)]
> Of course that was designed when 16 MB was a lot of RAM and
> abandoned when 128 MB became normal for a server (but then I guess
> the same is true for OS/2).
No, it was designed when 2M was a lot of RAM. ;-) On the other hand,
the architecture was designed by mainframe people, so they may have had
different experiences.
> I'd still be surprised if anybody ran an application mix on OS/2 where
> the combined code size of all DLLs exceeds 1 GB.
"Contemporary flavors" of OS/2 still run confortably in 64MB systems.
(Of course, there is no FireWire/Bluetooth support, but I do not
believe that they would add much - IIRC, the USB stack is coded with
pretty minimal overhead.)
> the kernel. But of course making large changes for a factor of at most 2
> doesn't make much sense in a world governed by Moore's law, and anybody
> who needed the space moved to 64 bit systems anyway.
I do not believe in Moore's law (at least not in this context). Even
with today's prices on memory, DockStar has only 128MB of memory. 2
weeks ago it costed $24.99 on Amazon (not now, though!). I think in a
year or two we can start getting Linux stations in about $10 range.
In this pricerange, memory size matters.
So Moore's law works both ways; low-memory situation does not
magically go out of scope.
Yours,
Ilya
------------------------------
Date: Tue, 03 Aug 2010 11:15:43 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: CGI Program Questions
Message-Id: <874ofbekpc.fsf@lifelogs.com>
On Mon, 2 Aug 2010 17:04:54 -0500 "E.D.G." <edgrsprj@ix.netcom.com> wrote:
EDG> "Ted Zlatanov" <tzz@lifelogs.com> wrote in message
EDG> news:87tyndf3xy.fsf@lifelogs.com...
>> It's amazing how people manage to avoid discovering NNTP.
EDG> This is a fairly complex project involving a number of people.
EDG> And Perl is being used exclusively for everything possible. Other
EDG> types of applications are being developed and used when necessary.
EDG> But if possible, and as soon as possible, they are converted to Perl
EDG> language code.
NNTP is a protocol and can be implemented in Perl. Is it insufficient
for your needs? How do you plan to deal with disconnected servers
coming back, offline access, universal article storage and access
formats, indexing, etc.?
EDG> The computer programs being developed need to be able to run on both a
EDG> Windows PC and a Web server.
OK, but the statement above (you realize a Windows PC can be a web
server, right?) and the requirements you posted are not encouraging.
Ted
------------------------------
Date: Tue, 03 Aug 2010 08:32:56 -0700
From: sln@netherlands.com
Subject: Re: Extract variable length numbers (tab delimitered) from a string?
Message-Id: <mvcg565dpm0hp8kdb8rkpr5t3psd5p2b98@4ax.com>
On Tue, 03 Aug 2010 15:10:47 +0200, wolf <wolf@gsheep.com> wrote:
>Thomas Andersson schrieb:
>> As the topic says. I ahve a settings file where each line contains 2 numbers
>> of varying length and I want to extract each number and assign to a
>> variable, how would I go about that?
>>
>>
>use SPLIT: split( /\s+/, $input) splits on any whitespace(s) including
>tab(s). split( /\t/, $input) splits on every tab.
>
>open (my $infile, '<', 'mynumbers.txt') or die;
>my ($input, $number1, $number2);
>
>while ($input = <$infile>) {
> chomp $input;
> ($number1, $number2) = split( /\s+/, $input);
Don't forget to validate $number(s) or you could run
into errors when doing stuff like
if $number1 == $number2
So after the split() it could be validated something like
$number1 =~ s/^\s+//;
$number1 =~ s/\s+$//;
if $number1 =~ /^[+-]?\d*?\.?\d+$/ # for non-exponent
Or it can all be done in one line
($number1, $number2) = $input =~ /\s*([+-]?\d*?\.?\d+)\s+([+-]?\d*?\.?\d+)/;
-sln
------------------------------
Date: Tue, 03 Aug 2010 10:54:26 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Help with regular expression
Message-Id: <878w4nelot.fsf@lifelogs.com>
On Mon, 2 Aug 2010 00:10:44 +0200 "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
PJH> Then the problem cannot be solved with a real regular expression.
...
PJH> I agree with Eric: Write a proper grammar and use that to parse your
PJH> expressions. If you've ever heard of BNF, using Parse::Yapp or
PJH> Parse::RecDescent shouldn't be too hard (I prefer the former, although
PJH> the docs assume that you are already familiar with yacc).
I don't think even a grammar will help. The requirements are
fundamentally broken because there's more than one way to interpret
nested parens. The OP should explain what he's trying to do and give
real-world examples he needs parsed.
Also, a grammar is pretty slow compared to regular expressions. So I
always hesitate before recommending it for anything except
low-throughput situations, e.g. input submitted by a user or small
files.
Ted
------------------------------
Date: Tue, 3 Aug 2010 18:40:09 +0200
From: Helmut Richter <hhr-m@web.de>
Subject: Re: Help with regular expression
Message-Id: <Pine.LNX.4.64.1008031816410.4418@lxhri01.lrz.lrz-muenchen.de>
On Tue, 3 Aug 2010, Ted Zlatanov wrote:
> I don't think even a grammar will help. The requirements are
> fundamentally broken because there's more than one way to interpret
> nested parens.
I do not think so:
Let X be the regular language of nonempty words not containing any
parentheses. Then the language L of words that are double-parenthesis
enclosed is:
L -> (( inside ))
inside -> inside1 | inside2
inside1 -> X | inside1 single-paren | inside1 X
inside2 -> single-paren X | single-paren single-paren | inside2 X
| inside2 single-paren
single-paren -> ( inside ) | ( )
"inside1" should be the language of all properly nested strings that do not
begin with "(", and "inside2" the language of all properly nested strings that
begin with "(" except when the last token is the matching ")".
Not that I find that grammar pretty or easy to parse -- but at least it is not
ambiguous.
> The OP should explain what he's trying to do and give
> real-world examples he needs parsed.
That should be a requirement for such weird questions.
--
Helmut Richter
------------------------------
Date: Tue, 03 Aug 2010 11:20:58 -0700
From: sln@netherlands.com
Subject: Re: Help with regular expression
Message-Id: <4gng569rq3vnri717lsaq2di0eolngch3l@4ax.com>
On Sun, 1 Aug 2010 19:47:18 +0000 (UTC), Mark Hobley <markhobley@yahoo.donottypethisbit.co> wrote:
>On Sun, 25 Jul 2010 22:04:25 +0200, Peter J. Holzer wrote:
>
>> Is this a match?
>>
>> (((1 + 2) * (3 +4)))
>
>Yes. That is a match.
Then
((3 * bar) + ((foo))) - This is a match
((3 * bar) + ((foo))bar) - This is a match.
is matched by ((foo)) only ...
It can't be both ways.
You have 2 requirements, balanced double parenths
that does not include balanced single parenths.
Pretty easy actually. Do you know regular expressions?
-sln
--------------------
use strict;
use warnings;
# Require (())
my $x = quotemeta '((';
my $y = quotemeta '))';
# Require not ()
my $m = quotemeta '(';
my $n = quotemeta ')';
my $regex = qr/
(
$x
(?:
(
$m
(?:
(?>(?:(?!$x|$y|$m|$n).)+)
| (?2)
)*
$n
)
|
(?>(?:(?!$x|$y|$m|$n).)+)
| (?1)
)*
$y
)
/xs;
while (<DATA>) {
if (/$regex/) {
print "$1\n";
}
}
__DATA__
I need a regular expression with the following properties.
I need to match text (typically, though not necessarily expressions)
enclosed within double parentheses. However, I do not want to match nested
single parentheses enclosed text.
So ((*)) is a match, but ((*)*(*)) is not a match.
Here are some examples to illustrate this.
((FOO)) - This is a match
(()) - This is a match
((3 + 2)) - This is a match
((3 + 2) + (2 * foo)) - This is not a match
((3 * bar) + ((foo))) - This is a match
((3 * bar) + ((foo))bar) - This is a match.
I hope that lot makes sense.
Thanks in advance to anyone who can help.
--
Mark Hobley
Linux User: #370818 http://markhobley.yi.org/
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
On Sun, 25 Jul 2010 22:04:25 +0200, Peter J. Holzer wrote:
> Is this a match?
>
> (((1 + 2) * (3 +4)))
Yes. That is a match.
--
Mark Hobley
Linux User: #370818 http://markhobley.yi.org/
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
------------------------------
Date: Tue, 03 Aug 2010 13:42:02 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Help with regular expression
Message-Id: <87sk2vczd1.fsf@lifelogs.com>
On Tue, 3 Aug 2010 18:40:09 +0200 Helmut Richter <hhr-m@web.de> wrote:
HR> On Tue, 3 Aug 2010, Ted Zlatanov wrote:
>> I don't think even a grammar will help. The requirements are
>> fundamentally broken because there's more than one way to interpret
>> nested parens.
HR> I do not think so:
HR> Let X be the regular language of nonempty words not containing any
HR> parentheses. Then the language L of words that are double-parenthesis
HR> enclosed is:
HR> L -> (( inside ))
HR> inside -> inside1 | inside2
HR> inside1 -> X | inside1 single-paren | inside1 X
HR> inside2 -> single-paren X | single-paren single-paren | inside2 X
HR> | inside2 single-paren
HR> single-paren -> ( inside ) | ( )
HR> "inside1" should be the language of all properly nested strings that do not
HR> begin with "(", and "inside2" the language of all properly nested strings that
HR> begin with "(" except when the last token is the matching ")".
HR> Not that I find that grammar pretty or easy to parse -- but at least it is not
HR> ambiguous.
I didn't parse the requirements that way, but that's probably my error.
Thanks for explaining.
Ted
------------------------------
Date: Tue, 03 Aug 2010 11:57:29 -0700
From: sln@netherlands.com
Subject: Re: Help with regular expression
Message-Id: <3jpg56t5s1pqe7dbbb424nfg97n6lha5nt@4ax.com>
On Tue, 03 Aug 2010 10:54:26 -0500, Ted Zlatanov <tzz@lifelogs.com> wrote:
>On Mon, 2 Aug 2010 00:10:44 +0200 "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>
>PJH> Then the problem cannot be solved with a real regular expression.
>...
>PJH> I agree with Eric: Write a proper grammar and use that to parse your
>PJH> expressions. If you've ever heard of BNF, using Parse::Yapp or
>PJH> Parse::RecDescent shouldn't be too hard (I prefer the former, although
>PJH> the docs assume that you are already familiar with yacc).
>
>I don't think even a grammar will help. The requirements are
>fundamentally broken because there's more than one way to interpret
>nested parens. The OP should explain what he's trying to do and give
>real-world examples he needs parsed.
>
>Also, a grammar is pretty slow compared to regular expressions. So I
>always hesitate before recommending it for anything except
>low-throughput situations, e.g. input submitted by a user or small
>files.
>
>Ted
I don't think the requirements are broken at all.
As soon as the OP said that neither of the double
parenths can be part of an inner single parenth,
it pretty much made it complete. Trivial or not
I believe its a complete req, that can be done with
a simple regular expression.
The match will satisfy the requirements, however,
outlier parenths may not be balanced relative to the
match. Though, additional expressions could be added
to balance the complete text.
-sln
------------------------------
Date: Tue, 3 Aug 2010 22:11:16 +0200
From: Helmut Richter <hhr-m@web.de>
Subject: Re: Help with regular expression
Message-Id: <Pine.LNX.4.64.1008032159460.4195@lxhri01.lrz.lrz-muenchen.de>
On Tue, 3 Aug 2010, Ted Zlatanov wrote:
> On Tue, 3 Aug 2010 18:40:09 +0200 Helmut Richter <hhr-m@web.de> wrote:
>
> HR> On Tue, 3 Aug 2010, Ted Zlatanov wrote:
> >> I don't think even a grammar will help. The requirements are
> >> fundamentally broken because there's more than one way to interpret
> >> nested parens.
>
> HR> I do not think so:
>
> HR> Let X be the regular language of nonempty words not containing any
> HR> parentheses.
No, this is an error, albeit easy to fix. X should be the language of
one-token words where the token is not a parenthesis. Otherwise
concatenating the X would introduce an ambiguity.
> HR> Then the language L of words that are double-parenthesis
> HR> enclosed is:
>
> HR> L -> (( inside ))
> HR> inside -> inside1 | inside2
> HR> inside1 -> X | inside1 single-paren | inside1 X
> HR> inside2 -> single-paren X | single-paren single-paren | inside2 X
> HR> | inside2 single-paren
> HR> single-paren -> ( inside ) | ( )
>
> HR> "inside1" should be the language of all properly nested strings that do not
> HR> begin with "(", and "inside2" the language of all properly nested strings that
> HR> begin with "(" except when the last token is the matching ")".
>
> HR> Not that I find that grammar pretty or easy to parse -- but at least it is not
> HR> ambiguous.
Well, it *is* easy to parse: nearly LR(0) with the only exception that
there is a minor shift-reduce conflict when ")" is encountered. So it is
certainly LR(1). Writing the L rule as two rules makes it even a
precedence grammar for several notions of precedence. This allows writing
a parser by hand, whereas LR parsers should better be generated.
> I didn't parse the requirements that way, but that's probably my error.
Well, when I set up the grammar, there were ambiguities of interpretation
of the requirements. This is just my interpretation. But at least I chose
it because I found it to be the most plausible, and not in order that
parsing be possible.
I am not sure the extended notion of regexp in perl, which goes beyond
regular languages, cannot be used to parse such a thing. After all, regexp
handling involves backtracking, which is not normally considered a good
technique in context-free parsing.
--
Helmut Richter
------------------------------
Date: Tue, 03 Aug 2010 14:03:40 -0700
From: sln@netherlands.com
Subject: Re: Help with regular expression
Message-Id: <911h56lfoi8a4d608uar6vf94s9rms8mon@4ax.com>
On Tue, 3 Aug 2010 22:11:16 +0200, Helmut Richter <hhr-m@web.de> wrote:
>I am not sure the extended notion of regexp in perl, which goes beyond
>regular languages, cannot be used to parse such a thing. After all, regexp
>handling involves backtracking, which is not normally considered a good
>technique in context-free parsing.
But mixing expressions without backtracking with expressions with
back tracking is a feature of extended notation.
-sln
------------------------------
Date: Tue, 3 Aug 2010 18:44:55 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: If Perl is compiled on a 32-bit system, and the system is upgraded to 64-bit...
Message-Id: <slrni5ghs7.6td.hjp-usenet2@hrunkner.hjp.at>
On 2010-08-02 21:19, Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
> On 2010-08-02, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>>> What is important is the ratio of data/text.
>>
>> No. What is important is the ratio between code and the usable address
>> space.
>
> I see (below) that we discuss different scenarios.
>
>>> In your case, it is less than 10. (With more memory, you run more of
>>> OTHER monsters. ;-)
>
>> Yes, but those other monsters get their own virtual address space, so
>> they don't matter in this discussion.
>
> They do on OS/2: the DLL's-related memory is loaded into shared
> address region. (This way one does not need any "extra"
> per-process-context patching or redirection of DLL address accesses.)
Sounds a bit like the pre-ELF shared library system in Linux. Of course
that was designed when 16 MB was a lot of RAM and abandoned when 128 MB
became normal for a server (but then I guess the same is true for OS/2).
I'd still be surprised if anybody ran an application mix on OS/2 where
the combined code size of all DLLs exceeds 1 GB. Heck, I'd be surprised
if anybody did it on Linux (with code I really mean code - many systems
put read-only data into the text segment of an executable, but you
couldn't move that to a different address space, so it doesn't count
here).
>> No, you misunderstood. If you now have an address space of 2 GB for
>> code+data, and you move the code to a different segment, you win 40MB
>> for data. But if the OS is changed to give each process a 4 GB address
>> space, then you win 2 GB, which is a lot more than 40 MB.
>
> I do not see how one would lift this limit (without a segmented
> architecture ;-).
If you can move code to a different segment you obviously have a
segmented architecture. But even without ...
> I expect that (at least) this would make context switch majorly
> costlier...
I don't see why the kernel should need a large address space in the same
context as the running process. When both the size of physical RAM and
the maximum VM of any process could realistically be expected to be much
smaller than 4GB, a fixed split between user space and kernel space
(traditionally 2GB + 2GB in Unix, but 3GB + 1GB in Linux) made some
sense: Within a system call, the kernel could access the complete
address space of the calling process and the complete RAM without
fiddling with page tables. But when physical RAM exceeded the the kernel
space that was no longer possible anyway, so there was no longer a
reason to reserve a huge part of the address space of each process for
the kernel. But of course making large changes for a factor of at most 2
doesn't make much sense in a world governed by Moore's law, and anybody
who needed the space moved to 64 bit systems anyway.
hp
------------------------------
Date: Tue, 3 Aug 2010 21:18:10 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: piped open and shell metacharacters
Message-Id: <slrni5h1si.km7.nospam-abuse@powdermilk.math.berkeley.edu>
On 2010-08-03, Uri Guttman <uri@StemSystems.com> wrote:
> >> When used between two "logically connected" parts of the list (as in
> >> $^X, -I => $INC, ...), it is, IMO, very appropriate.
> i don't use => except for real pairs
> in hashes or arrays (that will eventually be copied to hashes)
Note that if system($foo, -I => $INC) is used with a GetOpt Perl
program $foo, what we deal is an array (well, list) that will
eventually be copied to a hash. 1/3 ;-)
Ilya
------------------------------
Date: Tue, 03 Aug 2010 10:35:41 -0500
From: "Mumia W." <paduille.4061.mumia.w+nospam@earthlink.net>
Subject: Re: Plot Module Question
Message-Id: <V-6dnaQ5vqCG2cXRnZ2dnUVZ_qadnZ2d@earthlink.com>
On 08/03/2010 03:37 AM, E.D.G. wrote:
> "Mumia W." <paduille.4061.mumia.w+nospam@earthlink.net> wrote in message
>
>>> Perhaps it would be a good idea to even create a Perl FAQ section that
>>> was specifically intended for science researchers. It would discuss all
>
>> Perhaps you should create that FAQ. This is user-supported software.
>
> I would probably be happy to help. However, I would first need to know
> how to get the various Perl routines etc. to work myself.
>
So learn those things over the course of this (and the next) year and
create the FAQ when you're done. It's okay if you ask for help writing
the FAQ.
------------------------------
Date: Tue, 3 Aug 2010 21:21:41 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: Plot Module Question
Message-Id: <slrni5h235.km7.nospam-abuse@powdermilk.math.berkeley.edu>
On 2010-08-02, Jim Gibson <jimsgibson@gmail.com> wrote:
> A CGI program desiring to display gnuplot-generated graphs would start
> a gnuplot session, send it commands to generate a graph and send the
> output to a PNG file, then put a link in a returned HTML page to
> display the PNG file as an image.
Fine today; but for usage tomorrow, one may start to investigate
optional pathway through SVG, not PNG.
Myself, I would go through Term::Gnuplot, not gnuplot. But I'm biased...
Yours,
Ilya
------------------------------
Date: Tue, 03 Aug 2010 13:29:32 -0400
From: Ralph Malph <ralph@happydays.com>
Subject: Re: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.9 $)
Message-Id: <bff9c$4c5851fc$40779ac3$24134@news.eurofeeds.com>
tl, dnr
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3060
***************************************