[28034] in Perl-Users-Digest
Perl-Users Digest, Issue: 9398 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jun 28 11:05:49 2006
Date: Wed, 28 Jun 2006 08:05:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 28 Jun 2006 Volume: 10 Number: 9398
Today's topics:
Re: crash on fork in taint mode <ynleder@nspark.org>
Re: how to remote a unix server in my cgi script <Deb.Fang@gmail.com>
Java identifiers (was: languages with full unicode supp <david.nospam.hopwood@blueyonder.co.uk>
Re: languages with full unicode support <david.nospam.hopwood@blueyonder.co.uk>
Re: languages with full unicode support <chris.uppal@metagnostic.REMOVE-THIS.org>
Re: Newbie precompile/PPM question anno4000@zrz.tu-berlin.de
Re: References as hash keys (Srinivasan's "Advanced Per <a24061@yahoo.com>
replacement of slow unpack <u8526505@gmail.com>
Re: Scalable method for searching in relatively big fil <hjp-usenet2@hjp.at>
Re: Scalable method for searching in relatively big fil <tadmc@augustmail.com>
Re: Scalable method for searching in relatively big fil <simon.chao@fmr.com>
Re: Scalable method for searching in relatively big fil <tadmc@augustmail.com>
Re: Scalable method for searching in relatively big fil <simon.chao@fmr.com>
Re: Searching each element of an array with grep <peace.is.our.profession@gmx.de>
Re: Single-liner for one-line substitute? <mritty@gmail.com>
Re: Single-liner for one-line substitute? <hawk007@flight.us>
Re: Single-liner for one-line substitute? <mritty@gmail.com>
Re: Single-liner for one-line substitute? <mwp@nospam.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 28 Jun 2006 12:52:33 +0200
From: Yohan N. Leder <ynleder@nspark.org>
Subject: Re: crash on fork in taint mode
Message-Id: <MPG.1f0c7dd86502c2ca9898ad@news.tiscali.fr>
In article <1151443070.093979.120750@y41g2000cwy.googlegroups.com>,
nobull67@gmail.com says...
>
> Yohan N. Leder wrote:
> > Hello. I'm using ActivePerl 5.8.8 build 817 and Apache2 for Windows. The
> > script below crashes on fork() when I run it in taint mode (not if I
> > remove -T).
>
> > Did you seen something like this ?
>
> Yes,
>
> http://groups.google.com/groups/search?q=fork+taint
>
>
Does building the Perl 5.8.8 for Windows from
<http://www.perl.com/download.csp>, rather than using ActivePerl 5.8.8,
will resolve this fork/taint problem ?
------------------------------
Date: 28 Jun 2006 05:29:27 -0700
From: "debbie523" <Deb.Fang@gmail.com>
Subject: Re: how to remote a unix server in my cgi script
Message-Id: <1151497767.332530.76020@y41g2000cwy.googlegroups.com>
Jim Gibson wrote:
> In article <1151417862.372738.220860@y41g2000cwy.googlegroups.com>,
> debbie523 <Deb.Fang@gmail.com> wrote:
>
> > I have a cgi script stay in one unix server A, and in this cgi script I
> > have to call a program which sits on another unix server B. so there
> > should have server lines to login B in my cgi script. Is there somebody
> > help me figure out what's the specific command I should use.
>
> What protocol does your CGI program on server A use to run the program
> on server B? SSH? RSH? RPC? HTTP/CGI?
SSH
> Is the program on server B
> running all the time or does the program on server A have to launch it?
the program on server A have to launch it.
>How do you run the program without using Perl?
It is a c++ parallel program
> The answers to these questions will greatly affect the answers.
------------------------------
Date: Wed, 28 Jun 2006 13:56:18 GMT
From: David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Subject: Java identifiers (was: languages with full unicode support)
Message-Id: <6Uvog.490321$tc.256914@fe2.news.blueyonder.co.uk>
Note Followup-To: comp.lang.java.programmer
Chris Uppal wrote:
> Since the interpretation of characters which are yet to be added to
> Unicode is undefined (will they be digits, "letters", operators, symbol=
,
> punctuation.... ?), there doesn't seem to be any sane way that a langua=
ge could
> allow an unrestricted choice of Unicode in identifiers. Hence, it must=
define
> a specific allowed sub-set. C certainly defines an allowed subset of U=
nicode
> characters -- so I don't think you could call its Unicode support "half=
-baked"
> (not in that respect, anyway). A case -- not entirely convincing, IMO =
-- could
> be made that it would be better to allow a wider range of characters.
>=20
> And no, I don't think Java's approach -- where there /is no defined set=
of
> allowed identifier characters/ -- makes any sense at all :-(
Java does have a defined set of allowed identifier characters. However, y=
ou
certainly have to go around the houses a bit to work out what that set is=
:
<http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.8>
# An identifier is an unlimited-length sequence of Java letters and Java =
digits,
# the first of which must be a Java letter. An identifier cannot have the=
same
# spelling (Unicode character sequence) as a keyword (=A73.9), boolean li=
teral
# (=A73.10.3), or the null literal (=A73.10.7).
[...]
# A "Java letter" is a character for which the method
# Character.isJavaIdentifierStart(int) returns true. A "Java letter-or-di=
git"
# is a character for which the method Character.isJavaIdentifierPart(int)=
# returns true.
[...]
# Two identifiers are the same only if they are identical, that is, have =
the
# same Unicode character for each letter or digit.
For Java 1.5.0:
<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html>
# Character information is based on the Unicode Standard, version 4.0.
<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isJavaI=
dentifierStart(int)>
# A character may start a Java identifier if and only if one of the follo=
wing
# conditions is true:
#
# * isLetter(codePoint) returns true
# * getType(codePoint) returns LETTER_NUMBER
# * the referenced character is a currency symbol (such as "$")
[This means that getType(codePoint) returns CURRENCY_SYMBOL, i.e. Unicode=
General Category Sc.]
# * the referenced character is a connecting punctuation character (suc=
h as "_").
[This means that getType(codePoint) returns CONNECTOR_PUNCTUATION, i.e. U=
nicode
General Category Pc.]
<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isJavaI=
dentifierPart(int)>
# A character may be part of a Java identifier if any of the following ar=
e true:
#
# * it is a letter
# * it is a currency symbol (such as '$')
# * it is a connecting punctuation character (such as '_')
# * it is a digit
# * it is a numeric letter (such as a Roman numeral character)
[General Category Nl.]
# * it is a combining mark
[General Category Mc (see <http://www.unicode.org/versions/Unicode4.0.0/c=
h04.pdf>).]
# * it is a non-spacing mark
[General Category Mn (ditto).]
# * isIdentifierIgnorable(codePoint) returns true for the character
<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isDigit=
(int)>
# A character is a digit if its general category type, provided by
# getType(codePoint), is DECIMAL_DIGIT_NUMBER.
[General Category Nd.]
<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isIdent=
ifierIgnorable(int)>
# The following Unicode characters are ignorable in a Java identifier or =
a Unicode
# identifier:
#
# * ISO control characters that are not whitespace
# o '\u0000' through '\u0008'
# o '\u000E' through '\u001B'
# o '\u007F' through '\u009F'
# * all characters that have the FORMAT general category value
[FORMAT is General Category Cf.]
<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isLette=
r(int)>
# A character is considered to be a letter if its general category type, =
provided
# by getType(codePoint), is any of the following:
#
# * UPPERCASE_LETTER
# * LOWERCASE_LETTER
# * TITLECASE_LETTER
# * MODIFIER_LETTER
# * OTHER_LETTER
=3D=3D=3D=3D
To cut a long story short, the syntax of identifiers in Java 1.5 is there=
fore:
Keyword ::=3D one of
abstract continue for new switch
assert default if package synchronized
boolean do goto private this
break double implements protected throw
byte else import public throws
case enum instanceof return transient
catch extends int short try
char final interface static void
class finally long strictfp volatile
const float native super while
Identifier ::=3D IdentifierChars butnot (Keyword | "true" | "fal=
se" | "null")
IdentifierChars ::=3D JavaLetter | IdentifierChars JavaLetterOrDigit
JavaLetter ::=3D Lu | Ll | Lt | Lm | Lo | Nl | Sc | Pc
JavaLetterOrDigit ::=3D JavaLetter | Nd | Mn | Mc |
U+0000..0008 | U+000E..001B | U+007F..009F | Cf
where the two-letter terminals refer to General Categories in Unicode 4.0=
=2E0
(exactly).
Note that the so-called "ignorable" characters (for which
isIdentifierIgnorable(codePoint) returns true) are not ignorable; they ar=
e
treated like any other identifier character. This quote from the API spec=
:
# The following Unicode characters are ignorable in a Java identifier [..=
=2E]
should be ignored (no pun intended). It is contradicted by:
# Two identifiers are the same only if they are identical, that is, have =
the
# same Unicode character for each letter or digit.
in the language spec. Unicode does have a concept of ignorable characters=
in
identifiers, which is probably where this documentation bug crept in.
The inclusion of U+0000 and various control characters in the set of vali=
d
identifier characters is also a dubious decision, IMHO.
Note that I am not defending in any way the complexity of this definition=
; there's
clearly no excuse for it (or for the "ignorable" documentation bug). The =
language
spec should have been defined directly in terms of the Unicode General Ca=
tegories,
and then the API in terms of the language spec. They way it is done now i=
s
completely backwards.
--=20
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
------------------------------
Date: Wed, 28 Jun 2006 11:03:05 GMT
From: David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Subject: Re: languages with full unicode support
Message-Id: <Jltog.230285$8W1.8494@fe1.news.blueyonder.co.uk>
Tim Roberts wrote:
> "Xah Lee" <xah@xahlee.org> wrote:
>
>>Languages with Full Unicode Support
>>
>>As far as i know, Java and JavaScript are languages with full, complete
>>unicode support. That is, they allow names to be defined using unicode.
>>(the JavaScript engine used by FireFox support this)
>>
>>As far as i know, here's few other lang's status:
>>
>>C ? No.
>
> This is implementation-defined in C. A compiler is allowed to accept
> variable names with alphabetic Unicode characters outside of ASCII.
It is not implementation-defined in C99 whether Unicode characters are
accepted; only how they are encoded directly in the source multibyte character
set.
Characters escaped using \uHHHH or \U00HHHHHH (H is a hex digit), and that
are in the sets of characters defined by Unicode for identifiers, are required
to be supported, and should be mangled in some consistent way by a platform's
linker. There are Unicode text editors which encode/decode \u and \U on the fly,
so you can treat this essentially like a Unicode transformation format (it
would have been nicer to require support for UTF-8, but never mind).
C99 6.4.2.1:
# 3 Each universal character name in an identifier shall designate a character
# whose encoding in ISO/IEC 10646 falls into one of the ranges specified in
# annex D. 59) The initial character shall not be a universal character name
# designating a digit. An implementation may allow multibyte characters that
# are not part of the basic source character set to appear in identifiers;
# which characters and their correspondence to universal character names is
# implementation-defined.
#
# 59) On systems in which linkers cannot accept extended characters, an encoding
# of the universal character name may be used in forming valid external
# identifiers. For example, some otherwise unused character or sequence of
# characters may be used to encode the \u in a universal character name.
# Extended characters may produce a long external identifier.
--
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
------------------------------
Date: Wed, 28 Jun 2006 12:38:04 +0100
From: "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org>
Subject: Re: languages with full unicode support
Message-Id: <44a26911$1$660$bed64819@news.gradwell.net>
Joachim Durchholz wrote:
> > This is implementation-defined in C. A compiler is allowed to accept
> > variable names with alphabetic Unicode characters outside of ASCII.
>
> Hmm... that could would be nonportable, so C support for Unicode is
> half-baked at best.
Since the interpretation of characters which are yet to be added to
Unicode is undefined (will they be digits, "letters", operators, symbol,
punctuation.... ?), there doesn't seem to be any sane way that a language could
allow an unrestricted choice of Unicode in identifiers. Hence, it must define
a specific allowed sub-set. C certainly defines an allowed subset of Unicode
characters -- so I don't think you could call its Unicode support "half-baked"
(not in that respect, anyway). A case -- not entirely convincing, IMO -- could
be made that it would be better to allow a wider range of characters.
And no, I don't think Java's approach -- where there /is no defined set of
allowed identifier characters/ -- makes any sense at all :-(
-- chris
------------------------------
Date: 28 Jun 2006 11:54:13 GMT
From: anno4000@zrz.tu-berlin.de
Subject: Re: Newbie precompile/PPM question
Message-Id: <4gf8v5F1m9fdrU1@news.dfncis.de>
<mgarrish@gmail.com> wrote in comp.lang.perl.misc:
> Bart Lateur wrote:
>
> > Jockser wrote:
> >
> > >ppm> install http://theoryx5.uwinnipeg.ca/ppms/appconfig.ppd
> > >Error: Failed to download URL
> > >http://theoryx5.uwinnipeg.ca/ppms/appconfig.ppd: 404 Not Found
> >
> > It would help if you used the proper case - internet URLs are case
> > sensitive.
> >
> > <http://theoryx5.uwinnipeg.ca/ppms/AppConfig.ppd>
> >
>
> That depends on the underlying server/OS,
A server can decide to serve the same data for different URLs. That
doesn't make the URLs equivalent.
> and isn't even true in the
> strictest of senses as domains are never case sensitive.
So? The domain is only part of a URL. The rest of it is still
case-sensitive.
Anno
------------------------------
Date: Wed, 28 Jun 2006 11:15:31 +0100
From: Adam Funk <a24061@yahoo.com>
Subject: Re: References as hash keys (Srinivasan's "Advanced Perl Programming")?
Message-Id: <3f1an3-2dd.ln1@news.ducksburg.com>
On 2006-06-27, Jim Gibson <jgibson@mail.arc.nasa.gov> wrote:
> The practical effect is that if you do want to use references as hash
> keys, you can, but if you need to dereference the references, you can't
> store the references only in the hash as keys unless you use the
> Tie::RefHash module. Without using that module, you need to store the
> references separately from the hash, in an array for example, and
> dereference those values and use them as keys to the hash.
>
> Like the documents say, this is rarely necessary. I only contemplated
> doing it once in many years of Perl programming, and, although it
> worked, I soon found it unnecessary and abandoned it.
Interesting; thanks.
------------------------------
Date: 28 Jun 2006 06:34:24 -0700
From: "cyl" <u8526505@gmail.com>
Subject: replacement of slow unpack
Message-Id: <1151501663.992642.106040@i40g2000cwc.googlegroups.com>
this block of code took about 25 seconds in my computer
open A,"a_500mb_file";
binmode A;
while(sysread(A,$x,256)){
#do nothing
}
close A;
and this took 215 seconds
open A,"a_500mb_file";
binmode A;
while(sysread(A,$x,256)){
my @c=unpack('C*',$x);
}
close A;
So why is unpack so slow here? I thought it very fast before. Do I use
it in a wrong way or is there any replacement? Thanks.
------------------------------
Date: Wed, 28 Jun 2006 13:31:34 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Scalable method for searching in relatively big files
Message-Id: <mapt7e.st5.ln@teal.hjp.at>
KaZ wrote:
> KaZ wrote:
>> --------------------------------------------------------------------------------------------------------
>> open (FILEA, '<', "./filea.txt")
>>
>> while (defined($line = <FILEA>)) {
[...]
>> if ( $var3 eq "blah") {
[...]
>> }
>> --------------------------------------------------------------------------------------------------------
>
> Sorry, I made a mistake:
> each "if ($var eq "some_string")" is to be replaced by a sub which
> search in an excel list, of about 400 rows, using
> Spreadsheet::ParseExcel.
>
> I already used this sub in other scripts, and it was slow but still
> below 1 minute, so I thought, it was not the reason for the slowness
> here.
If you parse an excel file several times for each line of a 4 MB file,
you are probably parsing it about a hundredthousand times. No wonder
this is slow. You should parse the excel file once at startup, extract
the information you need and store it in an appropriate perl data
structure (most likely a hash). Then you can replace parsing your excel
sheet with a simple hash lookup.
> But if you think perl is able to process such a script much
> faster normally, then I have to make a text version of it.
Just avoid doing the same thing over and over again if you already know
the result.
hp
--
_ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
|_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
| | | hjp@hjp.at | würde.
__/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
------------------------------
Date: Wed, 28 Jun 2006 07:29:35 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Scalable method for searching in relatively big files
Message-Id: <slrnea4thf.tho.tadmc@magna.augustmail.com>
KaZ <kaz219@gawab.com> wrote:
> open (FILEA, '<', "./filea.txt")
You should always, yes *always*, check the return value from open():
open (FILEA, '<', './filea.txt') or die "could not open './filea.txt' $!";
> @line = split '\t', $line;
A pattern match should *look like* a pattern match:
@line = split /\t/, $line;
> $var0 = @line[0];
You should always enable warnings when developing Perl code!
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: 28 Jun 2006 05:56:25 -0700
From: "it_says_BALLS_on_your forehead" <simon.chao@fmr.com>
Subject: Re: Scalable method for searching in relatively big files
Message-Id: <1151499385.544491.64770@m73g2000cwd.googlegroups.com>
Tad McClellan wrote:
> KaZ <kaz219@gawab.com> wrote:
>
>
> A pattern match should *look like* a pattern match:
>
> @line = split /\t/, $line;
I agree with you, Tad. However, From the Programming Perl 3rd ed. book
pg. 63...
PICK YOUR OWN QUOTES
"You can use whichever nonalphanumeric, nonwhitespace delimiter you
like in place of '/'."
An interesting thing about the single quote - it's not supposed to
interpolate, and if the pattern is a variable, it won't. But in the
case of at least tab characters '\t', it does. This behavior is not
consistent with how tabs behave between single quotes with the print
function.
------------------------------
Date: Wed, 28 Jun 2006 09:10:49 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Scalable method for searching in relatively big files
Message-Id: <slrnea53f9.tot.tadmc@magna.augustmail.com>
it_says_BALLS_on_your forehead <simon.chao@fmr.com> wrote:
> Tad McClellan wrote:
>> KaZ <kaz219@gawab.com> wrote:
[ the snipped OP's code was: @line = split '\t', $line; ]
>> A pattern match should *look like* a pattern match:
>>
>> @line = split /\t/, $line;
>
> I agree with you, Tad. However, From the Programming Perl 3rd ed. book
> pg. 63...
>
> PICK YOUR OWN QUOTES
>
> "You can use whichever nonalphanumeric, nonwhitespace delimiter you
> like in place of '/'."
Please don't cite a resource with limited distribution when there
is a widely available resource that says the same thing (perlop.pod).
If "/" is the delimiter then the initial C<m> is optional.
With the C<m> you can use any pair of non-alphanumeric,
non-whitespace characters as delimiters.
But that doesn't apply to the OP's code, because it does not have the C<m>:
@line = split m'\t', $line;
The OP is not supplying a pattern as split's first arg, he is
supplying a string instead (which will then be forced into a pattern
by the DWIMer).
IMHO, the DWIMer is being rather too helpful in the OP's case, which
is why I made my comment in the first place.
The OP's code acts like a pattern match but does not look like a pattern match.
> An interesting thing about the single quote - it's not supposed to
> interpolate, and if the pattern is a variable, it won't. But in the
> case of at least tab characters '\t', it does. This behavior is not
> consistent with how tabs behave between single quotes with the print
> function.
Yet another reason to make the pattern *look like* a pattern then, yes?
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: 28 Jun 2006 07:27:17 -0700
From: "it_says_BALLS_on_your forehead" <simon.chao@fmr.com>
Subject: Re: Scalable method for searching in relatively big files
Message-Id: <1151504837.623933.309360@j72g2000cwa.googlegroups.com>
Tad McClellan wrote:
> it_says_BALLS_on_your forehead <simon.chao@fmr.com> wrote:
> > Tad McClellan wrote:
> >> KaZ <kaz219@gawab.com> wrote:
>
>
> [ the snipped OP's code was: @line = split '\t', $line; ]
>
>
> >> A pattern match should *look like* a pattern match:
> >>
> >> @line = split /\t/, $line;
> >
> > I agree with you, Tad. However, From the Programming Perl 3rd ed. book
> > pg. 63...
> >
> > PICK YOUR OWN QUOTES
> >
> > "You can use whichever nonalphanumeric, nonwhitespace delimiter you
> > like in place of '/'."
>
>
> Please don't cite a resource with limited distribution when there
> is a widely available resource that says the same thing (perlop.pod).
>
> If "/" is the delimiter then the initial C<m> is optional.
> With the C<m> you can use any pair of non-alphanumeric,
> non-whitespace characters as delimiters.
>
I see nothing wrong with citing one of the definitive Perl reference
books when I provide the quote.
>
> But that doesn't apply to the OP's code, because it does not have the C<m>:
>
> @line = split m'\t', $line;
>
> The OP is not supplying a pattern as split's first arg, he is
> supplying a string instead (which will then be forced into a pattern
> by the DWIMer).
That I did not know; interesting.
>
> IMHO, the DWIMer is being rather too helpful in the OP's case, which
> is why I made my comment in the first place.
>
> The OP's code acts like a pattern match but does not look like a pattern match.
>
>
> > An interesting thing about the single quote - it's not supposed to
> > interpolate, and if the pattern is a variable, it won't. But in the
> > case of at least tab characters '\t', it does. This behavior is not
> > consistent with how tabs behave between single quotes with the print
> > function.
>
>
> Yet another reason to make the pattern *look like* a pattern then, yes?
As I stated before, I agree with you.
------------------------------
Date: Wed, 28 Jun 2006 15:01:04 +0200
From: Mirco Wahab <peace.is.our.profession@gmx.de>
Subject: Re: Searching each element of an array with grep
Message-Id: <e7tupc$2jg$1@mlucom4.urz.uni-halle.de>
Thus spoke Ben Morrow (on 2006-06-27 22:34):
> It is slower, though. A BLOCK has a finite amount of bookkeeping:
>
> ~% perl -MBenchmark=cmpthese -e'
> ...
> It's always cleaner and often faster to say what you mean.
Uhhh!
This is a very interesting (for me, at least)
aspect of array processing - one I wasn't
aware of.
I didn't expect expression-grep to come out fastest
at almost all circumstances.
Thanks for this hint!
Mirco
==>
use Benchmark qw(cmpthese);
my @foo = ("foo") x 4000;
cmpthese 0, {
iteration => sub { my @bar; for(@foo){ push(bar,$_) if /foo/ } },
expr_map => sub { my @bar = map /foo/ ? $_ : (), @foo },
expr_grep => sub { my @bar = grep /foo/, @foo },
blck_map => sub { my @bar = map { /foo/ ? $_ : () } @foo },
blck_grep => sub { my @bar = grep { /foo/ } @foo },
};
------------------------------
Date: 28 Jun 2006 06:52:58 -0700
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Single-liner for one-line substitute?
Message-Id: <1151502778.400455.264450@d56g2000cwd.googlegroups.com>
anno4000@zrz.tu-berlin.de wrote:
> Mike Pearson <mwp@nospam.com> wrote in comp.lang.perl.misc:
> > perl -pi -e 's/old/new/g' file
> >
> > for a global search/replace in a file. I've tried to modify this to
> > change only a string on the first line of a file, leaving the string
> > unchanged elsewhere in the file, but I haven't been able to find a way
> > to do this. Simply removing the 'g' has no effect - it still does a
> > global replace.
>
> The match operator is applied to every line in the file. The /g
> modifier changes the behavior of each application. It does not
> work across applications.
>
> Here is one way:
>
> perl -pi -e '$. == 1 && s/old/new/g' file
That would still cause Perl to loop through the entire file, each time
checking the value of $., even though we know it will only match the
first time. I wonder if this might be "better":
perl -pi -e's/old/new/g; exit;' file
That way, regardless of the success or failure of the s///, the program
ends after the first iteration of the implicit while(<>) loop....
Paul Lalli
------------------------------
Date: 28 Jun 2006 07:13:23 -0700
From: "Andrew" <hawk007@flight.us>
Subject: Re: Single-liner for one-line substitute?
Message-Id: <1151504003.766228.234510@j72g2000cwa.googlegroups.com>
Paul Lalli wrote:
> anno4000@zrz.tu-berlin.de wrote:
> > Mike Pearson <mwp@nospam.com> wrote in comp.lang.perl.misc:
> > > perl -pi -e 's/old/new/g' file
> > >
> > > for a global search/replace in a file. I've tried to modify this to
> > > change only a string on the first line of a file, leaving the string
> > > unchanged elsewhere in the file, but I haven't been able to find a way
> > > to do this. Simply removing the 'g' has no effect - it still does a
> > > global replace.
> >
> > The match operator is applied to every line in the file. The /g
> > modifier changes the behavior of each application. It does not
> > work across applications.
> >
> > Here is one way:
> >
> > perl -pi -e '$. == 1 && s/old/new/g' file
>
> That would still cause Perl to loop through the entire file, each time
> checking the value of $., even though we know it will only match the
> first time. I wonder if this might be "better":
>
> perl -pi -e's/old/new/g; exit;' file
>
> That way, regardless of the success or failure of the s///, the program
> ends after the first iteration of the implicit while(<>) loop....
Interesting, Paul, i wasn't aware of the implicit "while".
I just tried your suggestion, however, and it mysteriously zeroed out
the file. Using "last;" in place of "exit;" yields the same empty-file
result.
I intuit you're on the right track (if, indeed, a "while (<>)" is
implied), but perhaps there needs to be an additional explicit command
preceding "exit" or "last", which forces the modified data to be
written back to the file?
andrew
------------------------------
Date: 28 Jun 2006 07:24:16 -0700
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Single-liner for one-line substitute?
Message-Id: <1151504656.436439.235890@m73g2000cwd.googlegroups.com>
Andrew wrote:
> Paul Lalli wrote:
> > anno4000@zrz.tu-berlin.de wrote:
> > > Mike Pearson <mwp@nospam.com> wrote in comp.lang.perl.misc:
> > > > perl -pi -e 's/old/new/g' file
> > > >
> > > > for a global search/replace in a file. I've tried to modify this to
> > > > change only a string on the first line of a file, leaving the string
> > > > unchanged elsewhere in the file, but I haven't been able to find a way
> > > > to do this. Simply removing the 'g' has no effect - it still does a
> > > > global replace.
> > >
> > > The match operator is applied to every line in the file. The /g
> > > modifier changes the behavior of each application. It does not
> > > work across applications.
> > >
> > > Here is one way:
> > >
> > > perl -pi -e '$. == 1 && s/old/new/g' file
> >
> > That would still cause Perl to loop through the entire file, each time
> > checking the value of $., even though we know it will only match the
> > first time. I wonder if this might be "better":
> >
> > perl -pi -e's/old/new/g; exit;' file
> >
> > That way, regardless of the success or failure of the s///, the program
> > ends after the first iteration of the implicit while(<>) loop....
>
> Interesting, Paul, i wasn't aware of the implicit "while".
Take a look at perldoc perlrun, for the -p and -n options.
> I just tried your suggestion, however, and it mysteriously zeroed out
> the file. Using "last;" in place of "exit;" yields the same empty-file
> result.
WHOOPS! You are absolutely right. I was completely forgetting how
the -p and -i options work, in that they print each line to the
newly-modified file right after that line has been read (and possibly
modified by the -e'' code). Definately cannot put an exit or last
there.
Profuse apologies to the OP and to Anno, for my erroneous "correction".
Paul Lalli
------------------------------
Date: Wed, 28 Jun 2006 15:54:02 +0100
From: Mike Pearson <mwp@nospam.com>
Subject: Re: Single-liner for one-line substitute?
Message-Id: <5v55a294gmubm7l2c9c6ioghi8jmej5q3o@4ax.com>
On 28 Jun 2006 09:54:51 GMT, anno4000@zrz.tu-berlin.de wrote:
> perl -pi -e '$. == 1 && s/old/new/g' file
Many thanks - that's done the job.
Mike
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 9398
***************************************