[32589] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3861 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jan 16 09:09:20 2013

Date: Wed, 16 Jan 2013 06:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 16 Jan 2013     Volume: 11 Number: 3861

Today's topics:
    Re: best way to make a few changes in a large data file <xhoster@gmail.com>
    Re: best way to make a few changes in a large data file <derykus@gmail.com>
    Re: best way to make a few changes in a large data file <ben@morrow.me.uk>
    Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
    Re: best way to make a few changes in a large data file <derykus@gmail.com>
    Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
    Re: How do I get address of scalars? <ben@morrow.me.uk>
    Re: How do I get address of scalars? <nawglan@gmail.com>
    Re: plural and singular syntax in Perl5, PHP and Perl6 <source@netcom.com>
    Re: plural and singular syntax in Perl5, PHP and Perl6 <gogala.mladen@gmail.com>
    Re: Regular expression for BOM required <bugbear@trim_papermule.co.uk_trim>
    Re: Regular expression for BOM required <hjp-usenet2@hjp.at>
    Re: Regular expression for BOM required <petergoATnetspace.net.au>
    Re: Regular expression for BOM required <hjp-usenet2@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 14 Jan 2013 19:24:45 -0800
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <50f4cd14$0$18903$862e30e2@ngroups.net>

On 01/09/2013 06:10 AM, C.DeRykus wrote:
>
> Since speed isn't critical, the Tie::File suggestion would simplify
> the code considerably. Since the whole file isn't loaded, big files
> won't be problematic

I haven't used it in a while, but if I recall correctly Tie::File stores 
the entire table of line-number/byte-offset in RAM, and that can often 
be about as large as storing the entire file if the lines are fairly short.	

Xho


------------------------------

Date: Tue, 15 Jan 2013 02:08:27 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <13473acd-fd35-414f-a2e0-58afe93f05c4@googlegroups.com>

On Monday, January 14, 2013 7:24:45 PM UTC-8, Xho Jingleheimerschmidt wrote:
> On 01/09/2013 06:10 AM, C.DeRykus wrote:
> 
> >
> 
> > Since speed isn't critical, the Tie::File suggestion would simplify
> 
> > the code considerably. Since the whole file isn't loaded, big files
> 
> > won't be problematic
> 
> 
> 
> I haven't used it in a while, but if I recall correctly Tie::File stores 
> 
> the entire table of line-number/byte-offset in RAM, and that can often 
> 
> be about as large as storing the entire file if the lines are fairly short.	
> 
> 

Actually IIUC, Tie::File is more parsimonious of memory than even DB_File for instance and employs a 
"lazy cache" whose size can be user-specified.

See: http://perl.plover.com/TieFile/why-not-DB_File

So, even with overhead of 310 bytes per record, that 
would get slow only if the file gets really huge and 
least-recently read records start to get tossed.
But the stated aim was accuracy rather than speed.

And, since there's a 10Mb record limit with only 200-300K records, that's unlikely to be show-stopper status. Only a couple of seconds to read a comparably sized file in my simple test. 

-- 
Charles DeRykus


------------------------------

Date: Tue, 15 Jan 2013 19:49:27 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <7fogs9-3bq2.ln1@anubis.morrow.me.uk>


Quoth "C.DeRykus" <derykus@gmail.com>:
> On Monday, January 14, 2013 7:24:45 PM UTC-8, Xho Jingleheimerschmidt wrote:
> > 
> > I haven't used it in a while, but if I recall correctly Tie::File stores 
> > the entire table of line-number/byte-offset in RAM, and that can often 
> > be about as large as storing the entire file if the lines are fairly short.	
> 
> Actually IIUC, Tie::File is more parsimonious of memory than even
> DB_File for instance and employs a 
> "lazy cache" whose size can be user-specified.

Xho is correct (check the source). Tie::File keeps an array containing
the byte-offset of every line up to the last line you have accessed so
far, and that array is *not* counted against the cache. On a 64bit
machine an array element is 24 bytes, so if your lines are shorter than
that reading a file into Tie::File will use more memory than slurping
the whole thing. (Of course, having slurped you then have to manipulate
the data, which will use more memory.)

Ben



------------------------------

Date: Tue, 15 Jan 2013 20:40:32 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <87y5fu2n9b.fsf@sapphire.mobileactivedefense.com>

"C.DeRykus" <derykus@gmail.com> writes:

[...]

> Tie::File is more parsimonious of memory than even DB_File for instance and employs a 
> "lazy cache" whose size can be user-specified.
>
> See: http://perl.plover.com/TieFile/why-not-DB_File
>
> So, even with overhead of 310 bytes per record, that 
> would get slow only if the file gets really huge and 
> least-recently read records start to get tossed.
> But the stated aim was accuracy rather than speed.

Nevertheless, Tie::File not only needs *much* more memory than a
line-by-line processing loop (~5000 bytes vs 138M for a 63M file) but
is also atrociously slow: Replacing 10 randomly selected lines in a
53,248 lines file with a total size of 251K needs (on the system
where I tested this) about 0.02s when reading but about 0.51s when
using Tie::File (and it is probably still completely unsuitable to
solve the original problem to begin with).



------------------------------

Date: Tue, 15 Jan 2013 13:26:11 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <38f8bc52-ab87-4efb-8a95-f771873b25c2@googlegroups.com>

On Tuesday, January 15, 2013 12:40:32 PM UTC-8, Rainer Weikusat wrote:
> "C.DeRykus" <derykus@gmail.com> writes:
>=20
>=20
>=20
> [...]
>=20
>=20
>=20
> > Tie::File is more parsimonious of memory than even DB_File for instance=
 and employs a=20
>=20
> > "lazy cache" whose size can be user-specified.
>=20
> >
>=20
> > See: http://perl.plover.com/TieFile/why-not-DB_File
>=20
> >
>=20
> > So, even with overhead of 310 bytes per record, that=20
>=20
> > would get slow only if the file gets really huge and=20
>=20
> > least-recently read records start to get tossed.
>=20
> > But the stated aim was accuracy rather than speed.
>=20
>=20
>=20
> Nevertheless, Tie::File not only needs *much* more memory than a
>=20
> line-by-line processing loop (~5000 bytes vs 138M for a 63M file) but
>=20
> is also atrociously slow: Replacing 10 randomly selected lines in a
>=20
> 53,248 lines file with a total size of 251K needs (on the system
>=20
> where I tested this) about 0.02s when reading but about 0.51s when
>=20
> using Tie::File (and it is probably still completely unsuitable to
>=20
> solve the original problem to begin with).

In general I'd agree. But there's an upper bound of 10M records. If that sc=
enario changed or some threshold was impacted, you could re-design. But, wh=
o cares here if you lose a second of runtime... or memory bumps during that=
 short window. The OP said accuracy - not speed - was the objective: "it wo=
uldn't matter if it took 5 seconds to run or 5 minutes to run, as long as i=
t produces the correct results."=20

The code becomes simpler, more intuitive, timelier. You can quickly move on=
 ... to more pressing/interesting/challenging issues. =20

--=20
Charles DeRykus 


------------------------------

Date: Tue, 15 Jan 2013 22:25:10 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <87txqi2iex.fsf@sapphire.mobileactivedefense.com>

"C.DeRykus" <derykus@gmail.com> writes:

[...]

>> Nevertheless, Tie::File not only needs *much* more memory than a
>> line-by-line processing loop (~5000 bytes vs 138M for a 63M file) but
>> is also atrociously slow: Replacing 10 randomly selected lines in a
>> 53,248 lines file with a total size of 251K needs (on the system
>> where I tested this) about 0.02s when reading but about 0.51s when
>> using Tie::File (and it is probably still completely unsuitable to
>> solve the original problem to begin with).
>
> In general I'd agree. But there's an upper bound of 10M records. If
> that scenario changed or some threshold was impacted, you could
> re-design. But, who cares here if you lose a second of runtime... or
> memory bumps during that short window. The OP said accuracy - not
> speed - was the objective: "it wouldn't matter if it took 5 seconds
> to run or 5 minutes to run, as long as it produces the correct
> results."
>
> The code becomes simpler, more intuitive, timelier. You can quickly
> move on... to more pressing/interesting/challenging issues.

The code does not 'become simpler', it becomes a lot more complicated.
Not even the 'front-end code' which needs to be written specifically for
this is shorter than a sensible (meaning, it performs well)
implementation without Tie::File since it was 8 lines of code in both
cases. A 'performance doesn't matter' implementation can be shorter
than that, as demonstrated. IMO, this is really an example of using a
module because it exists, despite it isn't suitable for solving the
described problem, is a lot more complicated than just using the
facilities already provided by perl and is vastly technically inferior
to these as well.


------------------------------

Date: Mon, 14 Jan 2013 01:13:45 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How do I get address of scalars?
Message-Id: <9n2cs9-7er.ln1@anubis.morrow.me.uk>


Quoth Dez <nawglan@gmail.com>:
> I am trying to detect circular references inside a hash.  I was thinking 
> of doing this by tracking the address of the variables.  However, the 
> test program I have below is showing the same address for the scalars in 
> the hash, while the references in the hash are showing as expected.  
> Basically, I'm getting a false positive for hash member id and name.

Are you aware of Devel::Cycle? I think it does what you want, and it's
not quite as easy as you might think to do it correctly. However,
writing a function like this is a good way to learn about Perl's
reference model, so if you're doing it for that reason that's not a bad
idea.

> I tried the code at http://www.perlmonks.org/?node_id=406446 but it also 
> is showing the same address, which I assume is the address of my $ref, 
> not the address of the variable passed in.
> 
> Any ideas?
> 
> ---------- Output of Test Program -------
> 
> parent hash     ref = 164306792 type1 = HASH    blessed = 
> key = arr       ref = 163412544 type1 = ARRAY   blessed = 
> key = circ      ref = 164306812 type1 = SCALAR  blessed = 
> key = circ2     ref = 164306792 type1 = HASH    blessed = 
> key = circ3     ref = 163412544 type1 = ARRAY   blessed = 
> key = code      ref = 164340068 type1 = CODE    blessed = 
> key = has       ref = 164340768 type1 = HASH    blessed = 
> key = id        ref = 164339168 type1 = notaref blessed = 
> key = name      ref = 164339168 type1 = notaref blessed = 
> key = obj       ref = 164306732 type1 = SCALAR  blessed = JSON
> key = regex     ref = 164306712 type1 = REGEXP  blessed = Regexp
> 
> ---------- Test Program Below -----------
> 
> use strict;
> use warnings;
> 
> use B;
> use JSON;
> 
> my %tmap = qw(
>     B::NULL   SCALAR
>     B::HV     HASH
>     B::AV     ARRAY
>     B::CV     CODE
>     B::IO     IO
>     B::GV     GLOB
>     B::REGEXP REGEXP
> );
> 
> sub refaddr($) {
>   ref($_[0]) ? 0+$_[0] : undef
> }

    use Scalar::Util qw/refaddr/;

> sub chktype {
>   my $r = shift;
> 
>   return unless length(ref($r));
> 
>   my $t = ref(B::svref_2object($r));
> 
>   return
>       exists $tmap{$t} ? $tmap{$t}
>     : length(ref($$r)) ? 'REF'
>     :                    'SCALAR';
> }

    use Scalar::Util qw/reftype/;

> sub walk($) {

Don't use prototypes unless you have a very good reason. In this case
forcing scalar context on the argument is far more likely to be
confusing than helpful. 

(About the only useful use for a ($) prototype is for functions
imitating ref(), so Scalar::Util's refaddr, reftype and blessed are
($)-prototyped.)

>   my $ref = shift;

You've just copied $_[0] into $ref. This means that when you check
refaddr \$ref below, this will not give you the address of the variable
you passed into the function, it will give you the address of the local
$ref variable. Under normal circumstances (if the function is not
reentrant and doesn't store refs to local variables somewhere external)
this will give you the same address every time.

What you want to do instead is take a reference to $_[0]: @_ is passed
as aliases, so this will give you a reference to the original variable
that was passed in.

    my $ref = \$_[0];

You will obviously then need to replace $ref with $$ref throughout.

>   my $type1 = chktype $ref;
>   my $type2 = ref $ref;
> 
>   if (!defined $type1) {
>     $type1 = 'notaref';
>   }
> 
>   my $blessed = ($type1 ne $type2) ? $type2 : ''; 

    use Scalar::Util qw/blessed/;

> 
>   # if ref is a reference, get address to which it points
>   my $addr = $type2 ? refaddr $ref : refaddr \$ref;

I'm not sure if this is just a work-in-progress, but even when it's
working this will only detect circular refs of the form

    $test_hash->{circ} = \$test_hash->{name};

Other forms, such as

    $test_hash->{self} = \$test_hash->{self};

    $test_hash->{ouro} = \$test_hash->{bouros};
    $test_hash->{bouros} = \$test_hash->{ouro};

will require checking refaddr(\$_[0]) against a hash regardless of its
type, storing it in the hash if it wasn't there, and then recursing
through its referent if $_[0] is a ref itself. 

Ben



------------------------------

Date: 14 Jan 2013 02:35:40 GMT
From: Dez <nawglan@gmail.com>
Subject: Re: How do I get address of scalars?
Message-Id: <50f36efb$0$21005$c3e8da3$1cbc7475@news.astraweb.com>

On Mon, 14 Jan 2013 01:13:45 +0000, Ben Morrow wrote:

> Quoth Dez <nawglan@gmail.com>:
>> I am trying to detect circular references inside a hash.  I was
>> thinking of doing this by tracking the address of the variables. 
>> However, the test program I have below is showing the same address for
>> the scalars in the hash, while the references in the hash are showing
>> as expected. Basically, I'm getting a false positive for hash member id
>> and name.
> 
> Are you aware of Devel::Cycle? I think it does what you want, and it's
> not quite as easy as you might think to do it correctly. However,
> writing a function like this is a good way to learn about Perl's
> reference model, so if you're doing it for that reason that's not a bad
> idea.
> 
>> I tried the code at http://www.perlmonks.org/?node_id=406446 but it
>> also is showing the same address, which I assume is the address of my
>> $ref, not the address of the variable passed in.
>> 
>> Any ideas?
>> 
>> ---------- Output of Test Program -------
>> 
>> parent hash     ref = 164306792 type1 = HASH    blessed =
>> key = arr       ref = 163412544 type1 = ARRAY   blessed =
>> key = circ      ref = 164306812 type1 = SCALAR  blessed =
>> key = circ2     ref = 164306792 type1 = HASH    blessed =
>> key = circ3     ref = 163412544 type1 = ARRAY   blessed =
>> key = code      ref = 164340068 type1 = CODE    blessed =
>> key = has       ref = 164340768 type1 = HASH    blessed =
>> key = id        ref = 164339168 type1 = notaref blessed =
>> key = name      ref = 164339168 type1 = notaref blessed =
>> key = obj       ref = 164306732 type1 = SCALAR  blessed = JSON key =
>> regex     ref = 164306712 type1 = REGEXP  blessed = Regexp
>> 
>> ---------- Test Program Below -----------
>> 
>> use strict;
>> use warnings;
>> 
>> use B;
>> use JSON;
>> 
>> my %tmap = qw(
>>     B::NULL   SCALAR B::HV     HASH B::AV     ARRAY B::CV     CODE
>>     B::IO     IO B::GV     GLOB B::REGEXP REGEXP
>> );
>> 
>> sub refaddr($) {
>>   ref($_[0]) ? 0+$_[0] : undef
>> }
> 
>     use Scalar::Util qw/refaddr/;
> 
>> sub chktype {
>>   my $r = shift;
>> 
>>   return unless length(ref($r));
>> 
>>   my $t = ref(B::svref_2object($r));
>> 
>>   return
>>       exists $tmap{$t} ? $tmap{$t}
>>     : length(ref($$r)) ? 'REF'
>>     :                    'SCALAR';
>> }
> 
>     use Scalar::Util qw/reftype/;
> 
>> sub walk($) {
> 
> Don't use prototypes unless you have a very good reason. In this case
> forcing scalar context on the argument is far more likely to be
> confusing than helpful.
> 
> (About the only useful use for a ($) prototype is for functions
> imitating ref(), so Scalar::Util's refaddr, reftype and blessed are
> ($)-prototyped.)
> 
>>   my $ref = shift;
> 
> You've just copied $_[0] into $ref. This means that when you check
> refaddr \$ref below, this will not give you the address of the variable
> you passed into the function, it will give you the address of the local
> $ref variable. Under normal circumstances (if the function is not
> reentrant and doesn't store refs to local variables somewhere external)
> this will give you the same address every time.
> 
> What you want to do instead is take a reference to $_[0]: @_ is passed
> as aliases, so this will give you a reference to the original variable
> that was passed in.
> 
>     my $ref = \$_[0];
> 
> You will obviously then need to replace $ref with $$ref throughout.
> 
>>   my $type1 = chktype $ref;
>>   my $type2 = ref $ref;
>> 
>>   if (!defined $type1) {
>>     $type1 = 'notaref';
>>   }
>> 
>>   my $blessed = ($type1 ne $type2) ? $type2 : '';
> 
>     use Scalar::Util qw/blessed/;
> 
> 
>>   # if ref is a reference, get address to which it points my $addr =
>>   $type2 ? refaddr $ref : refaddr \$ref;
> 
> I'm not sure if this is just a work-in-progress, but even when it's
> working this will only detect circular refs of the form
> 
>     $test_hash->{circ} = \$test_hash->{name};
> 
> Other forms, such as
> 
>     $test_hash->{self} = \$test_hash->{self};
> 
>     $test_hash->{ouro} = \$test_hash->{bouros};
>     $test_hash->{bouros} = \$test_hash->{ouro};
> 
> will require checking refaddr(\$_[0]) against a hash regardless of its
> type, storing it in the hash if it wasn't there, and then recursing
> through its referent if $_[0] is a ref itself.
> 
> Ben

Thanks for the reply.  I'm doing this as a programming exercise. 8)  
Basically, I'm trying to flatten a hash into an array of key/value pairs, 
keeping track of parent / child relationships.  I want to be able to 
track references so that when I rebuild the object, I can restore 
internal references (circular) if they exist.  The reason for all this is 
to attempt to write a dbd storage module.  I know about Devel::Cycle, but 
as my main goal is a dbd module, I don't want to depend on it.  I'm also 
doing all this to learn a little more about the internals of perl.  I've 
been following perl5porters for a while now, and the traffic is very 
interesting.

-- 
Dez.


------------------------------

Date: Sun, 13 Jan 2013 20:44:45 -0800
From: David Harmon <source@netcom.com>
Subject: Re: plural and singular syntax in Perl5, PHP and Perl6
Message-Id: <KKidnX6CkYv4EG7NnZ2dnUVZ_qOdnZ2d@earthlink.com>

On Sat, 05 Jan 2013 19:37:42 -0500 in comp.lang.perl.misc, Shmuel
(Seymour J.) Metz <spamtrap@library.lspace.org.invalid> wrote,
>dies. From the traffic in CLP, it appears that real people *don't*
>find the Perl 5 use of sigils to be intuitive,

A big part is the fault of the tutorials/books.  

They will say something like a simple variable has a $ in front and
an array has @.  Then comes $foos[$i];  foos is an array, so where
is its @ sign?  It's often a long wait for a clear answer to that.



------------------------------

Date: Tue, 15 Jan 2013 22:38:37 +0000 (UTC)
From: Mladen Gogala <gogala.mladen@gmail.com>
Subject: Re: plural and singular syntax in Perl5, PHP and Perl6
Message-Id: <kd4lpd$d57$1@solani.org>

On Sun, 13 Jan 2013 20:44:45 -0800, David Harmon wrote:

> They will say something like a simple variable has a $ in front and an
> array has @.  Then comes $foos[$i];  foos is an array, so where is its @
> sign?  It's often a long wait for a clear answer to that.

And then there is this, which is much more clear and understandable:

$bar=\@foo;

$x=$bar->[$i]

bless $bar, $God;

-- 
http://mgogala.byethost5.com


------------------------------

Date: Mon, 14 Jan 2013 10:12:31 +0000
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: Re: Regular expression for BOM required
Message-Id: <DvednaomXbCSR27NnZ2dnUVZ8qednZ2d@brightview.co.uk>

Peter Gordon wrote:
> "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in news:slrnkf30s7.kis.hjp-
> usenet2@hrunkner.hjp.at:
>
>> You want to match the single character U+FEFF BOM here, not a sequence
>> of two characters U+00FE LATIN SMALL LETTER THORN U+00FF LATIN SMALL
>> LETTER Y WITH DIAERESIS.
>>
>> So you have to write
>>
>>         say "Found regular expression" if /\x{FEFF}/;
>>
>>       print;
>> }
>>
> Thanks Peter,
> It was the curly braces which I was missing.
>

Presumably you also have to check for the "other order" ?

  BugBear


------------------------------

Date: Mon, 14 Jan 2013 12:21:50 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Regular expression for BOM required
Message-Id: <slrnkf7qie.n4u.hjp-usenet2@hrunkner.hjp.at>

On 2013-01-14 10:12, bugbear <bugbear@trim_papermule.co.uk_trim> wrote:
> Peter Gordon wrote:
>> "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in news:slrnkf30s7.kis.hjp-
>> usenet2@hrunkner.hjp.at:
>>
>>> You want to match the single character U+FEFF BOM here, not a sequence
>>> of two characters U+00FE LATIN SMALL LETTER THORN U+00FF LATIN SMALL
>>> LETTER Y WITH DIAERESIS.
>>>
>>> So you have to write
>>>
>>>         say "Found regular expression" if /\x{FEFF}/;
>>>
>>>       print;
>>> }
>>>
>> Thanks Peter,
>> It was the curly braces which I was missing.
>>
>
> Presumably you also have to check for the "other order" ?

No. After decoding there is no byte order any more, just characters, and
the character you want to match is \x{FEFF}. 

If you try to open a big-endian file with :encoding(utf16le), the script
will die trying to read the first line.

(If you open it with :encoding(utf16), the BOM will be used to determine
endianness and *not* passed through - this seems a little inconsistent
to me)

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: 14 Jan 2013 21:04:42 GMT
From: Peter Gordon <petergoATnetspace.net.au>
Subject: Re: Regular expression for BOM required
Message-Id: <XnsA1494802620C9petergonetspacenetau@216.151.153.22>

bugbear <bugbear@trim_papermule.co.uk_trim> wrote in 
news:DvednaomXbCSR27NnZ2dnUVZ8qednZ2d@brightview.co.uk:

> Peter Gordon wrote:
>> "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in 
news:slrnkf30s7.kis.hjp-
>> usenet2@hrunkner.hjp.at:
>>
>>> You want to match the single character U+FEFF BOM here, not a 
sequence
>>> of two characters U+00FE LATIN SMALL LETTER THORN U+00FF LATIN SMALL
>>> LETTER Y WITH DIAERESIS.
>>>
>>> So you have to write
>>>
>>>         say "Found regular expression" if /\x{FEFF}/;
>>>
>>>       print;
>>> }
>>>
>> Thanks Peter,
>> It was the curly braces which I was missing.
>>
> 
> Presumably you also have to check for the "other order" ?
> 
>   BugBear
The files I'm editing are the playlists of Zoomplayer which is
an Israeli media player, thus they are consistent in their Unicode
and format.  Is there a method for getting Unicode to work with
the combination of the diamond operator and In-place editing?
The code below runs fine when run as a program eg: $insertTT.pl aa.zpl
but crashes when I try to run it with the -i command line option. eg:
$perl -i insertTT.pl aa.zpl

#!/cygdrive/c/cygwin/bin/perl
# Used to insert a "tt=NUMBER: " line in a new .df files.
use strict;
use warnings;
use 5.14.0;
use Encode qw(encode decode);
use open qw(:std IN :encoding(utf16-le));

# $^I = ".bak";
my $first = 1;
while( <> ) {
    	my $line = $_;
    	if ( $first == 1 ) {
    	    	$line =~ s/\x{FEFF}nm=(.*)/nm=$1/;
    	    	$first = 0;
    	}
    	$line = decode("utf8", $line);
	print $line;
	if (  $line =~ /nm=/ ) {
		my $num = $line;
		chomp($num);
	   $num	=~ s/nm=.*?(\d+).*/$1/;
	   print "tt=$num: \n";
    	}
}



------------------------------

Date: Tue, 15 Jan 2013 15:34:44 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Regular expression for BOM required
Message-Id: <slrnkfaq84.ms8.hjp-usenet2@hrunkner.hjp.at>

On 2013-01-14 21:04, Peter Gordon <petergoATnetspace.net.au> wrote:
> The code below runs fine when run as a program eg: $insertTT.pl aa.zpl
> but crashes when I try to run it with the -i command line option. eg:

If perl crashes you should file a bug report.

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3861
***************************************


home help back first fref pref prev next nref lref last post