[32588] in Perl-Users-Digest
Perl-Users Digest, Issue: 3860 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Jan 13 16:09:19 2013
Date: Sun, 13 Jan 2013 13:09:04 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 13 Jan 2013 Volume: 11 Number: 3860
Today's topics:
Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
How do I get address of scalars? <nawglan@gmail.com>
Regular expression for BOM required <petergoATnetspace.net.au>
Re: Regular expression for BOM required <hjp-usenet2@hjp.at>
Re: Regular expression for BOM required <petergoATnetspace.net.au>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 12 Jan 2013 15:47:27 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <87txqm5row.fsf@sapphire.mobileactivedefense.com>
BobMCT <r.mariotti@fdcx.net> writes:
> On Fri, 11 Jan 2013 13:56:32 +0000, Rainer Weikusat
> <rweikusat@mssgmbh.com> wrote:
>
>>Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>>> "C.DeRykus" <derykus@gmail.com> writes:
> Just a thought, but did you ever consider loading the data into a
> temporary indexed database table and 'batch' updating it using the
> indexing keys?
As I wrote in a reply to an earlier posting: This would be a
perfect job for one of the available 'flat file' database packages,
eg, DB_File. But unless the same 'base data' file is processed more
than once, this means 'read the big file', 'write a big file', 'read
this big file', 'write another big file' and the replacement step
would turn into 'modify the big file'. I doubt that this would be
worth the effort.
------------------------------
Date: 13 Jan 2013 04:27:51 GMT
From: Dez <nawglan@gmail.com>
Subject: How do I get address of scalars?
Message-Id: <50f237c6$0$7141$c3e8da3$b1356c67@news.astraweb.com>
I am trying to detect circular references inside a hash. I was thinking
of doing this by tracking the address of the variables. However, the
test program I have below is showing the same address for the scalars in
the hash, while the references in the hash are showing as expected.
Basically, I'm getting a false positive for hash member id and name.
I tried the code at http://www.perlmonks.org/?node_id=406446 but it also
is showing the same address, which I assume is the address of my $ref,
not the address of the variable passed in.
Any ideas?
---------- Output of Test Program -------
parent hash ref = 164306792 type1 = HASH blessed =
key = arr ref = 163412544 type1 = ARRAY blessed =
key = circ ref = 164306812 type1 = SCALAR blessed =
key = circ2 ref = 164306792 type1 = HASH blessed =
key = circ3 ref = 163412544 type1 = ARRAY blessed =
key = code ref = 164340068 type1 = CODE blessed =
key = has ref = 164340768 type1 = HASH blessed =
key = id ref = 164339168 type1 = notaref blessed =
key = name ref = 164339168 type1 = notaref blessed =
key = obj ref = 164306732 type1 = SCALAR blessed = JSON
key = regex ref = 164306712 type1 = REGEXP blessed = Regexp
---------- Test Program Below -----------
use strict;
use warnings;
use B;
use JSON;
my %tmap = qw(
B::NULL SCALAR
B::HV HASH
B::AV ARRAY
B::CV CODE
B::IO IO
B::GV GLOB
B::REGEXP REGEXP
);
sub refaddr($) {
ref($_[0]) ? 0+$_[0] : undef
}
sub chktype {
my $r = shift;
return unless length(ref($r));
my $t = ref(B::svref_2object($r));
return
exists $tmap{$t} ? $tmap{$t}
: length(ref($$r)) ? 'REF'
: 'SCALAR';
}
sub walk($) {
my $ref = shift;
my $type1 = chktype $ref;
my $type2 = ref $ref;
if (!defined $type1) {
$type1 = 'notaref';
}
my $blessed = ($type1 ne $type2) ? $type2 : '';
# if ref is a reference, get address to which it points
my $addr = $type2 ? refaddr $ref : refaddr \$ref;
return "ref = $addr\ttype1 = $type1\tblessed = $blessed\n";
}
my $test_hash = {
name => 'Some Name',
arr => ['a','b','3','5'],
has => {
key1 => 'val1',
key2 => 'val2',
key3 => 'val3'
},
regex => qr/thisisaregex/x,
code => sub {return 1},
id => 4,
obj => JSON->new
};
$test_hash->{circ} = \$test_hash->{name};
$test_hash->{circ2} = $test_hash;
$test_hash->{circ3} = $test_hash->{arr};
print "\n";
print "parent hash\t" . walk $test_hash;
foreach my $key (sort keys %{$test_hash}) {
print "key = $key\t" . walk $test_hash->{$key};
}
print "\n";
------------------------------
Date: 12 Jan 2013 11:54:51 GMT
From: Peter Gordon <petergoATnetspace.net.au>
Subject: Regular expression for BOM required
Message-Id: <XnsA146DEF32763Apetergonetspacenetau@207.246.207.22>
#!/cygdrive/c/cygwin/bin/perl
use strict;
use warnings;
use 5.14.0;
open my $fh, '<:encoding(utf16le)', "00Tst.zpl" or die "File opening error
\n";
while( <$fh> ) {
say "Found regular expression" if /\xFE\xFF/;
# say "Found it!" if s/\A.*nm=//;
print;
}
# I'm trying to match a byte order mask in a file. Below is
# the start of an octal dump of the file.
# 0000000 177377 000156 000155 000075 000142 000157 000164 000164
# The line:
# say "Found it!" if s/\A.*nm=//;
# works correctly, but I can't write a regular expression which matches
# octal 0000000 177377 at the start of a line. Help with the
# regular expression would be appreciated.
# If it matters, I'm working on Windows 7.
------------------------------
Date: Sat, 12 Jan 2013 16:38:47 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Regular expression for BOM required
Message-Id: <slrnkf30s7.kis.hjp-usenet2@hrunkner.hjp.at>
On 2013-01-12 11:54, Peter Gordon <petergoATnetspace.net.au> wrote:
> #!/cygdrive/c/cygwin/bin/perl
> use strict;
> use warnings;
> use 5.14.0;
> open my $fh, '<:encoding(utf16le)', "00Tst.zpl" or die "File opening error
> \n";
> while( <$fh> ) {
> say "Found regular expression" if /\xFE\xFF/;
You want to match the single character U+FEFF BOM here, not a sequence
of two characters U+00FE LATIN SMALL LETTER THORN U+00FF LATIN SMALL
LETTER Y WITH DIAERESIS.
So you have to write
say "Found regular expression" if /\x{FEFF}/;
> print;
> }
>
> # I'm trying to match a byte order mask in a file. Below is
> # the start of an octal dump of the file.
> # 0000000 177377 000156 000155 000075 000142 000157 000164 000164
^^^^^^
The default output format of od (little endian 16 bit values in octal)
is confusing. Yes, 0xFEFF is 0177377 in octal, but 177377 looks too much
like 7FFF for me to do the bitshift intuitively in my head.
Better to use "od -tx1" or "od -tx2".
hp
--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
| | | hjp@hjp.at | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
------------------------------
Date: 12 Jan 2013 18:22:17 GMT
From: Peter Gordon <petergoATnetspace.net.au>
Subject: Re: Regular expression for BOM required
Message-Id: <XnsA1472C77C4D18petergonetspacenetau@216.151.153.22>
"Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in news:slrnkf30s7.kis.hjp-
usenet2@hrunkner.hjp.at:
> You want to match the single character U+FEFF BOM here, not a sequence
> of two characters U+00FE LATIN SMALL LETTER THORN U+00FF LATIN SMALL
> LETTER Y WITH DIAERESIS.
>
> So you have to write
>
> say "Found regular expression" if /\x{FEFF}/;
>
> print;
> }
>
Thanks Peter,
It was the curly braces which I was missing.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3860
***************************************