[31291] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 2536 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 5 06:09:43 2009

Date: Wed, 5 Aug 2009 03:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 5 Aug 2009     Volume: 11 Number: 2536

Today's topics:
    Re: C api to implement perl storable (freeze) <xhoster@gmail.com>
        daemon helper <jak@isp2dial.com>
    Re: Determining if reference $ref is a Hash <usenet05@drabble.me.uk>
    Re: Determining if reference $ref is a Hash <ben@morrow.me.uk>
    Re: FAQ 7.27 How can I comment out a large block of per <cartercc@gmail.com>
    Re: FAQ 7.27 How can I comment out a large block of per <ben@morrow.me.uk>
    Re: FAQ 7.27 How can I comment out a large block of per <cartercc@gmail.com>
        New computer forums <eldelmich_2000@yahoo.com>
    Re: problem with fork <xhoster@gmail.com>
    Re: Problem with procentage character when reading BAT  <tadmc@seesig.invalid>
    Re: Problem with procentage character when reading BAT  <tadmc@seesig.invalid>
    Re: Problem with procentage character when reading BAT  <wxdeveloper@gmail.com>
    Re: Using Excel without Macros <justin.0903@purestblue.com>
        UTF -  SEEK_SET workaround for BOM encoding(utf-16/32)  sln@netherlands.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 03 Aug 2009 21:15:19 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: C api to implement perl storable (freeze)
Message-Id: <4a78ce7e$1$4841$ed362ca5@nr5-q3a.newsreader.com>

Qiang wrote:
> I have a perl client server program to exchange data using perl
> Storable module. Now I wanted to convert the server program to c++.
> Does anyone know is there a C API to implement something similar to
> perl freeze? 

The guts of the Storable module itself are written in C.  But where are 
you going to get a Perl data structure to freeze with it?

Xho


------------------------------

Date: Wed, 05 Aug 2009 03:58:21 +0000
From: John Kelly <jak@isp2dial.com>
Subject: daemon helper
Message-Id: <jq0i75ltu3b2dq238qq0m9nurfroifaq80@4ax.com>


daemon helper: start any program or script as a daemon.

It's a small C program with a simple interface and a liberal license. It
already does what I want it to do; I am interested in correctness
patches more than new features, unless they appear to be universally
desirable.

Hope you find it useful.

ftp://ftp.isp2dial.com/users/jak/src/dh/


-- 
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html
 


------------------------------

Date: Tue, 04 Aug 2009 12:18:58 +0100
From: Graham Drabble <usenet05@drabble.me.uk>
Subject: Re: Determining if reference $ref is a Hash
Message-Id: <Xns9C5D7D49D688Fgrahamdrabblelineone@drabble.me.uk>

On 04 Aug 2009 Michaelp <michaelp@hio.no> wrote in news:200ea0e8-4a75-
4ce3-8d06-2bc372b79961@o6g2000yqj.googlegroups.com:

> Hello!
> 
> How do I cleanly determine if $ref refers to a hash?
> 
> To be used like:
> if (isHash ($ref)){
>           # traverse the hash
>}

perldoc -f ref

if (ref($ref) eq "HASH") {

-- 
Graham Drabble
http://www.drabble.me.uk/


------------------------------

Date: Tue, 4 Aug 2009 17:23:07 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Determining if reference $ref is a Hash
Message-Id: <bs3mk6-6at2.ln1@osiris.mauzo.dyndns.org>


Quoth Graham Drabble <usenet05@drabble.me.uk>:
> On 04 Aug 2009 Michaelp <michaelp@hio.no> wrote in news:200ea0e8-4a75-
> 4ce3-8d06-2bc372b79961@o6g2000yqj.googlegroups.com:
> 
> > How do I cleanly determine if $ref refers to a hash?
> > 
> > To be used like:
> > if (isHash ($ref)){
> >           # traverse the hash
> >}
> 
> perldoc -f ref
> 
> if (ref($ref) eq "HASH") {

    use Scalar::Util qw/reftype/;

    if (reftype $ref eq "HASH") {

is safer. Consider

    ref bless [], "HASH";

If you are dealing with arbitrary scalars, you probably also want to
check Scalar::Util::blessed: you shouldn't usually be poking about in
the innards of an object just because it happens to be a hashref.

Ben



------------------------------

Date: Tue, 4 Aug 2009 08:33:07 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: FAQ 7.27 How can I comment out a large block of perl code?
Message-Id: <f79cb1f9-5c16-4662-b14a-d8efd5decf11@k30g2000yqf.googlegroups.com>

On Aug 2, 9:52=A0pm, Keith Thompson <ks...@mib.org> wrote:
> IHMO it's also worth mentioning that you can simply insert a '#'
> character at the beginning of each line. =A0This has the advantage
> that you can see which lines are commented out without having to
> scan for POD directives that might be some distance away. =A0It's
> also easy to do with any decent editor.

Using vi, comment blocks using the ed command (the colon, ':') with
the line numbers and a substitution, as for example, to comment lines
34 through 67

:34,67 s/^/#/

This command will place the comment character at the beginning of
lines 34 through 67. To uncomment lines, do this

:34,44 s/#//

This removes the comment character from lines 34 through 44.

CC


------------------------------

Date: Tue, 4 Aug 2009 17:19:14 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: FAQ 7.27 How can I comment out a large block of perl code?
Message-Id: <2l3mk6-6at2.ln1@osiris.mauzo.dyndns.org>


Quoth ccc31807 <cartercc@gmail.com>:
> On Aug 2, 9:52 pm, Keith Thompson <ks...@mib.org> wrote:
> > IHMO it's also worth mentioning that you can simply insert a '#'
> > character at the beginning of each line.  This has the advantage
> > that you can see which lines are commented out without having to
> > scan for POD directives that might be some distance away.  It's
> > also easy to do with any decent editor.
> 
> Using vi, comment blocks using the ed command (the colon, ':') with
<snip>

I am confused. Why did you think it was useful to post this? Working out
how to use your editor is not a topic for clpmisc.

Ben



------------------------------

Date: Tue, 4 Aug 2009 13:49:30 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: FAQ 7.27 How can I comment out a large block of perl code?
Message-Id: <7ca99afc-952d-402c-82f7-93e59d63b61c@b14g2000yqd.googlegroups.com>

On Aug 4, 12:19=A0pm, Ben Morrow <b...@morrow.me.uk> wrote:
> I am confused. Why did you think it was useful to post this? Working out
> how to use your editor is not a topic for clpmisc.
>
> Ben

I had nothing to do at work and was bored stiff. ;-)

Actually, I was responding to the post about emacs and cperl, not the
original post. I've spent the last month or so trying to make heads
and tails of emacs, without too much success, and seeing the smiley
following 'emacs' just tempted me too much. I don't know if emacs has
the equivalent of ed commands, but I use ed a great deal, and my
frustration level with emacs has reached a high level. Sorry if my OT
post offended you.

CC



------------------------------

Date: Tue, 4 Aug 2009 04:20:57 -0700 (PDT)
From: Premium Business Training <eldelmich_2000@yahoo.com>
Subject: New computer forums
Message-Id: <e0f23cd3-8446-425d-a8e5-25db6e8ea96e@32g2000yqj.googlegroups.com>

Hello dear fellow programmers,

You are invited to come help and share your passion for programming at
http://www.mycomputerforum.com , a website featuring various forums in
computer science, web development and computer hardware. See you
there !

Regards,
Eldel


------------------------------

Date: Mon, 03 Aug 2009 21:25:31 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: problem with fork
Message-Id: <4a78ce7f$0$4841$ed362ca5@nr5-q3a.newsreader.com>

friend.05@gmail.com wrote:
> My scipt is using fork so there are child processes. Sometimes my
> script runs properly and output is correct. But sometime my script
> gets stuck just after exiting child process. And this does happen
> always.

I am confused.  The freeze happens sometimes, or it happens always?

<I've applied some reformatting>

> 
> foreach my $w (keys %worklist) {
> 	my $child;
> 	unless ($child = fork()) {
> 	          die("connot for: $!") unless defined $child;
>                 foreach my $file (@{$worklist{$w}}) {
>                           #reading files processing of data and creating hash tables.
>                 }
>                 print "Worker $w exiting\n"; 

I'd print $$ here as well.

>                 #it gets stuck after printing this statement

How do you know?  What are you expecting to see that you do not see?
What system monitoring tool do you use to know that it has not exited?

> 
>                 exit;
> 	}
>            push(@workers, $child);   #array of child PID
> }
> 
> #wating for each child to finish.
> foreach my $pid (@workers) {

         print "About to wait for $pid\n";

>       $s = waitpid($pid, 0);
> 	print "$s finished\n";
> }
> 
> 

Xho


------------------------------

Date: Tue, 04 Aug 2009 06:35:03 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: Problem with procentage character when reading BAT file to perl  array
Message-Id: <slrnh7g6pj.1pf.tadmc@tadmc30.sbcglobal.net>

wxdeveloper <wxdeveloper@gmail.com> wrote:

> Printf allows me to provide a handle to a file where I want to store
> the result... Could I send the result to a file without printf ?


Have you read the documentation for the function you are using?

    perldoc -f printf

The 3rd word of the description names another function that can
be passed a filehandle, but that does not treat percent signs specially.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Tue, 04 Aug 2009 10:34:05 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: Problem with procentage character when reading BAT file to perl  array
Message-Id: <slrnh7gkpo.4t9.tadmc@tadmc30.sbcglobal.net>

wxdeveloper <wxdeveloper@gmail.com> wrote:
>> Have you read the documentation for the function you are using?
>>
>>     perldoc -f printf
>>
>> The 3rd word of the description names another function that can
>> be passed a filehandle, but that does not treat percent signs specially.
>>
>
> Using print with a file handle solves my problem.
>
>
> Thank you for your help


Please note that this was my most helpful contribution to this thread:

    Have you read the documentation for the function you are using?

That should be part of your Standard Operating Procedure for programming.

If it had been your SOP, you would have solved your problem in
a few seconds rather than the few minutes/hours/days that posting
to Usenet typically takes.


Work smart, use the docs!


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Tue, 4 Aug 2009 05:22:29 -0700 (PDT)
From: wxdeveloper <wxdeveloper@gmail.com>
Subject: Re: Problem with procentage character when reading BAT file to perl  array
Message-Id: <b3468396-1e36-40f9-8123-6761213f2f6b@s31g2000yqs.googlegroups.com>

> Have you read the documentation for the function you are using?
>
> =A0 =A0 perldoc -f printf
>
> The 3rd word of the description names another function that can
> be passed a filehandle, but that does not treat percent signs specially.
>

Using print with a file handle solves my problem.


Thank you for your help
J.




------------------------------

Date: Tue, 04 Aug 2009 13:57:40 -0000
From: Justin C <justin.0903@purestblue.com>
Subject: Re: Using Excel without Macros
Message-Id: <6de7.4a783e54.4a9db@zem>

On 2009-08-03, Henning <ratazong@yahoo.com> wrote:
> Hello!
> I have "inherited" the following code:
>
> use Win32::OLE::Const 'Microsoft Excel';
> $Win32::OLE::Warn = 3;
> $Excel = Win32::OLE->new('Excel.Application', 'Quit');  # get a new
> excel
> $Book1 = $Excel->Workbooks->Open("$abs_file_name");     # open Excel
> file
>
> When opening Excel (line 4) excel tries to start any macros associated
> with the file.
> Is there a way to avoid this (meaning: force excel to open the file
> with disabled macros)?
> And how to do so?

Use Spreadsheet::ReadExcel (or Spreadsheet::WriteExcel) instead? It
depends what you are wanting to do. Excel starting and running macros
automatically is something Excel does, I believe there is somewhere in
Excel to turn this off. If you need to actually start Excel from Perl I
don't think it's possible to do what you want, that's Excel specific,
not Perl.

	Justin.

-- 
Justin C, by the sea.


------------------------------

Date: Wed, 05 Aug 2009 01:23:43 -0700
From: sln@netherlands.com
Subject: UTF -  SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug
Message-Id: <rgdi7511foqosf2096l8sd8brfr32d4q5k@4ax.com>

* warning, tired person, large source code ahead ..
------------------------------------------------------

I don't guess they fixed this, and probably won't.
The failure of Encode to seek past the BOM on subsequent reads
then SEEK_SET 0 isin't something unique to Perl. Python has the same bug.

Here is a way to do it by analysing the BOM in the character semantics
of the encoding layer on the open handle without touching any layers
with binmode. The result is a conversion of a BOM into byte string
that can be compared to the octets of the Unicode family of encodings.
These octets of all the encoded BOM's are then simply read as keys to
a hash.

BOM's are unique in thier encoding's and file position so there can
be no ambiguity when analysing them in any particular read encoding 
character semantic. Whatever the encoding, enough characters are
read (1 at a time, up to the Max_Bom_Width octet count) to convert
its ordinal value into the first Max_Bom_Width bytes of the characters.
The width is gleaned withe each read from a file position differential.
The bytes from each character ordinal are taken off in a loop with
a bitwise & 0xff then a shift >>= 8 on the ordinal until its zero.
0x0 is padded until the byte count equals the width.
Get the next character, then do it again until the total bytes
in the conversion adds up to Max_Bom_Width bytes (octets).

I threw this together from another project where I needed to do
this. I didn't really want to have to do this but was forced to
because of other requirements.

For better or worse, here it is. It works.
I thought I would throw it up here incase anybody else needs this
solution besides me.

-sln

--------------------------
# File: bomtest.pl
#
# CheckUTF() - 
#   UTF-8/16/32 check, Will add a guessed utf-16/32 layer
#   if there is no current encoding() layer.
#   Opened filehandle should be in byte context.
#   This is more for automatically adding a utf layer on
#   unknown files opened in byte mode.
#   Only guesses utf-8/16/32 encodings, BOMb'd or not.
#   Used standalone or optionally with Get_BOM().
#
# Get_BOM() -
#    SEEK_SET workaround for BOMb'd Encoding(utf-) layer Bug
#    and a general way to get the BOM and offset for subsequent
#    seeks.
#    Fixes SEEK_SET if need to itterate reading of BOMb'd file.
#    Once called, subsequent seek calls are positionalble past
#    the BOM from the offset returned. Only needs to be called
#    once and preferably after a call to optional CheckUTF().
#    Does not alter the PerlIO layers in any way.
#    The BOM is determined with character semantics
#    (that was the hard part!), a file opened in byte semantics
#    is just more gravy.
#    This will also detect a mismatch in width between the
#    detected BOM and the current opened encoding layer.
#    Used standalone or optionally with CheckUTF().
#
# test_surrogate_pair() -
#    Just a utility function to print hi/lo surrogate pairs
#    given a unicode test character (> 0x10000).
#
#
# Modify the top variables to test various configurations.
# It basically creates a test file in an encoding, reads it
# back in that encoding. Then reads it back in byte mode.
# The write flag can be set so that just reading occurs.
#
#
#  -sln, 8/4/09
# --------------------------------------------------------


use strict;
use warnings;

binmode (STDOUT, ':utf8');

my $fname = 'dummy.txt';
my $Unichar = "\x{21000}";

my $data  = "\x{1}\x{2}$Unichar";
my $enc   = 'UTF-16'; # blank or UTF-xxLE/BE or :bytes, :utf8, :raw, etc ...
my $enc_string = ($enc =~ /:/) ? "$enc" : (length($enc) ? "encoding($enc)" : '');
my $write_file = 1;
my $test_surr  = 0;

my ($fh, $line);
my ($width,$bom,$offset);

 
# Test surrogate pairs ..
#
  test_surrogate_pair($Unichar) if $test_surr;



# Create a test file with encoding ..
#
  if ($write_file) {
	print "\nWriting sample data as '$enc' to $fname\n",'-'x20,"\n";
	open $fh, ">$enc_string", $fname or die "can't open $fname for write $!";

	print $fh $data;
	close $fh;
  }


# Read test file with encoding ..
#
  print "\nReading $fname as '$enc'\n",'-'x20,"\n";
  open $fh, "<$enc_string", $fname or die "can't open $fname for read $!";


  CheckUTF($fh,'size'=> 100, 'verbose'=> 1);

  ($width,$bom,$offset) = Get_BOM( $fh, 'v' );  # list context, verbose prints
  #            $offset  = Get_BOM( $fh );       # scalar context, no prints

  print "\nGet_BOM returned $bom, $width,  offset = $offset\n";

  for (1 .. 3) {
	print "Pass $_:  ";
	seek($fh, $offset, 0);   # SEEK_SET
	my $fpos = tell($fh);
	my $ok = ($fpos == $offset) ? 'ok' : 'bad';
	$line = <$fh>;
	print "seek $offset/$fpos $ok, data (".length($line).") =   ";
	for (map {ord $_} split //, $line) {
		printf ("%x  ",$_);
	}
	print "\n";
  }
  close $fh;


# Read test file in byte mode ..
#
  print "\nReading $fname as bytes\n",'-'x20,"\n";
  open $fh, '<', $fname or die "can't open $fname for read $!";

  $offset = 0;

  for (1 .. 3) {
	print "Pass $_:  ";
	seek($fh, $offset, 0);   # SEEK_SET
	my $fpos = tell($fh);
	my $ok = ($fpos == $offset) ? 'ok' : 'bad';
	$line = <$fh>;
	print "seek $offset/$fpos $ok, data (".length($line).") =   ";
	for (map {ord $_} split //, $line) {
		printf ("%x  ",$_);
	}
	print "\n";
  }
  close $fh;

exit 0;

##
## End of program  ...
##


sub Get_BOM
{
	my ($fh, $verbose) = @_;
	$verbose = 0 unless (defined $verbose and $verbose);
	use Encode;

	my %bom2enc = (
	   map { encode($_, "\x{feff}") => $_ } qw(
	    UTF-8
	    UTF-16BE
	    UTF-16LE
	    UTF-32BE
	    UTF-32LE
	   )
	);
	my %enc2bom = (
	   reverse(%bom2enc),
	   map { $_ => encode($_, "\x{feff}") } qw(
	    UCS-2
	    iso-10646-1
	    utf8
	   )
	);
	my @bombs = sort { length $b <=> length $a } keys %bom2enc;
	my $MAX_BOM_LENGTH = length $bombs[0];
	my $bomstr = join '|', @bombs;
	my $bom_re = qr/^($bomstr)/o;
	my $layers = "@{[PerlIO::get_layers($fh)]}";

	print "\nGet_BOM() - Layers = $layers\n" if $verbose;

	if (defined($fh) && (ref($fh) eq 'GLOB' || ref(\$fh) eq 'GLOB'))
	{
		# Note - Must SEEK_SET/read twice to
		# mitigate UTF-16/32 BOM seek bug 
		# -----------------------------------
		if (seek($fh, 0, 0))
		{
			my $sample = '';

			read ($fh, $sample, 1);
			seek($fh, 0, 0); # SEEK_SET
			my $fpos = tell($fh);
			my @samparray = ();
			
			## Read in $MAX_BOM_LENGTH 'characters'
			## in whatever encoding the file was opened with.
			## Store each 'character' and bytes/char - based
			## on the differential file position per read.
			## A BOM can be sucessfully read un-alterred in
			## 1,2,4 or 8 byte sequences.
			## -------------------------------------------
			for (1 .. $MAX_BOM_LENGTH)
			{
				my $count = read ($fh, my $buff, 1);
				last if (!defined $count or $count != 1);
				push @samparray, {'char'=> $buff, 'width'=> (tell($fh)-$fpos)};
				$fpos = tell($fh);
			}
			$sample = '';

			print ">> Reading up to $MAX_BOM_LENGTH characters\n" if $verbose;

			for my $aref (@samparray) {
				$sample .= $aref->{'char'};
				printf ("     - char %x , bytes = %d\n", ord($aref->{'char'}), $aref->{'width'}) if $verbose;
			}
			seek($fh, 0, 0); # SEEK_SET, last seek, set to position 0

			if ($verbose) {
				print "     - Read ".length($sample)." characters. Position = $fpos\n";
				print "     - Sample (".length($sample).") = ";
				for (map {ord $_} split //, $sample) {
					printf ("%02x  ", $_);
				}
				print "\n>> Converting to bytes\n";
			}
			## Convert the 'characters' to bytes.
			## Only process $MAX_BOM_LENGTH bytes (octets).
			## We only care about the BOM, which will match
			## the encoding bytes we already have in %bom2enc.
			## -----------------------------------------------

			my ($octets, $offset) = ('',0);
			my $numb_octets = 0;

			for my $aref (@samparray)
			{
				my @ar = ();
				my ($width, $ordchar) = (
					$aref->{'width'},
					ord( $aref->{'char'})
				);
				last if (($numb_octets + $width) > $MAX_BOM_LENGTH);
				$numb_octets += $width;
				
				while ($ordchar > 0) {
					push @ar, $ordchar & 0xff;
					$ordchar >>= 8;
					--$width;
				}
				push (@ar,0) while ($width-- > 0);
				for (reverse @ar) {
					vec ($octets, $offset++, 8) = $_;
				}
			}
			if ($verbose) {
				print "     BOMS avail = ";
				for my $bom (@bombs) {
					print '( ';
					for (map {ord $_} split //, $bom) {
						printf ("%02x",$_);
					}
					print ' ) ';
				}
				print "\n     - Bom octets from sample (".length($octets).") = ";
				for (map {ord $_} split //, $octets) {
					printf ("%02x  ", $_);
				}
				print "\n";
			}
			# end convert

			if (my ($found) = $octets =~ /$bom_re/) {
				my ($offset, $bomstring) = (
					length($found),
					$bom2enc{$found}
				);
				if ($verbose) {
					print ">> Found $bomstring, BOM = ";
					for (map {ord $_} split //, $found) {
						printf ("%02x", $_);
					}
					print "\n";
				}

				my $noendian = $bomstring;
				$noendian =~ s/(?:LE|BE)$//i;
				my $width = ($layers =~ /$noendian/i) ? 'ok' : 'mismatch';
				print "     - Does not match width of current encoding layer\n" if ($verbose and $width ne 'ok');

				return ($width, $bomstring,$offset);
			}
			print ">> BOM Not found\n" if $verbose;
			return ('','',0);
		}
		# can't seek here, fall through
	}
	# There was an error ..
	print "\nGet_BOM() - Layers = $layers\n" if (!$verbose);
	print ">> Invalid filehandle or file is unseekable\n    - Utf BOMs available (max length = $MAX_BOM_LENGTH):\n";
	while (my ($key,$val) = each %enc2bom)
	{
		my $valstring = '';
		for (split //, $val) {
			$valstring .= sprintf ("%x ",ord $_);
		}
		print  "       $key = $bom2enc{$val} = $valstring\n";
		#print "       $key = $bom2enc{$val} = '$val' = '$enc2bom{$key}' = $valstring\n";
	}
	return ('','',-1);
}


sub CheckUTF
{
	# UTF-8/16/32 check
	# Will add a layer if there is no current
	# encoding() layer. So open filehandle should
	# be in byte context.
	# ----------------------------------------
	# Result value:
	# -1 - not filehandle (or undef'ed)
	# 0  - read error or empty file
	# 1  - already have encoding layer (could force change later)
	# 2  - guess_encoder() error message
	# 3  - do nothing unless utf-16/32(be/le)
	# 4  - added a layer

	my ($fh,@args) = @_;
	my %parm = ('verbose'=> 0, 'size'=> 60);
	while (my ($name, $val) = splice (@args, 0, 2)) {
		$name =~ s/^\s+|\s+$//; next if not defined $val;
		if (lc($name) eq 'verbose' || (lc($name) eq 'size' && $val > 4))
			{ $parm{$name} = $val; }
	}
	my ($Res, $Utfmsg, $Layers) = (0, 'UTF Check', '');

	if (!defined($fh) || (ref($fh) ne 'GLOB' && ref(\$fh) ne 'GLOB')) {
		$Utfmsg .= ", not a filehandle ..";
		print "$Utfmsg\n" if $parm{'verbose'};
		return wantarray ? ($Res,$Utfmsg,'') : $Utfmsg;
	}
	$Layers = ':'.join (':', PerlIO::get_layers($fh)).':';

	if ($Layers =~ /:encoding/) {
		$Utfmsg .= ", already have encoding layer";
		$Res = 1;
	} else {
		my ($count, $sample);
		my $utf8layer = $Layers =~ /:utf8/;

		binmode ($fh,":bytes");
		seek ($fh, 0, 0);

		# Try to read a large sample
		if (!defined($count = read ($fh,$sample,$parm{'size'},0))) {
			$Utfmsg .= ". $!"; # read error
		} elsif ($count <= 0) {
			$Utfmsg .= ". File is empty";
		} else {
			seek ($fh, 0, 0);   # SEEK_SET
			use Encode::Guess;

			my $decoder = guess_encoding ($sample); # ascii/utf8/BOMed UTF
			$Res = 2;

			if (ref($decoder)) {
				my $name = $decoder->name;
				$decoder = 'Do nothing';
				$Res = 3;
				$Utfmsg .= ", guess($count read): $name";
				if ($name =~ /UTF.*?(?:16|32)/i) {
					# $name =~ s/(?:LE|BE)$//i;
					$decoder = "Adding layer";
					$Res = 4;
					binmode ($fh, ":encoding($name)");
				}
			}
			$Utfmsg .= ". $decoder" if (defined $decoder); # guess error
		}
		binmode ($fh,":utf8") if $utf8layer;
	}
	$Utfmsg .= ' ..';
	$Layers = "@{[PerlIO::get_layers($fh)]}";
	print "@{[$Utfmsg,$Layers]}\n" if $parm{'verbose'};
	return wantarray ? ($Res,$Utfmsg,$Layers) : "@{[$Utfmsg,$Layers]}";
}


sub test_surrogate_pair
{
	my $test = shift;
	print "\nTesting surrogate pairs\n",'-'x20,"\n";
	if (ord($test) < 0x10000) {
		print "nothing to test < 0x10000\n";
	} else {
		my $hi = (ord($test) - 0x10000) / 0x400 + 0xD800;
		my $lo = (ord($test) - 0x10000) % 0x400 + 0xDC00;
		my $uni = 0x10000 + ($hi - 0xD800) * 0x400 + ($lo - 0xDC00);
		printf "test uni      = %x\n", ord($test);
		printf "hi surrogate  = %x\n", $hi;
		printf "lo surrogate  = %x\n", $lo;
		printf "uni           = %x\n", $uni;
	}
}


__END__


C:\temp>perl bomtest.pl

Writing sample data as 'UTF-16' to dummy.txt
--------------------

Reading dummy.txt as 'UTF-16'
--------------------
UTF Check, already have encoding layer .. unix crlf encoding(UTF-16) utf8

Get_BOM() - Layers = unix crlf encoding(UTF-16) utf8
>> Reading up to 4 characters
     - char feff , bytes = 2
     - char 1 , bytes = 2
     - char 2 , bytes = 2
     - char 21000 , bytes = 4
     - Read 4 characters. Position = 10
     - Sample (4) = feff  01  02  21000
>> Converting to bytes
     BOMS avail = ( fffe0000 ) ( 0000feff ) ( efbbbf ) ( fffe ) ( feff )
     - Bom octets from sample (4) = fe  ff  00  01
>> Found UTF-16BE, BOM = feff

Get_BOM returned UTF-16BE, ok,  offset = 2
Pass 1:  seek 2/2 ok, data (3) =   1  2  21000
Pass 2:  seek 2/2 ok, data (3) =   1  2  21000
Pass 3:  seek 2/2 ok, data (3) =   1  2  21000

Reading dummy.txt as bytes
--------------------
Pass 1:  seek 0/0 ok, data (10) =   fe  ff  0  1  0  2  d8  44  dc  0
Pass 2:  seek 0/0 ok, data (10) =   fe  ff  0  1  0  2  d8  44  dc  0
Pass 3:  seek 0/0 ok, data (10) =   fe  ff  0  1  0  2  d8  44  dc  0

C:\temp>



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2536
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31291] in Perl-Users-Digest

Perl-Users Digest, Issue: 2536 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Aug 5 06:09:43 2009

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 5 06:09:43 2009