[31494] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2753 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Jan 5 18:09:42 2010

Date: Tue, 5 Jan 2010 15:09:09 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 5 Jan 2010     Volume: 11 Number: 2753

Today's topics:
        Determine physical location of IP <user@example.net>
    Re: Determine physical location of IP <spamtotrash@toomuchfiction.com>
    Re: Determine physical location of IP <tadmc@seesig.invalid>
    Re: Determine physical location of IP <rvtol+usenet@xs4all.nl>
    Re: Determine physical location of IP <ben@morrow.me.uk>
    Re: I need to make some cash here <tadmc@seesig.invalid>
        Online Math Coding Contest <kvenkan@gmail.com>
        significant figures <spam.meplease@ntlworld.com>
    Re: unicode newbie, can you help? <OJZGSRPBZVCX@spammotel.com>
    Re: unicode newbie, can you help? <jurgenex@hotmail.com>
    Re: unicode newbie, can you help? sln@netherlands.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 05 Jan 2010 16:57:16 -0500
From: monkeys paw <user@example.net>
Subject: Determine physical location of IP
Message-Id: <IqWdnQP-M6YzKN7WnZ2dnUVZ_ridnZ2d@insightbb.com>

Has anyone coded anything to determine the physical location
of an IP address? Something that takes an IP and tells you where 
geographically the server exists?

e.g.

my $location = ip2location('216.14.104.174');

print $location;

RESULT:
New York, New York

I know some www sites perform this service, i'm interested in
the underlying network code that could accomplish this. A working
example would be fab, a point in the right direction much appreciated.


------------------------------

Date: Tue, 5 Jan 2010 22:35:03 +0000 (UTC)
From: Kevin Collins <spamtotrash@toomuchfiction.com>
Subject: Re: Determine physical location of IP
Message-Id: <slrnhk7fko.18g.spamtotrash@vai.unix-guy.com>

On 2010-01-05, monkeys paw <user@example.net> wrote:
> Has anyone coded anything to determine the physical location
> of an IP address? Something that takes an IP and tells you where 
> geographically the server exists?
>
> e.g.
>
> my $location = ip2location('216.14.104.174');
>
> print $location;
>
> RESULT:
> New York, New York
>
> I know some www sites perform this service, i'm interested in
> the underlying network code that could accomplish this. A working
> example would be fab, a point in the right direction much appreciated.

I knew there was something with "Geo" in the name out there, and a quick CPAN
search points to (among others):

Geo::IP2Location   

Looks like exactly what you want...

Kevin


------------------------------

Date: Tue, 05 Jan 2010 16:37:55 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Determine physical location of IP
Message-Id: <slrnhk7fk2.536.tadmc@tadbox.sbcglobal.net>

monkeys paw <user@example.net> wrote:
> Has anyone coded anything to determine the physical location
> of an IP address? Something that takes an IP and tells you where 
> geographically the server exists?


    http://search.cpan.org/search?query=Geo::IP&mode=all


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"


------------------------------

Date: Tue, 05 Jan 2010 23:39:41 +0100
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: Determine physical location of IP
Message-Id: <4b43bfad$0$22919$e4fe514c@news.xs4all.nl>

monkeys paw wrote:

> Has anyone coded anything to determine the physical location
> of an IP address?

On CPAN there are several, for example Geo::IP.

-- 
Ruud


------------------------------

Date: Tue, 5 Jan 2010 22:49:50 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Determine physical location of IP
Message-Id: <e9sc17-igp.ln1@osiris.mauzo.dyndns.org>


Quoth monkeys paw <user@example.net>:
> Has anyone coded anything to determine the physical location
> of an IP address? Something that takes an IP and tells you where 
> geographically the server exists?
> 
> e.g.
> 
> my $location = ip2location('216.14.104.174');
> 
> print $location;
> 
> RESULT:
> New York, New York
> 
> I know some www sites perform this service, i'm interested in
> the underlying network code that could accomplish this. A working
> example would be fab, a point in the right direction much appreciated.

There's Geo::IP, but for anything more specific than 'country' you have
to pay. In general there's no 'underlying network code' for this, you
just have to look the IP address up in a table. The people who compile
these tables tend to want to get paid.

You could also try WHOIS lookups (there are plenty of modules on the
CPAN) but that information is not likely to be any more accurate. In any
case you should carefully read the Terms of Use of any database you are
using, to be sure you aren't exceeding any usage limits.

Ben



------------------------------

Date: Tue, 05 Jan 2010 12:04:13 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: I need to make some cash here
Message-Id: <slrnhk6vis.3ne.tadmc@tadbox.sbcglobal.net>

Ralph Malph <ralph.malph@altavista.com> wrote:

> Ignore Uri. While he is good at posting advice on
> basic perl for beginner to intermediate
> programmers


While you are NOT good at posting advice on Perl.

You've never helped even a single person here with a Perl problem...

Should we pay attention to the person who has posted 3 times, never on
the topic of Perl, or the person who has posted thousands of times
over more than a decade?

Ignore Ralph Malph (unless you prefer unhappy days).


> Uri's claims of "placing programmers" and
> other such claims of professional success are all
> lies. He has placed very few if any people.


He has placed 4 members of the Perl community that I am aware of.

Including me.

Twice.


> Most of his
> income is from


You have access to Uri's bank accounts?

Or are you just making shit up (again)?


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"


------------------------------

Date: Tue, 5 Jan 2010 11:42:13 -0800 (PST)
From: kannan <kvenkan@gmail.com>
Subject: Online Math Coding Contest
Message-Id: <698774f4-df27-47d2-a832-834119284331@a21g2000yqc.googlegroups.com>

Hi , we gladly invite you to take part in Athena - the Online Math
Coding Contest of Kurukshetra 2010 , the International Techo-
Management Fest organised by College  Of Engineering Guindy , India
organised under the patronage of UNESCO .

Here's your chance to lock horns against the best minds across the
globe and showcase your algorithmic prowess and mathematical acumen !!
Participants from 20+  countries already registered!!

catch the action at  www.athena.kurukshetra.org.in       Exciting
prizes to be won !!

Dates :
Practice contest  : 6th January 5.00 P.M Indian Standard Time
Main  contest      : 8th January - 15th January

awaiting your presence ,
Team Athena


------------------------------

Date: Tue, 05 Jan 2010 21:38:41 GMT
From: dan <spam.meplease@ntlworld.com>
Subject: significant figures
Message-Id: <BjO0n.1326$Mq5.176@newsfe29.ams2>

Whilst trying to create something that would parse a number into one with 
an appropriate number of significant figures, I accidentally wrote this:

sub sigfig {
  my ($sigfigs, $number) = @_;

  my $divisor = 10**(length(int $number) - $sigfigs);
  $number /= $divisor;
  $number = sprintf "%1.0f", $number;
  $number *= $divisor;

  return $number
}

which seems to work for positive numbers not in scientific notation.

As my friends and colleagues consider me big-headed, I would appreciate 
it if somebody could point out any inadequacies,foibles and associated 
notwithstandings of the above code, bringing me down a much needed peg or 
to.

dan


------------------------------

Date: Tue, 05 Jan 2010 18:05:33 +0100
From: "Jochen Lehmeier" <OJZGSRPBZVCX@spammotel.com>
Subject: Re: unicode newbie, can you help?
Message-Id: <op.u52kfjucmk9oye@frodo>

On Tue, 05 Jan 2010 16:58:56 +0100, alexxx.magni@gmail.com  
<alexxx.magni@gmail.com> wrote:

> wow, so many things I didnt know... there isnt any utility able to
> detect the coding of a file ?

There is Encode::Guess, but, again, you'd have to tell perl to use that.  
It is not possible to reliably detect the encoding of a file, so Perl does  
not try to do so by default. In other words, it does not change the bytes  
it reads or writes unless you tell it to, generally.

Do you know the encoding of your input files? If it is utf8, first try to  
get your code to work with the minimum amount of additions (for example,  
those I gave you); you can fiddle around with Encode::Guess afterwards. If  
it is not utf8... well... substitute whatever encoding it is. If you don't  
know the encoding, it will become complicated.

> I tried your suggestion, but discovered I do not even have Encoding.pm
> installed - I need to do a bit of homework I guess...

Sorry, it's "use Encode" instead of "use Encoding". I assume you are on a  
unix/linux machine; "perldoc Encode" should bring up its documentation.


------------------------------

Date: Tue, 05 Jan 2010 09:44:19 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: unicode newbie, can you help?
Message-Id: <mit6k5por18dh3k1qudlkjeh3af8qhsc3l@4ax.com>

"alexxx.magni@gmail.com" <alexxx.magni@gmail.com> wrote:
> there isnt any utility able to
>detect the coding of a file ?

Impossible. 
At the very best you can make an educated guess, e.g. based on
- specific characteristics of the file format (if it has a byte order
mark then probably it is UTF-16 or UCS-2; this lead double byte is
unlikly to occur in any other encoding as the lead character(s) except
in Windows UTF-8), 
- statistical analysis (if I assume this is Windows-1251 then I get a
character distribution that is consistent with the Serbian language)
- text analysis (if I assume this file as encoded in ISO 639-3 then most
of the resulting words can be found in a Farsi dictionary)

Obviously none of these methods are conclusive.

jue


------------------------------

Date: Tue, 05 Jan 2010 10:45:59 -0800
From: sln@netherlands.com
Subject: Re: unicode newbie, can you help?
Message-Id: <q027k5dbim0pab46die07cv4psh7bv0ha0@4ax.com>

On Tue, 5 Jan 2010 06:40:24 -0800 (PST), "alexxx.magni@gmail.com" <alexxx.magni@gmail.com> wrote:

>sorry if it's a naive question, but I never needed up to now to deal
>with anything beyond a-zA-Z0-9....
>
>now I have a long text with standard ascii and, sometimes, cyrillic
>text - and I want to filter it out (the cyrillic, of course).
>I tried with
>
>perl -wne 'foreach($_){if (/(\p{InCyrillic})/){print"record $_ matches
>>>>$1<<<\n"}else{print"ok\n"}}' test.htm
>
>but it fails to recognize it - what am I doing wrong?
>
>
>thanks for any help...
>
>
>alessandro

Try this on your file and post the output.
Assign $fname = 'your file' and run it.

-sln

# File: bomtest.pl
#
# CheckUTF() - 
#   UTF-8/16/32 check, Will add a guessed utf-16/32 layer
#   if there is no current encoding() layer.
#   Opened filehandle should be in byte context.
#   This is more for automatically adding a utf layer on
#   unknown files opened in byte mode.
#   Guesses utf-8/16/32 encodings, BOMb'd or not.
#   Uses Encode::Guess module. Will use standard guess settings (see docs).
#   Used standalone or optionally with Get_BOM().
#
# Get_BOM() -
#    SEEK_SET workaround for BOMb'd Encoding(utf-) layer Bug
#    and a general way to get the BOM and offset for subsequent
#    seeks.
#    The file can be opened using any encoding. The handle passed
#    is duped, that handle set to ':raw'. No character semantics
#    are used when analysing the BOM.
#    This will also detect a mismatch in width between the
#    detected BOM and the current opened encoding layer.
#    Used standalone or optionally with CheckUTF().
#    CheckUTF() has an option to get the bom on the callers
#    behalf, combining steps.
#
# test_surrogate_pair() -
#    Just a utility function to print hi/lo surrogate pairs
#    given a unicode test character (> 0x10000).
#
#
# Modify the top variables to test various configurations.
# Includes:
#    $fname            File name for output and input
#    $write_file       Flag 0/1 to create a test file
#                      (0 to not write, just read file $fname)
#    $data             Data string for writing test file, can be unicode
#    $enc              Encoding to read/write file $fname
#    $test_surr        Stand-alone surrogate pair test
#
#  -sln, 8/13/09
# --------------------------------------------------------

use strict;
use warnings;

binmode (STDOUT, ':utf8');

{
#  my $fname = 'dummy.txt';

  my $fname = 'c:\temp\XML\CollectedData_1804.xml';

  my $Unichar = "\x{21000}";

  my $data  = "\x{1}\x{2}$Unichar";
  my $enc   = ''; # blank or UTF-xxLE/BE or :bytes, :utf8, :raw, etc ...
  my $enc_string = ($enc =~ /:/) ? "$enc" : (length($enc) ? "encoding($enc)" : '');
  my $write_file = 0;
  unless ($write_file eq 'dummy.txt') { $write_file = 0;  }
  my $test_surr  = 0;

  my ($fh, $line);
  my ($width,$encfound,$bom,$offset);
 
# Test surrogate pairs ..
##
  test_surrogate_pair($Unichar) if $test_surr;


# Create a test file with encoding ..
##
  if ($write_file) {
	print "\nWriting sample data as '$enc' to $fname\n",'-'x20,"\n";
	open $fh, ">$enc_string", $fname or die "can't open $fname for write $!";

	print $fh $data;

	close $fh;
  } else {
	print "\nNOT WRITING to $fname !!\n",'-'x20,"\n";
  }

# Read test file with encoding ..
##
  print "\nReading $fname as '$enc'\n",'-'x20,"\n";
  open $fh, "<$enc_string", $fname or die "can't open $fname for read $!";

  $offset = CheckUTF($fh, 'size'=> 100, 'verbose'=> 1, 'getbom_v'=> 1);
  print "\n";

  for (1 .. 3) {
	print "Pass $_:  ";
	seek($fh, $offset, 0);   # SEEK_SET
	my $fpos = tell($fh);
	my $ok = ($fpos == $offset) ? 'ok' : 'bad';
	$line = <$fh>;
	print "seek $offset/$fpos $ok, data (".length($line).") = ".ordsplit($line)."\n";
	print "$line\n";
  }
  close $fh;

# Read test file with encoding into scalar, open handle to scalar as :utf8 ..
##
  print "\nBuffering $fname as '$enc', open/read buffer as ':utf8'\n",'-'x20,"\n";
  open $fh, "<$enc_string", $fname or die "can't open $fname for read $!";

  $offset = CheckUTF($fh, 'size'=> 100, 'verbose'=> 1, 'getbom_v'=> 1);
  seek($fh, $offset, 0);   # SEEK_SET

  my $membuf = <$fh>;
  close $fh;

  open $fh, "<:utf8", \$membuf or die "can't open \$membuf for read $!";

  $offset = 0;
  print "\n";

  for (1 .. 3) {
	print "Pass $_:  ";
	seek($fh, $offset, 0);   # SEEK_SET
	my $fpos = tell($fh);
	my $ok = ($fpos == $offset) ? 'ok' : 'bad';
	$line = <$fh>;
	print "seek $offset/$fpos $ok, data (".length($line).") = ".ordsplit($line)."\n";
  }
  close $fh;

# Read test file in byte mode ..
##
  print "\nReading $fname as bytes\n",'-'x20,"\n";
  open $fh, '<', $fname or die "can't open $fname for read $!";

  $offset = 0;

  for (1 .. 3) {
	print "Pass $_:  ";
	seek($fh, $offset, 0);   # SEEK_SET
	my $fpos = tell($fh);
	my $ok = ($fpos == $offset) ? 'ok' : 'bad';
	$line = <$fh>;
	print "seek $offset/$fpos $ok, data (".length($line).") =   ";
	for (map {ord $_} split //, $line) {
		printf ("%x  ",$_);
	}
	print "\n";
  }
  close $fh;

}
exit 0;

##
## End of program  ...
##

sub ordsplit
{ 
	my $string = shift;
	my $buf = '';
	for (map {ord $_} split //, $string) {
		$buf.= sprintf ("%x  ",$_);
	}
	return $buf;
}

sub Get_BOM
{
	my ($fh_orig, $verbose) = @_;
	$verbose = 0 unless (defined $verbose and lc($verbose) eq 'v');
	use Encode;

	my %bom2enc = (
	   map { encode($_, "\x{feff}") => $_ } qw(
	    UTF-8
	    UTF-16BE
	    UTF-16LE
	    UTF-32BE
	    UTF-32LE
	   )
	);
	my %enc2bom = (
	   reverse(%bom2enc),
	   map { $_ => encode($_, "\x{feff}") } qw(
	    UCS-2
	    iso-10646-1
	    utf8
	   )
	);
	my @bombs = sort { length $b <=> length $a } keys %bom2enc;
	my $MAX_BOM_LENGTH = length $bombs[0];
	my $bomstr = join '|', @bombs;
	my $bom_re = qr/^($bomstr)/o;
	my @arlays = PerlIO::get_layers($fh_orig);
	my $layers = "@arlays";

	print "Get_BOM() - Layers = $layers\n" if $verbose;

	if (defined($fh_orig) && (ref($fh_orig) eq 'GLOB' || ref(\$fh_orig) eq 'GLOB'))
	{
		# Dup the passed in handle, binmode it to raw
		# We will ONLY do byte semantics !!
		# --------------------------------------------------
		open my $fh_dup, "<&", $fh_orig or die "can't dup filehandle: $!";
		binmode ($fh_dup, ":raw");

		if (seek($fh_dup, 0, 0))     # SEEK_SET
		{
			my $sample = '';

			## Read in $MAX_BOM_LENGTH characters.
			## As a check, store character/bytes read based
			## on the differential file position per read.
			## Since our dup handle is in byte mode, the bytes
			## per character will always be 1.
			## This is a hold over when there could be encoded
			## characters and is not necessary.
			## -----------------------------------------------
			my $fpos = tell($fh_dup);
			my @samparray = ();
			print " * Reading up to $MAX_BOM_LENGTH characters\n" if $verbose;

			for (1 .. $MAX_BOM_LENGTH)
			{
				my $count = read ($fh_dup, my $buff, 1);
				last if (!defined $count or $count != 1);
				push @samparray, {'char'=> $buff, 'width'=> (tell($fh_dup)-$fpos)};
				$fpos = tell($fh_dup);
			}
			$sample = '';
			for my $href (@samparray) {
				$sample .= $href->{'char'};
				printf ("     - char %x , bytes = %d\n", ord($href->{'char'}), $href->{'width'}) if $verbose;
			}
			if ($verbose) {
				print "     - Read ".length($sample)." characters. Position = $fpos\n";
				print "     - Sample (".length($sample).") = ";
				for (map {ord $_} split //, $sample) {
					printf ("%02x  ", $_);
				}
				print "\n * Analysing bytes\n";
			}
			## Convert the 'characters' to bytes.
			## Only process $MAX_BOM_LENGTH bytes.
			## We only care about the BOM, which will match
			## the encoding bytes we already have in %bom2enc.
			## -----------------------------------------------
			my ($octets, $numb_octets, $vecoffset) = ('',0,0);

			for my $href (@samparray)
			{
				my @ar = ();
				my ($charwidth, $ordchar) = (
					$href->{'width'},
					ord( $href->{'char'})
				);
				last if (($numb_octets + $charwidth) > $MAX_BOM_LENGTH);
				$numb_octets += $charwidth;
				
				while ($ordchar > 0) {
					push @ar, $ordchar & 0xff;
					$ordchar >>= 8;
					--$charwidth;
				}
				push (@ar,0) while ($charwidth-- > 0);
				for (reverse @ar) {
					vec ($octets, $vecoffset++, 8) = $_;
				}
			}
			if ($verbose) {
				print "     BOMS avail = ";
				for my $bomavail (@bombs) {
					print '( ';
					for (map {ord $_} split //, $bomavail) {
						printf ("%02x",$_);
					}
					print ' ) ';
				}
				print "\n     - Bom bytes from sample (".length($octets).") = ";
				for (map {ord $_} split //, $octets) {
					printf ("%02x  ", $_);
				}
				print "\n";
			}
			# end convert

			my ($bomenc, $bom, $bomwidth, $bomoffset) = ('','','',0);
			if (my ($found) = $octets =~ /$bom_re/) {

				($bomenc, $bomoffset) = (
				    $bom2enc{$found},
				    length($found)
				);
				for (map {ord $_} split //, $found) {
					$bom .= sprintf ("%02x", $_);
				}
				print " * Found $bomenc, BOM = $bom\n" if ($verbose);

				my $noendian = $bomenc;
				$noendian =~ s/(?:LE|BE)$//i;
				$bomwidth = ($layers =~ /$noendian/i) ? 'ok' : 'mismatch';
				print "     - Does not match width of current encoding layer\n" if ($verbose and $bomwidth ne 'ok');
			} else {
				print " * BOM Not found\n" if $verbose;
			}

			close ($fh_dup);

			# Original handle -> Must SEEK_SET/read twice to
			# mitigate UTF-16/32 BOM seek bug 
			# ----------------------------------------------
			seek($fh_orig, 0, 0);        # SEEK_SET
			read ($fh_orig, $sample, 1); # read
			seek($fh_orig, 0, 0);        # SEEK_SET last. Caller to seek past BOM.

			return ($bomenc, $bom, $bomwidth, $bomoffset);
		}
		close ($fh_dup);

		# seek failed, fall through
	}
	# There was an error ..
	if ($verbose) {
		print " * Invalid filehandle or file is unseekable\n    - Utf BOMs available (max length = $MAX_BOM_LENGTH):\n";
		while (my ($key,$val) = each %enc2bom)
		{
			my $valstring = '';
			for (split //, $val) {
				$valstring .= sprintf ("%x ",ord $_);
			}
			print  "       $key = $bom2enc{$val} = $valstring\n";
			#print "       $key = $bom2enc{$val} = '$val' = '$enc2bom{$key}' = $valstring\n";
		}
	}
	return ('','','',-1);
}

sub CheckUTF
{
	# UTF-8/16/32 check
	# Will add a layer if there is no current
	# encoding() layer. So open filehandle should
	# be in byte context.
	# ----------------------------------------
	# Result value:
	# -1 - not filehandle (or undef'ed)
	# 0  - read error or empty file
	# 1  - already have encoding layer (could force change later)
	# 2  - guess_encoder() error message
	# 3  - do nothing unless utf-16/32(be/le)
	# 4  - added a layer

	my ($fh_check,@args) = @_;
	my %parm = ('verbose'=>0, 'size'=>60, 'getbom' =>0, 'bomopts'=>'');
	while (my ($name, $val) = splice (@args, 0, 2))
	{
		$name =~ s/^\s+|\s+$//;
		$name = lc $name;
		next if not defined $val;

		if ( $name =~ /^getbom(?:_(\w*))?/ and $val) {
			$parm{'gbom'} = 1;
			$parm{'bomopts'} = $1 if defined($1);
		}
		elsif ($name eq 'verbose' || $name eq 'size' && $val > 4) {
			$parm{$name} = $val;
		}
	}
	#    0       1         2       3       4       5          6
	my ($Res, $Utfmsg, $layers, $bomenc, $bom, $bomwidth, $bomoffset) = (
	       0, 'UTF Check',  '',      '',   '',        '',       0);
	$layers = ':'.join (':', PerlIO::get_layers($fh_check)).':';

	if (!defined($fh_check) || (ref($fh_check) ne 'GLOB' && ref(\$fh_check) ne 'GLOB')) {
		$Utfmsg .= ", not a filehandle";
		print "$Utfmsg\n" if $parm{'verbose'};
		$Res = -1;
	}
	elsif ($layers =~ /:encoding/) {
		$Utfmsg .= ", already have encoding layer";
		$Res = 1;
	}
	else {
		my ($count, $sample);
		my $utf8layer = $layers =~ /:utf8/;

		binmode ($fh_check,":bytes");
		seek ($fh_check, 0, 0);         # SEEK_SET

		# Try to read a large sample
		if (!defined($count = read ($fh_check,$sample,$parm{'size'},0))) {
			$Utfmsg .= ". $!"; # read error
		} elsif ($count <= 0) {
			$Utfmsg .= ". File is empty";
		} else {
			seek ($fh_check, 0, 0);   # SEEK_SET
			use Encode::Guess;

			my $decoder = guess_encoding ($sample); # ascii/utf8/BOMed UTF
			$Res = 2;

			if (ref($decoder)) {
				my $name = $decoder->name;
				$decoder = 'Do nothing';
				$Res = 3;
				$Utfmsg .= ", guess($count): $name";
				if ($name =~ /UTF.*?(?:16|32)/i) {
					# $name =~ s/(?:LE|BE)$//i;
					$decoder = "Adding layer";
					$Res = 4;
					binmode ($fh_check, ":encoding($name)");
				}
			}
			$Utfmsg .= ". $decoder" if (defined $decoder); # guess error
		}
		binmode ($fh_check,":utf8") if $utf8layer;
	}
	$Utfmsg .= ' ..';
	$layers = "@{[PerlIO::get_layers($fh_check)]}";

	print "@{[$Utfmsg,$layers]}\n" if $parm{'verbose'};

	$bomoffset = $Res if $Res == -1;
	if ($Res != -1 && $parm{'gbom'}) {
		($bomenc, $bom, $bomwidth, $bomoffset) = Get_BOM ($fh_check, $parm{'bomopts'});
		$Utfmsg .= " bom($bomoffset) = $bom ..";
		print "Get_BOM returned $bomenc, $bom, $bomwidth, $bomoffset\n" if $parm{'verbose'};
	}
	return ($Res, $Utfmsg, $layers, $bomenc, $bom, $bomwidth, $bomoffset);
}

sub test_surrogate_pair
{
	my $test = shift;
	print "\nTesting surrogate pairs\n",'-'x20,"\n";
	if (ord($test) < 0x10000) {
		print "nothing to test < 0x10000\n";
	} else {
		my $hi = (ord($test) - 0x10000) / 0x400 + 0xD800;
		my $lo = (ord($test) - 0x10000) % 0x400 + 0xDC00;
		my $uni = 0x10000 + ($hi - 0xD800) * 0x400 + ($lo - 0xDC00);
		printf "test uni      = %x\n", ord($test);
		printf "hi surrogate  = %x\n", $hi;
		printf "lo surrogate  = %x\n", $lo;
		printf "uni           = %x\n", $uni;
	}
}
__END__


c:\temp\new_v3c>perl bomtest.pl

NOT WRITING to c:\temp\XML\CollectedData_1804.xml !!
--------------------

Reading c:\temp\XML\CollectedData_1804.xml as ''
--------------------
UTF Check, guess(100): UTF-16. Adding layer .. unix crlf encoding(UTF-16) utf8
Get_BOM() - Layers = unix crlf encoding(UTF-16) utf8
 * Reading up to 4 characters
     - char ff , bytes = 1
     - char fe , bytes = 1
     - char 3c , bytes = 1
     - char 0 , bytes = 1
     - Read 4 characters. Position = 4
     - Sample (4) = ff  fe  3c  00
 * Analysing bytes
     BOMS avail = ( fffe0000 ) ( 0000feff ) ( efbbbf ) ( fffe ) ( feff )
     - Bom bytes from sample (4) = ff  fe  3c  00
 * Found UTF-16LE, BOM = fffe
Get_BOM returned UTF-16LE, fffe, ok, 2

Pass 1:  seek 2/2 ok, data (42) = 3c  3f  78  6d  6c  20  76  65  72  73  69  6f
  6e  3d  22  31  2e  30  22  20  65  6e  63  6f  64  69  6e  67  3d  22  75  6e
  69  63  6f  64  65  22  3f  3e  d  a
<?xml version="1.0" encoding="unicode"?>

Pass 2:  seek 2/2 ok, data (42) = 3c  3f  78  6d  6c  20  76  65  72  73  69  6f
  6e  3d  22  31  2e  30  22  20  65  6e  63  6f  64  69  6e  67  3d  22  75  6e
  69  63  6f  64  65  22  3f  3e  d  a
<?xml version="1.0" encoding="unicode"?>

Pass 3:  seek 2/2 ok, data (42) = 3c  3f  78  6d  6c  20  76  65  72  73  69  6f
  6e  3d  22  31  2e  30  22  20  65  6e  63  6f  64  69  6e  67  3d  22  75  6e
  69  63  6f  64  65  22  3f  3e  d  a
<?xml version="1.0" encoding="unicode"?>


Buffering c:\temp\XML\CollectedData_1804.xml as '', open/read buffer as ':utf8'
--------------------
UTF Check, guess(100): UTF-16. Adding layer .. unix crlf encoding(UTF-16) utf8
Get_BOM() - Layers = unix crlf encoding(UTF-16) utf8
 * Reading up to 4 characters
     - char ff , bytes = 1
     - char fe , bytes = 1
     - char 3c , bytes = 1
     - char 0 , bytes = 1
     - Read 4 characters. Position = 4
     - Sample (4) = ff  fe  3c  00
 * Analysing bytes
     BOMS avail = ( fffe0000 ) ( 0000feff ) ( efbbbf ) ( fffe ) ( feff )
     - Bom bytes from sample (4) = ff  fe  3c  00
 * Found UTF-16LE, BOM = fffe
Get_BOM returned UTF-16LE, fffe, ok, 2

Pass 1:  seek 0/0 ok, data (42) = 3c  3f  78  6d  6c  20  76  65  72  73  69  6f
  6e  3d  22  31  2e  30  22  20  65  6e  63  6f  64  69  6e  67  3d  22  75  6e
  69  63  6f  64  65  22  3f  3e  d  a
Pass 2:  seek 0/0 ok, data (42) = 3c  3f  78  6d  6c  20  76  65  72  73  69  6f
  6e  3d  22  31  2e  30  22  20  65  6e  63  6f  64  69  6e  67  3d  22  75  6e
  69  63  6f  64  65  22  3f  3e  d  a
Pass 3:  seek 0/0 ok, data (42) = 3c  3f  78  6d  6c  20  76  65  72  73  69  6f
  6e  3d  22  31  2e  30  22  20  65  6e  63  6f  64  69  6e  67  3d  22  75  6e
  69  63  6f  64  65  22  3f  3e  d  a

Reading c:\temp\XML\CollectedData_1804.xml as bytes
--------------------
Pass 1:  seek 0/0 ok, data (85) =   ff  fe  3c  0  3f  0  78  0  6d  0  6c  0  2
0  0  76  0  65  0  72  0  73  0  69  0  6f  0  6e  0  3d  0  22  0  31  0  2e
0  30  0  22  0  20  0  65  0  6e  0  63  0  6f  0  64  0  69  0  6e  0  67  0
3d  0  22  0  75  0  6e  0  69  0  63  0  6f  0  64  0  65  0  22  0  3f  0  3e
 0  d  0  a
Pass 2:  seek 0/0 ok, data (85) =   ff  fe  3c  0  3f  0  78  0  6d  0  6c  0  2
0  0  76  0  65  0  72  0  73  0  69  0  6f  0  6e  0  3d  0  22  0  31  0  2e
0  30  0  22  0  20  0  65  0  6e  0  63  0  6f  0  64  0  69  0  6e  0  67  0
3d  0  22  0  75  0  6e  0  69  0  63  0  6f  0  64  0  65  0  22  0  3f  0  3e
 0  d  0  a
Pass 3:  seek 0/0 ok, data (85) =   ff  fe  3c  0  3f  0  78  0  6d  0  6c  0  2
0  0  76  0  65  0  72  0  73  0  69  0  6f  0  6e  0  3d  0  22  0  31  0  2e
0  30  0  22  0  20  0  65  0  6e  0  63  0  6f  0  64  0  69  0  6e  0  67  0
3d  0  22  0  75  0  6e  0  69  0  63  0  6f  0  64  0  65  0  22  0  3f  0  3e
 0  d  0  a

c:\temp\new_v3c>



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2753
***************************************


home help back first fref pref prev next nref lref last post