[28067] in Perl-Users-Digest
Perl-Users Digest, Issue: 9431 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jul 6 14:15:18 2006
Date: Thu, 6 Jul 2006 11:15:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 6 Jul 2006 Volume: 10 Number: 9431
Today's topics:
Re: smarter zipcode search algorithm anno4000@radom.zrz.tu-berlin.de
Re: smarter zipcode search algorithm gmillerd@gmail.com
Re: smarter zipcode search algorithm gmillerd@gmail.com
Re: smarter zipcode search algorithm <jgibson@mail.arc.nasa.gov>
Re: smarter zipcode search algorithm xhoster@gmail.com
Sticky form <franzl.wisseworst@mailinator.net>
Re: Symbol Table and References <bol@adv.magwien.gv.at>
Re: Using reference for performance gain? <howachen@gmail.com>
Re: Using reference for performance gain? <uri@stemsystems.com>
Re: Win32: File Manipulation <bart.lateur@pandora.be>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 6 Jul 2006 07:34:38 GMT
From: anno4000@radom.zrz.tu-berlin.de
Subject: Re: smarter zipcode search algorithm
Message-Id: <4h3soeF1mk0b4U1@news.dfncis.de>
<premgrps@gmail.com> wrote in comp.lang.perl.misc:
> Hi,
> I have a database with two tables
> a) A table of 2 million records with city, zip and associated
> information (say XYZ) and
> b) zipcode latitude, longitude table having >40,000 records/zip codes
>
> PROBLEM:
> I need to find the the XYZs within the the range of a certain zipcode.
> This zipcode and radial range in miles is entered by the user (web
> interface).
[...]
Your question has nothing to do with Perl (nor with zip codes, btw.).
It is a problem in computational geometry, and that is where I would
start looking for algorithms.
Anno
------------------------------
Date: 6 Jul 2006 02:46:06 -0700
From: gmillerd@gmail.com
Subject: Re: smarter zipcode search algorithm
Message-Id: <1152179166.353107.23980@m73g2000cwd.googlegroups.com>
i wrote this many years ago for a singles website i made ... you may
find that just making a sql table with zipcode_to,zipcode_from,distance
is where you want to be. making a table for three digits and for five
digit zipcodes is a plus as well.
also you can do the lookups now with google map api i just noticed, way
to get banned with your 29000 zipcode entries :)
good luck
use strict;
use Math::Trig;
##
## racine wisconsin
##
my $a_lat = 42.13404;
my $a_long = -87.934097;
##
## wheeling illinois
##
my $b_lat = 42.716112;
my $b_long = -87.823329;
my $b_lat = 41.37372;
my $b_long = -93.376719;
######################################################################
print
gc(
$a_lat,
$a_long,
$b_lat,
$b_long),
"\n";
######################################################################
sub gc
{
my ($a_lat, $a_long, $b_lat, $b_long) = @_;
##
## convert degrees to radians
##
$a_long = deg2rad($a_long);
$b_long = deg2rad($b_long);
$a_lat = deg2rad($a_lat);
$b_lat = deg2rad($b_lat);
##
## meters in a mile
##
my $meterspermile = 1609.344;
##
## earth's radius
##
my $earth_radius = 6367000;
my $earth_radius = 6378.14 * 1000;
my $d = acos(sin($a_lat) * sin($b_lat) + cos($a_lat) * cos($b_lat)
* cos($a_long - $b_long));
$d = ($earth_radius * $d) / $meterspermile;
return($d);
}
anno4000@radom.zrz.tu-berlin.de wrote:
> <premgrps@gmail.com> wrote in comp.lang.perl.misc:
> > Hi,
> > I have a database with two tables
> > a) A table of 2 million records with city, zip and associated
> > information (say XYZ) and
> > b) zipcode latitude, longitude table having >40,000 records/zip codes
> >
> > PROBLEM:
> > I need to find the the XYZs within the the range of a certain zipcode.
> > This zipcode and radial range in miles is entered by the user (web
> > interface).
>
> [...]
>
> Your question has nothing to do with Perl (nor with zip codes, btw.).
> It is a problem in computational geometry, and that is where I would
> start looking for algorithms.
>
> Anno
------------------------------
Date: 6 Jul 2006 02:46:12 -0700
From: gmillerd@gmail.com
Subject: Re: smarter zipcode search algorithm
Message-Id: <1152179172.688422.26520@j8g2000cwa.googlegroups.com>
i wrote this many years ago for a singles website i made ... you may
find that just making a sql table with zipcode_to,zipcode_from,distance
is where you want to be. making a table for three digits and for five
digit zipcodes is a plus as well.
also you can do the lookups now with google map api i just noticed, way
to get banned with your 29000 zipcode entries :)
good luck
use strict;
use Math::Trig;
##
## racine wisconsin
##
my $a_lat = 42.13404;
my $a_long = -87.934097;
##
## wheeling illinois
##
my $b_lat = 42.716112;
my $b_long = -87.823329;
my $b_lat = 41.37372;
my $b_long = -93.376719;
######################################################################
print
gc(
$a_lat,
$a_long,
$b_lat,
$b_long),
"\n";
######################################################################
sub gc
{
my ($a_lat, $a_long, $b_lat, $b_long) = @_;
##
## convert degrees to radians
##
$a_long = deg2rad($a_long);
$b_long = deg2rad($b_long);
$a_lat = deg2rad($a_lat);
$b_lat = deg2rad($b_lat);
##
## meters in a mile
##
my $meterspermile = 1609.344;
##
## earth's radius
##
my $earth_radius = 6367000;
my $earth_radius = 6378.14 * 1000;
my $d = acos(sin($a_lat) * sin($b_lat) + cos($a_lat) * cos($b_lat)
* cos($a_long - $b_long));
$d = ($earth_radius * $d) / $meterspermile;
return($d);
}
anno4000@radom.zrz.tu-berlin.de wrote:
> <premgrps@gmail.com> wrote in comp.lang.perl.misc:
> > Hi,
> > I have a database with two tables
> > a) A table of 2 million records with city, zip and associated
> > information (say XYZ) and
> > b) zipcode latitude, longitude table having >40,000 records/zip codes
> >
> > PROBLEM:
> > I need to find the the XYZs within the the range of a certain zipcode.
> > This zipcode and radial range in miles is entered by the user (web
> > interface).
>
> [...]
>
> Your question has nothing to do with Perl (nor with zip codes, btw.).
> It is a problem in computational geometry, and that is where I would
> start looking for algorithms.
>
> Anno
------------------------------
Date: Thu, 06 Jul 2006 09:50:03 -0700
From: Jim Gibson <jgibson@mail.arc.nasa.gov>
Subject: Re: smarter zipcode search algorithm
Message-Id: <060720060950030541%jgibson@mail.arc.nasa.gov>
In article <1152179172.688422.26520@j8g2000cwa.googlegroups.com>,
<gmillerd@gmail.com> wrote:
> i wrote this many years ago for a singles website i made ... you may
> find that just making a sql table with zipcode_to,zipcode_from,distance
> is where you want to be. making a table for three digits and for five
> digit zipcodes is a plus as well.
Your code below uses a spherical earth approximation and is not very
accurate for long distances. While probably not needed for this
application, you should be aware that there are modules on CPAN that
provide better accuracy: Geo::Distance and Geo::Ellipsoid (which I
wrote -- the module, not the algorithm).
> use strict;
use warnings;
> use Math::Trig;
>
> ##
> ## racine wisconsin
> ##
> my $a_lat = 42.13404;
> my $a_long = -87.934097;
>
> ##
> ## wheeling illinois
> ##
> my $b_lat = 42.716112;
> my $b_long = -87.823329;
>
> my $b_lat = 41.37372;
> my $b_long = -93.376719;
Which is it? You will get warnings '"my" variable $b_lat masks earlier
declaration in same scope at ...' if you have 'use warnings;' in your
program.
>
> ######################################################################
>
> print
> gc(
> $a_lat,
> $a_long,
> $b_lat,
> $b_long),
> "\n";
>
> ######################################################################
>
> sub gc
> {
> my ($a_lat, $a_long, $b_lat, $b_long) = @_;
>
> ##
> ## convert degrees to radians
> ##
> $a_long = deg2rad($a_long);
> $b_long = deg2rad($b_long);
> $a_lat = deg2rad($a_lat);
> $b_lat = deg2rad($b_lat);
>
> ##
> ## meters in a mile
> ##
> my $meterspermile = 1609.344;
>
> ##
> ## earth's radius
> ##
> my $earth_radius = 6367000;
> my $earth_radius = 6378.14 * 1000;
Another duplicate declaration.
>
> my $d = acos(sin($a_lat) * sin($b_lat) + cos($a_lat) * cos($b_lat)
> * cos($a_long - $b_long));
>
> $d = ($earth_radius * $d) / $meterspermile;
>
> return($d);
> }
Here is a cleaned-up version of your code with a comparison with the
more accurate results that Geo::Ellipsoid can provide (with distances
in meters instead of miles):
#!/usr/local/bin/perl
#
use strict;
use warnings;
use Geo::Ellipsoid;
use Math::Trig;
my $geo = Geo::Ellipsoid->new(units=>'degrees');
my $a_lat = 42.13404;
my $a_long = -87.934097;
my $b_lat = 42.716112;
my $b_long = -87.823329;
my $gc_d = gc( $a_lat, $a_long, $b_lat, $b_long);
my $geo_d = $geo->range($a_lat, $a_long, $b_lat, $b_long);
printf "Distance from (%.6f,%.6f) => (%.6f,%.6f):\n",
$a_lat, $a_long, $b_lat, $b_long;
printf " %10s: %12.6f\n", $$_[0], $$_[1]
for ( ['Spherical', $gc_d], ['Ellipsoid',$geo_d] );
sub gc
{
my ($a_lat, $a_long, $b_lat, $b_long) = @_;
$a_long = deg2rad($a_long);
$b_long = deg2rad($b_long);
$a_lat = deg2rad($a_lat);
$b_lat = deg2rad($b_lat);
my $earth_radius = 6378140;
my $d = acos(sin($a_lat) * sin($b_lat) + cos($a_lat) * cos($b_lat)
* cos($a_long - $b_long));
return ($earth_radius * $d);
}
Which produces:
Distance from (42.134040,-87.934097) => (42.716112,-87.823329):
Spherical: 65432.132083
Ellipsoid: 65296.887943
------------------------------
Date: 06 Jul 2006 16:54:09 GMT
From: xhoster@gmail.com
Subject: Re: smarter zipcode search algorithm
Message-Id: <20060706125513.407$Si@newsreader.com>
premgrps@gmail.com wrote:
> Hi,
> I have a database with two tables
> a) A table of 2 million records with city, zip and associated
> information (say XYZ) and
> b) zipcode latitude, longitude table having >40,000 records/zip codes
40,001 and 10**89 are both greater than 40,000. I assuming you mean ~40,
000 rather than >40,000, otherwise it is meaningless.
Is that ~40,000 records where each record is a zipcode?
Or is it ~40,000 records per zip code?
> PROBLEM:
> I need to find the the XYZs within the the range of a certain zipcode.
> This zipcode and radial range in miles is entered by the user (web
> interface).
Are you assuming zip codes to be points?
> The brute force way is to calculate the distance between the user
> zipcode and all the zipcodes in the database. Once the zipcode_range
> subroutine gives back the zipcodes within a certain radius, I need to
> find all the XYZs from the table #1.
OK. You could probably even build a table giving the distances between
every pair of zip codes, or at least every pair within 300 miles or so
of each other.
> Another approach is to find the zipcodes with a square region (min/max
> of the user zipcode latitude/longitude position).
Yep, that is another approach. You could construct a square around the
circle, use a data structure to get everything in the square, then brute
force just those things in the square to see if they are also within the
circle. (neglecting the curvature of the earth.)
> Both the approaches are consuming too much time. Especially if the
> radial distance starts increasing.
Starts increasing? You make it sound like an organic creature. Isn't it
a simple number provided by the end user?
Anyway, what is taking a long time? Getting a list of zip codes within
X miles of the reference zip code? Turning that list into a list of all
the qualifying XYZ?
>
> My questions:
> 1. Is there any other smart way to do the above task.
We only have the vaguest idea of how you are doing it now. You can't make
tuning decisions based on vague notions.
> 2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
> increase the performance. Right now each select command to the
> 2Million record table takes about a minute.
What database system are you using? What query are you issuing to it?
What indices exist? What does this have to do with Perl?
If you are using Perl as your database system, have you looked at Tree::R?
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
------------------------------
Date: Thu, 6 Jul 2006 19:36:01 +0200
From: Franzl Wisseworst <franzl.wisseworst@mailinator.net>
Subject: Sticky form
Message-Id: <e8jijd$b20$02$1@news.t-online.com>
I'm searching for a simply cgi/html form output where the form <textarea>
fills the dual functionality the information display and saving of the
updated state. So for example, editing the textarea,and pressing a save
button below the form will update / save to file. Basically like a sticky
note html page without fancy html output etc. Has anyone seen some free
script anywhere which does something like it? I didn't find any via
cgi-resources.com etc. I could only find more complex html editing systems.
------------------------------
Date: Thu, 6 Jul 2006 17:18:54 +0200
From: "Ferry Bolhar" <bol@adv.magwien.gv.at>
Subject: Re: Symbol Table and References
Message-Id: <1152199135.604188@proxy.dienste.wien.at>
Daniel:
> Hello,
> I am trying to understand symbol tables and references. Maybe someone
> can give me a hint.
> Assume we are loking at symbol table st.
> 0) What is the internal format of a symbol table?
A special form of a hash - a symbol table hash (shortly called "stash").
The hash itself is the package (the namespace, where its name
includes always the leading "::"), its keys are the names of all variables
in this package and its values are typeglobs (see below). So, writing
*x (in package 'main')
is similar to
$main::{x}
(except that the first is processed by the compiler and the second is
evaluated at run-time).
> 1) What is actually stored at e.g. @st? The data of @a or a pointer to
> this data?
@st is an array and so is stored internally in an array value (AV) .The
AV contains a pointer to its element list which is a C array of pointers
to scalar values (SV).
> 2) What is returned by (1,2,3), the data itself or a pointer to this data?
Depends on the context. What do you mean with 'returned'? When
returned? To whom?
> 3) Is it correct, that a reference is simply a pointer?
Not "simply", but yes. Unlike a pointer in C, a reference points
_always_ to valid data. ;-)
> 4) What is *st exactly, a pointer to the start of the symbol table?
Well, now we are at typeglobs - as mentioned above, the values
of a stash (which implements global variables) are typeglobs. A
typeglob is simply a holder and has a 'slot' for each kind of 'value'
Perl provides for a particular identifier:
scalar, array, hash, code, IO handle and format.
For each of these values, a typeglob can hold a pointer - a
reference -. You can assign a value to a typeglob by assigning
a reference of the particular type:
$a = 3; is the same as *a = \3; 1)
@a = (1,2) is the same as *a = [1,2];
%a = ('a' => 1); is the same as *a = {'a' => 1};
sub a {...}; is the same as *a = sub {...};
1) except that in the latter case, the scalar is marked as read-only.
In fact, the compiler generates entries in the symbol table in
exact this way when it encounters identifiers in a script. In
addition, some pragmas ('vars', 'subs') and modules ('Exporter')
use this mechanism for their operations.
You can assign the slots in a typeglob by using a hash-like
notation with the typeglob:
$scalarref = *a{SCALAR};
$arrayref = *a{ARRAY};
$hashref = *a{HASH};
$coderef = *a{CODE};
$handleref = *a{IO};
$fmtref = *a{FORMAT};
$globref = *a{GLOB}; # Reference to the glob itself
However, in this way, you can read the typeglob slots only; see
exampels above how to write into them.
All this will return references (scalars) and so you can store them
into scalar variables like any other scalars. In other words, @x
will address to ARRAY slot in the typeglob *x which contains
a reference to the actual AV and will dereference it to address
the actual array.
There is also a *a{NAME} notation which returns the glob
name ('a' in this case (useful if you get the glob passed by refe-
rence and you want to obtain its name) and *a{PACKAGE}
which returns the package which the glob belongs to.
Because typeglobs contain references, using them was the only
way to pass references in Perl 4, where the reference operator
'\' did not yet exist.
I'd suggest to read the chapters "Symbol Tables" in perlmod and
"Making References" in perlref which addresses these topics in
greater detail.
Finally, note that the above is meaningful for global (package)
variables only, not lexicals (those declared with "my"). These
are implemented in a complete different way.
> 5) What is the difference between *st and /*st?
The first addresses the glob *st and the second is interpreted
as unterminated search pattern ;-): Assuming you mean \*st,
this a reference to glob *st.
> 6) What exactly is a file handle?
The same as a file descriptior in C - a logical channel through
which you can perform I/O.
Greetings, Ferry
--
Ing. Ferry Bolhar
Municipality of Vienna, Department 14
A-1010 Vienna / AUSTRIA
E-mail: bol@adv.magwien.gv.at
------------------------------
Date: 6 Jul 2006 09:51:28 -0700
From: "howa" <howachen@gmail.com>
Subject: Re: Using reference for performance gain?
Message-Id: <1152204688.305078.215320@m79g2000cwm.googlegroups.com>
Uri Guttman =E5=AF=AB=E9=81=93=EF=BC=9A
> >>>>> "h" =3D=3D howa <howachen@gmail.com> writes:
>
> h> anno4000@radom.zrz.tu-berlin.de
>
> >> Your benchmarks don't tell much about the actual performance of
> >> returning a list vs. returning a reference. For one, you are buildi=
ng
> >> the hash on each call to the routine. That is going to swamp out
> >> other performance differences.
>
> h> i have considered this, but this just reflect the real world suitati=
on.
>
> what real world? if you are building constant hashes in each call, then
> your real world is very slow. and as anno said, it will swamp out any
> return overhead.
>
in real world, complex data are always generated and returning from a
function, and this is the reason why we need to consider using
reference instead.
of coz purely comparing the speed of reference & value might be a
factor of 100, but when used in real world suitation, this might be
just a factor of 2 as you must have some min. overhead of other things
else.
thanks anyway.
------------------------------
Date: Thu, 06 Jul 2006 13:34:17 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Using reference for performance gain?
Message-Id: <x74pxuzlgm.fsf@mail.sysarch.com>
>>>>> "h" == howa <howachen@gmail.com> writes:
h> Uri Guttman 寫道:
>> >>>>> "h" == howa <howachen@gmail.com> writes:
>>
h> anno4000@radom.zrz.tu-berlin.de
>>
>> >> Your benchmarks don't tell much about the actual performance of
>> >> returning a list vs. returning a reference. For one, you are building
>> >> the hash on each call to the routine. That is going to swamp out
>> >> other performance differences.
>>
h> i have considered this, but this just reflect the real world suitation.
>>
>> what real world? if you are building constant hashes in each call, then
>> your real world is very slow. and as anno said, it will swamp out any
>> return overhead.
>>
h> in real world, complex data are always generated and returning from a
h> function, and this is the reason why we need to consider using
h> reference instead.
h> of coz purely comparing the speed of reference & value might be a
h> factor of 100, but when used in real world suitation, this might be
h> just a factor of 2 as you must have some min. overhead of other things
h> else.
you arent' getting it but i can't think of any way to explain it to
you. optimization is a skill in itself and it requires understanding of
when things happen. loading large CONSTANT hashes inside a sub is a
killer of cpu power. it also will distort any benchmarks you are
doing. as for the real world, the whole point of a benchmark is to try
to simulate real world conditions. your benchmark was useless for that
and for isolating whether returning a hash or a reference was faster
(besides it being broken in how it returned stuff).
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: Thu, 06 Jul 2006 09:08:34 GMT
From: Bart Lateur <bart.lateur@pandora.be>
Subject: Re: Win32: File Manipulation
Message-Id: <7okpa2trflmd15893l3bc71kstpu8ovod8@4ax.com>
AJ wrote:
> if ($PickFTP->Put($nextname.dnl)){
> if ($PickFTP->Rename($nextname.dnl,$nextname)) {
$nextname.dnl is not a valid name for a variable.
--
Bart.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 9431
***************************************