[30351] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 1594 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri May 30 11:14:57 2008

Date: Fri, 30 May 2008 08:14:14 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 30 May 2008     Volume: 11 Number: 1594

Today's topics:
        sorting a hash / 2008-06-01 <dn.perl@gmail.com>
    Re: sorting a hash / 2008-06-01 <someone@example.com>
    Re: sorting a hash / 2008-06-01 <1usa@llenroc.ude.invalid>
    Re: sorting a hash / 2008-06-01 <noreply@gunnar.cc>
    Re: sorting a hash / 2008-06-01 <1usa@llenroc.ude.invalid>
        Speed comparison of regex versus index, lc, and / /i <benkasminbullock@gmail.com>
    Re: Speed comparison of regex versus index, lc, and / / <someone@example.com>
    Re: Speed comparison of regex versus index, lc, and / / <benkasminbullock@gmail.com>
    Re: Speed comparison of regex versus index, lc, and / / <simon.chao@fmr.com>
    Re: Speed comparison of regex versus index, lc, and / / xhoster@gmail.com
    Re: Using perl locally on a Windows XP system <hjp-usenet2@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 30 May 2008 00:21:42 -0700 (PDT)
From: "dn.perl@gmail.com" <dn.perl@gmail.com>
Subject: sorting a hash / 2008-06-01
Message-Id: <1851690d-ce84-4bb7-94dc-f262fc061f37@l64g2000hse.googlegroups.com>


I want to sort a hash. The hash contains a list of cities and their
temperature and I want the 4 cities with max temp. The problem is that
the city-names are one extra level deep with the state-name coming in-
between. I wondered whether I should build the hash differently. A
different format would be: state_city, with the underbar separating
the state and the city.
$hash{Calif_Cupertino}{max_temp}  = 38 ;
instead of
$hash{Calif}{Cupertino}{max_temp}  = 38 ;


my %hash = () ;
$hash{Calif}{San Jose}{max_temp}  = 84 ;
$hash{Calif}{San Fran}{max_temp}  = 94 ;
$hash{Calif}{Cupertino}{max_temp}  = 38 ;
$hash{Calif}{Fremont}{max_temp}  = 66 ;
$hash{Texas}{Dallas}{max_temp} =  72 ;
$hash{Texas}{Austin}{max_temp} =  96 ;
$hash{Texas}{Fort Worth}{max_temp} =  62 ;
$hash{Mass}{Boston}{max_temp} =  96 ;
$hash{Mass}{Framingham}{max_temp} =  55 ;
$hash{Mass}{Worcester}{max_temp} =  55 ;

How do I sort this hash, please?
Thanks in advance.





------------------------------

Date: Fri, 30 May 2008 12:49:12 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: sorting a hash / 2008-06-01
Message-Id: <cJS%j.603$cV.202@edtnps92>

dn.perl@gmail.com wrote:
> I want to sort a hash. The hash contains a list of cities and their
> temperature and I want the 4 cities with max temp. The problem is that
> the city-names are one extra level deep with the state-name coming in-
> between. I wondered whether I should build the hash differently. A
> different format would be: state_city, with the underbar separating
> the state and the city.
> $hash{Calif_Cupertino}{max_temp}  = 38 ;
> instead of
> $hash{Calif}{Cupertino}{max_temp}  = 38 ;
> 
> 
> my %hash = () ;
> $hash{Calif}{San Jose}{max_temp}  = 84 ;
> $hash{Calif}{San Fran}{max_temp}  = 94 ;
> $hash{Calif}{Cupertino}{max_temp}  = 38 ;
> $hash{Calif}{Fremont}{max_temp}  = 66 ;
> $hash{Texas}{Dallas}{max_temp} =  72 ;
> $hash{Texas}{Austin}{max_temp} =  96 ;
> $hash{Texas}{Fort Worth}{max_temp} =  62 ;
> $hash{Mass}{Boston}{max_temp} =  96 ;
> $hash{Mass}{Framingham}{max_temp} =  55 ;
> $hash{Mass}{Worcester}{max_temp} =  55 ;
> 
> How do I sort this hash, please?

$ perl -le'
my %hash = (
     Calif => {
         "San Jose" => { max_temp => 84 },
         "San Fran" => { max_temp => 94 },
         Cupertino  => { max_temp => 38 },
         Fremont    => { max_temp => 66 },
         },
     Texas => {
         Dallas       => { max_temp => 72 },
         Austin       => { max_temp => 96 },
         "Fort Worth" => { max_temp => 62 },
         },
     Mass  => {
         Boston     => { max_temp => 96 },
         Framingham => { max_temp => 55 },
         Worcester  => { max_temp => 55 },
         },
     );

print "City: $_->[0]  Temperature: $_->[1]"
     for (
     sort { $b->[ 1 ] <=> $a->[ 1 ] }
     map  { my $hash = $_; map [ $_, $hash->{ $_ }{ max_temp } ], keys 
%$hash }
     values %hash
     )[ 0 .. 3 ];
'
City: Austin  Temperature: 96
City: Boston  Temperature: 96
City: San Fran  Temperature: 94
City: San Jose  Temperature: 84




John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Fri, 30 May 2008 12:59:18 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: sorting a hash / 2008-06-01
Message-Id: <Xns9AAE5B6F3DF0Basu1cornelledu@127.0.0.1>

"dn.perl@gmail.com" <dn.perl@gmail.com> wrote in news:1851690d-ce84-
4bb7-94dc-f262fc061f37@l64g2000hse.googlegroups.com:

> I want to sort a hash. The hash contains a list of cities and their
> temperature and I want the 4 cities with max temp.

I am going to assume you want the cities with the top four maximum 
temperatures.

> The problem is that
> the city-names are one extra level deep with the state-name coming in-
> between. I wondered whether I should build the hash differently. A
> different format would be: state_city, with the underbar separating
> the state and the city.
> $hash{Calif_Cupertino}{max_temp}  = 38 ;
> instead of
> $hash{Calif}{Cupertino}{max_temp}  = 38 ;

First, note that it might make your life easier later to use standard 
abbreviations such as CA for California, TX for Texas. If you need to 
also present longer names, you could use a lookup table.

> $hash{Calif}{San Fran}{max_temp}  = 94 ;

Similarly, I would use the actual identifier for the reporting 
temperature measurement station instead of a cutesy abbreviation of the 
city name. You could use a lookup table to map station identifiers to 
city names.

It is also not nice to post code that you have not tested at all:

C:\Temp> cat s.pl
#!/usr/bin/perl

use strict;
use warnings;

my %hash = () ;
$hash{Calif}{San Jose}{max_temp}  = 84 ;
$hash{Calif}{San Fran}{max_temp}  = 94 ;
$hash{Calif}{Cupertino}{max_temp}  = 38 ;
$hash{Calif}{Fremont}{max_temp}  = 66 ;
$hash{Texas}{Dallas}{max_temp} =  72 ;
$hash{Texas}{Austin}{max_temp} =  96 ;
$hash{Texas}{Fort Worth}{max_temp} =  62 ;
$hash{Mass}{Boston}{max_temp} =  96 ;
$hash{Mass}{Framingham}{max_temp} =  55 ;
$hash{Mass}{Worcester}{max_temp} =  55 ;


C:\Temp> s
Can't locate object method "San" via package "Jose" (perhaps you forgot 
to load "Jose"?) at C:\Temp\s.pl line 7.

You need to quote keys that do not consist solely of \w characters.

Please read the posting guidelines for this group and follow them.

Second, the data structure is dictated by the problem at hand usually. I 
think it is futile to venture whether a different data structure would 
be more appropriate using only the information you give in this post. In 
general, I do not favor using composite hash keys unless there is a 
complelling reason to do so.

However, assuming you are only storing a single attribute for each city, 
then I would recommend:

my %maxtemp;

$maxtemp{Calif}->{San Jose} = 94;

etc.

> How do I sort this hash, please?

By definition, a hash table is unsorted. You may present the contents in 
a certain order, but the data structure itself remains unaffected by 
that presentational transformation.

That said, a full blown sort seems to be unnecessary here. You said all 
you want are the four cities with the highest maximum tempreatures. On 
the other hand, simply sorting and picking up the four elements at the 
top has a certain appeal as well. Both of these might fail according to 
some criteria if there are ties. How do you handle ties?

The following code will find the highest four temperatures and list the 
cities with those temperatures. It does not handle the case where there 
are fewer than four distinct temperatures in the whole data set.


#!/usr/bin/perl

use strict;
use warnings;

my %maxtemp = () ;
$maxtemp{Calif}->{'San Jose'}   = 84;
$maxtemp{Calif}->{'San Fran'}   = 94;
$maxtemp{Calif}->{'Cupertino'}  = 38;
$maxtemp{Calif}->{'Fremont'}    = 66;
$maxtemp{Texas}->{'Dallas'}     = 72;
$maxtemp{Texas}->{'Austin'}     = 96;
$maxtemp{Texas}->{'Fort Worth'} = 62;
$maxtemp{Mass}->{'Boston'}      = 96;
$maxtemp{Mass}->{'Framingham'}  = 55;
$maxtemp{Mass}->{'Worcester'}   = 55;

my %cities_by_temp;

for my $state ( keys %maxtemp ) {
    for my $city ( keys %{ $maxtemp{$state} } ) {
        my $temp = $maxtemp{$state}->{$city};
        push @{ $cities_by_temp{$temp} }, "$city, $state";
    }
}

my @highest = (sort { $b <=> $a } keys %cities_by_temp)[0 .. 3];

for my $temp ( @highest ) {
    print "$temp\t", join("\n\t", @{$cities_by_temp{$temp}} ), "\n";
}

__END__





-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


------------------------------

Date: Fri, 30 May 2008 15:11:22 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: sorting a hash / 2008-06-01
Message-Id: <6aacqtF36ku0jU1@mid.individual.net>

dn.perl@gmail.com wrote:
> I want to sort a hash. The hash contains a list of cities and their 
> temperature

Well, I'd rather say it contains three hash references.

This is one sensible way to sort that data structure:

   foreach my $state ( sort keys %hash ) {
     print "State: $state\n";
     foreach my $city ( sort { $a cmp $b } keys %{ $hash{$state} } ) {
       print "$city = $hash{$state}{$city}{max_temp}\n";
     }
     print "\n";
   }

> and I want the 4 cities with max temp.

I'm not sure what you mean by that. Please clarify what's the desired 
output.

> The problem is that 
> the city-names are one extra level deep with the state-name coming in- 
> between. I wondered whether I should build the hash differently.

Probably. This is one idea:

   my %hash = (
     'San Jose' => {
       state => 'Calif',
       max_temp => 84,
     },
     'San Fran' => {
       state => 'Calif',
       max_temp => 94,
     },
   );

> my %hash = () ;
> $hash{Calif}{San Jose}{max_temp}  = 84 ;
---------------^^^^^^^^
Hash keys with spaces need to be quoted.

   $hash{Calif}{'San Jose'}{max_temp}  = 84 ;

> $hash{Calif}{San Fran}{max_temp}  = 94 ;
> $hash{Calif}{Cupertino}{max_temp}  = 38 ;
> $hash{Calif}{Fremont}{max_temp}  = 66 ;
> $hash{Texas}{Dallas}{max_temp} =  72 ;
> $hash{Texas}{Austin}{max_temp} =  96 ;
> $hash{Texas}{Fort Worth}{max_temp} =  62 ;
> $hash{Mass}{Boston}{max_temp} =  96 ;
> $hash{Mass}{Framingham}{max_temp} =  55 ;
> $hash{Mass}{Worcester}{max_temp} =  55 ;

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Fri, 30 May 2008 13:51:51 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: sorting a hash / 2008-06-01
Message-Id: <Xns9AAE6457BCD72asu1cornelledu@127.0.0.1>

Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in news:6aacqtF36ku0jU1
@mid.individual.net:

> dn.perl@gmail.com wrote:

 ...

>> The problem is that 
>> the city-names are one extra level deep with the state-name coming
>> in-between. I wondered whether I should build the hash differently.
> 
> Probably. This is one idea:
> 
>    my %hash = (
>      'San Jose' => {
>        state => 'Calif',
>        max_temp => 84,
>      },
>      'San Fran' => {
>        state => 'Calif',
>        max_temp => 94,
>      },
>    );

http://en.wikipedia.org/wiki/Athens_(town),_New_York

http://en.wikipedia.org/wiki/Athens,_Georgia

Sinan

-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


------------------------------

Date: Fri, 30 May 2008 13:04:19 +0000 (UTC)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Speed comparison of regex versus index, lc, and / /i
Message-Id: <g1ou0j$kme$2@ml.accsnet.ne.jp>

In a recent discussion on this newsgroup, it was mentioned that
"index" is better for matching fixed strings than using regular
expressions.  Coincidentally I've recently been setting up a search
system for a fairly large volume (about 30 megabytes) of text files,
and as a first approximation for the search system I made a simple
routine to open each file and search for the string in the file using
"index".

As a test of the proposition that index is better than regexes, I also
tried using a regex to do the same job. My results were that the
version using regexes was almost identical (within a few percent) in
speed to "index", leading me to think that under the bonnet "index" is
probably just using a regex anyway. Furthermore, the biggest
bottleneck in the code wasn't the pattern matching, but the use of
"lc" to convert the text into lower case for case insensitive
search. I found that saving all the text files as lower case before
doing the matching, rather than converting the strings using lc, saved
more than half of the total execution time, so the difference between
"index" and regexes was not even significant compared to the time
spent converting to lower case. I also found that the "i" option to
the regex similarly meant that the regex ran drastically
slower. Similarly another big bottleneck I identified was conversion
of the files - opening the (utf8 encoded) files with

open my $file, "<:utf8", $filename;

saved about 30% of the total execution time compared to converting the
text after reading it in.

So my conclusion is that "index" isn't necessary and one can always
use regexes - unless anyone can prove otherwise - and Perl's regexes
are so fast that they may not be much of a bottleneck. But why "lc"
should be so slow I don't know.


------------------------------

Date: Fri, 30 May 2008 13:28:21 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: Speed comparison of regex versus index, lc, and / /i
Message-Id: <VhT%j.606$cV.200@edtnps92>

Ben Bullock wrote:
> In a recent discussion on this newsgroup, it was mentioned that
> "index" is better for matching fixed strings than using regular
> expressions.  Coincidentally I've recently been setting up a search
> system for a fairly large volume (about 30 megabytes) of text files,
> and as a first approximation for the search system I made a simple
> routine to open each file and search for the string in the file using
> "index".
> 
> As a test of the proposition that index is better than regexes, I also
> tried using a regex to do the same job. My results were that the
> version using regexes was almost identical (within a few percent) in
> speed to "index", leading me to think that under the bonnet "index" is
> probably just using a regex anyway.

Just the opposite.  AFAIK if searching for a literal string (as opposed 
to a regular expression pattern) the regexp engine will use the same 
algorithm as index().

See perlreguts.pod for details: 
http://search.cpan.org/~rgarcia/perl-5.10.0/pod/perlreguts.pod


> Furthermore, the biggest
> bottleneck in the code wasn't the pattern matching, but the use of
> "lc" to convert the text into lower case for case insensitive
> search. I found that saving all the text files as lower case before
> doing the matching, rather than converting the strings using lc, saved
> more than half of the total execution time, so the difference between
> "index" and regexes was not even significant compared to the time
> spent converting to lower case. I also found that the "i" option to
> the regex similarly meant that the regex ran drastically
> slower. Similarly another big bottleneck I identified was conversion
> of the files - opening the (utf8 encoded) files with
> 
> open my $file, "<:utf8", $filename;
> 
> saved about 30% of the total execution time compared to converting the
> text after reading it in.
> 
> So my conclusion is that "index" isn't necessary and one can always
> use regexes - unless anyone can prove otherwise - and Perl's regexes
> are so fast that they may not be much of a bottleneck. But why "lc"
> should be so slow I don't know.

I assume that most of the slowdown was caused by the introduction of the 
use of UTF, etc.



John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Fri, 30 May 2008 14:10:09 +0000 (UTC)
From: Ben Bullock <benkasminbullock@gmail.com>
Subject: Re: Speed comparison of regex versus index, lc, and / /i
Message-Id: <g1p1s1$m4g$1@ml.accsnet.ne.jp>

On Fri, 30 May 2008 13:28:21 +0000, John W. Krahn wrote:

> Just the opposite.  AFAIK if searching for a literal string (as opposed
> to a regular expression pattern) the regexp engine will use the same
> algorithm as index().

I don't know what it does internally, but actually using non-literal
strings in the regular expression match like "something|else" or
"first.*second" did not result in a significant slowdown. The search
string did not change at all during the execution of the program, so
the regular expression would only have been compiled once.

> I assume that most of the slowdown was caused by the introduction of the
> use of UTF, etc.

No - the "lc"-related slowdown was experienced even if I read in the
files as bytes and did not convert them into anything. I'm sure of
this because I converted to using UTF-8 halfway through coding because
of an unrelated problem, and by that point I'd already noticed that "lc"
or / /i more than doubled the time of the program execution. In fact at the 
same time that I converted the searched files into UTF-8, I also converted 
them to lower case.




------------------------------

Date: Fri, 30 May 2008 07:20:40 -0700 (PDT)
From: nolo contendere <simon.chao@fmr.com>
Subject: Re: Speed comparison of regex versus index, lc, and / /i
Message-Id: <e782ccb4-aae1-4431-a834-28cb1049cac8@59g2000hsb.googlegroups.com>

On May 30, 10:10=A0am, Ben Bullock <benkasminbull...@gmail.com> wrote:
> On Fri, 30 May 2008 13:28:21 +0000, John W. Krahn wrote:
> > Just the opposite. =A0AFAIK if searching for a literal string (as oppose=
d
> > to a regular expression pattern) the regexp engine will use the same
> > algorithm as index().
>
> I don't know what it does internally, but actually using non-literal
> strings in the regular expression match like "something|else" or
> "first.*second" did not result in a significant slowdown. The search
> string did not change at all during the execution of the program, so
> the regular expression would only have been compiled once.
>
> > I assume that most of the slowdown was caused by the introduction of the=

> > use of UTF, etc.
>
> No - the "lc"-related slowdown was experienced even if I read in the
> files as bytes and did not convert them into anything. I'm sure of
> this because I converted to using UTF-8 halfway through coding because
> of an unrelated problem, and by that point I'd already noticed that "lc"
> or / /i more than doubled the time of the program execution. In fact at th=
e
> same time that I converted the searched files into UTF-8, I also converted=

> them to lower case.

Could you post the code you used to compare, as well as the output?
I'm assuming you used Benchmark, please correct me if I'm wrong.
Also, what's the output of perl -V, and what are your system specs?


------------------------------

Date: 30 May 2008 14:55:04 GMT
From: xhoster@gmail.com
Subject: Re: Speed comparison of regex versus index, lc, and / /i
Message-Id: <20080530105505.273$pB@newsreader.com>

Ben Bullock <benkasminbullock@gmail.com> wrote:
> In a recent discussion on this newsgroup, it was mentioned that
> "index" is better for matching fixed strings than using regular
> expressions.

Yes, it is.  If using regex to match fixed strings, you need to worry
about special characters or syntax errors in the regex, like the problem
with the literal string like "[l-c]" which we recently witnessed here.

> Coincidentally I've recently been setting up a search
> system for a fairly large volume (about 30 megabytes) of text files,
> and as a first approximation for the search system I made a simple
> routine to open each file and search for the string in the file using
> "index".
>
> As a test of the proposition that index is better than regexes,

"Better" is a much bigger issue than merely faster.

 ...

>
> So my conclusion is that "index" isn't necessary and one can always
> use regexes

True.  On the other hand, Perl isn't necessary and one can always use other
languages.  Computers aren't necessary and one can always use paper and
pencil.  Where is this headed?

And of course, if you are interested in where the string matches (i.e. the
return value of index, and not just whether or not it is -1) then it is
simpler to get it from index than from a regex.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Fri, 30 May 2008 16:16:29 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Using perl locally on a Windows XP system
Message-Id: <slrng4031v.680.hjp-usenet2@hrunkner.hjp.at>

On 2008-05-29 22:42, J�rgen Exner <jurgenex@hotmail.com> wrote:
> "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>On 2008-05-28 13:50, J�rgen Exner <jurgenex@hotmail.com> wrote:
>>> Oh, you are not talking about Perl at all, you are talking about an HTTP
>>> server. The "best" way to do that is to install a proxy server, which
>>> redirects all requests to your test environment, in this case to
>>> localhost.
>>
>>What do you need the proxy for? 
>
> A proxy server is the easiest method to redirect traffic between the
> live and test site.

I find typing "http://test.example.com" instead of
"http://www.example.com" into the URL bar much easier.

	hp


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1594
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[30351] in Perl-Users-Digest

Perl-Users Digest, Issue: 1594 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri May 30 11:14:57 2008

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri May 30 11:14:57 2008