[30317] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1560 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue May 20 14:09:56 2008

Date: Tue, 20 May 2008 11:09:18 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 20 May 2008     Volume: 11 Number: 1560

Today's topics:
    Re: Example for open3 on windows? <szrRE@szromanMO.comVE>
    Re: FAQ 4.41 How can I remove duplicate elements from a <szrRE@szromanMO.comVE>
        high and low bytes of a decimal <swest@gmx.de>
    Re: high and low bytes of a decimal <swest@gmx.de>
    Re: high and low bytes of a decimal <jl_post@hotmail.com>
    Re: high and low bytes of a decimal <jl_post@hotmail.com>
    Re: high and low bytes of a decimal <swest@gmx.de>
    Re: high and low bytes of a decimal <swest@gmx.de>
    Re: high and low bytes of a decimal <wahab@chemie.uni-halle.de>
    Re: high and low bytes of a decimal <swest@gmx.de>
    Re: How to determine if a word has an extended characte <jurgenex@hotmail.com>
    Re: How to determine if a word has an extended characte <JustMe@somewhere.de>
        How would I set up a hash for the following chadda@lonemerchant.com
    Re: How would I set up a hash for the following <jurgenex@hotmail.com>
    Re: How would I set up a hash for the following chadda@lonemerchant.com
    Re: How would I set up a hash for the following <roblund@gmail.com>
    Re: I need ideas on how to sort 350 million lines of da <tzz@lifelogs.com>
        Installing Perl Module <amerar@iwc.net>
    Re: Installing Perl Module (Jens Thoms Toerring)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 20 May 2008 10:10:04 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: Example for open3 on windows?
Message-Id: <g0v0lc02d4b@news4.newsguy.com>

zentara wrote:
> On Tue, 20 May 2008 07:33:37 +0200, Manuel Reimer
> <mreimer@expires-31-05-2008.news-group.org> wrote:
>
>> Hello,
>>
>> I tried to forward input, output and error of a command to files
>> using open3 on windows.
>>
>> I used the following code:
>>
>> open(F_IN, "<$f_in");
>> open(F_OUT, ">$f_out");
>> open(F_ERR, ">$f_err");
>> my $pid = open3(\*F_IN, \*F_OUT, \*F_ERR, "$command");
>> waitpid($pid, 0);
>>
>> Anything, this code does, is hanging on waitpid.
>>
>> Does open3 work on windows? If yes: How can I use it to forward all
>> input/output channels to files?
>>
>> Thanks in advance
>
> open3 uses pipes to open the filehandles, but Win32 dosn't
> handle pipes well,

Could you please elaborate on what you mean by, "doesn't[sic] handle 
pipes well" ?


I find under 2000 and XP I can use pipes just as I can in a Linux/UNIX 
environment.

E.G. :

  ls -l | egrep "^d" | sort
  some_program_with_lots_of_output | more
  grep "1.2.3.4" access_log | tail -n 25
  ls -l long_dir | less

(Note that some of these examples use Win32 ports of Linux uilities like 
grep, sort, ls, tail, less... etc, which I use on all my Windows systems 
so I can things in a similar manner on either win or nix :-) )

-- 
szr 




------------------------------

Date: Tue, 20 May 2008 08:52:39 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: FAQ 4.41 How can I remove duplicate elements from a list or array?
Message-Id: <g0us48028ff@news4.newsguy.com>

brian d foy wrote:
> In article <g0q0580tcc@news4.newsguy.com>, szr <szrRE@szromanMO.comVE>
> wrote:
>
>> Point taken, though my goal was to show how to do what was shown at
>> the end of the FAQ with one less line.
>
> Generally I don't golf perlfaq answers, and even take more lines than
> I would in my normal programming. Beginners can see more discrete
> steps, and advanced people can golf it as much as they like. :)

Agreed. It was more of a fun little mini-challenge, if you will :-)

-- 
szr 




------------------------------

Date: Tue, 20 May 2008 18:28:00 +0200
From: Susanne West <swest@gmx.de>
Subject: high and low bytes of a decimal
Message-Id: <c955a$4832fc0d$544b8434$22812@news.hispeed.ch>


i need to split/switch high and low bytes from a decimal,
my head is spinning and i'm really not sure how to
'properly' do it, because every search to this topic
yields another 10 different methods. so maybe once and
for all someone could help:


foreach $decimal (0..65025){
   $lowbyte = ???;
   $highbyte = $decimal >> 8;	#fails for $decimal>65000?

   #ex. to specify order:
   $int16_lowhigh = $lowbyte . $highbyte;
   $int16_highlow = $highbyte . $lowbyte;

}

 ... i bet there's a more efficient way with pack or sprintf,
but i'm not sure how this all behaves for the different
ranges (0-255, 255-65025, 65025-...)

the reason for all this is a bytestream specification
that i need to meet that switches byteorders for different
values...

thanks!!!



------------------------------

Date: Tue, 20 May 2008 18:40:11 +0200
From: Susanne West <swest@gmx.de>
Subject: Re: high and low bytes of a decimal
Message-Id: <4832FEEB.5080305@gmx.de>



addition:

> foreach $decimal (0..65025){
 >   $lowbyte = ???;

     $lowbyte = $decimal % 256;    #seems pretty fast

>   $highbyte = $decimal >> 8;    #fails for $decimal>65000?
> 
>   #ex. to specify order:
>   $int16_lowhigh = $lowbyte . $highbyte;
>   $int16_highlow = $highbyte . $lowbyte;
> 
> }



------------------------------

Date: Tue, 20 May 2008 10:00:02 -0700 (PDT)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: Re: high and low bytes of a decimal
Message-Id: <f48e5098-d9b4-4a03-995d-a51a9741a935@a9g2000prl.googlegroups.com>

On May 20, 10:28 am, Susanne West <sw...@gmx.de> wrote:
>
> i need to split/switch high and low bytes from a decimal,
> my head is spinning and i'm really not sure how to
> 'properly' do it, because every search to this topic
> yields another 10 different methods. so maybe once and
> for all someone could help:
>
> ...
>
> the reason for all this is a bytestream specification
> that i need to meet that switches byteorders for different
> values...


Dear Susanne,

   If you're trying to fit integers into a bytestream I'm guessing
that you're trying to work through the issue of little-endian vs. big-
endian byte ordering and sending the integer to the stream one byte at
a time.

   If that's the case, you need to encode your integers into a string
of bytes, and then send that string into the bytestream.  But you have
to decide first if you want to encode that integer in little- or big-
endian order.

   Assuming you have an integer to send, such as:

      my $integerToSend = 2008;

you can encode it in big-endian order like this:

      my $stringToSend = pack('N', $integerToSend);

or you can encode it in little-endian order like this:

      my $stringToSend = pack('V', $integerToSend);

   Chances are you're going to want to use big-endian order, as that's
what's favored by many networks.  But in the end it really depends on
the specifications for the bytestream you're writing to.  If the
bytestream expects the high-byte to be first, use big-endian
encoding.  But if it expects the low-byte first, use little-endian.

(Note:  These pack() calls assume you want to encode the numbers as 32-
bit (4-byte) integers.  If you only want 16-bit (2-byte) integers,
change 'N' and 'V' to 'n' and 'v', respectively.  You can also read
"perldoc -f pack" for more details.)

   Once you have the string to send, you can just print() it to your
file or socket handle, like this:

      print FILE_OR_SOCKET_HANDLE $stringToSend;

   Also, be aware that you probably want your file or socket handle to
be writing out in binary mode, so be sure to call binmode() on it,
like this:

      binmode(FILE_OR_SOCKET_HANDLE);

before printing to it.  Otherwise integers with bytes holding a value
of 10 (such as 10, 266, and 522) might get extra bytes sent along with
them.

   I hope this helps, Susanne.

   -- Jean-Luc


------------------------------

Date: Tue, 20 May 2008 10:19:03 -0700 (PDT)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: Re: high and low bytes of a decimal
Message-Id: <4bc233ee-07a9-4c7b-bd3c-052298ab9696@l28g2000prd.googlegroups.com>

On May 20, 11:00 am, "jl_p...@hotmail.com" <jl_p...@hotmail.com>
wrote:
>
>    Assuming you have an integer to send, such as:
>
>       my $integerToSend = 2008;
>
> you can encode it in big-endian order like this:
>
>       my $stringToSend = pack('N', $integerToSend);
>
> or you can encode it in little-endian order like this:
>
>       my $stringToSend = pack('V', $integerToSend);


   In case you want to see (or extract) the values of the bytes in the
$stringToSend, you can do so easily with this:

      print ord, "\n"  foreach split //, $stringToSend;

This will print out the value of each byte.

   For example, if you were to write:

      my $integerToSend = 2008;
      my $stringToSend = pack('N', $integerToSend);
      print ord, "\n"  foreach split //, $stringToSend;

you would see the following output:

0
0
7
216

(You get 2008 from that by adding 0*256^3 + 0*256^2 + 7*256 + 216.)

   But if you were to write it this way, with little-endian ordering:

      my $integerToSend = 2008;
      my $stringToSend = pack('V', $integerToSend);
      print ord, "\n"  foreach split //, $stringToSend;

you would see the following output:

216
7
0
0

Because you encoded the integer in little-endian order, the bytes are
reversed (with the low-byte shown first).

   I hope this helps, Susanne.

   -- Jean-Luc


------------------------------

Date: Tue, 20 May 2008 19:29:56 +0200
From: Susanne West <swest@gmx.de>
Subject: Re: high and low bytes of a decimal
Message-Id: <48330A94.1010602@gmx.de>



hi jean-luc


thanks very much for clarifying! yes, you are more
or less right. but i do need all the variants:
  - encoded little endian (goes into bytestream)
  - encoded big endian (goes into bytestream)
  - only low byte (for other reasons)
  - only high byte (for other reasons)


problem 1: bytestream
the point with pack() is that i have a hard time to
understand size and system specifics: i need to pack
an int16 so i would indeed do
   $integerToSend = 2008;
   my $stringToSend = pack('n', $integerToSend);
but what happens if for unexpected reasons:
   $integerToSend = 200000;
   my $stringToSend = pack('n', $integerToSend);
is it correct, that this is truncated to 65025?


problem 2: only high and low bytes (similar story)
what is the best (fastest, safest) way to extract
the lowest two bytes of an decimal of unknown length?
i'm currently using
  my $integerToSend = 2008;
  my $lowbyte = $decimal % 256;
  my $highbyte = $decimal >> 8;
but i doubt that this is the 'proper' way to do it.
especially when (again) for uexpected reasons:
  my $integerToSend = 200000;
  my $lowbyte = $decimal % 256;
  my $highbyte = $decimal >> 8;


thanks for your comments. you've almost put me back
on track...

susanne.




jl_post@hotmail.com wrote:
>    Assuming you have an integer to send, such as:
> 
>       my $integerToSend = 2008;
> 
> you can encode it in big-endian order like this:
> 
>       my $stringToSend = pack('N', $integerToSend);
> 
> or you can encode it in little-endian order like this:
> 
>       my $stringToSend = pack('V', $integerToSend);
> 
>    Chances are you're going to want to use big-endian order, as that's
> what's favored by many networks.  But in the end it really depends on
> the specifications for the bytestream you're writing to.  If the
> bytestream expects the high-byte to be first, use big-endian
> encoding.  But if it expects the low-byte first, use little-endian.
> 
> (Note:  These pack() calls assume you want to encode the numbers as 32-
> bit (4-byte) integers.  If you only want 16-bit (2-byte) integers,
> change 'N' and 'V' to 'n' and 'v', respectively.  You can also read
> "perldoc -f pack" for more details.)
> 
>    Once you have the string to send, you can just print() it to your
> file or socket handle, like this:
> 
>       print FILE_OR_SOCKET_HANDLE $stringToSend;
> 
>    Also, be aware that you probably want your file or socket handle to
> be writing out in binary mode, so be sure to call binmode() on it,
> like this:
> 
>       binmode(FILE_OR_SOCKET_HANDLE);
> 
> before printing to it.  Otherwise integers with bytes holding a value
> of 10 (such as 10, 266, and 522) might get extra bytes sent along with
> them.
> 
>    I hope this helps, Susanne.
> 
>    -- Jean-Luc


------------------------------

Date: Tue, 20 May 2008 19:33:38 +0200
From: Susanne West <swest@gmx.de>
Subject: Re: high and low bytes of a decimal
Message-Id: <fa51$48330b74$544b8434$24096@news.hispeed.ch>




> This will print out the value of each byte.
> 
>    For example, if you were to write:
> 
>       my $integerToSend = 2008;
>       my $stringToSend = pack('N', $integerToSend);
>       print ord, "\n"  foreach split //, $stringToSend;
> 
> you would see the following output:
> 
> 0
> 0
> 7
> 216
> 
> (You get 2008 from that by adding 0*256^3 + 0*256^2 + 7*256 + 216.)
> 
>    But if you were to write it this way, with little-endian ordering:
> 
>       my $integerToSend = 2008;
>       my $stringToSend = pack('V', $integerToSend);
>       print ord, "\n"  foreach split //, $stringToSend;
> 
> you would see the following output:
> 
> 216
> 7
> 0
> 0
> 

wow... that's a very helpful example! thanks!





------------------------------

Date: Tue, 20 May 2008 18:42:00 +0200
From: Mirco Wahab <wahab@chemie.uni-halle.de>
Subject: Re: high and low bytes of a decimal
Message-Id: <g0v22b$ibc$1@nserver.hrz.tu-freiberg.de>

Susanne West wrote:
> .... i bet there's a more efficient way with pack or sprintf,
> but i'm not sure how this all behaves for the different
> ranges (0-255, 255-65025, 65025-...)

I don't know if it't efficient for
your purpose, but you could play
w/pack and unpack and keep whats
necessary:

One (somehow extended) example:

  ...
  foreach my $decimal (0..65025){
     my ($lowbyte, $highbyte) = unpack "B8 B8", pack "J", $decimal;
     print "$lowbyte, $highbyte => ";

     my $int16_lowhigh = unpack "V", pack "B32", $lowbyte.$highbyte;
     my $int16_highlow = unpack "V", pack "B32", $highbyte.$lowbyte;
     print "$int16_highlow, $int16_lowhigh\n"
  }
  ...

(http://perldoc.perl.org/functions/pack.html)

Regards

M.


------------------------------

Date: Tue, 20 May 2008 19:57:52 +0200
From: Susanne West <swest@gmx.de>
Subject: Re: high and low bytes of a decimal
Message-Id: <f39f0$4833111d$544b8434$27498@news.hispeed.ch>




for the record and those reading along:

thanks to jean-luc's example it's easy to reproduce the
problem with oversizes:

   my $integerToSend = 200000;
   my $stringToSend = '';

   print "\n2 bytes big endian:\n";
   $stringToSend = pack('n', $integerToSend);
   print ord, "\n"  foreach split //, $stringToSend;

   print "\n2 bytes little endian:\n";
   $stringToSend = pack('v', $integerToSend);
   print ord, "\n"  foreach split //, $stringToSend;

will yield to output:

----
2 bytes big endian:
13
64

2 bytes little endian:
64
13
----

so:  13*256 + 64 would be 3392...




------------------------------

Date: Tue, 20 May 2008 13:50:56 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: How to determine if a word has an extended character?
Message-Id: <nrk5345ri927q4ak1b8j0rltufddm5bi3m@4ax.com>

ambarish.mitra@gmail.com wrote:
>I have a file which contains just one word. My task is just to find
>out if the word has any extended character. Thats all.
>
>I can use regex, but am not able to find out a regex pattern for
>extended character. Any hints?

[Interpreting 'extended' as non-ASCII]

You could simply use the POSIX character class [:ASCII:]

Another way would be to check for each character, if its ord() is less
than 128. That should work at least for the most common encodings like
ISO-Latin-1, Windows-1252, ...

Or: [untested]
if (/^[A-Za-z]*$/) {
	print 'false';
} else {
	print 'true';
}

You could probably also set your locale to EN-US and use
if (/\W/) {
	print 'true';
} else {
	print 'false';
}

All of these do somewhat different things, so you have some options to
choose the one that most closely matches your needs.

jue


------------------------------

Date: Tue, 20 May 2008 15:55:11 +0200
From: Hartmut Camphausen <JustMe@somewhere.de>
Subject: Re: How to determine if a word has an extended character?
Message-Id: <MPG.229cf6a7814b8654989684@news.t-online.de>

In <<405f2950-fa4a-4a3e-b5a7-c030a4604b2f@k1g2000prb.googlegroups.com>>=20
schrieb ...
> I have a file which contains just one word. My task is just to find
> out if the word has any extended character. Thats all.
>=20
> I can use regex, but am not able to find out a regex pattern for
> extended character. Any hints?
>=20
>=20
> For example, if the file content is: sample, then the Perl code prints
> false; and if the file content is sampl=E9, then the Perl code prints
> true.


  $string =3D~ m/[^\w]/ ? print "\nhas extended." : print "\nOK.";

should do the trick.

This prints "has extended" if $string contains any characters other=20
([^...]) then 'a' to 'z', 'A' to 'Z', '0' to '9' plus '_' (the \w=20
character class).

If you want to exclude the '_' (contained in \w), use   [^a-zA-Z0-9]
If you want to include more "valid" characters, expand the [^...]=20
accordingly (note: if you want to inlcude '-' as valid character, put it=20
at the very end of the characters list).

See
   perldoc perlre
   perldoc perlrequick
   perldoc perlreref
   perldoc perlretut



hth, Hartmut

--=20
  ------------------------------------------------
Hartmut Camphausen      h.camp[bei]textix[punkt]de


------------------------------

Date: Tue, 20 May 2008 09:52:31 -0700 (PDT)
From: chadda@lonemerchant.com
Subject: How would I set up a hash for the following
Message-Id: <08d8a66d-9acb-4a0e-97bb-a7b30ae54b1f@u12g2000prd.googlegroups.com>

I hardcode the categories into the script because I have no idea how
to make the script traverse   a site that has urls going to other urls
that in turn is going to other urls. Added on top of that, I want the
script to only follow the urls that have certain words in them.

Anyhow, when I do something like the following...


#!/usr/bin/perl

use strict;
use warnings;

use HTML::TokeParser;
use LWP::Simple;
use LWP::UserAgent;
use HTML::LinkExtor;

my @urls;

#for privoxy
my $browser = LWP::UserAgent->new;
$browser->proxy( ['http', 'https' ], "http://localhost:8118");

#my categories
my $acer_laptops = 'http://www.doba.com/catalog/search/search.php?
filters[submit]=advanced&filters[i
nc_noimage]=0&filters[inc_outofstock]=0&filters[inc_discontinued]=0&filters[inc_refurbished]=1&filte
rs[inc_pro_only]=0&filters[min_qty]=0&filters[category]=112666';

my $html = get($acer_laptops);

my $get_links = new HTML::LinkExtor;
$get_links->parse($html);

my @links = $get_links->links;
foreach (@links) {
            # $_ contains [type, [name, value], ...]
    shift @$_;
    while (my ($name, $value) = splice(@$_, 0, 2)) {
        if($value =~/\/catalog\/search\/search_hit/){
            push(@urls, $value);
            push(@urls, "\n");
            #print "  $name -> $value\n";
    }
    }
}

I get the following
 ./buildfile.pl
/catalog/search/search_hit.php?product_id=2969975&location=/catalog/
2969975.html
 /catalog/search/search_hit.php?product_id=2969975&location=/catalog/
2969975.html
 /catalog/search/search_hit.php?product_id=2988526&location=/catalog/
2988526.html
 /catalog/search/search_hit.php?product_id=2988526&location=/catalog/
2988526.html
 /catalog/search/search_hit.php?product_id=2994617&location=/catalog/
2994617.html
 /catalog/search/search_hit.php?product_id=2994617&location=/catalog/
2994617.html
 /catalog/search/search_hit.php?product_id=3041783&location=/catalog/
3041783.html
 /catalog/search/search_hit.php?product_id=3041783&location=/catalog/
3041783.html
 /catalog/search/search_hit.php?product_id=3117275&location=/catalog/
3117275.html
 /catalog/search/search_hit.php?product_id=3117275&location=/catalog/
3117275.html
 /catalog/search/search_hit.php?product_id=3132778&location=/catalog/
3132778.html
 /catalog/search/search_hit.php?product_id=3132778&location=/catalog/
3132778.html
 /catalog/search/search_hit.php?product_id=3137118&location=/catalog/
3137118.html
 /catalog/search/search_hit.php?product_id=3137118&location=/catalog/
3137118.html
 /catalog/search/search_hit.php?product_id=3137121&location=/catalog/
3137121.html
 /catalog/search/search_hit.php?product_id=3137121&location=/catalog/
3137121.html
 /catalog/search/search_hit.php?product_id=3137123&location=/catalog/
3137123.html
 /catalog/search/search_hit.php?product_id=3137123&location=/catalog/
3137123.html
 /catalog/search/search_hit.php?product_id=3137124&location=/catalog/
3137124.html
 /catalog/search/search_hit.php?product_id=3137124&location=/catalog/
3137124.html
 /catalog/search/search_hit.php?product_id=3610730&location=/catalog/
3610730.html
 /catalog/search/search_hit.php?product_id=3610730&location=/catalog/
3610730.html
 /catalog/search/search_hit.php?product_id=3610734&location=/catalog/
3610734.html
 /catalog/search/search_hit.php?product_id=3610734&location=/catalog/
3610734.html

50% of the URLS are duplicates. I know there is a perl faq for
removing duplicate hash entries. The question is, how would I set up a
hash when the only values are the urls? Also, input on how to improve
my code are more than welcome.


------------------------------

Date: Tue, 20 May 2008 16:59:09 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: How would I set up a hash for the following
Message-Id: <bl0634davk8vmgo9rapmk51a5tvqmmssv0@4ax.com>

chadda@lonemerchant.com wrote:

>50% of the URLS are duplicates. I know there is a perl faq for
>removing duplicate hash entries. The question is, how would I set up a
>hash when the only values are the urls? 

Create a hash, for each URL add an entry in that hash where the key is
the URL and the value is 1 (or even leave the value undefined, you will
never use it anyway).
To retrieve all URLs just do a keys() on the hash.

jue


------------------------------

Date: Tue, 20 May 2008 10:00:59 -0700 (PDT)
From: chadda@lonemerchant.com
Subject: Re: How would I set up a hash for the following
Message-Id: <3f86d1eb-aece-47aa-b8a6-9b85004ef9da@z24g2000prf.googlegroups.com>

On May 20, 9:59 am, J=FCrgen Exner <jurge...@hotmail.com> wrote:
> cha...@lonemerchant.com wrote:
> >50% of the URLS are duplicates. I know there is a perl faq for
> >removing duplicate hash entries. The question is, how would I set up a
> >hash when the only values are the urls?
>
> Create a hash, for each URL add an entry in that hash where the key is
> the URL and the value is 1 (or even leave the value undefined, you will
> never use it anyway).
> To retrieve all URLs just do a keys() on the hash.
>
> jue


Got it. Thanks.


------------------------------

Date: Tue, 20 May 2008 10:07:14 -0700 (PDT)
From: Gibbering <roblund@gmail.com>
Subject: Re: How would I set up a hash for the following
Message-Id: <76b7f0a9-ae6d-4858-bf35-f5f728f73a9b@b5g2000pri.googlegroups.com>

On May 20, 9:52 am, cha...@lonemerchant.com wrote:
> I hardcode the categories into the script because I have no idea how
> to make the script traverse   a site that has urls going to other urls
> that in turn is going to other urls. Added on top of that, I want the
> script to only follow the urls that have certain words in them.
>
> Anyhow, when I do something like the following...
>
> #!/usr/bin/perl

*SNIP*

What you probably want to do is hash the product ids ... something
like:

use Data::Dumper;
my %ids
for ($get_links->links) {
    my $url = pop @$_;
    my ($id) = $url =~ /product_id=(\d+)/;
    ++$ids{$id} if $id;
}

print Dumper \%ids;




------------------------------

Date: Tue, 20 May 2008 10:42:19 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: I need ideas on how to sort 350 million lines of data
Message-Id: <867idpq8x0.fsf@lifelogs.com>

On Tue, 20 May 2008 11:42:55 +0100 bugbear <bugbear@trim_papermule.co.uk_trim> wrote: 

b> Ted Zlatanov wrote:
>> One simple way, without using databases, is to take smaller pieces (say,
>> 10K lines each) and sort them individually by whatever field you need.
>> Then you take the top or bottom of each piece, make a new set, and sort
>> that set for the final result.  
>> 
>> If you need to sort the whole list and not just get the max/min, apply
>> the same algorithm except you keep each sorted piece open and keep
>> taking the smallest/largest element from the top/bottom of the piece
>> that contains it.
>> 
>> For more information and if my explanation doesn't make sense, look up
>> the "merge sort" algorithm.

b> IIRC Linux/Unix sort used quicksort for in RAM
b> and merge sort (via disc) if the data size exceeds RAM size,
b> again using quicksort in RAM when the portion to be
b> merged fit in RAM.

Yes, but a) it writes them in /tmp (unless you use -T in newer sort
implementations), b) it's not as flexible as what I described, and c) it
only works on Unix-like systems (on Windows you have to install cygwin
or other packages, etc.).

(b) is particularly important IMO for anything but simple sorting.

Ted


------------------------------

Date: Tue, 20 May 2008 07:54:57 -0700 (PDT)
From: "amerar@iwc.net" <amerar@iwc.net>
Subject: Installing Perl Module
Message-Id: <5cc9d053-5ee6-47c9-91cd-8dde67e31969@b1g2000hsg.googlegroups.com>


Hi,

I'm not very good with figuring this wtuff out, so I'm hoping someone
can offer some advice.

We need to install some ability to create line charts from Perl.  So,
I'm trying to install DBIx::Chart.  One of the requirements is to
install GD 1.19.

I tried installing that, but when I get to the make command I get
these errors:

/usr/bin/perl GD/Image.pm.PLS GD/Image.pm
Extracting Image.pm (with variable substitutions)
cp GD/Polyline.pm blib/lib/GD/Polyline.pm
cp qd.pl blib/lib/qd.pl
cp GD/Image.pm blib/lib/GD/Image.pm
cp GD.pm blib/lib/GD.pm
AutoSplitting blib/lib/GD.pm (blib/lib/auto/GD)
cp GD/Simple.pm blib/lib/GD/Simple.pm
cp GD/Polygon.pm blib/lib/GD/Polygon.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap /usr/lib/
perl5/5.8.5/ExtUtils/typemap -typemap typemap  GD.xs > GD.xsc && mv
GD.xsc GD.c
gcc -c  -I/usr/local/include -D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -
fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -
D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -
march=i386 -mtune=pentium4   -DVERSION=\"2.39\" -DXS_VERSION=\"2.39\" -
fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -
DHAVE_JPEG -DHAVE_FT -DHAVE_XPM -DHAVE_GIF -DHAVE_PNG -DHAVE_ANIMGIF -
DVERSION_33 -DHAVE_UNCLOSEDPOLY -DHAVE_FONTCONFIG -DHAVE_FTCIRCLE GD.c
GD.xs: In function `XS_GD__Image_stringFT':
GD.xs:2211: error: `gdFTEX_DISABLE_KERNING' undeclared (first use in
this function)
GD.xs:2211: error: (Each undeclared identifier is reported only once
GD.xs:2211: error: for each function it appears in.)

I have no clue what all that means or how to solve it.  Can anyone
help?  It would really help me out a lot.

Thanks!


------------------------------

Date: 20 May 2008 15:45:32 GMT
From: jt@toerring.de (Jens Thoms Toerring)
Subject: Re: Installing Perl Module
Message-Id: <69ga0sF32qj1rU1@mid.uni-berlin.de>

amerar@iwc.net <amerar@iwc.net> wrote:
> I'm not very good with figuring this wtuff out, so I'm hoping someone
> can offer some advice.

> We need to install some ability to create line charts from Perl.  So,
> I'm trying to install DBIx::Chart.  One of the requirements is to
> install GD 1.19.

> I tried installing that, but when I get to the make command I get
> these errors:

> /usr/bin/perl GD/Image.pm.PLS GD/Image.pm
> Extracting Image.pm (with variable substitutions)
> cp GD/Polyline.pm blib/lib/GD/Polyline.pm
> cp qd.pl blib/lib/qd.pl
> cp GD/Image.pm blib/lib/GD/Image.pm
> cp GD.pm blib/lib/GD.pm
> AutoSplitting blib/lib/GD.pm (blib/lib/auto/GD)
> cp GD/Simple.pm blib/lib/GD/Simple.pm
> cp GD/Polygon.pm blib/lib/GD/Polygon.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap /usr/lib/
> perl5/5.8.5/ExtUtils/typemap -typemap typemap  GD.xs > GD.xsc && mv
> GD.xsc GD.c
> gcc -c  -I/usr/local/include -D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -
> fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -
> D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -
> march=i386 -mtune=pentium4   -DVERSION=\"2.39\" -DXS_VERSION=\"2.39\" -
> fPIC "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -
> DHAVE_JPEG -DHAVE_FT -DHAVE_XPM -DHAVE_GIF -DHAVE_PNG -DHAVE_ANIMGIF -
> DVERSION_33 -DHAVE_UNCLOSEDPOLY -DHAVE_FONTCONFIG -DHAVE_FTCIRCLE GD.c
> GD.xs: In function `XS_GD__Image_stringFT':
> GD.xs:2211: error: `gdFTEX_DISABLE_KERNING' undeclared (first use in
> this function)
> GD.xs:2211: error: (Each undeclared identifier is reported only once
> GD.xs:2211: error: for each function it appears in.)

The C compiler is complaining about some variables, functions or
defines or whatever not being found. Tou find in the README for
the GD module:

  If this module fails to compile and link, you are probably using an
  older version of libgd.  Symptoms of this problem include errors
  about functions not being recognized in the gd.h header file, and
  undefined symbols from the linker.  If you are having this type of
  error, please REMOVE all versions of libgd, gd.h from your system
  and reinstall libgd 2.0.28 or higher.  Do not contact Lincoln for
  help until you have done this.

Your symptoms look exactly like described here. So as a first step
I would do what's recommended, i.e. install the newest version of
libgd available from http://www.boutell.com/gd/.

                              Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1560
***************************************


home help back first fref pref prev next nref lref last post