[31995] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3259 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jan 17 14:09:25 2011

Date: Mon, 17 Jan 2011 11:09:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 17 Jan 2011     Volume: 11 Number: 3259

Today's topics:
        Compare two extremely large lists? <j.joeyoung@gmail.com>
    Re: Compare two extremely large lists? <smallpond@juno.com>
    Re: Compare two extremely large lists? <peter@makholm.net>
    Re: Compare two extremely large lists? <glex_no-spam@qwest-spam-no.invalid>
    Re: Compare two extremely large lists? <skye.shaw@gmail.com>
    Re: Compare two extremely large lists? <anfi@onet.eu>
        LWP::Simple - relative or absolute path? <jwcarlton@gmail.com>
    Re: LWP::Simple - relative or absolute path? <RedGrittyBrick@spamweary.invalid>
    Re: LWP::Simple - relative or absolute path? <jwcarlton@gmail.com>
    Re: LWP::Simple - relative or absolute path? <hjp-usenet2@hjp.at>
    Re: LWP::Simple - relative or absolute path? <tadmc@seesig.invalid>
    Re: LWP::Simple - relative or absolute path? <sherm.pendley@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 17 Jan 2011 05:37:16 -0800 (PST)
From: Joe Young <j.joeyoung@gmail.com>
Subject: Compare two extremely large lists?
Message-Id: <56f43271-aec7-47df-a518-cdc90d632140@15g2000vbz.googlegroups.com>

I have a list of several thousands of numerical ids


and in another file I have a database dump of hundreds of thousands of
records


I need to parse the first list, and with each id select the
corresponding record from the database dump.



file1
20121
2193403
334
4343
43434
3535340
948548
34543
And so on.......



file 2
72371  more.jpg green No Friday
034     Leicester.png Yes
8213   sport.jpeg No Saturday Two Pass
2313   feline.jpg Yes Wednesday












------------------------------

Date: Mon, 17 Jan 2011 05:50:10 -0800 (PST)
From: smallpond <smallpond@juno.com>
Subject: Re: Compare two extremely large lists?
Message-Id: <af5104c5-aee0-49dd-8a0f-b9f2c8b51238@w29g2000vba.googlegroups.com>

On Jan 17, 8:37=A0am, Joe Young <j.joeyo...@gmail.com> wrote:
> I have a list of several thousands of numerical ids
>
> and in another file I have a database dump of hundreds of thousands of
> records
>
> I need to parse the first list, and with each id select the
> corresponding record from the database dump.
>
> file1
> 20121
> 2193403
> 334
> 4343
> 43434
> 3535340
> 948548
> 34543
> And so on.......
>
> file 2
> 72371 =A0more.jpg green No Friday
> 034 =A0 =A0 Leicester.png Yes
> 8213 =A0 sport.jpeg No Saturday Two Pass
> 2313 =A0 feline.jpg Yes Wednesday


Why would you not use the database for this?  That's what they're for.

You can put it all in a hash in memory using id as a key.  Several
hundred thousand short lines of text is only a few MB.


------------------------------

Date: Mon, 17 Jan 2011 14:55:22 +0100
From: Peter Makholm <peter@makholm.net>
Subject: Re: Compare two extremely large lists?
Message-Id: <87mxmz3a9h.fsf@vps1.hacking.dk>

Joe Young <j.joeyoung@gmail.com> writes:

> I have a list of several thousands of numerical ids

Several thousands is in my opinion not neccessarily extremely large
lists. I would just do the naīve thing and parse file2 into a hash and
the read file1 line by line and output the relevant data from the
hash.

Based on you examples that should be doable using a meager 1MB memory
for storing data in-memory.

//Makholm


------------------------------

Date: Mon, 17 Jan 2011 12:16:41 -0600
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: Compare two extremely large lists?
Message-Id: <4d3476f5$0$46840$815e3792@news.qwest.net>

Joe Young wrote:
> I have a list of several thousands of numerical ids
> 
> 
> and in another file I have a database dump of hundreds of thousands of
> records
> 
> 
> I need to parse the first list, and with each id select the
> corresponding record from the database dump.

open file1, for read.
while reading file1, line by line
parse the line for ID
store the ID as a key in a hash
close file1.

open file2, for read.
while reading through file2, line by line
parse the line for the id and the record information
print the record information if the id exists as a key in the file1 hash.
close file2

perldoc perlopentut



Or, insert all ids from file1 into a table and use
the database to select the record information for
all rows where the ids match.


------------------------------

Date: Mon, 17 Jan 2011 10:33:56 -0800 (PST)
From: "Skye Shaw!@#$" <skye.shaw@gmail.com>
Subject: Re: Compare two extremely large lists?
Message-Id: <6b1816fa-9ed6-4e3a-9aad-3b53efced072@c13g2000prc.googlegroups.com>

On Jan 17, 5:37=A0am, Joe Young <j.joeyo...@gmail.com> wrote:
> I have a list of several thousands of numerical ids
>
> and in another file I have a database dump of hundreds of thousands of
> records
>
> I need to parse the first list, and with each id select the
> corresponding record from the database dump.

If that's all you have to do, try using join:

Skyes-MacBook-Pro-15:~ sshaw$ sort -n file1 > sorted1  #lines should
be sorted
Skyes-MacBook-Pro-15:~ sshaw$ sort -n file2 > sorted2
Skyes-MacBook-Pro-15:~ sshaw$ join sorted1 sorted2
334 Leicester.png Yes
4343 feline.jpg Yes Wednesday

-Skye


------------------------------

Date: Mon, 17 Jan 2011 19:38:15 +0100
From: Andrzej Adam Filip <anfi@onet.eu>
Subject: Re: Compare two extremely large lists?
Message-Id: <wan5cl9dks+B1H@jeffrey.huge.strangled.net>

Joe Young <j.joeyoung@gmail.com> wrote:
> I have a list of several thousands of numerical ids
>
>
> and in another file I have a database dump of hundreds of thousands of
> records
>
>
> I need to parse the first list, and with each id select the
> corresponding record from the database dump.
>
>
>
> file1
> 20121
> 2193403
> 334
> 4343
> 43434
> 3535340
> 948548
> 34543
> And so on.......
>
>
>
> file 2
> 72371  more.jpg green No Friday
> 034     Leicester.png Yes
> 8213   sport.jpeg No Saturday Two Pass
> 2313   feline.jpg Yes Wednesday

my %Keys;
open( my $F1,'<','file1') or die;
while(<$F1>) {
  chomp; $Keys{$_}++;
}
open( my $F2,'<','file2') or die;
while(<$F2>) {
  die unless /^\d+)\s+(\S.*)$/;
  print if $Keys{$1};
}

-- 
[pl>en Andrew] Andrzej Adam Filip : anfi@onet.eu : Andrzej.Filip@gmail.com
I have a hard time being attracted to anyone who can beat me up.
  -- John McGrath, Atlanta sportswriter, on women weightlifters.


------------------------------

Date: Sun, 16 Jan 2011 23:05:30 -0800 (PST)
From: jwcarlton <jwcarlton@gmail.com>
Subject: LWP::Simple - relative or absolute path?
Message-Id: <4658e4da-c0f2-44f1-ba9c-34ddb85b3171@30g2000yql.googlegroups.com>

I'm having a server issue at the moment that's keeping me from using a
relative path with LWP::Simple. I can use an absolute path with no
problem, though. It's not a big deal; I made several security updates,
and inadvertently changed something that's causing this error.

So, here's the question. Is there a speed/server load difference
between using a relative path or an absolute path? For this purpose,
the absolute path is on the same domain, IP, and server as the script
loading it.

I would normally test it myself, but since I can't load a relative
path right now, I can't test it. I'm hoping that one of you will
already know, before I spend too much time trying to correct a server
issue that may be irrelevant.


------------------------------

Date: Mon, 17 Jan 2011 10:00:57 +0000
From: RedGrittyBrick <RedGrittyBrick@spamweary.invalid>
Subject: Re: LWP::Simple - relative or absolute path?
Message-Id: <4d341372$0$2512$db0fefd9@news.zen.co.uk>

On 17/01/2011 07:05, jwcarlton wrote:
> I'm having a server issue at the moment that's keeping me from using a
> relative path with LWP::Simple.

A *relative path*?

Are you using LWP::Simple's get($url) subroutine?

If so, the URL has to fully specify the protocol, server and the full 
document path.


> I can use an absolute path with no
> problem, though. It's not a big deal; I made several security updates,
> and inadvertently changed something that's causing this error.

Which error did you have in mind? What is the actual error message?


> So, here's the question. Is there a speed/server load difference
> between using a relative path or an absolute path? For this purpose,
> the absolute path is on the same domain, IP, and server as the script
> loading it.

HTTP requests (such as those transmitted by LWP::Simple get($url)) use 
fully specified URLs. The returned HTML may contain links that are 
relative - in which case a browser following those links would use the 
page context to construct a full URL from the relative link.

In other words, I don't understand what you are saying. Maybe you could 
elaborate a little and provide some example code and actual output?

-- 
RGB


------------------------------

Date: Mon, 17 Jan 2011 02:33:28 -0800 (PST)
From: jwcarlton <jwcarlton@gmail.com>
Subject: Re: LWP::Simple - relative or absolute path?
Message-Id: <ccb64994-7fc7-494d-b3e0-2bbb9f1b6721@q35g2000vbb.googlegroups.com>

> A *relative path*?
>
> Are you using LWP::Simple's get($url) subroutine?
>
> If so, the URL has to fully specify the protocol, server and the full
> document path.

That's correct; I'm just curious which of these would be less of a
server load (if there's a difference at all):

# relative path
$var = get("../whatever.txt");

# or absolute path
$var = get("http://www.example.com/whatever.txt");


> Which error did you have in mind? What is the actual error message?

I don't have an error message or anything, and nothing shows up in the
logs; it simply returns an empty variable. Using a relative path (like
above) worked fine until a week or so ago, when I made several
security updates to the server, so I'm guessing that one of those
updates is now preventing the file from opening in this way.

I wasn't actually using LWP::Simple on my existing website, but I'm
making several updates and changes that do use LWP::Simple. I've been
making security updates for about 10 days now, and only just yesterday
discovered that this was no longer working the same as before.

It's not really a big deal, though, and I suspect that I'm making
things better by requiring an absolute path, anyway. I just wanted to
make sure that switching my scripts to absolute paths like this would
be the same load (or preferably, less of a load), or if I should track
down whatever it was that made relative paths stop working in the
first place.


------------------------------

Date: Mon, 17 Jan 2011 12:38:47 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: LWP::Simple - relative or absolute path?
Message-Id: <slrnij8ai7.dgg.hjp-usenet2@hrunkner.hjp.at>

On 2011-01-17 10:33, jwcarlton <jwcarlton@gmail.com> wrote:
>> A *relative path*?
>>
>> Are you using LWP::Simple's get($url) subroutine?
>>
>> If so, the URL has to fully specify the protocol, server and the full
>> document path.
>
> That's correct; I'm just curious which of these would be less of a
> server load (if there's a difference at all):

The difference is simple:
>
> # relative path
> $var = get("../whatever.txt");

This doesn't work.

>
> # or absolute path
> $var = get("http://www.example.com/whatever.txt");

This works.


>> Which error did you have in mind? What is the actual error message?
>
> I don't have an error message or anything, and nothing shows up in the
> logs; it simply returns an empty variable.

That's why you shouldn't use LWP::Simple for production: It 
returns just undef on error and you are left guessing what the error
was.

> Using a relative path (like above) worked fine until a week or so ago,

I don't believe that. "../whatever.txt" doesn't contain a server name,
so get wouldn't know which server to connect to. 

It is possible that older versions of LWP::Simple returned the contents
of the file if the argument looked like a filename instead of an URI -
in that case the script might have appeared to work but really did
something different than you thought.

	hp



------------------------------

Date: Mon, 17 Jan 2011 09:04:32 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: LWP::Simple - relative or absolute path?
Message-Id: <slrnij8mgu.105.tadmc@tadbox.sbcglobal.net>

jwcarlton <jwcarlton@gmail.com> wrote:

> I'm just curious which of these would be less of a
> server load (if there's a difference at all):
>
> # relative path
> $var = get("../whatever.txt");


This never contacts any server.

So it generates a server load of zero.


> # or absolute path
> $var = get("http://www.example.com/whatever.txt");


This generates a load on the server.


Zero is less than non-zero, so the first one generate less of a 
server load.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.


------------------------------

Date: Mon, 17 Jan 2011 10:39:16 -0500
From: Sherm Pendley <sherm.pendley@gmail.com>
Subject: Re: LWP::Simple - relative or absolute path?
Message-Id: <m21v4bwndn.fsf@sherm.shermpendley.com>

jwcarlton <jwcarlton@gmail.com> writes:

> So, here's the question. Is there a speed/server load difference
> between using a relative path or an absolute path?

No. Relative paths are resolved by the client, so the server receives
requests for an absolute path in either case.

sherm--

-- 
Sherm Pendley
                                   <http://camelbones.sourceforge.net>
Cocoa Developer


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3259
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31995] in Perl-Users-Digest

Perl-Users Digest, Issue: 3259 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon Jan 17 14:09:25 2011

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jan 17 14:09:25 2011