[32932] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4209 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue May 6 21:09:31 2014

Date: Tue, 6 May 2014 18:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 6 May 2014     Volume: 11 Number: 4209

Today's topics:
    Re: grepping a list of patterns in a larger list <news@todbe.com>
    Re: grepping a list of patterns in a larger list (Tim McDaniel)
    Re: grepping a list of patterns in a larger list <gamo@telecable.es>
    Re: grepping a list of patterns in a larger list <arjenbax@googlemail.com>
    Re: grepping a list of patterns in a larger list <jurgenex@hotmail.com>
    Re: grepping a list of patterns in a larger list <jurgenex@hotmail.com>
    Re: grepping a list of patterns in a larger list <gamo@telecable.es>
    Re: grepping a list of patterns in a larger list <rweikusat@mobileactivedefense.com>
    Re: grepping a list of patterns in a larger list <manfred.lotz@arcor.de>
    Re: grepping a list of patterns in a larger list (Tim McDaniel)
    Re: grepping a list of patterns in a larger list (Tim McDaniel)
    Re: grepping a list of patterns in a larger list <rweikusat@mobileactivedefense.com>
    Re: grepping a list of patterns in a larger list (Tim McDaniel)
    Re: grepping a list of patterns in a larger list <rweikusat@mobileactivedefense.com>
        How to read from URL line-wise? <no.email@please.post>
    Re: How to read from URL line-wise? <jurgenex@hotmail.com>
    Re: How to read from URL line-wise? <hjp-usenet3@hjp.at>
    Re: How to read from URL line-wise? <no.email@please.post>
    Re: How to read from URL line-wise? <no.email@please.post>
    Re: How to read from URL line-wise? <*@eli.users.panix.com>
        slfjs <no.email@please.post>
    Re: slfjs <jimsgibson@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 05 May 2014 21:23:44 -0700
From: "$Bill" <news@todbe.com>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <lk9o4l$ri$1@dont-email.me>

On 5/5/2014 20:17, Udaykumar Kunapuli wrote:
> Hi,
>
> I have 2 lists.
>
> As an example:
> @list1 = (1234, 345638, 3535, 93756387);
> @list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux EXIT", "6384643 Linux RUN", "45234 Linux RUN", "3535 Linux DONE", "93756387 Linux DONE");
>
> I need to print out the list of elements in list2 where the first number part of the string does NOT exist in list1.
>
> I need to get a list3 from list2 where list3 looks like below
> @list3 = ("375539 Linux DONE", "6384643 Linux RUN", "45234 Linux RUN");
>
> None of the above first number parts in list3 exist in list1, which is how I want it to be.
>
> What would be the simplest way of doing this in Perl?

If list1 is reasonably small, I'd just grep list2 for each item in list1:

use strict;
use warnings;

my @list1 = (1234, 345638, 3535, 93756387);
my @list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux EXIT",
   "6384643 Linux RUN", "45234 Linux RUN", "3535 Linux DONE",
     "93756387 Linux DONE"
);

foreach my $re (@list1) {
	my @found = grep /$re/, @list2;
	print "@found\n" if @found;
}

__END__


------------------------------

Date: Tue, 6 May 2014 05:17:14 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <lk9r8q$kdn$1@reader1.panix.com>

In article <160a43d6-9d36-490b-ab8a-9f185b5c8513@googlegroups.com>,
Udaykumar Kunapuli  <ukunapul@gmail.com> wrote:
>I have 2 lists. 
>
>As an example:
>@list1 = (1234, 345638, 3535, 93756387);
>@list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux EXIT", "6384643 Linux RUN", "45234 Linux
>RUN", "3535 Linux DONE", "93756387 Linux DONE");
>
>I need to print out the list of elements in list2 where the first number part of the string does NOT
>exist in list1.
>
>I need to get a list3 from list2 where list3 looks like below
>@list3 = ("375539 Linux DONE", "6384643 Linux RUN", "45234 Linux RUN");
>
>None of the above first number parts in list3 exist in list1, which is how I want it to be. 
>
>What would be the simplest way of doing this in Perl?

The better sort of reply would note that this might be a "do my
homework for me" question and would ask what you've tried already.

So what have you tried already?  Or what have you thought about and
not known how to do?

I'll be kind enough, or unkind enough (in that, if it's a class,
someone providing you with a solution is hurting your learning), to
suggest that you consider a hash table:
- hash1, with key = each element of @list1 and value = 1 (the value
  need not be used so it doesn't matter)
Then go through each element of @list2, separate out the number at the
start of the element, and see whether it's in %hash1.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Tue, 06 May 2014 07:26:19 +0200
From: gamo <gamo@telecable.es>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <lk9rq5$1ae$1@speranza.aioe.org>

El 06/05/14 05:17, Udaykumar Kunapuli escribió:
> Hi,
>
> I have 2 lists.
>
> As an example:
> @list1 = (1234, 345638, 3535, 93756387);
> @list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux EXIT", "6384643 Linux RUN",

  "45234 Linux RUN", "3535 Linux DONE", "93756387 Linux DONE");
>
> I need to print out the list of elements in list2 where the first number part of the string does NOT exist in list1.
>
> I need to get a list3 from list2 where list3 looks like below
> @list3 = ("375539 Linux DONE", "6384643 Linux RUN", "45234 Linux RUN");
>
> None of the above first number parts in list3 exist in list1, which is how I want it to be.
>
> What would be the simplest way of doing this in Perl?
>
> Thanks,
> Uday
>

for $j (@list2){
	$j =~ /(\d+)\s/;
	$n = $1;  # extract the number
	$ok=1;
	for $i (@list1){
		if ($i == $n){
			$ok =0;
			last;
		}	
	}
	if ($ok==1){
		push @list3, $j;
	}
}			

You ask for the simplest way.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Tue, 6 May 2014 01:03:45 -0700 (PDT)
From: ilovelinux <arjenbax@googlemail.com>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <956f0dac-484b-4fb0-8095-2ae2f0d559d2@googlegroups.com>


Maybe not the simplest, but to avoid programming a nested loop: create a
regular expression of the numbers in @list1 and grep @list2 for that RE.

    my $re = join("|", map { quotemeta $_ } @list1);
    my @list3 = grep { !/$re/ } @list2;


------------------------------

Date: Tue, 06 May 2014 01:05:25 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <2g5hm9lmjbmnj8gsn2218qu489t1ah8d8g@4ax.com>

Udaykumar Kunapuli <ukunapul@gmail.com> wrote:
[Please limit your line length to ~70 to 75 characters as has been
customary in Usenet for decades]

>I have 2 lists. 
>
>As an example:
>@list1 = (1234, 345638, 3535, 93756387);
>@list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux EXIT", "6384643 Linux RUN", "45234 Linux RUN", "3535 Linux DONE", "93756387 Linux DONE");
>
>I need to print out the list of elements in list2 where the first number part of the string does NOT exist in list1.
>
>I need to get a list3 from list2 where list3 looks like below
>@list3 = ("375539 Linux DONE", "6384643 Linux RUN", "45234 Linux RUN");
>
>None of the above first number parts in list3 exist in list1, which is how I want it to be. 

Asking if an item exists in a list is a strong indication that maybe
using the "exists" function will take you a long way towards a solution.

Put your first list into a hash such that you can use "exists", then
loop over your second list, isolating the number for each element and
checking if it "exists" in the hash.

jue


------------------------------

Date: Tue, 06 May 2014 01:08:41 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <a06hm9hdjhcf7ka89ns02kq4l7pfpq0i4e@4ax.com>

gamo <gamo@telecable.es> wrote:
>El 06/05/14 05:17, Udaykumar Kunapuli escribió:
>> Hi,
>>
>> I have 2 lists.
>>
>> As an example:
>> @list1 = (1234, 345638, 3535, 93756387);
>> @list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux EXIT", "6384643 Linux RUN",
>
>  "45234 Linux RUN", "3535 Linux DONE", "93756387 Linux DONE");
>>
>> I need to print out the list of elements in list2 where the first number part of the string does NOT exist in list1.
>>
>> I need to get a list3 from list2 where list3 looks like below
>> @list3 = ("375539 Linux DONE", "6384643 Linux RUN", "45234 Linux RUN");
>>
>> None of the above first number parts in list3 exist in list1, which is how I want it to be.
>>
>> What would be the simplest way of doing this in Perl?
>>
>> Thanks,
>> Uday
>>
>
>for $j (@list2){
>	$j =~ /(\d+)\s/;
>	$n = $1;  # extract the number
>	$ok=1;
>	for $i (@list1){
>		if ($i == $n){
>			$ok =0;
>			last;
>		}	
>	}
>	if ($ok==1){
>		push @list3, $j;
>	}
>}			
>
>You ask for the simplest way.

The simplest way is to use "exists" instead of that flag in a nested
loop.

jue


------------------------------

Date: Tue, 06 May 2014 11:33:44 +0200
From: gamo <gamo@telecable.es>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <lkaa9n$vvq$1@speranza.aioe.org>

El 06/05/14 10:08, Jürgen Exner escribió:
> gamo <gamo@telecable.es> wrote:
>> El 06/05/14 05:17, Udaykumar Kunapuli escribió:
>>> Hi,
>>>
>>> I have 2 lists.
>>>
>>> As an example:
>>> @list1 = (1234, 345638, 3535, 93756387);
>>> @list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux EXIT", "6384643 Linux RUN",
>>
>>   "45234 Linux RUN", "3535 Linux DONE", "93756387 Linux DONE");
>>>
>>> I need to print out the list of elements in list2 where the first number part of the string does NOT exist in list1.
>>>
>>> I need to get a list3 from list2 where list3 looks like below
>>> @list3 = ("375539 Linux DONE", "6384643 Linux RUN", "45234 Linux RUN");
>>>
>>> None of the above first number parts in list3 exist in list1, which is how I want it to be.
>>>
>>> What would be the simplest way of doing this in Perl?
>>>
>>> Thanks,
>>> Uday
>>>
>>
>> for $j (@list2){
>> 	$j =~ /(\d+)\s/;
>> 	$n = $1;  # extract the number
>> 	$ok=1;
>> 	for $i (@list1){
>> 		if ($i == $n){
>> 			$ok =0;
>> 			last;
>> 		}	
>> 	}
>> 	if ($ok==1){
>> 		push @list3, $j;
>> 	}
>> }			
>>
>> You ask for the simplest way.
>
> The simplest way is to use "exists" instead of that flag in a nested
> loop.
>
> jue
>

Ok, let's go:

for $i (@list1){
	$h{$i}=1;
}

for $j (@list2){
	$j =~ /(\d+)\s/;
	$n = $1;
	unless (defined $h{$n}){
		push @list3, $j;
	}
}	
%h = ();


-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Tue, 06 May 2014 15:58:37 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <87mwevar8i.fsf@sable.mobileactivedefense.com>

gamo <gamo@telecable.es> writes:
> El 06/05/14 10:08, Jürgen Exner escribió:
>> gamo <gamo@telecable.es> wrote:
>>> El 06/05/14 05:17, Udaykumar Kunapuli escribió:
>>>> Hi,
>>>>
>>>> I have 2 lists.
>>>>
>>>> As an example:
>>>> @list1 = (1234, 345638, 3535, 93756387);
>>>> @list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux EXIT", "6384643 Linux RUN",
>>>
>>>   "45234 Linux RUN", "3535 Linux DONE", "93756387 Linux DONE");
>>>>
>>>> I need to print out the list of elements in list2 where the first number part of the string does NOT exist in list1.

[...]

>> The simplest way is to use "exists" instead of that flag in a nested
>> loop.
>>
>> jue
>>
>
> Ok, let's go:
>
> for $i (@list1){
> 	$h{$i}=1;
> }
>
> for $j (@list2){
> 	$j =~ /(\d+)\s/;
> 	$n = $1;
> 	unless (defined $h{$n}){
> 		push @list3, $j;
> 	}
> }	
> %h = ();

This can be simplified somewhat with map and grep:

%filter = map { $_, 1 } @list1;
@list3 = grep { /^(\S+)/, !$filter{$1}; } @list2;

The grep expression could also be written as

!$first{(/^(\S+)/)[0]}

I'm unsure if this should be considered a good or a bad idea.



------------------------------

Date: Tue, 6 May 2014 17:07:03 +0200
From: Manfred Lotz <manfred.lotz@arcor.de>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <20140506170703.74397def@arcor.com>

On Mon, 5 May 2014 20:17:05 -0700 (PDT)
Udaykumar Kunapuli <ukunapul@gmail.com> wrote:

> Hi,
> 
> I have 2 lists. 
> 
> As an example:
> @list1 = (1234, 345638, 3535, 93756387);
> @list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux
> EXIT", "6384643 Linux RUN", "45234 Linux RUN", "3535 Linux DONE",
> "93756387 Linux DONE");
> 
> I need to print out the list of elements in list2 where the first
> number part of the string does NOT exist in list1.
> 
> I need to get a list3 from list2 where list3 looks like below
> @list3 = ("375539 Linux DONE", "6384643 Linux RUN", "45234 Linux
> RUN");
> 
> None of the above first number parts in list3 exist in list1, which
> is how I want it to be. 
> 
> What would be the simplest way of doing this in Perl?
> 

I would do it this way. If @list1 is really large one should perhaps
think twice.

<---------------------------snip------------------------------>
#! /usr/bin/perl

use strict;
use warnings;

my @list1 = (1234, 345638, 3535, 93756387);
my @list2 = ("375539 Linux DONE", "1234 Linux DONE", "345638 Linux
EXIT", "6384643 Linux RUN", "45234 Linux RUN", "3535 Linux DONE",
"93756387 Linux DONE");


my %hash = map { $_ => 1 } @list1;
my @list3;

foreach my $li ( @list2 ) {
	my @words = split /\s+/,$li;
	if ( not defined $hash{"$words[0]"} ) {
		push @list3, $li;
	}
}

foreach my $li ( @list3 ) {
	print "$li\n";
}

<---------------------------snap------------------------------>




------------------------------

Date: Tue, 6 May 2014 18:44:16 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <lkbai0$2qj$1@reader1.panix.com>

In article <lkaa9n$vvq$1@speranza.aioe.org>, gamo  <gamo@telecable.es> wrote:
>for $i (@list1){
>	$h{$i}=1;
>}

People should
    use strict;
    use warnings;
and declare all variables.

People have posted two ways to convert a list to a hash in which the
indexes are the members of the list.  Here's the version I usually use:

    @h{@list1} = (1) x @list1;

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Tue, 6 May 2014 18:45:45 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <lkbakp$gtl$1@reader1.panix.com>

In article <lkbai0$2qj$1@reader1.panix.com>,
Tim McDaniel <tmcd@panix.com> wrote:
>People have posted two ways to convert a list to a hash in which the
>indexes are the members of the list.

Sorry -- two people, both using something of the form
    %hash1 = map { $_ => 1 } @list1;

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Tue, 06 May 2014 20:04:38 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <87wqdyafuh.fsf@sable.mobileactivedefense.com>

tmcd@panix.com (Tim McDaniel) writes:
> In article <lkaa9n$vvq$1@speranza.aioe.org>, gamo  <gamo@telecable.es> wrote:
>>for $i (@list1){
>>	$h{$i}=1;
>>}
>
> People should
>     use strict;
>     use warnings;
> and declare all variables.
>
> People have posted two ways to convert a list to a hash in which the
> indexes are the members of the list.  Here's the version I usually use:
>
>     @h{@list1} = (1) x @list1;

Two similar other ways would be

++$_ for @h{@list}

s///1/ for @h{@list}

If one is willing to test for 'existence' instead of truth, any
expression can be used in front of the find, eg

// for @h{@list}

or

1 for @h{@list}

Actually, any operator which modifies its operand will do for this case, eg,

++@h{@list}


------------------------------

Date: Tue, 6 May 2014 20:31:44 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <lkbgrf$kk6$1@reader1.panix.com>

In article <87wqdyafuh.fsf@sable.mobileactivedefense.com>,
Rainer Weikusat  <rweikusat@mobileactivedefense.com> wrote:
>tmcd@panix.com (Tim McDaniel) writes:
>> ... to convert a list to a hash in which the
>> indexes are the members of the list.
>
>If one is willing to test for 'existence' instead of truth, any
>expression can be used in front of the find, eg
 ...
>1 for @h{@list}
>
>Actually, any operator which modifies its operand will do for this case, eg,
>
>++@h{@list}

$ perl -e 'use strict; use warnings; use Data::Dumper; my @list = (17, 19, 23); my %h; ++@h{@list}; print Dumper(\%h), "\n"'
$VAR1 = {
    '23' => 1,
    '19' => undef,
    '17' => undef
};
$ perl -e 'use strict; use warnings; use Data::Dumper; my @list = (17, 19, 23); my %h; 0 for @h{@list}; print Dumper(\%h), "\n"'
$VAR1 = {
    '23' => undef,
    '19' => undef,
    '17' => undef
};

OK, I don't get it.  What's going on here?  I don't think it could be
autovivication eo nomine.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Tue, 06 May 2014 21:45:26 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: grepping a list of patterns in a larger list
Message-Id: <87oazaab6h.fsf@sable.mobileactivedefense.com>

tmcd@panix.com (Tim McDaniel) writes:
> Rainer Weikusat  <rweikusat@mobileactivedefense.com> wrote:
>>tmcd@panix.com (Tim McDaniel) writes:
>>> ... to convert a list to a hash in which the
>>> indexes are the members of the list.
>>
>>If one is willing to test for 'existence' instead of truth, any
>>expression can be used in front of the find, eg
> ...
>>1 for @h{@list}
>>
>>Actually, any operator which modifies its operand will do for this case, eg,
>>
>>++@h{@list}
>
> $ perl -e 'use strict; use warnings; use Data::Dumper; my @list = (17, 19, 23); my %h; ++@h{@list}; print Dumper(\%h), "\n"'
> $VAR1 = {
>     '23' => 1,
>     '19' => undef,
>     '17' => undef
> };
> $ perl -e 'use strict; use warnings; use Data::Dumper; my @list = (17, 19, 23); my %h; 0 for @h{@list}; print Dumper(\%h), "\n"'
> $VAR1 = {
>     '23' => undef,
>     '19' => undef,
>     '17' => undef
> };
>
> OK, I don't get it.  What's going on here?  I don't think it could be
> autovivication eo nomine.

'Autovivification because of lvalue context', see pp_hslice in pp.c.



------------------------------

Date: Tue, 6 May 2014 21:23:14 +0000 (UTC)
From: kj <no.email@please.post>
Subject: How to read from URL line-wise?
Message-Id: <lkbjs1$qbn$1@reader1.panix.com>


First, profuse apologies for the original posting of this query,
which had gibberish ("sdfsdfsf") in the Subject: line.  I goofed.

---

I'm looking for the "moral equivalent" of the (fictitious) `openremote`
function below:

    my $handle = openremote( 'http://some.domain.org/huge.tsv' ) or die $!;
    while ( <$handle> ) {
        chomp;
        # etc.
        # do stuff with $_
    }
    close $handle;

IOW, I'm looking for a way to open a read handle to a remote file
so that I can read from it *line-by-line*.  (Typically this file
will be larger than I want to read all at once into memory.  IOW,
I want to avoid solutions based on stuffing the value returned into
LWP::Simple::get into an IO::String.)

I'm sure this is really basic stuff, but I have not been able to
find it after a lot of searching.

TIA!

kj


------------------------------

Date: Tue, 06 May 2014 15:12:02 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: How to read from URL line-wise?
Message-Id: <obnim95vr9g3e2n4e652d2c43tejuu8ndv@4ax.com>

kj <no.email@please.post> wrote:
>I'm looking for the "moral equivalent" of the (fictitious) `openremote`
>function below:
>
>    my $handle = openremote( 'http://some.domain.org/huge.tsv' ) or die $!;
[...]
>IOW, I'm looking for a way to open a read handle to a remote file
>so that I can read from it *line-by-line*.  (Typically this file
>will be larger than I want to read all at once into memory.  IOW,
>I want to avoid solutions based on stuffing the value returned into
>LWP::Simple::get into an IO::String.)
>
>I'm sure this is really basic stuff, but I have not been able to
>find it after a lot of searching.

I very much doubt that HTTP supports such a line-by-line retrieval. And
if line-by-line is not supported by the underlying protocol, then at the
very best you can only hope for a local simulation, but at that point
the resource has been retrived in full.already. 

jue


------------------------------

Date: Wed, 7 May 2014 01:01:26 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: How to read from URL line-wise?
Message-Id: <slrnlmiqe6.8hl.hjp-usenet3@hrunkner.hjp.at>

On 2014-05-06 22:12, Jürgen Exner <jurgenex@hotmail.com> wrote:
> kj <no.email@please.post> wrote:
>>I'm looking for the "moral equivalent" of the (fictitious) `openremote`
>>function below:
>>
>>    my $handle = openremote( 'http://some.domain.org/huge.tsv' ) or die $!;
> [...]
>>IOW, I'm looking for a way to open a read handle to a remote file
>>so that I can read from it *line-by-line*.  (Typically this file
>>will be larger than I want to read all at once into memory.  IOW,
>>I want to avoid solutions based on stuffing the value returned into
>>LWP::Simple::get into an IO::String.)
>>
>>I'm sure this is really basic stuff, but I have not been able to
>>find it after a lot of searching.
>
> I very much doubt that HTTP supports such a line-by-line retrieval.

Not line-by-line (files don't support that either on most platforms),
but byte-ranges are supported by HTTP/1.1. Whether the server supports
it for the file is another question, but most servers do for files
stored in the file system (but not dynamically created content).

But I associate "line-by-line" with sequential access, not random
access, and you are of course always free to process the response in
little chunks as you receive it (see "Handlers in LWP::UserAgent for a
standard way of doing this).

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Tue, 6 May 2014 23:02:48 +0000 (UTC)
From: kj <no.email@please.post>
Subject: Re: How to read from URL line-wise?
Message-Id: <lkbpmo$fhc$1@reader1.panix.com>

In <obnim95vr9g3e2n4e652d2c43tejuu8ndv@4ax.com> Jürgen Exner <jurgenex@hotmail.com> writes:

>kj <no.email@please.post> wrote:
>>I'm looking for the "moral equivalent" of the (fictitious) `openremote`
>>function below:
>>
>>    my $handle = openremote( 'http://some.domain.org/huge.tsv' ) or die $!;
>[...]
>>IOW, I'm looking for a way to open a read handle to a remote file
>>so that I can read from it *line-by-line*.  (Typically this file
>>will be larger than I want to read all at once into memory.  IOW,
>>I want to avoid solutions based on stuffing the value returned into
>>LWP::Simple::get into an IO::String.)
>>
>>I'm sure this is really basic stuff, but I have not been able to
>>find it after a lot of searching.

>I very much doubt that HTTP supports such a line-by-line retrieval. And
>if line-by-line is not supported by the underlying protocol, then at the
>very best you can only hope for a local simulation, but at that point
>the resource has been retrived in full.already. 

I was under the impression that HTTP supported incremental downloads
(some fixed number of bytes at a time); if so, a client could easily
implement a line-by-line interface to that stream...  But now I
think I need to do some homework and review HTTP.

Thanks!

kj



------------------------------

Date: Tue, 6 May 2014 23:16:27 +0000 (UTC)
From: kj <no.email@please.post>
Subject: Re: How to read from URL line-wise?
Message-Id: <lkbqgb$cn4$1@reader1.panix.com>

In <slrnlmiqe6.8hl.hjp-usenet3@hrunkner.hjp.at> "Peter J. Holzer" <hjp-usenet3@hjp.at> writes:

>...you are of course always free to process the response in
>little chunks as you receive it (see "Handlers in LWP::UserAgent for a
>standard way of doing this).

Thanks for this pointer!  This approaches what I'm after.  I'd
hoped to find a package (in some obscure corner of LWP) that already
implemented this line-oriented interface to the stream, but I guess
I'll have to write it myself.  (Conceptually it's not a hard thing
to do, but IME *robust* implementations of even simple tasks like
this one can take a lot more work than one would expect.)

kj



------------------------------

Date: Tue, 6 May 2014 23:20:00 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: How to read from URL line-wise?
Message-Id: <eli$1405061828@qz.little-neck.ny.us>

In comp.lang.perl.misc, Jürgen Exner  <jurgenex@hotmail.com> wrote:
> kj <no.email@please.post> wrote:
>> IOW, I'm looking for a way to open a read handle to a remote file
>> so that I can read from it *line-by-line*.  (Typically this file
>> will be larger than I want to read all at once into memory.  IOW,
>> I want to avoid solutions based on stuffing the value returned into
> I very much doubt that HTTP supports such a line-by-line retrieval. And
> if line-by-line is not supported by the underlying protocol, then at the
> very best you can only hope for a local simulation, but at that point
> the resource has been retrived in full.already. 

HTTP does not support "line-by-line" retrieval. You can get stuff in
chunks smaller than the whole file, however, and use a small buffer
to emulate some sort of record based read. I've written HTTP read/write
code from bare sockets in Perl, it's certainly doable, but it's a
project you need a lot of time testing with: there can be a lot of
variation in the way things are returned depending on server
configuration.

Things to consider:

At a high level HTTP/1.1 has a "Range:" header that can be used to
request a fragment of a large resource. Most servers support returning
just a portion of a resource IF that resource is a static file on disk.
If it is a dynamic page, YMMV.

In HTTP/1.0, you don't get Range:, but you also don't have to deal with
"Tranfer-Encoding: chunked" (more in a bit). You either have a
"Content-Length" header specifying the whole length of the result or you
work blind. In either scenario, you sysread off the socket until you get
your record separator or zero length read. In theory, you could read one
byte at a time and just let the kernel handle your buffering. That might
be slow, and can induce confusion between "bytes" and "characters" when
dealing with 21st century character encoding awareness.

In HTTP/1.1, besides the Content-Length or flying blind option, the
server can do it's own break-into-useful-size bits. This results in a
Tranfer-Encoding: chunked header on the response, and interleaved chunk
sizes in the body of the response. Again you can use sysread. This sort
of response is very common for dynamic content like CGI or compressed-on
the-fly pages. (Chunk sizes I've observed often look like the output of
compressing a page in 4096 byte blocks and then sending the output as
a HTTP chunk.) Unless you understand the chunking protocol, the chunk
sizes will corrupt the body content. 

The server will probably never compress-on-the-fly unless you add the
appropriate "Accept-Encoding" header, but if you are dealing with
truely large text files (as implied by the question), you do the network
a favor allowing the server to compress them. Then, of course, you need
to do chunked decompression, too. 

And there is a whole level of insanity to doing SSL stuff on your own.
I punted that in my own code by using Net::SSLeay::Handle, which shows
you a read HTTPS line-by-line example in the docs. An example that does
not handle any of the complexity or subtlety of real HTTP/HTTPS. (In
particular, the lack of a Host: header breaks a large part of the modern
web.)

Elijah
------
would not be surprised if there is easy to find code for the OP's problem


------------------------------

Date: Tue, 6 May 2014 21:13:12 +0000 (UTC)
From: kj <no.email@please.post>
Subject: slfjs
Message-Id: <lkbj98$m1h$1@reader1.panix.com>



I'm looking for the "moral equivalent" of the (fictitious) `openremote`
function below:

    my $handle = openremote( 'http://some.domain.org/huge.tsv' ) or die $!;
    while ( <$handle> ) {
        chomp;
        # etc.
        # do stuff with $_
    }
    close $handle;

IOW, I'm looking for a way to open a read handle to a remote file
so that I can read from it *line-by-line*.  (Typically this file
will be larger than I want to read all at once into memory.  IOW,
I want to avoid solutions based on stuffing the value returned into
LWP::Simple::get into an IO::String.)

I'm sure this is really basic stuff, but I have not been able to
find it after a lot of searching.

TIA!

kj


------------------------------

Date: Tue, 06 May 2014 17:19:56 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: slfjs
Message-Id: <060520141719568174%jimsgibson@gmail.com>

In article <lkbj98$m1h$1@reader1.panix.com>, kj <no.email@please.post>
wrote:

> I'm looking for the "moral equivalent" of the (fictitious) `openremote`
> function below:
> 
>     my $handle = openremote( 'http://some.domain.org/huge.tsv' ) or die $!;
>     while ( <$handle> ) {
>         chomp;
>         # etc.
>         # do stuff with $_
>     }
>     close $handle;
> 
> IOW, I'm looking for a way to open a read handle to a remote file
> so that I can read from it *line-by-line*.  (Typically this file
> will be larger than I want to read all at once into memory.  IOW,
> I want to avoid solutions based on stuffing the value returned into
> LWP::Simple::get into an IO::String.)
> 
> I'm sure this is really basic stuff, but I have not been able to
> find it after a lot of searching.

You cannot read a file on a remote system line-by-line, barring NFS or
other file distribution system that mimics a local file system from a
remote system. The normal ways to read such a file are:

1. Download the remote file to your local system and read the copy.

2. Open a socket to the remote system and read the socket (but not
exactly a line-oriented protocol). This assume a cooperative socket
server on the remote system that will read the file and provide its
contents to you over the connected socket.

FTP and HTTP are two protocols that provide these services. They both
can be used from a Perl program. See the Net::FTP and LWP::UserAgent
modules. They both assume a corresponding service on the remote
computer.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4209
***************************************


home help back first fref pref prev next nref lref last post