[24519] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 6699 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jun 16 18:10:41 2004

Date: Wed, 16 Jun 2004 15:10:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 16 Jun 2004     Volume: 10 Number: 6699

Today's topics:
        sorting text jamasd@hotmail.com
        sorting text jamasd@hotmail.com
        sorting text jamasd@hotmail.com
    Re: sorting text <noreply@gunnar.cc>
    Re: sorting text <postmaster@castleamber.com>
    Re: sorting text <raisin@delete-this-trash.mts.net>
    Re: sorting text <postmaster@castleamber.com>
    Re: sorting text <pinyaj@rpi.edu>
    Re: sorting text <noreply@gunnar.cc>
    Re: sorting text <noreply@gunnar.cc>
    Re: SSH and Math::Pari prereq on AIX5.1 <nospam-abuse@ilyaz.org>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 16 Jun 2004 13:30:22 -0700
From: jamasd@hotmail.com
Subject: sorting text
Message-Id: <3151a273.0406161230.1be6971b@posting.google.com>

Here is a sample of my data (each column is separated by tabs):

1234123	jaesdf	ytkyk	345234
1264345	ghgfdf	ghjhg	657658
3456765	sdasdf	ytkyk	456543
1231232	assffg	werwe	123454
5447454	asdqfr	ytkyk	254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
  while ( <File> ) {
      next unless ( index($_, 'ytkyk') >= 0 );
      next unless ( index($_, 'ghjhg') >= 0 ); 
      print;
  }
  close( File );

Thank you very much.


------------------------------

Date: 16 Jun 2004 13:30:31 -0700
From: jamasd@hotmail.com
Subject: sorting text
Message-Id: <3151a273.0406161230.1e465c13@posting.google.com>

Here is a sample of my data (each column is separated by tabs):

1234123	jaesdf	ytkyk	345234
1264345	ghgfdf	ghjhg	657658
3456765	sdasdf	ytkyk	456543
1231232	assffg	werwe	123454
5447454	asdqfr	ytkyk	254364

I am interested in creating a hash with two of the elements in the
list ("ytkyk" and "ghjhg"). I would like to create a program to read
only the third colomn and print the line (row) if it contains one of
the latter items. Can anyone help me write a program. Here is what I
have so far and I would like to create a more efficient program (I am
going to use it for writing a larger program later):

open( File, '<', 'file.txt' ) or die "$!\n";
  while ( <File> ) {
      next unless ( index($_, 'ytkyk') >= 0 );
      next unless ( index($_, 'ghjhg') >= 0 ); 
      print;
  }
  close( File );

Thank you very much.


------------------------------

Date: 16 Jun 2004 13:37:39 -0700
From: jamasd@hotmail.com
Subject: sorting text
Message-Id: <3151a273.0406161237.77234f3a@posting.google.com>

Here is an example of the text I am running the program on (they are
separated by tabs):

1234123	jaesdf	ytkyk	345234
1264345	ghgfdf	ghjhg	657658
3456765	sdasdf	ytkyk	456543
1231232	assffg	werwe	123454
5447454	asdqfr	ytkyk	254364

I would like to create hash that contains "ytkyk" and "ghjhg". The
program needs to read the hash and search only the third column for
similar text. If the column contains the text, the line (row) needs to
be printed.

Here is what I have so far. This program just reads the entire thing
and prints lines that match:

open( File, '<', location of file.txt' ) or die "$!\n";
  while ( <File> ) {
      next unless ( index($_, 'ytkyk') >= 0 );
      next unless ( index($_, 'ghjhg') >= 0 );
    print;
  }
  close( File );

Thank you very much


------------------------------

Date: Wed, 16 Jun 2004 22:42:14 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: sorting text
Message-Id: <2jbpl2FvopdeU1@uni-berlin.de>

jamasd@hotmail.com wrote:
> Here is a sample of my data (each column is separated by tabs):
> 
> 1234123	jaesdf	ytkyk	345234
> 1264345	ghgfdf	ghjhg	657658
> 3456765	sdasdf	ytkyk	456543
> 1231232	assffg	werwe	123454
> 5447454	asdqfr	ytkyk	254364
> 
> I am interested in creating a hash with two of the elements in the
> list ("ytkyk" and "ghjhg"). I would like to create a program to read
> only the third colomn and print the line (row) if it contains one of
> the latter items. Can anyone help me write a program. Here is what I
> have so far and I would like to create a more efficient program (I am
> going to use it for writing a larger program later):
> 
> open( File, '<', 'file.txt' ) or die "$!\n";
>   while ( <File> ) {
>       next unless ( index($_, 'ytkyk') >= 0 );
>       next unless ( index($_, 'ghjhg') >= 0 ); 
>       print;
>   }
>   close( File );

What makes you believe that what you have is not efficient?

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Wed, 16 Jun 2004 15:50:38 -0500
From: John Bokma <postmaster@castleamber.com>
Subject: Re: sorting text
Message-Id: <40d0b2a1$0$210$58c7af7e@news.kabelfoon.nl>

Gunnar Hjalmarsson wrote:

> jamasd@hotmail.com wrote:
> 
>> Here is a sample of my data (each column is separated by tabs):
>>
>> 1234123    jaesdf    ytkyk    345234
>> 1264345    ghgfdf    ghjhg    657658
>> 3456765    sdasdf    ytkyk    456543
>> 1231232    assffg    werwe    123454
>> 5447454    asdqfr    ytkyk    254364
>>
>> I am interested in creating a hash with two of the elements in the
>> list ("ytkyk" and "ghjhg"). I would like to create a program to read
>> only the third colomn and print the line (row) if it contains one of
>> the latter items. Can anyone help me write a program. Here is what I
>> have so far and I would like to create a more efficient program (I am
>> going to use it for writing a larger program later):
>>
>> open( File, '<', 'file.txt' ) or die "$!\n";

my $filename = 'file.txt';
open my $fh, $filename or die "Can't open '$filename' for reading:$!";

>>   while ( <File> ) {

      while ( <$fh> ) {

>>       next unless ( index($_, 'ytkyk') >= 0 );
          next unless index($_, 'ytkyk');

The >= 0 test can be replaced, since it's clear it's not the first
position. Even better, (I guess) check the string at the exact position

>>       next unless ( index($_, 'ghjhg') >= 0 );       print;
>>   }
>>   close( File );

close $fh or die "Can't close '$filename' after reading: $!";

> What makes you believe that what you have is not efficient?

Maybe the OP forgot to explain the "sorting" part :-D.

-- 
John                               MexIT: http://johnbokma.com/mexit/
                            personal page:       http://johnbokma.com/
    Experienced Perl programmer available:     http://castleamber.com/
             Happy Customers: http://castleamber.com/testimonials.html


------------------------------

Date: Wed, 16 Jun 2004 15:54:17 -0500
From: Web Surfer <raisin@delete-this-trash.mts.net>
Subject: Re: sorting text
Message-Id: <MPG.1b3a6f715cbce0c9989832@news.mts.net>

[This followup was posted to comp.lang.perl.misc]

In article <3151a273.0406161230.1be6971b@posting.google.com>, 
jamasd@hotmail.com says...
> Here is a sample of my data (each column is separated by tabs):
> 
> 1234123	jaesdf	ytkyk	345234
> 1264345	ghgfdf	ghjhg	657658
> 3456765	sdasdf	ytkyk	456543
> 1231232	assffg	werwe	123454
> 5447454	asdqfr	ytkyk	254364
> 
> I am interested in creating a hash with two of the elements in the
> list ("ytkyk" and "ghjhg"). I would like to create a program to read
> only the third colomn and print the line (row) if it contains one of
> the latter items. Can anyone help me write a program. Here is what I
> have so far and I would like to create a more efficient program (I am
> going to use it for writing a larger program later):
> 
> open( File, '<', 'file.txt' ) or die "$!\n";
>   while ( <File> ) {
>       next unless ( index($_, 'ytkyk') >= 0 );
>       next unless ( index($_, 'ghjhg') >= 0 ); 
>       print;
>   }
>   close( File );
> 
> Thank you very much.
> 

### Try this untested code ###

#!/usr/bin/perl
use strict;
use warnings;

my ( $buffer , @fields , $filename , %hash1 );

$filename = "file.txt";
open(INPUT,"<$filename") or
        die("Can't open file \"$filename\" : $!\n");

%hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );

while ( $buffer = <INPUT> ) {
	chomp $buffer;
	@fields = split(/\t+/,$buffer);
	if ( 2 < @fields ) { # Ignore if less than 3 fields
		next;
	}
	unless ( exists $hash1{$fields[2]} ) {
		next;
	}
	print "$buffer\n";
}
close INPUT;


------------------------------

Date: Wed, 16 Jun 2004 16:00:22 -0500
From: John Bokma <postmaster@castleamber.com>
Subject: Re: sorting text
Message-Id: <40d0b4e9$0$203$58c7af7e@news.kabelfoon.nl>

Web Surfer wrote:

> [This followup was posted to comp.lang.perl.misc]
> 
> In article <3151a273.0406161230.1be6971b@posting.google.com>, 
> jamasd@hotmail.com says...
> 
>>Here is a sample of my data (each column is separated by tabs):
>>
>>1234123	jaesdf	ytkyk	345234

> while ( $buffer = <INPUT> ) {
> 	chomp $buffer;

why?, now you have to add back the \n in the print

> 	@fields = split(/\t+/,$buffer);
> 	if ( 2 < @fields ) { # Ignore if less than 3 fields
> 		next;

silly, the OP never specified that could happen. It are 4 fields btw, so 
I would test for inequality, not less than..
Don't see any point in putting the constant to the left, btw. Silly C 
coding convention IIRC.

-- 
John                               MexIT: http://johnbokma.com/mexit/
                            personal page:       http://johnbokma.com/
    Experienced Perl programmer available:     http://castleamber.com/
             Happy Customers: http://castleamber.com/testimonials.html


------------------------------

Date: Wed, 16 Jun 2004 17:16:47 -0400
From: Jeff 'japhy' Pinyan <pinyaj@rpi.edu>
Subject: Re: sorting text
Message-Id: <Pine.SGI.3.96.1040616171420.317727A-100000@vcmr-64.server.rpi.edu>

On Wed, 16 Jun 2004, John Bokma wrote:

>Web Surfer wrote:
>
>> 	if ( 2 < @fields ) { # Ignore if less than 3 fields
>
>silly, the OP never specified that could happen. It are 4 fields btw, so 
>I would test for inequality, not less than..

Because it was the *third* field that contained the string the OP is
searching for.  Thus, skip any line that doesn't have enough fields.

>Don't see any point in putting the constant to the left, btw. Silly C 
>coding convention IIRC.

There's nothing wrong with it.  It's not "silly".  There is a point to it.
It stops you from accidentally writing = instead of == if you mean to do a
comparison.  Compare:

  if ($foo = 2) { ... }

to

  if (2 = $foo) { ... }

The coder *meant* to write ==, but only did =.  The first one is not an
error, and the if block is reached all the time.  The second one IS an
error.

-- 
Jeff Pinyan         RPI Acacia Brother #734        RPI Acacia Corp Secretary
"And I vos head of Gestapo for ten     | Michael Palin (as Heinrich Bimmler)
 years.  Ah!  Five years!  Nein!  No!  | in: The North Minehead Bye-Election
 Oh.  Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)




------------------------------

Date: Wed, 16 Jun 2004 23:19:18 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: sorting text
Message-Id: <2jbrqjF103ca7U1@uni-berlin.de>

John Bokma wrote:
> Gunnar Hjalmarsson wrote:
>> jamasd@hotmail.com wrote:
>>> Here is a sample of my data (each column is separated by tabs):
>>> 
>>> 
>>> 1234123    jaesdf    ytkyk    345234
>>> 1264345    ghgfdf    ghjhg    657658
>>> 3456765    sdasdf    ytkyk    456543
>>> 1231232    assffg    werwe    123454
>>> 5447454    asdqfr    ytkyk    254364
>>> 
>>> I am interested in creating a hash with two of the elements in
>>> the list ("ytkyk" and "ghjhg"). I would like to create a
>>> program to read only the third colomn and print the line (row)
>>> if it contains one of the latter items. Can anyone help me
>>> write a program. Here is what I have so far and I would like to
>>> create a more efficient program (I am going to use it for
>>> writing a larger program later):

<snip>

>>>       next unless ( index($_, 'ytkyk') >= 0 );
> 
>          next unless index($_, 'ytkyk');
> 
> The >= 0 test can be replaced, since it's clear it's not the first 
> position.

No, it can't. If the string is not found in $_, index() returns -1
which is a true value.

>> What makes you believe that what you have is not efficient?
> 
> Maybe the OP forgot to explain the "sorting" part :-D.

Maybe. But it just struck me that the code will not print anything. I
would believe that this is what the OP meant to do:

     while ( <File> ) {
         print and next if index($_, 'ytkyk') >= 0;
         print and next if index($_, 'ghjhg') >= 0;
     }

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Wed, 16 Jun 2004 23:29:58 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: sorting text
Message-Id: <2jbsejFv8jppU1@uni-berlin.de>

Web Surfer wrote:
> jamasd@hotmail.com says:
>> Here is a sample of my data (each column is separated by tabs):
>> 
>> 1234123	jaesdf	ytkyk	345234
>> 1264345	ghgfdf	ghjhg	657658
>> 3456765	sdasdf	ytkyk	456543
>> 1231232	assffg	werwe	123454
>> 5447454	asdqfr	ytkyk	254364
>> 
>> I am interested in creating a hash with two of the elements in
>> the list ("ytkyk" and "ghjhg"). I would like to create a program
>> to read only the third colomn and print the line (row) if it
>> contains one of the latter items. Can anyone help me write a
>> program. Here is what I have so far and I would like to create a
>> more efficient program (I am going to use it for writing a larger
>> program later):
>> 
>> open( File, '<', 'file.txt' ) or die "$!\n";
>>   while ( <File> ) {
>>       next unless ( index($_, 'ytkyk') >= 0 );
>>       next unless ( index($_, 'ghjhg') >= 0 );
>>       print;
>>   }
>>   close( File );
> 
> ### Try this untested code ###
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> 
> my ( $buffer , @fields , $filename , %hash1 );
> 
> $filename = "file.txt";
> open(INPUT,"<$filename") or
>         die("Can't open file \"$filename\" : $!\n");
> 
> %hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );
> 
> while ( $buffer = <INPUT> ) {
> 	chomp $buffer;
> 	@fields = split(/\t+/,$buffer);
> 	if ( 2 < @fields ) { # Ignore if less than 3 fields
> 		next;
> 	}
> 	unless ( exists $hash1{$fields[2]} ) {
> 		next;
> 	}
> 	print "$buffer\n";
> }
> close INPUT;

Would a hash creation and involving the regex engine (through split())
be more efficient? What would a benchmark result in?

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Wed, 16 Jun 2004 15:34:57 +0000 (UTC)
From:  Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: SSH and Math::Pari prereq on AIX5.1
Message-Id: <cappb1$301r$1@agate.berkeley.edu>

[A complimentary Cc of this posting was sent to
W. Van Hooste
<wvanhooste@yahoo.com>], who wrote in article <495823bb.0406150153.5d7a5342@posting.google.com>:

> ==>During compilation of Math::Pari I notice that it completely
> ignores the fact i have no native C-compiler but use gcc...
> cc: unrecognized option `-qmaxmem=16384'
> cc: unrecognized option `-q32'
> cc: unrecognized option `-qlonglong'

Apparently, your Perl is misconfigured.  README/INSTALL to see how
(un)supported such a configuration is.  :-(

> Failed Test  Status Wstat Total Fail  Failed  List of failed
> -------------------------------------------------------------------------------
> t/00_Pari.t                 127    1   0.79%  109

I do not remember what 109 checks.  Can't check it for a couple of weeks too...

> ==>I did try force install and then installation of Crypt::Random
> (prereq) failed with PARI errors!

Crypt::Random was absolutely broken (w.r.t. PARI object creation)
until very recently.  Any appearance of it working was a pure
coincidence.

Hope this helps,
Ilya

P.S.  My mail is extremely unreliable in the last two weeks.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6699
***************************************


home help back first fref pref prev next nref lref last post