[23218] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5439 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Sep 4 11:05:54 2003

Date: Thu, 4 Sep 2003 08:05:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 4 Sep 2003     Volume: 10 Number: 5439

Today's topics:
    Re: [newbie] list elements <postmaster@castleamber.com>
    Re: expression specific search and replace (qanda)
    Re: file upload script (JR)
    Re: Order of evaluation of expressions <abigail@abigail.nl>
        Output from external command under Win2K (Neil Palmer)
    Re: Perl Debugger, Windows NT and BSODs (Bohne)
        Small string problem <kraut@ukrmdotnet.cut>
    Re: Small string problem <shawn@magma.ca>
    Re: Small string problem <laocoon@fastmail.fm>
    Re: Text File Processing <raisin@SPAM-KILLER.mts.net>
    Re: Text File Processing <postmaster@castleamber.com>
    Re: Text File Processing (Tad McClellan)
    Re: What ever happened to comp.lang.perl ? <tcurrey@no.no.no.i.said.no>
    Re: What ever happened to comp.lang.perl ? <tcurrey@no.no.no.i.said.no>
        Why does perl think there's a regex? <no_thanks@bms.umist.ac.uk>
    Re: Why does perl think there's a regex? <abigail@abigail.nl>
    Re: Why does perl think there's a regex? <bharnish@technologist.com>
    Re:  <bwalton@rochester.rr.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 04 Sep 2003 16:33:49 +0200
From: John Bokma <postmaster@castleamber.com>
Subject: Re: [newbie] list elements
Message-Id: <3f574db1$0$186$58c7af7e@news.kabelfoon.nl>

Vlad Tepes wrote:

> Guest1 <guest1@deskpro192.internal.sanger.ac.uk> wrote:
> 
> 
>>If I have an array @a = [ 1 2 3 4 2 3 ];
>>how I can get just the elements that are not the same?
> 
> 
> I'll do as Gunnar, and assume you mean 
> 
>     @a = qw( 1 2 3 4 2 3 );
> 
>>and what is the best way to store these values?
> 
> 
> You could perhaps use the fact that hashes have unique keys.
> Try building a hash with the values from the array:
> 
>     map $hash{$_} = 1, @a; # this will be more efficient in later perls..

how about

	my %hash;
	@hash{@a} = ( "" ) x @a;

-- 
Kind regards,       feel free to mail: mail(at)johnbokma.com (or reply)
                     virtual home: http://johnbokma.com/  ICQ: 218175426
John                web site hints: http://johnbokma.com/websitedesign/



------------------------------

Date: 4 Sep 2003 07:45:36 -0700
From: fumail@freeuk.com (qanda)
Subject: Re: expression specific search and replace
Message-Id: <62b4710f.0309040645.6ca1abcf@posting.google.com>

Sorry for not being precise.

The expression I gave was an (obviously wrong) guess.

The pattern I want to look for is an uppercase letter C, followed by a
forward slash, followed by alphanumeric characters, the pattern can
start at the beginning of a field or in the middle and it ends at
whitespace following alphanumeric characters of the end of field
character.  The pattern can be in any field position such as field 1,
field 3, field n, etc and can be in 0 or more fields in one record.

The data itself is spread over 50,000 to 500,000 files, each
containing several hundred thousand records.  These could contain say
100,000 unique strings that match this pattern, for example
C/abc
C/Abc
C/1DE

Every occurance of C/abc should be replaced by string_1, every
occurance of
C/Abc should be replaced by string_2, etc.

I think that I want the following (in pseudocode) but would appreciate
an example of this
or something better considering the performance running against
millions of
records.  I would assume it makes sense to use a hash to store each
pattern
found and then a search and replace with a counter ...

    for all files matching file specification
        open a file
        read a record
        for each PATTERN in record
            if PATTERN exists in the pattern hash
                replace the part that matched with
string_patternNumber
            else
                add PATTERN to hash
            endif
        endfor
    endfor

This may be nonsense so feel free to beat me up for it!  However I
hope it
explains the problem a bit better.

Thanks.


------------------------------

Date: 4 Sep 2003 06:42:58 -0700
From: jrolandumuc@yahoo.com (JR)
Subject: Re: file upload script
Message-Id: <b386d54b.0309040542.7b94a380@posting.google.com>

bbass@hotmail.com (boris bass) wrote in message news:<65fcb898.0309032337.27d3a45f@posting.google.com>...
> the following script
> 
> --------------------------
> 
> #!/usr/local/bin/perl
> 
> ######use lib $0 =~ m#(.+)[/\\]#;
> use CGI qw(:standard);
> ######use CGI::Carp qw/fatalsToBrowser/;
> 
> print header();
> print start_html("File Upload Example");
> print strong("Version "),$CGI::VERSION,p;
> 
> print h1("File Upload Example"),
>     '',
>     strong(""),
>     p,
>     '',cite(''),'';
> 
> @types = ('count lines','count words','count characters');
> 
> # Start a multipart form.
> print start_multipart_form(),
>     "Enter the file to process:",
>     filefield('filename','',45),
>     br,
>     checkbox_group('count',\@types,\@types),
>     p,
>     reset,submit('submit','Process File'),
>     endform;
> 
> # Process the form if there is a file name entered
> if ($file = param('filename')) 
>   {
>     $filename = "Cabbage.gif";
>     $UPLOAD_PATH = "";
>     $UPLOAD_FILE = $UPLOAD_PATH.$filename;
> 
>     print "FILE NAME: $UPLOAD_PATH$filename<br>";
>     open (OUTFILE,">$UPLOAD_FILE") or die "Could not open $UPLOAD_FILE
> - $!\n";
>     binmode(OUTFILE);
> 
>     while(read($file,$buffer,1024))
>       {
>         print OUTFILE $buffer;
>       }
>    print "Finished - Filname: $filename, Buffer: $buffer<BR><BR>";
>    close OUTFILE;
> 
> }
> 
> print end_html;
>                         
> ---------------------------------
> 
> is not working for me; the form is printed ok, but param() does not
> return a filename.
> 
> what could possibly be wrong?
> 
> if somebody could post a script that works for them, it would be
> greatly appreciated

### I found this upload script on the web several years ago.  I don't
know
### if it's the best solution, but I do know that it's worked for over
### 500 file uploads, without a single error.  Good luck.  

#!/usr/bin/perl

$thisurl = $ENV{'SERVER_URL'}.$ENV{'SCRIPT_NAME'};  
$upload_dir = 'yourdir'; # location for uploaded files

main(); 

sub main {
   read_net_input();  
   if( $GLOBAL{'UPLOAD'} ) { handle_upload(); }
   elsif( !$ENV{'PATH_INFO'} || $ENV{'PATH_INFO'} eq '/' ) {
show_dir_content(); }
   else { start_download( $ENV{'PATH_INFO'} ); }
} 

sub print_header {
   print "Content-Type: text/html\n\n";   
}

sub urldecode {
   local($in) = @_; 
   local($i, @input_list); 
   @input_list = split(/&/,$in);
   foreach $i (@input_list) {
      $i =~ s/\+/ /g;      # Convert plus's to spaces
      # Convert %XX from hex numbers to alphanumeric
      $i =~ s/%(..)/pack("c",hex($1))/ge;
      # Split into key and value.
      $loc = index($i,"=");
      $key = substr($i,0,$loc);
      $val = substr($i,$loc+1);
      $GLOBAL{$key} = $val;
   }
}

sub read_net_input {
   local ($i, $loc, $key, $val, $input);
   local($f,$header, $header_body, $len, $buf); 
   if ($ENV{'REQUEST_METHOD'} eq "GET")
      { $input = $ENV{'QUERY_STRING'}; }
      elsif ($ENV{'REQUEST_METHOD'} eq "POST")
      {  
          $len = 0;
          $input = ''; 
          while( $len != $ENV{'CONTENT_LENGTH'} ) {
             $buf = ''; 
             $len += sysread(STDIN, $buf, $ENV{'CONTENT_LENGTH'});
             $input .= $buf; 
         }
      }
   if( $ENV{'CONTENT_TYPE'} =~ /multipart\/form-data; boundary=(.+)$/
) {
      $boundary = '--'.$1;  # please refer to RFC1867 
       @list = split(/$boundary/, $input); 
       $header_body = $list[1]; 
       $header_body =~ /\r\n\r\n|\n\n/; # separate header and body 
       $header = $`;        # front part
       $body   = $';        # rear part
       $body =~ s/\r\n$//;  # the last \r\n was put in by Netscape
       $GLOBAL{'FILE_CONTENT'} = $body;  
       $header =~ /filename=\"(.+)\"/; 
       $GLOBAL{'FILE_NAME'} = $1; 
       $GLOBAL{'FILE_NAME'} =~ s/\"//g; # remove "s
       $GLOBAL{'FILE_NAME'} =~ s/\s//g; # make sure no space(include
\n, \r..) in the file name
       for( $i=2; $list[$i]; $i++) { 
          $list[$i] =~ s/^.+name=$//; 
          $list[$i] =~ /\"(\w+)\"/; 
          $GLOBAL{$1} = $'; 
      }
       return 1; 
   }

    urldecode($input); 
    1;
}

sub read_file {
   local($fname) = @_;
   local($content);
   open(FILE, "<$fname") || return '';
   while(<FILE>)
   {
      $content .= $_;
   }
   close(FILE);
   $content; 
}

sub read_dir {
   local($target_dir) = @_ ; 
   local($filename, $dir_content); 
   return 0 if( !$target_dir ); 
   opendir(DIR, $target_dir) || return 0; 
   $target_dir =~ s/^\.\///;        # remove ./ 
   $target_dir =~ /(.+)\/(.+)\/$/;  # find out upper level 
   $GLOBAL{'UP_LEVEL'} = $1;        # save upper level as a global
   if( $target_dir ) {
      $dir_content = "..back\n\n\n";
   }
   while($filename = readdir(DIR)) {
      if( $filename =~ /^\.|^\#|~$/ ) { next; } # skip hidden files
         $dir_content .= "$target_dir$filename\n"; 
      }
      closedir(DIR);
      $dir_content;
} 

sub format_html_output {
   local($content) = @_; 
   local(@filelist, $formated_content, $up_level); 
   return 0 if (!$content); 
   @filelist = split(/\n/, $content); 
   $foo = 0;
   $formated_content = "<CENTER><TABLE cellspacing=5 cellpadding=5
border=1>\n";
   foreach $f (@filelist) {
      if( $f eq '..back' ) { 
         $up_level =  $GLOBAL{'UP_LEVEL'}; 
         $formated_content = "<a
href=".$thisurl.'/'.$up_level.">$f</a><br>\n<P><CENTER><TABLE
cellspacing=5 cellpadding=5 border=1>\n";
         next;
      }
      if( !$f ) {
         next;
      }
      if( -d $f ) {
         $f = "$f/";
      }
         $foo++;
         if ($foo eq 5) {
            $formated_content .= '<TD><a
href='.$thisurl.'/'.$f.">$f</a><TR>\n";
            $foo = 0;
         }
         else {
            $formated_content .= '<TD><a
href='.$thisurl.'/'.$f.">$f</a>\n";
         }
      }
      $formated_content .= "</TABLE></CENTER>\n";
      $formated_content; 
}

sub show_dir_content {
   local($dir) = @_; 
   local($files, $f_files); 
   $dir = './' if (!$dir); # default to cgi dir
   $files = read_dir($dir); 
   $f_files = format_html_output($files); 
   print_header(); 
   print "
<HTML>
   <HEAD>
      <TITLE>File UpLoad/DownLoad</TITLE>
   </HEAD>
   <BODY BGCOLOR=\#FFFFFF >
   <CENTER>
   <H2>File Upload Screen</H2>
      <FORM METHOD=\"POST\"  ENCTYPE=multipart/form-data >
         <TABLE>
            <TR>
               <TD WIDTH=80%>File Name: <INPUT TYPE=\"file\"
NAME=\"file\"  SIZE=50  >
               </TD>
            </TR>
         </TABLE>
         <TABLE>
            <TR>
               <TD WIDTH=20%><INPUT TYPE=submit NAME=UPLOAD
VALUE=Upload >
               </TD>
            <TR>
         </TABLE>
         <INPUT TYPE=HIDDEN NAME=CURRENT_DIR VALUE=\"$dir\">
      </FORM>
   <font size=-1>Default Directory: $upload_dir</font></center>
<HR>
<I>
<P>\&nbsp\;</P>
</I>
</FONT>
</BODY>
</HTML>
"; 
   exit;  
}

sub show_file_not_found {
   print_header(); 
   print "<TITLE>Not Found</TITLE><H1>Not Found</H1> The requested
object does not exist on this server. The link you followed is either
outdated, inaccurate, or the server has been instructed not to let you
have it. Connection closed by foreign host.\n"
; 
   exit;  
} 
 
sub show_upload_failed {
   local($reason) = @_; 
   print_header(); 
   print "<TITLE>Upload Failed</TITLE><H1>Upload Failed</H1> The
requested object was not uploaded to the server. <br> Reason :
$reason. The server may have decided not let you write to the
directory specified. Please contact the web master for this prob
lem. Connection closed by foreign host.\n"; 
   exit;  
}
 
sub show_upload_success {
   local($uploaded_file) = @_; 
   local(@status_list) ; 
   $file_stats = `ls -la $uploaded_file`; 
   @status_list = split(/\s+/,  $file_stats); # bug fix in v00.01 
   print_header(); 
   print "
<HTML>
   <HEAD>
      <TITLE>File UpLoaded</TITLE>
   </HEAD>
   <BODY BGCOLOR=\#FFFFFF >
      <H2><center>File Transfer Successful</center></H2> 
      <table align = center width = 35% cell padding = 10 border = 6>
         <tr>
            <td>
               <blockquote>
               <br />
               Remote File Name : <FONT COLOR=\#FF0000>
$GLOBAL{'FILE_NAME'} </FONT>
               File Name : $filename   
               Location  : $upload_dir 
               File Size : $status_list[4]
               <br />
               Local Time: $status_list[5] $status_list[6]
$status_list[7]
               <br />   
               <a href=\"$ENV{'SCRIPT_NAME'}\"> Back </a> 
               </blockquote>
            </td>
         </tr>
      </table>
      <br />
         <center> 
            <form action = 'myform.pl' method = 'get'>
               <input type = 'hidden' name = '$filename'>
               <input type = 'Submit' value = 'Preprocess this
file...'>
            </form>
         </center>
   </BODY>
</HTML>
"; 
   exit; 
}
 
sub handle_upload {
   if( !$GLOBAL{'FILE_NAME'} ) { show_file_not_found(); } 
      $filename   = $GLOBAL{'FILE_NAME'}; 
      $filename =~ s/.+\\([^\\]+)$|.+\/([^\/]+)$/\1/;     
      if( $GLOBAL{'UPLOAD_DIR'} =~ /CURRENT/ ) { # change upload dir
to current
         $GLOBAL{'CURRENT_DIR'} =~ s/\s//g; 
         $upload_dir = $GLOBAL{'CURRENT_DIR'}; 
      }
   $write_file = $upload_dir.$filename; 
   open(ULFD,">$write_file")  || show_upload_failed("$write_file 
$!");
   print ULFD $GLOBAL{'FILE_CONTENT'}; 
   close(ULFD); 
   show_upload_success($write_file); 
   1;
}


------------------------------

Date: 04 Sep 2003 14:01:12 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Order of evaluation of expressions
Message-Id: <slrnblehd7.fa9.abigail@alexandra.abigail.nl>

Anno Siegel (anno4000@lublin.zrz.tu-berlin.de) wrote on MMMDCLVI
September MCMXCIII in <URL:news:bj7ace$kha$1@mamenchi.zrz.TU-Berlin.DE>:
!!  Mark Jason Dominus <mjd@plover.com> wrote in comp.lang.perl.misc:
!! > In article <slrnbk10md.srs.abigail@alexandra.abigail.nl>,
!! > Abigail  <abigail@abigail.nl> wrote:
!! > >Where is its order of evaluation documented? Where in the documentation
!! > >does it say that:
!! > 
!! > Yes, that's what I would like to know.
!! > Or, if it doesn't say that, I would like to know that it doesn't.
!! > 
!! > Does anyone have any actual facts?
!! > 
!! > I wasn't able to find anything about it in the manuals, but the
!! > manuals are pretty badly organized on basic matters like this, so I'm
!! > not sure I was looking in the right places.
!!  
!!  Like you, I was never able to find a general commitment in the docs.
!!  While Perl documentation is huge, and a moving target, I think it's
!!  safe to say that it is silent about the point.  I mean, *someone*
!!  would have found it by now :)
!!  
!!  Only some (few) operators are described individually as having left-right
!!  evaluation order.  Some that come to mind are the short-circuiting boolean
!!  operators, the list- and comma operators (both ","), and (I think)
!!  assignment ("="), though I'm not entirely sure of the latter.

Where in the documentation do you read that about comma in list context?
Perlop says:

       Comma Operator

       Binary "," is the comma operator.  In scalar context it
       evaluates its left argument, throws that value away, then
       evaluates its right argument and returns that value.  This
       is just like C's comma operator.

       In list context, it's just the list argument separator,
       and inserts both its arguments into the list.

which is explicite for the scalar case, but doesn't say anything about
evaluation order in list context. Elsewhere in the same manual page,
it is written:

       In the absence of parentheses, the precedence of list
       operators such as "print", "sort", or "chmod" is either
       very high or very low depending on whether you are looking
       at the left side or the right side of the operator.  For
       example, in

           @ary = (1, 3, sort 4, 2);
           print @ary;         # prints 1324

       the commas on the right of the sort are evaluated before
       the sort, but the commas on the left are evaluated after.

which suggest that if there's an evaluation order, it's from right
to left!


Abigail
-- 
perl -we '$@="\145\143\150\157\040\042\112\165\163\164\040\141\156\157\164".
             "\150\145\162\040\120\145\162\154\040\110\141\143\153\145\162".
             "\042\040\076\040\057\144\145\166\057\164\164\171";`$@`'


------------------------------

Date: 4 Sep 2003 07:33:15 -0700
From: perl@fdisk.org (Neil Palmer)
Subject: Output from external command under Win2K
Message-Id: <5e8ce568.0309040633.6b75ecd3@posting.google.com>

I'm working on a script that makes a call to an external program that
produces a bunch of postscript and PDF files.  The external program
normally issues a bunch of informational messages to STDOUT, and I
want to capture them and display them from the script.  That should be
easily enough done:

#!c:/perl/bin/perl.exe -w

$| = 1;
my $execute = 'C:/Progra~1/Appdir/app.exe';
my $param1 = '-ID=customerID';
my $param2 = '-format=formattingcodes';
my $command = "$execute $param1 $param2";

open (PIPE, "$command |") || die "Cannot pipe command\n";
while (my $line = <PIPE>) {
        print "$line\n";
}
close(PIPE);

#process newly created PS and PDF here

However, depending on what exactly we're producing from a given run,
the external command can take up to an hour to run.  And from what
I've seen, the output doesn't get sent to the screen until the command
has finished running.

What I want to do is have any output that the external app might
produce be displayed as it is produced, rather than be held until
everything is finished, to serve as a visual feedback that the script
is in fact running as it should.  I don't care about STDERR, only
STDOUT.

Any suggestions?  Any other information I can provide?

ActiveState perl v5.6.1, build 631, running under Win2K


------------------------------

Date: 4 Sep 2003 06:23:34 -0700
From: simjesse@aol.com (Bohne)
Subject: Re: Perl Debugger, Windows NT and BSODs
Message-Id: <bfedec4.0309040523.7bdcfde0@posting.google.com>

> Does it happen on every machine you've tried?  Have you considered completely
> rebuilding the machine?

--> it happened on my and my collegues NT machines, however, not on
the WIN XP machines I got!

> So, I fear you're alone.  You have much sympathy, but I doubt you'll find
> an easy answer in the newsgroup.  If you do figure it out, please let us
> (and the archives) know.

--> in the meantime I upgraded Active Perl to 5.8.0 and have had only
one system crash since (which is a major improvement).
Even so, I am not holding my breath until I will have tested this for
some weeks...

Until then, thanks for you sympathy, that is much nicer than someone
taking the p... (not looking at anyone in particular!)


------------------------------

Date: Thu, 4 Sep 2003 15:39:35 +0100
From: "PeterT" <kraut@ukrmdotnet.cut>
Subject: Small string problem
Message-Id: <bj7ir8$gq$1@news.ox.ac.uk>

I'm trying to extract the numerical part and the letter part of a short
string.

i.e.

$mix = "6KL"  -->  $num = "6"  and $let = "KL"

I can't find a way to do it with split, or substr

anybody can give me a clue? ;-)


-- 
petert




------------------------------

Date: Thu, 04 Sep 2003 10:51:55 -0400
From: Shawn Corey <shawn@magma.ca>
Subject: Re: Small string problem
Message-Id: <Gb2cnZ3BbcL6zMqiXTWJkA@magma.ca>

PeterT wrote:

> I'm trying to extract the numerical part and the letter part of a short
> string.
> 
> i.e.
> 
> $mix = "6KL"  -->  $num = "6"  and $let = "KL"
> 
> I can't find a way to do it with split, or substr
> 
> anybody can give me a clue? ;-)
> 
> 

$mix =~ /(\d+)([A-Za-z]+)/;
$num = $1;
$let = $2;



------------------------------

Date: Thu, 4 Sep 2003 17:01:29 +0200
From: Lao Coon <laocoon@fastmail.fm>
Subject: Re: Small string problem
Message-Id: <bj7k49$i4b$02$1@news.t-online.com>

"PeterT" <kraut@ukrmdotnet.cut> wrote in news:bj7ir8$gq$1@news.ox.ac.uk:

> I'm trying to extract the numerical part and the letter part of a short
> string.
> 
> i.e.
> 
> $mix = "6KL"  -->  $num = "6"  and $let = "KL"
> 
> I can't find a way to do it with split, or substr
> 
> anybody can give me a clue? ;-)

Read  perldoc perlre

E.g.

my ($num,$let) = $mix =~ / ( \d+ ) ( [[:alpha:]]+ ) /x; 



------------------------------

Date: Thu, 4 Sep 2003 08:12:52 -0500
From: Barry Kimelman <raisin@SPAM-KILLER.mts.net>
Subject: Re: Text File Processing
Message-Id: <MPG.19c0f6528f3fea2d9897f8@news.mts.net>

[This followup was posted to comp.lang.perl.misc]

In article <7a78a207.0309031827.76b3f46d@posting.google.com>, Greg 
Carlill (gregcarlill@energex.com.au) says...
> Hi All,
> 
> I have a text file that contains column data like:
> 
>        730                 B13                    
>        730                 B33                    
>        730                 B53                    
>        730                 B73                    
>        800               B10-1                    
>        800               B30-1                    
>        800               B50-1                    
>        800               B70-1                    
>       
> and want to get a output text file like this.
> 
> 730	B13, B33, B53, B73
> 800	B10-1, B30-1, B50-1, B70-1 
> 
> Perl 5.005_02 on NT is all I have to do this. What is the best way to
> attack this. I have almost no knowledge of Perl but am willing to
> learn what I need.
> 
> Thanks in advance
> Greg
> 

Perl has a very nice feature called "hashes". Hashes are essentially 
arrays whose key can be any type of value, such as an integer, a 
character string, etc...


#!/usr/bin/perl -w

$filename = "mydatafile.txt";
open(INPUT,"<$filename") or die("open failed : $!\n");

%mydata = ();
while ( $buffer = <INPUT> ) {
	chomp $buffer;
	$buffer =~ s/^\s+//; # delete leading whitespace
	@fields = split(/\s+/,$buffer);
	if ( exists $mydata{$fields[0]} ) {
		$mydata{$fields[0]} .= ", " . $fields[1];
	}
	else {
		$mydata{$fields[0]} = $fields[1];
	}
}
close INPUT;

foreach $keyval ( keys %mydata ) {
	print "$keyval $mydata{$keyval}\n";
}

-- 
---------

Barry Kimelman
Winnipeg, Manitoba, Canada
email : bkimelman@hotmail.com


------------------------------

Date: Thu, 04 Sep 2003 16:27:07 +0200
From: John Bokma <postmaster@castleamber.com>
Subject: Re: Text File Processing
Message-Id: <3f574c1f$0$195$58c7af7e@news.kabelfoon.nl>

Barry Kimelman wrote:

> [This followup was posted to comp.lang.perl.misc]
> 
> In article <7a78a207.0309031827.76b3f46d@posting.google.com>, Greg 
> Carlill (gregcarlill@energex.com.au) says...
> 
>>Hi All,
>>
>>I have a text file that contains column data like:
>>
>>       730                 B13                    
>>       730                 B33                    
>>       730                 B53                    
>>       730                 B73                    
>>       800               B10-1                    
>>       800               B30-1                    
>>       800               B50-1                    
>>       800               B70-1                    
>>      
>>and want to get a output text file like this.
>>
>>730	B13, B33, B53, B73
>>800	B10-1, B30-1, B50-1, B70-1 
>>
>>Perl 5.005_02 on NT is all I have to do this. What is the best way to
>>attack this. I have almost no knowledge of Perl but am willing to
>>learn what I need.
>>
>>Thanks in advance
>>Greg
>>
> 
> 
> Perl has a very nice feature called "hashes". Hashes are essentially 
> arrays whose key can be any type of value, such as an integer, a 
> character string, etc...
> 
> 
> #!/usr/bin/perl -w
> 
> $filename = "mydatafile.txt";
> open(INPUT,"<$filename") or die("open failed : $!\n");
> 
> %mydata = ();
> while ( $buffer = <INPUT> ) {

IIRC defined($buffer = <INPUT>) ..

> 	chomp $buffer;
> 	$buffer =~ s/^\s+//; # delete leading whitespace
> 	@fields = split(/\s+/,$buffer);


how about push(@{$mydata{$fields[0]}}, $fields[1]);

I also would recommend using sensible names for the fields:

	my($number, $bcode);

	push(@{$mydata{$number}}, $bcode)
              if (($number, $bcode) =~ /^\s+(\d+)\s+(\S+)/';

the \S+ could be made more specific probably.

> 	if ( exists $mydata{$fields[0]} ) {
> 		$mydata{$fields[0]} .= ", " . $fields[1];
> 	}
> 	else {
> 		$mydata{$fields[0]} = $fields[1];
> 	}
> }
> close INPUT;

or die....

> 
> foreach $keyval ( keys %mydata ) {
> 	print "$keyval $mydata{$keyval}\n";

	print "$keyval ", join(", ", @{mydata{$keyval}), "\n";

> }
> 

Which seperates the data and the presentation layer.

-- 
Kind regards,       feel free to mail: mail(at)johnbokma.com (or reply)
                     virtual home: http://johnbokma.com/  ICQ: 218175426
John                web site hints: http://johnbokma.com/websitedesign/



------------------------------

Date: Thu, 4 Sep 2003 09:47:46 -0500
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Text File Processing
Message-Id: <slrnblek4i.67f.tadmc@magna.augustmail.com>

John Bokma <postmaster@castleamber.com> wrote:

>> while ( $buffer = <INPUT> ) {
> 
> IIRC defined($buffer = <INPUT>) ..


You do not recall correctly.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Thu, 4 Sep 2003 07:43:27 -0700
From: "Trent Curry" <tcurrey@no.no.no.i.said.no>
Subject: Re: What ever happened to comp.lang.perl ?
Message-Id: <bj7jai$f41$1@news.astound.net>

"Anno Siegel" <anno4000@lublin.zrz.tu-berlin.de> wrote in message
news:bj6qjb$984$2@mamenchi.zrz.TU-Berlin.DE...
> Trent Curry <tcurrey@no.no.no.i.said.no> wrote in comp.lang.perl.misc:
> > "Tad McClellan" <tadmc@augustmail.com> wrote in message
>
> [...]
>
> > > The secondary indication was your proclivity for transposing
> > > letters and other signs of carelessness in composition.
> > >
> > > Those 2 heuristics where enough for me, even without the jsut.
> >
> > Rubbish. Complete rubbish. You are try again to miss label me with non
>                                          ^^^^^
>
> Again, right?  Oh boy, this is hard to pull off, isn't it?

You make no sense.

http://dictionary.reference.com/search?q=again




------------------------------

Date: Thu, 4 Sep 2003 07:53:54 -0700
From: "Trent Curry" <tcurrey@no.no.no.i.said.no>
Subject: Re: What ever happened to comp.lang.perl ?
Message-Id: <bj7ju5$f96$1@news.astound.net>


"Brian McCauley" <nobull@mail.com> wrote in message
news:u9oey0aiva.fsf@wcl-l.bham.ac.uk...
> tadmc@augustmail.com (Tad McClellan) writes:
>
> > Trent Curry <tcurrey@no.no.no.i.said.no> wrote:
> > >
> > > So let me get this stright, you are assosiating me with another person
or
> >                      ^^^^^^^              ^
> > > persons becuase of a typographical error?
> >              ^^
> >
> > Yes I am, but it was only a tertiary indication.
> >
> > I was reasonably sure it was you before I inserted the typo in my post.
>
> If Tad were to conclude that all persons who make these typographical
> errors were one and the same then he would conclude that I was Trent.
> I'm glad to say I'm not.

I didn't know I did anything wrong. If you want to see me as what Tad has
tried to present me as, then there is little I can do to stop you. Just know
you will be accepting a lie.

And if you seriously believe that I am the one & only to make such a typo,
you should do a search, like:
http://groups.google.com/groups?as_q=jsut&safe=images&ie=UTF-8&oe=UTF-8&as_ugroup=comp.lang.perl.misc&lr=&num=100&hl=en

   or

http://www.google.com/search?num=100&hl=en&lr=&ie=UTF-8&oe=UTF-8&q=jsut+group%3Acomp.lang.perl.misc&sa=N&tab=gw

   or

http://www.metacrawler.com/_1_2SLTTUW08HU4EK__info.metac/dog/webresults.htm?&qkw=jsut&qcat=web&method=0&top=1&start=&nextid=&ver=20697

So kindly, take $tads_absurd_obsession and shove it into /dev/null and leave
me the hell alone. You have no right to target people like this, especially
when you have shown zero real proof to back up your claims, which just
reviles there is no substance or any real reason behind them.




------------------------------

Date: Thu, 04 Sep 2003 14:41:28 +0100
From: Chris <no_thanks@bms.umist.ac.uk>
Subject: Why does perl think there's a regex?
Message-Id: <3f574a4b$1@news.umist.ac.uk>

Dear all,

I have a (fairly complicated) script which unmasks a string (or protein 
sequence, for those who are interested) where it has previously been 
partially masked by Xs. BTW not all lines are masked.

For example this (all one line):

QUERY     438 
QDYPRYXXXXXXXXXXXXXXXXXXXXY-RVVCLRDNRRRD-----------K-------T 478

Is converted to this (also all one line):

QUERY      438 
QDYPRYVPGFVVTVVTTFVAGVLVFVY-RVVCLRDNRRRD-----------K-------T  478

This has been working well for a while, but today for some strings the 
masked version has also got some '*' in the string. e.g.:

szpo03656 61 
FYCSFLAAKLPSQLISKKIGPNRWIPTQMVLWSIVYICQYSLSG-T-A-LF*I-MRWLLG 116

When my script comes across the '*' it fails with the following error 
(where <BLAST> is the input file being read):

'Quantifier follows nothing before HERE mark in regex m/* << HERE / at 
unfilter_edit.pl line 214, <BLAST> line 43.'

My question is why does perl think there is a regex on line 214 when 
there isn't? I still get the error even if I replace the '*' with 
s/\*/-/ !!! What on earth is going on!

Any help appreciated.
Chris.


The script does a lot of pre-processing and parsing of the original 
file, but the key subroutine is shown below. It is passed the 'masked' 
string '$seq', an array reference to the unmasked string (cut up into 
one-letter chunks) '$aref' and a marker of where to start the comparisons.

sub unfilt {

	my ($seq, $aref, $seq_start) = @_ or return;
	
	print "RAW1: $seq\n";
	
	$seq =~ s/\*/-/;
	
	print "RAW2: $seq\n";
	
	my @split_seq = split //, $seq;
	
	foreach my $code (@split_seq) {
		if ($code eq '-') {     <--------- line 214
			print $code;
		}
		elsif ($code =~ /x/i) {
			$code = $aref->[$seq_start-1];
			print $code;
			++$seq_start;
		}
		elsif ($code !~ /$aref->[$seq_start-1]/i) {
			printf "position %d doesn't agree
			$code:$aref->[$seq_start-1]\n", $seq_start-1;
		}
		else {
			print $code;
			++$seq_start;
		}
	}
}

BTW using perl 5.6.1 built for i686-linux.



------------------------------

Date: 04 Sep 2003 14:06:19 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Why does perl think there's a regex?
Message-Id: <slrnblehmr.fa9.abigail@alexandra.abigail.nl>

Chris (no_thanks@bms.umist.ac.uk) wrote on MMMDCLVI September MCMXCIII in
<URL:news:3f574a4b$1@news.umist.ac.uk>:
__  
__  When my script comes across the '*' it fails with the following error 
__  (where <BLAST> is the input file being read):
__  
__  'Quantifier follows nothing before HERE mark in regex m/* << HERE / at 
__  unfilter_edit.pl line 214, <BLAST> line 43.'
__  
__  My question is why does perl think there is a regex on line 214 when 
__  there isn't? I still get the error even if I replace the '*' with 
__  s/\*/-/ !!! What on earth is going on!


Perl's line numbers aren't always correct. One particular place
this happens is in if() statements. You don't have a regexp on
line 214, but you do have one on line 222.


Abigail
-- 
perl -MLWP::UserAgent -MHTML::TreeBuilder -MHTML::FormatText -wle'print +(
HTML::FormatText -> new -> format (HTML::TreeBuilder -> new -> parse (
LWP::UserAgent -> new -> request (HTTP::Request -> new ("GET",
"http://work.ucsd.edu:5141/cgi-bin/http_webster?isindex=perl")) -> content))
=~ /(.*\))[-\s]+Addition/s) [0]'


------------------------------

Date: Thu, 04 Sep 2003 14:07:49 GMT
From: Brian Harnish <bharnish@technologist.com>
Subject: Re: Why does perl think there's a regex?
Message-Id: <pan.2003.09.04.14.08.08.733241@technologist.com>

-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1

On Thu, 04 Sep 2003 14:41:28 +0100, Chris wrote:

> Dear all,
> 
> I have a (fairly complicated) script which unmasks a string (or protein 
> sequence, for those who are interested) where it has previously been 
> partially masked by Xs. BTW not all lines are masked.

You're supposed to make a sample script that can be ran and shows your
problem. The script you provided doesn't run, because all it is is a
subroutine that is never called. Usually when you try creating the smaller
script, you figure the problem out for yourself.

 - Brian
-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/V0dDiK/rA3tCpFYRAn7cAJwP/VdfMlUbLwXTizbgxhXyp+T6+ACdHGUg
wi6cUArF359LnEY+DOAPYYo=
=DKsc
-----END PGP SIGNATURE-----



------------------------------

Date: Sat, 19 Jul 2003 01:59:56 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: 
Message-Id: <3F18A600.3040306@rochester.rr.com>

Ron wrote:

> Tried this code get a server 500 error.
> 
> Anyone know what's wrong with it?
> 
> if $DayName eq "Select a Day" or $RouteName eq "Select A Route") {

(---^


>     dienice("Please use the back button on your browser to fill out the Day
> & Route fields.");
> }
 ...
> Ron

 ...
-- 
Bob Walton



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5439
***************************************


home help back first fref pref prev next nref lref last post