[11796] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 5396 Volume: 8

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Apr 16 02:07:30 1999

Date: Thu, 15 Apr 99 23:00:20 -0700
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 15 Apr 1999     Volume: 8 Number: 5396

Today's topics:
        Boolean keyword search in HTML files - boose.pl (0/1) (Dmitry Epstein)
        Boolean keyword search in HTML files - boose.pl (1/1) (Dmitry Epstein)
    Re: checking filenames.. (Tad McClellan)
    Re: Core (Matthew Bafford)
    Re: Creating an empty file. <hove@ido.phys.ntnu.no>
    Re: Creating an empty file. <cassell@mail.cor.epa.gov>
    Re: flocking question - worried (Tad McClellan)
    Re: FREE Certifications Offered Online to perl programm <uri@home.sysarch.com>
    Re: ftp for Windows NT? <cassell@mail.cor.epa.gov>
    Re: guestbook with CGI script <jbc@shell2.la.best.com>
    Re: How to make a var defined in more than one package? <cassell@mail.cor.epa.gov>
    Re: How to make a var defined in more than one package? (Ronald J Kimball)
        Newbie needs help with pattern search and concatenation <greg_harrison@analog.com>
    Re: Newbie needs help with pattern search and concatena <ebohlman@netcom.com>
        preserving blank spaces <bluesrift@aol.com>
    Re: Question about Connection <cassell@mail.cor.epa.gov>
    Re: Scalars .. (Tad McClellan)
        Special: Digest Administrivia (Last modified: 12 Dec 98 (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 16 Apr 1999 04:17:14 GMT
From: mitiaNOSPAM@nwu.edu.invalid (Dmitry Epstein)
Subject: Boolean keyword search in HTML files - boose.pl (0/1)
Message-Id: <3716b699.205250869@news.acns.nwu.edu>

The attached script searches HTML files in subdirectories using a
query string with boolean syntax.  It will eventually be incorporated
into a site search CGI, but for now this is a stand-alone script that
can search files on your computer.

This is more than a few lines of code typically posted to this
newsgroup, but there are plenty of comments to aid comprehension.  I
am posting this code hoping to get your comments and criticism.  Some
may also find this useful for their own projects.

Dmitry

--
Remove NOSPAM and .invalid from mitiaNOSPAM@nwu.edu.invalid


------------------------------

Date: Fri, 16 Apr 1999 04:17:15 GMT
From: mitiaNOSPAM@nwu.edu.invalid (Dmitry Epstein)
Subject: Boolean keyword search in HTML files - boose.pl (1/1)
Message-Id: <3717b9cb.206068877@news.acns.nwu.edu>

#!/usr/local/bin/perl
######################################################################
## Boolean keyword search in HTML files
## Written by Dmitry Epstein <mitia@nwu.edu>
##
## Feel free to use and modify this code.  Comments and suggestions
## are welcome.
##
## This script searches HTML files subdirectories.  The query string
## can contain words, phrases in double quotes "", wildcards * and ?,
## boolean operators & (AND), | (OR), ! (NOT), and brackets ( ).
##
## Examples:
##   austr* & !(kangaroo | koala)
##   "macintosh apple?" AND NOT computer?
##   *@aol.com *@ibm.net *@hotmail.com
## (default operator is assumed where none is specified)
##
## For each HTML file where the query string was matched, the script
## prints file name, title, total word count, and the number of hits
## for each word or phrase.
##
## The following variables should be set:
##   $default_op - default operator
##   $case_sens  - case sensitive?
##   $search_dir - search directory
##   @excluded   - subdirectories to exclude from search
##   @file_ext   - extensions of files to search
######################################################################
## This script works by translating the query string into an
## expression that can be evaluated by Perl, where the variables
## are the word counts to be found later.  Each keyword or phrase
## is searched in a file, and the number of counts is substituted
## into the expression.  The expression is then eval'ed.
######################################################################

use strict;

## Perl 5 required to run this script
require 5;

## A standard module distributed with Perl 5
use File::Find;

######################## Set variables below #########################
## Default operator: || or &&
my $default_op = '||';

## Case-sensitive search?
my $case_sens = 0;

## Directory to search (absolute or relative to the script).
##   No trailing /
my $search_dir = '/your/search/directory';

## Subdirectories of $search_dir to exclude from search.
##   No trailing /
my @excluded = qw(your excluded subdirectories);

## Search files with these extensions
my @file_ext = qw(html htm shtml);
######################################################################

## HTML entities
my %html_ent = (
  '&quot;'   => '"',
  '&amp;'    => '&',
  '&lt;'     => '<',
  '&gt;'     => '>',
  '&nbsp;'   => ' '
);

print 'Enter query string: ';
chomp(my $query_str = <STDIN>);

my %keywords;  ## keyword -> word count
my %patterns;  ## keyword -> search pattern

## Parse the search string and get an expression for later evaluation;
## construct %keywords and %patterns
my $expression = &parse_str($query_str);

## Form patterns to determine valid files and subdirectories
my $file_ext = '\.'.(join '$|\.', @file_ext).'$';
my $excluded = '^'.$search_dir.'/(?:'.(join '$|', @excluded).'$)';

my @results;  ## file name, title, word count, and report

## Traverse the directory structure, performing search in all valid files
find(\&search_file, $search_dir);

## Sort results by word count (descending)
@results = sort {$b->[2] <=> $a->[2]} @results;

foreach (@results) {
  print $_->[0], ': ', $_->[1], "\n",
       'Total word count: ', $_->[2], "\n",
       $_->[3], "\n";
}


## This subroutine parses the search string, checks syntax, puts keywords and
## phrases into %keywords, puts corresponding search patterns into %patterns,
## and returns a Perl expression with word counts for later evaluation.
sub parse_str {

  my $str = shift;

  ## Replace all AND, OR, and NOT with &, |, and ! respectively
  my $str1;
  while () {
    ## Ignore quoted
    if ($str =~ /\G\s*("[^"]*")/gc) {
      $str1 .= $1;
    }

    ## Process unquoted
    elsif ($str =~ /\G([^"]+)/gc) {
      my $chunk = $1;
      $chunk =~ s/\bAND\b/&/g;
      $chunk =~ s/\bOR\b/|/g;
      $chunk =~ s/\bNOT\b/!/g;
      $str1 .= $chunk;
    }
    else { last; }
  }
  $str = $str1;

  ## Parse the search string and construct a Perl expression.
  my $expr;
  my $is_op =  1;
  while () {
    ## & |
    if ($str =~ /\G\s*([&|])/gc) {
      $expr .= $1.$1;  ## Replace single & and | with && and ||
      $is_op = 1;
    }

    ## ! ( )
    elsif ($str =~ /\G\s*([!()])/gc) {
      if (!$is_op && $1 ne ')') {
        $expr .= $default_op;  ## insert default operator if none specified
        $is_op = 1;
      }
      $expr .= $1;
    }

    ## Words and quoted phrases
    elsif ($str =~ /\G\s*"([^"]+)"/gc ||
           $str =~ /\G\s*([^\s&|!()]+)/gc) {

      my $word = $1;
      my $pattern = $1;

      $keywords{$word} = 0;

      ## Translate the word/phrase into a search pattern and add to %patterns
      $pattern =~ tr/ \t//s;           ## Single spaces between words
      $pattern =~ s/^ | $//;           ## No leading or trailing spaces
      $pattern =~ s/([^\w*?])/\\$1/g;  ## Quote all non-word chars...
      $pattern =~ s/([*?])/\\w$1/g;    ## ...except for wildcards * and ?
      ## Define the boundaries of the pattern: \b if the first/last
      ## character is a \w, \B otherwise.
      $pattern = (($pattern =~ /^\\\W/) ? '\B' : '\b').$pattern;
      $pattern .= ($pattern =~ /\\\W$/) ? '\B' : '\b';

      $patterns{$word} = $pattern;

      ## insert default operator if none specified
      if (!$is_op) {
        $expr .= '||';
      }
      $is_op = 0;

      $word =~ s/\\/\\\\/g;
      $word =~ s/'/\\'/g;
      ## This will evaluate to the number of counts for the word/phrase
      $expr .= "\$keywords{'$word'}";
    }
    else { last; }
  }

  ## Check the syntax by trying to eval
  eval $expr;
  die $@ if $@;

  return $expr;
}


## Search HTML files for keywords
sub search_file {

  my $full_path = $File::Find::name;

  ## Skip if the subdirectory is to be excluded
  $File::Find::prune = 1 if $full_path =~ /$excluded/o;

  ## Check if the file is valid
  if ( -f && /$file_ext/o && (open(FILE, $_) ||
       ! warn "Couldn't open $full_path: $!\n") ) {

    ## Read the whole file
    my $allfile = join '', <FILE>;
    close FILE;

    $allfile =~ tr/\n/ /;         ## Remove line breaks
    ## Extract the title
    my ($title) = $allfile =~ m!<title>(.*)</title>!i;
    $title = $_ unless $title =~ /\S/;
    $allfile =~ s/<[^>]*?>/ /g;   ## Remove HTML tags
                                  ## Translate or remove HTML entities
    $allfile =~ s/(&.*?;)/$html_ent{lc $1}/eg;
    $allfile =~ tr/ \t//s;        ## Remove extra spaces

    ## Search for each pattern
    while (my ($word, $pattern) = each %patterns) {
      my $cnt;
      if ($case_sens) { $cnt++ while $allfile =~ /$pattern/g; }
      else            { $cnt++ while $allfile =~ /$pattern/ig; }
      $keywords{$word} = $cnt;
    }

    ## Evaluate the expression
    if (eval($expression)) {
      ## Calculate total word count
      my ($tcount, $report);
      while (my ($word, $count) = each %keywords) {
        if ($count) {
          $tcount += $count;
          $report .= "  '$word' = $count\n";
        }
      }
      push @results, [$full_path, $title, $tcount, $report];
    }
  }
}


------------------------------

Date: Fri, 16 Apr 1999 00:30:00 -0400
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: checking filenames..
Message-Id: <8ce6f7.ua9.ln@magna.metronet.com>

Terra Landry (terral@cyberplex.com) wrote:
: I have a script written that checks if a filename exists.. if it does, I
: want to add a sequential number to the end of the filename... how do I get
: the greatest sequential number from the filenames that already exist??

-----------------------
#!/usr/bin/perl -w
use strict;

my $filename = 'foo';  # the "base" part of the filename

opendir(DIR, '.') || die "could not open current directory  $!";

# get the list of numbers, determine largest
my $largest = 0;
foreach (grep s/$filename(\d+)$/$1/g, readdir(DIR)) {
   $largest = $_ if $_ > $largest;
}
closedir(DIR);

print "$largest\n";
-----------------------



: I need an answer quick!!


   If you _want_ an answer quick, you might take a stab at
   posting to Usenet.

   If you _need_ an answer quick, then it would be Bad Business
   to rely on Usenet. 

   You should hire someone to help you solve the problem if
   you truly *need* quick response.

   Usenet is unreliable.

   It may be a day before you get an answer.

   You may never get an answer...


--
    Tad McClellan                          SGML Consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: Fri, 16 Apr 1999 03:35:44 GMT
From: dragons@dragons.duesouth.net (Matthew Bafford)
Subject: Re: Core
Message-Id: <slrn7hdb2k.g16.dragons@dragons.duesouth.net>

On Thu, 15 Apr 1999 20:27:56 -0400, Chris Lambrou <cglcomputer@earthlink.net>
lucked upon a computer, and thus typed in the following:
) One of our PERL scripts on our Web Server dumps the core, peridically.
) 1. When will PERL dump a core file?  And under what circumstances?

Only if there is a serious bug in Perl.  If Perl dumps for any reason
other than you explicitly telling it to, you need to upgrade immediatly.
If the bug persists, then report it using the perlbug utility.

) 2. How can we identify the offending script?

Try them one by one?  Also, since a core dump is just that, try looking
through it for the name of your script.  Possibly the filename was
hanging around at dump time.

) Gracias.

HTH,

--Matthew


------------------------------

Date: 15 Apr 1999 11:01:00 +0200
From: Joakim Hove <hove@ido.phys.ntnu.no>
Subject: Re: Creating an empty file.
Message-Id: <k0niuays1f7.fsf@ido.phys.ntnu.no>

Steven Filipowicz <s.filipowicz@orades.nl> writes:

> I have this line in a program :
> 
> ------------------------------------------------------------------
> open (GUEST,">/opt/guide/web/count.pl") || die "Can't Open
> /opt/guide/web/count.pl $!\n";
> ------------------------------------------------------------------
> 
> But if this file isn't there it will not open offcourse.

Yes it will. As long as you have write-access to the directory where
you want to open a file the command above will work perfectly. If you
don't have write-access to the actual directory $! will contain
"Permission denied".

> So what I would like the program to do is, make the file, and set the
> permission to be able to write to it and then open it.

If you insist on first creating a file and setting the mode manually:

perldoc -f chmod

to change the mode of a file. The unix system call "touch foo" (man
touch) will create an empty file "foo", but I don't know of any perl
version of this.

HTH Joakim


-- 
=== Joakim Hove     www.phys.ntnu.no/~hove/    ======================
# Institutt for fysikk	(735) 93637 / 352 GF  |  Skoyensgate 10D    #
# N - 7034 Trondheim	hove@phys.ntnu.no     |	 N - 7030 Trondheim #
=====================================================================


------------------------------

Date: Thu, 15 Apr 1999 21:20:58 -0700
From: David Cassell <cassell@mail.cor.epa.gov>
Subject: Re: Creating an empty file.
Message-Id: <3716BAAA.BFE0E476@mail.cor.epa.gov>

Joakim Hove wrote:
> 
> Steven Filipowicz <s.filipowicz@orades.nl> writes:
> 
> > I have this line in a program :
> >
> > ------------------------------------------------------------------
> > open (GUEST,">/opt/guide/web/count.pl") || die "Can't Open
> > /opt/guide/web/count.pl $!\n";
> > ------------------------------------------------------------------
> >
> > But if this file isn't there it will not open offcourse.
> 
> Yes it will. As long as you have write-access to the directory where
> you want to open a file the command above will work perfectly. If you
> don't have write-access to the actual directory $! will contain
> "Permission denied".

Right.  If the file does not exist, open() creates it.  So...

> > So what I would like the program to do is, make the file, and set the
> > permission to be able to write to it and then open it.
> 
> If you insist on first creating a file and setting the mode manually:
> 
> perldoc -f chmod
> 
> to change the mode of a file. The unix system call "touch foo" (man
> touch) will create an empty file "foo", but I don't know of any perl
> version of this.

But you do!  Right up there, in between the lines of dashes.  
If you open() the file and don't print anything to it, what do you have?

BTW, if you want to open the file and create the permissions at the
*same* time, you might look into the sysopen() function.

use FileHandle;
sysopen FH, $pathname, O_RDWR|O_CREAT|O_EXCL, 0755
    or die "an agonizing death when $pathname $!";

And make sure you write that mode as 0xyz instead of xyz, so you
get octal.

HTH,
David
-- 
David Cassell, OAO                               
cassell@mail.cor.epa.gov
Senior Computing Specialist                          phone: (541)
754-4468
mathematical statistician                              fax: (541)
754-4716


------------------------------

Date: Thu, 15 Apr 1999 22:41:28 -0400
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: flocking question - worried
Message-Id: <o086f7.m89.ln@magna.metronet.com>

Daniel Beckham (danbeck@scott.net) wrote:
: ooh, can you do that?  That race condition has always bothered me.  I was 
: under the assumption that you had to unlock it first...


   Ack!

   The act of unlocking before the close is what *creates* 
   the race condition.

   If you just close, then there _is_ no race to worry about.



: In article <slrn7ha4l9.j6n.billynospam@wing.mirror.bt.co.uk>, 
: > 
: > are there any fully portable solution to this problem? I'm currently not
: > bothering unlock files, but am letting close() do it for me.


--
    Tad McClellan                          SGML Consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: 16 Apr 1999 01:09:58 -0400
From: Uri Guttman <uri@home.sysarch.com>
Subject: Re: FREE Certifications Offered Online to perl programmers
Message-Id: <x7g161m9qw.fsf@home.sysarch.com>

>>>>> "l" == ludlow1435  <ludlow1435@my-dejanews.com> writes:

  l> To qualify as a Certified perl Programmer, you must pass the
  l> examination with a score of 2.75 or higher.  To be certified as a
  l> Master perl Programmer, you must obtain a score of 4.00 or higher.

this is a fairly bogus test. i just got a 4.25 (or 34/40 right) score so
i am certified as a master perl hacker. hell i BOUGHT for $3 a
certificate from the perl mongers at the last conference that said the
same thing.  some of the questions are just wrong like saying the
difference between use and require is compile time vs. runtime without
mentioning import. also a perl/tk question was asked which is stupid as
it is not needed to be fluent in perl. other questions were not clear or
the answer choices were ambiguous. on most of my wrong answers i
actually timed out since i wasn't paying attention to the tiny ticker in
the message bar.

they should have more real perl experts review the questions before they
infect the world with this test. and as always a simple 40 question test
does not gauge diddly about your real ability. i can impress inteviewers
with my perl bullshit very easily. i don't need this certificate on my
resume. maybe some newbie types who want to get their foot in the door
would want it. real high level certifications like the bar or cpa exams
cover hours and days and real depth of knowledge. i just happened to
know the little things they asked here and wasn't tripped up by the
my/local, closures, and other things they asked about. how many times
are closures used in the real world? i have never coded one in 7+ years
of perl at work. not that it doesn't have its use but i never needed
it. i am surprised they didn't ask about formats which i also have never
used. in fact many of the questions were set up as tricks and/or poor
code. some ask you what will be printed with this code. why not just run
it through perl as you are writing it. better questions would be to ask
which statement/regex would do what is asked for.

so take this for the fun of it and if you are insecure enough in your
perl knowledge to want a certificate use this. (i suspect most u$hit certified
morons are just very insecure about their actual abilities that they
need that rag to bolster their ego and up their fees).

uri


-- 
Uri Guttman  -----------------  SYStems ARCHitecture and Software Engineering
uri@sysarch.com  ---------------------------  Perl, Internet, UNIX Consulting
Have Perl, Will Travel  -----------------------------  http://www.sysarch.com
The Best Search Engine on the Net -------------  http://www.northernlight.com


------------------------------

Date: Thu, 15 Apr 1999 21:24:25 -0700
From: David Cassell <cassell@mail.cor.epa.gov>
Subject: Re: ftp for Windows NT?
Message-Id: <3716BB79.1CF83A83@mail.cor.epa.gov>

ToddNKay wrote:
> 
> Hello all,
> Has anyone written a perl script for Windows NT that transfers a file from a
> user's machine to the web server executing the script?
> 
> Any suggestions or samples that could point me in the right direction?  I've
> written a similar script for UNIX, but even something as simple as opening
> the user's file for reading doesn't work on our NT server.

You might look at the Net::FTP module in CPAN.

If the docs that come with it don't help enough, you may want to look
at the FTP material in the Perl Cookbook.

HTH,
David
-- 
David Cassell, OAO                               
cassell@mail.cor.epa.gov
Senior Computing Specialist                          phone: (541)
754-4468
mathematical statistician                              fax: (541)
754-4716


------------------------------

Date: 16 Apr 1999 03:00:17 GMT
From: John Callender <jbc@shell2.la.best.com>
Subject: Re: guestbook with CGI script
Message-Id: <3716a7c1$0$236@nntp1.ba.best.com>

Vegard Kesa <vkaasa@c2i.net> wrote:
> I'm trying to make a guestbook in perl. I've been using the book"CGI
> Programming on the world wide web, by O'Reilly and Associates. (the one with
> the mouse on the front).

For a somewhat simpler guestbook, or at least a different perspective,
you might try checking out the CGI tutorial I wrote, at:

http://www.lies.com/begperl/

It includes a guestbook script, with lots of hopefully-newbie-friendly
annotation.

-- 
John Callender
jbc@west.net
http://www.west.net/~jbc/


------------------------------

Date: Thu, 15 Apr 1999 20:20:38 -0700
From: David Cassell <cassell@mail.cor.epa.gov>
Subject: Re: How to make a var defined in more than one package?
Message-Id: <3716AC86.5B17F906@mail.cor.epa.gov>

sstarre@my-dejanews.com wrote:
> 
> If you're going to lecture me about not doing my research project on this
> topic PLEASE PLEASE just press NEXT and move on. Otherwise, a kind sentance
> or two pointing me in the right direction would be most appreciated.

I know we've given your posts a roasting before, but please don't take
it personally.  Everyone here gets a little lecture when they don't
'do the right thing'.  And you have done all the right things [except
for the 1st two lines above :-].  You have a good question on a hard
topic,
with an explanatory subject line.
 
> I want to use:
> 
> #main.pl
>  use strict;
>  require './mypackage.pl';
>  my varx = 'a';
>  # I'd like to see this var from mypackage!

I know the 'varx' is a typo and you mean $varx here.  But you just
used 'my'.  Now if you have variables that you want to make *private*
to a package [or module - they're pretty much the same thing], then 
you declare them with 'my', and they can't be seen outside the 
lexical scope of the package.  So if you have your package as a separate
file, then that file is the lexical scope.

What you want to do is the other usual way of declaring variables
when you 'use strict'.  This will make $varx and $vary and @rray global:

use vars qw($varx $vary @rray);

> 
> #mypackage.pl
>  use strict;
>  #I wish I could use varx without saying my varx again!

And now you can.

   $main::varx = 3.1415926;
          ^^^^
And the `$' goes at the front.

> I've studied Camel sections on bless, and tie, all of which seem to be
> related to OO design. For example, tie is shown as
> 
>     tie variable, classname, list
> 
> and offers this for assistance: "CLASSNAME is the name of a class
> implementing objects of an appropriate type" [1] Can someone translate this
> using words like "packages" and "variable names"?

I think you might want to read through chapter 12 of the Perl Cookbook.
It has the clearest presentation on packages and libraries and modules
[guess what the chapter title is? :-] that I've seen.  YMMV.

>                                                   I'm not really big on OOD,
> seems like a lot of fluff...

A lot of it is hype (IMHO).  But it has some real advantages in the
proper place.  Still, you can work with packages and libraries without
having to immerse yourself in OOP.  Just remember, OOP spelled backwards
is 'POO'. 

>                               The solution with the fewest characters and
> lines is the best solution!

Not if you read the entries in the Obfuscated Perl Contest!  :-)
But seriously, sometimes a little OOP gives you the shortest, simplest
program.  Even if you don't write the OO but just apply it.  For 
example, the "Five Quick Hacks: Downloading Web Pages" article in the 
latest issue of The Perl Journal. All short, all sweet, all using 1 or 2
modules and some showing more OOD than others.

> 
> Cheers && Hugs,
> S

HTH,
David
-- 
David Cassell, OAO                               
cassell@mail.cor.epa.gov
Senior Computing Specialist                          phone: (541)
754-4468
mathematical statistician                              fax: (541)
754-4716


------------------------------

Date: Thu, 15 Apr 1999 23:44:31 -0400
From: rjk@linguist.dartmouth.edu (Ronald J Kimball)
Subject: Re: How to make a var defined in more than one package?
Message-Id: <1dqbtms.rhfce61vgr8o0N@p54.tc2.state.ma.tiac.com>

<sstarre@my-dejanews.com> wrote:

> If you're going to lecture me about not doing my research project on this
> topic PLEASE PLEASE just press NEXT and move on. Otherwise, a kind sentance
> or two pointing me in the right direction would be most appreciated.

sigh...

> I've studied Camel sections on bless, and tie, all of which seem to be
> related to OO design.

Yep, that's what bless and tie are for.  Have you looked at the
documentation for my()?  That should be more helpful in answering your
question.

[Hint: don't use my() for this.]

-- 
 _ / '  _      /       - aka -
( /)//)//)(//)/(   Ronald J Kimball      rjk@linguist.dartmouth.edu
    /                                http://www.tiac.net/users/chipmunk/
        "It's funny 'cause it's true ... and vice versa."


------------------------------

Date: Fri, 16 Apr 1999 00:24:31 -0400
From: greg <greg_harrison@analog.com>
Subject: Newbie needs help with pattern search and concatenation snippet
Message-Id: <3716BB7F.A15816EC@analog.com>

Hi guys..

I'm trying to write a perl script to do some text file manipulation and
need a bit of help.

Here's what I am trying to do: Assuming I have read a file in using

    # read it in
        @lines = <FILE>;
        close(FILE);

I now need to traverse each line, see if it has a string in it that was
specified by the user.
If it does, I then need to check to see if the line ends with a
semicolon.  If it does, I want to leave the line intact and go on to the
next line.
If it doesn't, I need to join it with the following line.
I then want to check this new line (which has been formed
by the concatenation of the two) and if it still doesn't end with a
semicolon, then join it with the next line again....etc..etc..until it
finally ends with a semicolon. That's my main
question.  I'm trying to traverse the lines that have been read in with
         foreach $line (@lines)
         {
                   print "$i | $line" if $line =~ /$search_string/;
                    i++;
                     .
                     .     # Here's where I need the pattern
                     .     # matching / concatenation code
          }

Once I have a line fully concatenated until it ends with a semicolon,
I will either dump it to stdout or to a file (as specified by user).

If anyone could help me with a perl snippet, I'd  appreciate it.
There's a lot more meat to this program than I am mentioning here
(command line args, verbose matching, output to
screen or file, etc) so the script is longer than I wish to post here.
If anyone wishes to see the full script, let me know and I will email it

to any interested party.

If you wish to email me, please replace the "_" in my email address with

a "." and it will get to me (I don't want a bot picking up my email addy

for spam lists :-P ) I'd appreciate CC-ing me when posting a reply so
that I'm sure to see it.

Thanks in advance..
Greg Harrison
greg_harrison@analog.com





------------------------------

Date: Fri, 16 Apr 1999 04:58:00 GMT
From: Eric Bohlman <ebohlman@netcom.com>
Subject: Re: Newbie needs help with pattern search and concatenation snippet
Message-Id: <ebohlmanFA9n4o.6uq@netcom.com>

greg <greg_harrison@analog.com> wrote:
: Here's what I am trying to do: Assuming I have read a file in using

:     # read it in
:         @lines = <FILE>;
:         close(FILE);

: I now need to traverse each line, see if it has a string in it that was
: specified by the user.
: If it does, I then need to check to see if the line ends with a
: semicolon.  If it does, I want to leave the line intact and go on to the
: next line.
: If it doesn't, I need to join it with the following line.
: I then want to check this new line (which has been formed
: by the concatenation of the two) and if it still doesn't end with a
: semicolon, then join it with the next line again....etc..etc..until it
: finally ends with a semicolon. That's my main

It sounds to me like the "basic unit" of your file is "a bunch of lines, 
the last of which ends with a semicolon."  If that's the case, you ought 
to take advantage of Perl's ability to read arbitrary chunks of text, 
rather than just lines, from files.  Something like:

open(INPUT,"mytext") or die "Couldn't open input: $!";
{local $/=";\n";
 while (<INPUT>) {
   print if /$search_pattern/o;
}
close INPUT;

Ought to work.  Depending on the complexity of your search pattern, you 
might want to translate embedded newlines in the text you read into 
spaces, or use the '/s' modifier on the match pattern if your search 
string includes '.'.



------------------------------

Date: Fri, 16 Apr 1999 05:21:15 GMT
From: "Rob Bell" <bluesrift@aol.com>
Subject: preserving blank spaces
Message-Id: <fNzR2.89$C76.56@c01read02.service.talkway.com>

Within an html form textarea tag one may enter multiple spaces to
seperate words or whatever.  I am pulling the content of this textarea
and printing it into a new html page and would like to preserve those
spaces which are normally collapsed by browsers when rendering html.  I
know this must involve the usage of &nbsp; yet a simple substitution of
&nbsp; for a blank space does not seen to work such as below:

$form{'THETEXTAREA'} =~ s/ /\&nbsp\;/g;

Please suggest any other method I could use.

Thank you
Rob Bell

Surf Usenet at home or on the road -- always at Talkway.
http://www.talkway.com



------------------------------

Date: Thu, 15 Apr 1999 20:24:45 -0700
From: David Cassell <cassell@mail.cor.epa.gov>
Subject: Re: Question about Connection
Message-Id: <3716AD7D.6AABDD26@mail.cor.epa.gov>

Abigail wrote:
> 
> James Hill (jrhill@writeme.com) wrote on MMLIII September MCMXCIII in
> <URL:news:37165D17.59AC@writeme.com>:
> `` How does one get perl to read an HTML file from another server than the
> `` one it resides on?  Thanks!!
> 
> There are many possibilties. You can for instance NFS mount the disk
> of the remote server, and then access it using open () and <>.
> 
> Alternatively, you could use UUCP on the remote server to copy the file
> to your server.
> 
> Other possibilities include FTP-by-mail and Gopher. No doubt someone
> will suggest to use LWP, but that's so boring. Everyone could do that.
> 
> I don't recommend smoke signals though, unless you only want to read
> the other file in daylight, and chances of fog are low.

Too funny.  But you forgot one: 
never underestimate the bandwidth of a station wagon full of mag tapes.

Now how do I clean all this splurted coffee off my keyboard?  In Perl,
of course.  I haven't been able to get the PSI::ESP::Telekinesis module
to compile on my Slowlaris box.

David
-- 
David Cassell, OAO                               
cassell@mail.cor.epa.gov
Senior Computing Specialist                          phone: (541)
754-4468
mathematical statistician                              fax: (541)
754-4716


------------------------------

Date: Thu, 15 Apr 1999 23:48:37 -0400
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: Scalars ..
Message-Id: <lub6f7.m89.ln@magna.metronet.com>

Ganesh Srinivasan (gsriniva@ecn.purdue.edu) wrote:

: I am relatively new to Perl 


   Welcome!


: and was not able to
: figure this one out.


   *Tens of thousands* of people before you have been new to Perl.

   You don't really think that you would be the first to wonder
   about how to round numbers do you?


   In fact, a *whole lot* of people before you asked that question.

   The question got answered a corresponding bazillion times.

   Sometimes by experts. 
  
   Sometimes they were answered by well intentioned folks who
   had only hours or days more experience with Perl than the
   asker. 

   These answers were quite frequently wrong.

   So then someone (usually more than one) would have to post
   a correction to the broken answer to save the original
   poster from getting even more confused.




   That whole scenario played out several dozen times a year
   for that question.



   Similar scenarios were played out for several dozen *other*
   frequent questions several dozen times a year.

   Time passes...



   Eventually someone noticed the huge waste of brain power
   (or busted a blood vessel at seeing the same exact question
   for the 47th time) that these Frequently Asked Questions
   were consuming while not advancing the state of knowledge,
   since they had all been answered satisfactorily already.

   So they spent many hours collecting all the Frequently
   Asked Questions together, getting the best of the
   answers for each question, and "publishing" it all in
   a series of files that is shipped with every copy of
   Perl.

   They played with their children less because they thought
   it was important to stop the futility of reanswering
   questions that have been answered adequately already.

   They gave up real chunks of their life to compile the list
   of Frequently Asked Questions and the best answer for each.


      But the very people that their sacrifice was meant to help
      ignore them.


   It is so sad. 

   I'll bet they feel like real idiots for wasting their time
   when they read posts like yours...


: I am reading a few numbers from a form and doing some calculations and
: spitting out the answers. Is there a way I can round-off the
: answers that I get to say 3 decimal places or
: sth?


   Perl FAQ, part 4:

      "Does perl have a round function?  
       What about ceil() and floor()?  
       Trig functions?"


   Searching for "round" in the Subject headers for this
   newsgroup at

      http://www.dejanews.com/home_ps.shtml

   gets about 300 hits.

   ( I'm not trying to pick on you in particular here,
     it's just that I've been here to see all 300 of 
     them myself.

     Waste, waste, waste!!

     Reasking a question that has been answered is just
     tooooo much  :-(
   )

   Another great resource that is faster at getting answers than
   posting, those Usenet archives like Dejanews.


: Another thing that was holding me up was that
: is there a way I can make the browser not display stuff from the cache
: but compare it everytime with the network copy. This is because I was
: getting images displayed from the cache when
: actually the network copy has been changed.


   You do not have a Perl question there.


   You should ask about WWW things in a newsgroup that is associated
   in some way with WWW things. This isn't such a newsgroup.

      comp.infosystems.www.authoring.cgi
      comp.infosystems.www.browsers.mac
      comp.infosystems.www.browsers.misc
      comp.infosystems.www.browsers.ms-windows
      comp.infosystems.www.browsers.x
 

--
    Tad McClellan                          SGML Consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: 12 Dec 98 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Special: Digest Administrivia (Last modified: 12 Dec 98)
Message-Id: <null>


Administrivia:

Well, after 6 months, here's the answer to the quiz: what do we do about
comp.lang.perl.moderated. Answer: nothing. 

]From: Russ Allbery <rra@stanford.edu>
]Date: 21 Sep 1998 19:53:43 -0700
]Subject: comp.lang.perl.moderated available via e-mail
]
]It is possible to subscribe to comp.lang.perl.moderated as a mailing list.
]To do so, send mail to majordomo@eyrie.org with "subscribe clpm" in the
]body.  Majordomo will then send you instructions on how to confirm your
]subscription.  This is provided as a general service for those people who
]cannot receive the newsgroup for whatever reason or who just prefer to
]receive messages via e-mail.

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.

The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V8 Issue 5396
**************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[11796] in Perl-Users-Digest

Perl-Users Digest, Issue: 5396 Volume: 8

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Apr 16 02:07:30 1999

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Apr 16 02:07:30 1999