[22079] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4301 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Dec 21 21:05:52 2002

Date: Sat, 21 Dec 2002 18:05:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 21 Dec 2002     Volume: 10 Number: 4301

Today's topics:
    Re: advice - gui-ish programming <goldbb2@earthlink.net>
    Re: Concatenation help for beginner at Perl (Tad McClellan)
    Re: Concatenation help for beginner at Perl (Anno Siegel)
    Re: Concatenation help for beginner at Perl <krahnj@acm.org>
    Re: incorporating off-site content (Anno Siegel)
    Re: OCR Module? <goldbb2@earthlink.net>
    Re: Performance for my roguelike <goldbb2@earthlink.net>
        Placement of use statements <jim.bloggs@eudoramail.com>
    Re: Placement of use statements <tassilo.parseval@post.rwth-aachen.de>
    Re: Placement of use statements <ian@WINDOZEdigiserv.net>
    Re: Placement of use statements <jim.bloggs@eudoramail.com>
        race condition and file locking (Jeff Mott)
    Re: race condition and file locking <goldbb2@earthlink.net>
    Re: race condition and file locking (Anno Siegel)
    Re: trapping connection timeout ... using sockets <goldbb2@earthlink.net>
        Use a variable on the right side of =~ ? (Peter Davis)
    Re: Use a variable on the right side of =~ ? (Anno Siegel)
    Re: Use a variable on the right side of =~ ? (Tad McClellan)
    Re: Use a variable on the right side of =~ ? <pd@world.std.com>
    Re: Use a variable on the right side of =~ ? <pd@world.std.com>
    Re: Wierd loop/chdir behavior <goldbb2@earthlink.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 21 Dec 2002 19:04:24 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: advice - gui-ish programming
Message-Id: <3E050187.860E84C6@earthlink.net>

kevin wrote:
> 
> Hey All,
> I'm totally fed up with the M$ platform, and I want to learn to
> program gui apps for linux.  I've used Delphi (pascal, don't laugh: it
> was required for a class) in the past to do M$ Winblows 95
> programming, but that was some time ago.  In my job as a sys admin,
> I've been using perl, and have grown to like this lang.  I'm curious
> if there is any way to write gui apps for x-windows using perl.

In addition to all of the numerous libraries that others have mentioned,
there is X11::Protocol.  This is both more and less portable -- it's
more portable due to being pure-perl, and works anywhere that IO::Socket
works (so you don't need to compile any extentions), but less portable
due to only working with X Windows, not with any other windowing system.

AFAIK, though, all types of *nix use X as their windowing system.

Beware, though, that all this is designed to do is implement a client
program for the low-level network protocol used to talk with an X
Windows server -- it's *not* a high-level library, not by any means.

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: Sat, 21 Dec 2002 13:36:07 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Concatenation help for beginner at Perl
Message-Id: <slrnb09gl7.hmp.tadmc@magna.augustmail.com>

al <al@onetel.net.uk> wrote:

> just need the second
> field for each line collated by line number for right now.


-----------------------------------------
#!/usr/bin/perl
use strict;
use warnings;

my %h;
while ( <DATA> ) {
   next unless /\S/;  # skip blank lines
   my($num, $field) = split /\s*,\s*/;

   if ( exists $h{$num} )
      { $h{$num} = "$field $h{$num}" }
   else
      { $h{$num} = $field }
}

foreach ( sort keys %h ) {
   print "$h{$_}   $_\n";
}

__DATA__
4Q246.1.1, yhwl[(], prep.|s5, prep.loc~,
4Q246.1.1, tr#, v-G11, v., <yr# to settle
4Q246.1.1, lpn, v-G10, v., <lpn to fall
4Q246.1.1, Mdq, prep., prep.locative, <Mdq before
4Q246.1.1, )ysrk, n-mse, p.obj., <)srk chair

4Q246.1.2, )kl[m], n-mse, vocative, <Klm king
4Q246.1.2, )ml(<m>[l], prep.|n-mse, prep.temporal|p.obj., <Ml( forever
4Q246.1.2, ht), p2, s., <ht( you
4Q246.1.2, zgr, adj-msa, adj.modifier, <zgr angry/distressed
4Q246.1.2, Kyn#w, c.conj.|n-mpc|s2, s., <yn# year

4Q246.1.3, ...,, ,
4Q246.1.3, Kwzx, n-msc|s2, , <wzx vision
4Q246.1.3, )lkw, c.conj.|adj-mse, adj.substantive,
4Q246.1.3, ht), p2, s., <ht) you
4Q246.1.3, d(, prep., prep.temporal, <d( until
4Q246.1.3, )ml(, n-mse, p.obj., <Ml( forever
-----------------------------------------



> #!/usr/bin/perl


You should ask for all the (automated) help you can get:

   use strict;
   use warnings;


> open(INFO, "$file" ) or die"Cannot open $file:$!\n";     
             ^     ^
             ^     ^  a useless use of quotes

   open(INFO, $file ) or die"Cannot open $file:$!\n";


>     foreach $line (<INFO>) {         


That is wasteful of memory.

   while ( $line = <INFO> ) {   # only one line at a time in memory


> 	my @lineOut = "";                   


Are you trying to ensure that @lineOut is empty?

If so, you're doing the wrong thing, @lineOut will contain
one element after that statement.

   my @lineOut;        # my() guarantees that it will be empty

   my @lineOut = ();   # if you insist on being explicit


> 	$fieldEval = @field[0];                                         
                     ^
                     ^

You should always enable warnings when developing Perl code.


> 	    foreach $word ($line){


What is the point of a loop that must always execute exactly
one time?


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 21 Dec 2002 21:29:01 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Concatenation help for beginner at Perl
Message-Id: <au2met$8fl$2@mamenchi.zrz.TU-Berlin.DE>

According to Tad McClellan <tadmc@augustmail.com>:
> al <al@onetel.net.uk> wrote:
> 
> > just need the second
> > field for each line collated by line number for right now.

You have the rare talent of finding a useful handle in a convoluted
problem description.  When I tried to answer this post, I got lost
in the various requirements and gave up.

> -----------------------------------------
> #!/usr/bin/perl
> use strict;
> use warnings;
> 
> my %h;
> while ( <DATA> ) {
>    next unless /\S/;  # skip blank lines
>    my($num, $field) = split /\s*,\s*/;
> 
>    if ( exists $h{$num} )
>       { $h{$num} = "$field $h{$num}" }
>    else
>       { $h{$num} = $field }
> }
> 
> foreach ( sort keys %h ) {
>    print "$h{$_}   $_\n";
> }

Short and sweet.  It can be shortened even more using an array instead
of a string to collect the partial fields:

    my %h;
    while ( <DATA> ) {
       next unless /\S/;  # skip blank lines
       my($num, $field) = split /\s*,\s*/;

       unshift @{ $h{ $num}}, $field;
    }

    foreach ( sort keys %h ) {
       print "@{ $h{$_}}   $_\n";
    }

This way, autovivification does the right thing without knowing whether
the key exists.  It does involve more technology, however.  The string
manipulation is easier to understand.

Anno



------------------------------

Date: Sun, 22 Dec 2002 02:03:42 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: Concatenation help for beginner at Perl
Message-Id: <3E051D2A.19182979@acm.org>

al wrote:
> 
> I am writing a program to convert a comma-delimited data file from a
> grammatical database into a compiled text, one which would be suitable
> for a chrestomathy.
> 
> Below I give a sample of my data, desired output, and my code.
> Ultimately, the data file will be much larger.  But, for now, I am
> using "4Q246.db" as a sample of the larger file.
> 
> In between each text, I want it to add three CRs, two blank lines,
> between the end of the one text and the title of the next.
> Ultimately, I would love it if they could all go into files according
> to their names, but I am pressed for time and just need the second
> field for each line collated by line number for right now.
> 
> DATA:
> 
> 4Q246.1.1, yhwl[(], prep.|s5, prep.loc~,
> 4Q246.1.1, tr#, v-G11, v., <yr# to settle
> 4Q246.1.1, lpn, v-G10, v., <lpn to fall
> 4Q246.1.1, Mdq, prep., prep.locative, <Mdq before
> 4Q246.1.1, )ysrk, n-mse, p.obj., <)srk chair
> 
> 4Q246.1.2, )kl[m], n-mse, vocative, <Klm king
> 4Q246.1.2, )ml(<m>[l], prep.|n-mse, prep.temporal|p.obj., <Ml( forever
> 4Q246.1.2, ht), p2, s., <ht( you
> 4Q246.1.2, zgr, adj-msa, adj.modifier, <zgr angry/distressed
> 4Q246.1.2, Kyn#w, c.conj.|n-mpc|s2, s., <yn# year
> 
> 4Q246.1.3, ...,, ,
> 4Q246.1.3, Kwzx, n-msc|s2, , <wzx vision
> 4Q246.1.3, )lkw, c.conj.|adj-mse, adj.substantive,
> 4Q246.1.3, ht), p2, s., <ht) you
> 4Q246.1.3, d(, prep., prep.temporal, <d( until
> 4Q246.1.3, )ml(, n-mse, p.obj., <Ml( forever
> 
> OUTPUT (right justified eventually):
> 
> )ysrk Mdq lpn tr# yhwl[(]   4Q246.1.1
> Kyn#w zgr ht) )ml(<m>[l] )kl[m]   4Q246.1.2,
> )ml( d( ht) )lkw Kwzx ...   4Q246.1.3


#!/usr/bin/perl
use warnings;
use strict;

my $file = '4Q246.db';
open INFO, $file or die "Cannot open $file: $!";
open OUT, '>>qa.txt' or die "Cannot open file 'qa.txt' $!";

my %data;
while ( <INFO> ) {
    next unless /\S/;
    my ( $ref, $form ) = split /\s*,\s*/;
    if ( exists $data{ $ref } ) {
        $data{ $ref } = "$form $data{ $ref }";
        }
    else {
        print OUT "@{[reverse %data]}\n" if %data;
        %data = ( $ref, $form );
        }
    }
print OUT "@{[reverse %data]}\n" if %data;

__END__



John
-- 
use Perl;
program
fulfillment


------------------------------

Date: 21 Dec 2002 22:33:14 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: incorporating off-site content
Message-Id: <au2q7a$9io$3@mamenchi.zrz.TU-Berlin.DE>

According to Tintin <me@privacy.net>:
> 
> "bashbrothers" <bashbrothers@cox.net> wrote in message
> news:ATQM9.63926$Y86.10777@news2.central.cox.net...
> > To put the original portion of an email or Usenet response before the
> quoted
> > part, as opposed to the more logical sequence of quoted portion first with
> > original following. This term is generally used pejoratively with the
> > implication that the offending person is a newbie, a Microsoft addict
> > (Microsoft mail tools produce a similar format by default), or a
> > garden-variety idiot.
> >
> > I think one or more of these applies to me, lol, thanks for your help and
> > sorry to offend.
> 
> The next thing you need to work on is quoting in context.

 ...but he gets points for a gracious response.  That makes up for something.

Anno


------------------------------

Date: Sat, 21 Dec 2002 16:17:02 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: OCR Module?
Message-Id: <3E04DA4E.A1B55EA9@earthlink.net>

Graham Knight wrote:
> 
> Would anyone happen to know of an perl OCR module that will accept
> .jpg or .gif images and spit out some text? I've been looking for one
> for a while now without much luck.

There isn't one AFAIK.  How would you go about doing OCR without perl?

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: Sat, 21 Dec 2002 16:56:51 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Performance for my roguelike
Message-Id: <3E04E3A3.7B551923@earthlink.net>

gibbering poster wrote:
[snip]
> for $x (0 .. $MAX_X) {
>     for $y (0 .. $MAX_Y) {
>         $grid[$x][$y] = ' '; # Init to walls
>     }
> }

It would be faster to do:
   my @grid = map [ (' ') x (1+$MAX_Y) ], 0 .. $MAX_X;


> And to print it:
> 
> for $x (0 .. $MAX_X) {
>     for $y (0 .. $MAX_Y) {
>         print "$grid[$x][$y]";
>     }
>     print "\n";
> }

And this would be faster as:

   print @$_, "\n" for @grid;


> Cheesy, yet simple to start....  Using Term::Readkey, and calls to
> screen clearing (cls, clear), I was able to get Very fast seamless
> screen refreshes...  but...
> 
> Once I got my dungeon generation algorithm the way I wanted it, I
> realized that the end node of my grid array could no longer be a
> scalar... It would need to keep track of players, monsters, traps, and
> items that might reside on a given tile. So, I created a Tile Object,
> and filled the grid array with Tiles...
> 
> In the Tile Object, I have a method something like:
> 
> sub get_tile_character {
>     my $this = shift;
>     return $this->{'display_character'};
> }
> 
> Since I've done that, My screen refreshes are about 10X slower.. still
> very fast, but slow enough that it looks annoying...
> 
> I was wondering if any of the folks in this group would have any
> suggestions on how I can speed this up...  Some ideas I've had yet not
> implemented...
>
> 1) Keep a local cache of the grid as a property of the main map
> object...  This would have end nodes as scalar characters like the old
> version, and would be completely reloaded each time the character
> moved.
>
> 2) Same as #1, but only update the parts that have changed

   use POSIX ();
   use Term::Cap;
   my $ospeed = do {
      (my $termios = POSIX::Termios->new)->getattr;
      $termios->getospeed();
   };
   my $term = Term::Cap->Tgetent( { OSPEED => $ospeed } );

   $term->Tgoto('cm', $col, $row, \*STDOUT);

> 3) Instead of printing 1 character at a time, buffer the whole line
> into a string of characters, and print that all at one time per line

The print function automatically performs line buffering on stdout,
unless you have disabled it by setting $|.

> 4) Same as #3, but put all the lines together into 1 string with
> newlines, and print that...

I would go with updating only those parts of the screen which need
updating.

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: Sat, 21 Dec 2002 20:23:26 -0000
From: "doofus" <jim.bloggs@eudoramail.com>
Subject: Placement of use statements
Message-Id: <au2ijv$3tjfo$1@ID-150435.news.dfncis.de>

Should use statements be put at the top of a program, or just where they
are used?




------------------------------

Date: 21 Dec 2002 20:47:42 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@post.rwth-aachen.de>
Subject: Re: Placement of use statements
Message-Id: <au2k1e$9oe$1@nets3.rz.RWTH-Aachen.DE>

Also sprach doofus:

> Should use statements be put at the top of a program, or just where they
> are used?

Ordinary (that is, non-pragmatic) modules can be included anywhere you
like. You can even access their functionality in a line before the
use-statement. That's because the use() happens at compile time no
matter where you put it.

Pragmatic modules may behave differently. 'use strict' for instance is
block scoped so it matters where you put it.

However, it is common practice to assemble all the used modules on top of
your file. Their might be reasons not to do so like when there is a
special branch in your program that requires an additional module. In
such a situation you'd probably be using require() anyway since it
happens at runtime and would allow for loading a module only on demand.

Tassilo
-- 
$_=q!",}])(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus;})(rekcah{lrePbus;})(lreP{rehtonabus;})(rehtona{tsuJbus!;
$_=reverse;s/sub/(reverse"bus").chr(32)/xge;tr~\n~~d;eval;


------------------------------

Date: Sat, 21 Dec 2002 20:48:39 GMT
From: "Ian.H [dS]" <ian@WINDOZEdigiserv.net>
Subject: Re: Placement of use statements
Message-Id: <ssk90vcvqnsaubimm0i3po3a32ahouusg0@4ax.com>
Keywords: Remove WINDOZE to reply

-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1

In a fit of excitement on Sat, 21 Dec 2002 20:23:26 -0000, "doofus"
<jim.bloggs@eudoramail.com> managed to scribble:

> Should use statements be put at the top of a program, or just where
> they are used?
> 

At the top.


Regards,

  Ian

-----BEGIN xxx SIGNATURE-----
Version: PGP Personal Privacy 6.5.3

iQA/AwUBPgTTpmfqtj251CDhEQI5CgCgvv1B86DEyFSEwgTWujCt8HfoYzcAoL+p
JPSihsLD2v90UmwO1ADGATWK
=kzuK
-----END PGP SIGNATURE-----

-- 
Ian.H  [Design & Development]
digiServ Network - Web solutions
www.digiserv.net  |  irc.digiserv.net  |  forum.digiserv.net
Scripting, Web design, development & hosting.


------------------------------

Date: Sat, 21 Dec 2002 20:51:43 -0000
From: "doofus" <jim.bloggs@eudoramail.com>
Subject: Re: Placement of use statements
Message-Id: <au2k91$3tgo2$1@ID-150435.news.dfncis.de>

Tassilo v. Parseval wrote:
> Also sprach doofus:
>
>> Should use statements be put at the top of a program, or just where
>> they are used?
>
> Ordinary (that is, non-pragmatic) modules can be included anywhere you
> like. You can even access their functionality in a line before the
> use-statement. That's because the use() happens at compile time no
> matter where you put it.
>
> Pragmatic modules may behave differently. 'use strict' for instance is
> block scoped so it matters where you put it.
>
> However, it is common practice to assemble all the used modules on
> top of your file. Their might be reasons not to do so like when there
> is a special branch in your program that requires an additional
> module. In such a situation you'd probably be using require() anyway
> since it happens at runtime and would allow for loading a module only
> on demand.

Excellent and incredibly speedy response!

That clears it up nicely.

Thank you... :-)

--
doofus




------------------------------

Date: 21 Dec 2002 13:12:13 -0800
From: jeffmott@twcny.rr.com (Jeff Mott)
Subject: race condition and file locking
Message-Id: <f9c0ce19.0212211312.9577de9@posting.google.com>

Could the following cause a race condition?

open(FH, '+<', $file) or open(FH, '+>', $file) or die $!;

Also, does flock need to be called immediately following the open to
be effective? Would I be able to do...

if ( condition ) {
  flock FH, LOCK_EX;
} else {
  flock FH, LOCK_SH;
}


------------------------------

Date: Sat, 21 Dec 2002 17:07:20 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: race condition and file locking
Message-Id: <3E04E618.88D20A53@earthlink.net>

Jeff Mott wrote:
> 
> Could the following cause a race condition?
> 
> open(FH, '+<', $file) or open(FH, '+>', $file) or die $!;

Yes -- it's possible for someone to open (and create) the file in
between the first open() call and the second -- this would result in the
second open() call truncating the file.  I would suggest you use a
single call to sysopen here:

   sysopen( FH, $file, O_RDWR|O_CREAT ) or die $!;

> Also, does flock need to be called immediately following the open to
> be effective?

No, you can do it any time that you like.  However, if you examine the
contents of FH in between opening it and locking it, then that
examination is subject to a race condition.

> Would I be able to do...
> 
> if ( condition ) {
>   flock FH, LOCK_EX;
> } else {
>   flock FH, LOCK_SH;
> }

Yes -- but don't make 'condition' dependent on what's in FH.  Or if you
do, remember that the examination was done under suspicious
circumstances, and you should re-examine the data after you've locked
the handle.

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: 21 Dec 2002 22:00:31 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: race condition and file locking
Message-Id: <au2o9v$9io$1@mamenchi.zrz.TU-Berlin.DE>

According to Jeff Mott <jeffmott@twcny.rr.com>:
> Could the following cause a race condition?
> 
> open(FH, '+<', $file) or open(FH, '+>', $file) or die $!;

These are two distinct operations on the same file, so yes.

> Also, does flock need to be called immediately following the open to
> be effective?

The lock is only effective once you have it.  When you get it is your
business.  But see below.

>                Would I be able to do...
> 
> if ( condition ) {
>   flock FH, LOCK_EX;
> } else {
>   flock FH, LOCK_SH;
> }

You can do that.  There is no way to open a file and lock it in one go,
so in principle it doesn't matter whether you get your lock right away
or wait a while.  There is always a gap.  Of course, you're not allowed
to use the filehandle in any way before you lock it, so opening a file
prematurely doesn't make much sense.

Depending on what you intend to do, it may be advisable to seek to the
end before writing to a file.  Someone else may have have locked it and
written to it before you got the lock.  That applies whether you have
explicitly "done something else" or requested the lock immediately after
open().

The '+>>' (read/append) mode of open() sometimes offers a solution for
problems like yours.

Anno


------------------------------

Date: Sat, 21 Dec 2002 15:59:16 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: trapping connection timeout ... using sockets
Message-Id: <3E04D624.587FD5F8@earthlink.net>

Leo wrote:
> 
> Is there a way to get out of connection timeout using IO::Socket,
> if the host im trying to connect to is down or won't give you
> any data for some time , how will I do this?

It takes multiple steps.

   1/ Create a tcp socket, not connected to anything.
   2/ Set it in nonblocking mode.
   3/ Call ->connect on it, to tell it to initiate a connection.
   4/ Use IO::Select to wait for it to become writable (with a timeout).
   5/ Send your request data.
   6/ use IO::Select to wait for it to become readable (with a timeout).

If you don't have to send any data [eg, for a 'daytime' request], then
steps (4) and (5) can be ommited.

Remember check on the return value of each operation to be sure that the
step succeeds.

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: 21 Dec 2002 11:49:55 -0800
From: pd@world.std.com (Peter Davis)
Subject: Use a variable on the right side of =~ ?
Message-Id: <f6e2d38e.0212211149.19763cff@posting.google.com>

I'm trying to enhance my mail fetching script by including a simple
filtering capability.  Basically, I have a filter file like this:

from:somespammer
subject:some spam stuff

In my script, I read this into @spamfilt, and then attempt the
following:

        foreach $filt (@spamfilt) {
            ($tag, $val) = ($filt =~ /([^:]+)\:\s*([\s\S]*)/m);
            $isspam = 0;
            $testhdr = $head{$tag};
            print "Comparing $testhdr :: $val\n"; # debugging
            if ($testhdr =~ /$val/) {
                printf ("SPAM  %02d/%02d %-17s %-48s\n",
                        $msgmn, $msgdy,
                        substr($from, 0, 16),
                        substr($head{'subject'}, 0, 48));
                open OUT, ">$spampath/${nxtspammsgnum}";
                print OUT $msg;
                close OUT;
                $nxtspammsgnum += 1;
                $popbox->delete($i);
                $isspam = 1;
                last;
            }
        }

The print statement shows the right things being compared, but I
*never* get a match, though I can see the strings matching from the
print statement.

Do I need to quote $val in the if statement somehow?  Any clues to
what's going wrong here.  I'm running Cygwin Perl 5.6.1.

Originally, I had hardcoded things like:

  if ($head{'content-type'} =~ "euc-kr") ...


That worked just fine.  I'm trying to generalize it a bit, though.

Thanks very much.

-pd


------------------------------

Date: 21 Dec 2002 22:16:06 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Use a variable on the right side of =~ ?
Message-Id: <au2p76$9io$2@mamenchi.zrz.TU-Berlin.DE>

According to Peter Davis <pd@world.std.com>:

See the FAQ: perldoc -q 'variable to use in a regex'

> I'm trying to enhance my mail fetching script by including a simple
> filtering capability.  Basically, I have a filter file like this:
>
> from:somespammer
> subject:some spam stuff
> 
> In my script, I read this into @spamfilt, and then attempt the
> following:
> 
>         foreach $filt (@spamfilt) {
>             ($tag, $val) = ($filt =~ /([^:]+)\:\s*([\s\S]*)/m);
                                                      ^^^^
That character-class catches everything.  Replace it by "." and add
the /s modifier to make it more readable.

>             $isspam = 0;
>             $testhdr = $head{$tag};
>             print "Comparing $testhdr :: $val\n"; # debugging
>             if ($testhdr =~ /$val/) {

Here $val can contain any character from your liberal character class,
including those that have special meaning in a regex.  You will have to
use some form of quotemeta() to make the string regex-compatible.

However, what you do is a plain substring match.  That can be done
easier and more efficiently by index(): 

    if ( index( $testhdr, $val) >= 0 ) { 
        # etc...

[rest snipped]

Anno


------------------------------

Date: Sat, 21 Dec 2002 17:02:45 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Use a variable on the right side of =~ ?
Message-Id: <slrnb09sol.icc.tadmc@magna.augustmail.com>

Peter Davis <pd@world.std.com> wrote:

> I'm trying to enhance my mail fetching script by including a simple
> filtering capability.  


You should give very serious consideration to using a module
rather than reinvent a perfectly good (and existing) wheel.

I use Mail::Audit.


> Basically, I have a filter file like this:
> 
> from:somespammer
> subject:some spam stuff
> 
> In my script, I read this into @spamfilt, and then attempt the
> following:
> 
>         foreach $filt (@spamfilt) {
>             ($tag, $val) = ($filt =~ /([^:]+)\:\s*([\s\S]*)/m);
>             $isspam = 0;
>             $testhdr = $head{$tag};


What is in the %head hash?


>             print "Comparing $testhdr :: $val\n"; # debugging


If $val has whitespace at the end, you won't be able to see it...

   print "Comparing ($testhdr) :: ($val)\n";


>             if ($testhdr =~ /$val/) {


What if   $val = '...'  ?

That is likely to match when you don't want it to match.

   if ($testhdr =~ /\Q$val/) {

or, probably better:

   if ($testhdr =~ /\b\Q$val\E\b/) {

or, easier to understand:

   $val = quotemeta $val;
   if ($testhdr =~ /\b$val\b/) {

or:

   if ( index($testhdr, $val) >= 0 ) {


>                 open OUT, ">$spampath/${nxtspammsgnum}";


You should always, yes *always*, check the return value from open():

   open OUT, ">$spampath/${nxtspammsgnum}" or
      die "could not open '$spampath/$nxtspammsgnum'  $!";

The curly braces serve no purpose, so why clutter things up 
by including them?



> The print statement shows the right things being compared, 


Not if $val happens to have trailing whitespace...


> but I
> *never* get a match, though I can see the strings matching from the
> print statement.


Machines don't make mistakes, people do.  :-)


> Do I need to quote $val in the if statement somehow?  


I dunno. Maybe. It depends on what you want it to do.

> Any clues to
> what's going wrong here.


Apart from the whitespace thing, no.

You must have 2 things to analyse the behavior of a pattern match:

   the regex (along with the operator that uses the regex)

   the string that is to be matched against

We do not have either one, so we won't be able to help much.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Sun, 22 Dec 2002 01:04:46 GMT
From: "pd" <pd@world.std.com>
Subject: Re: Use a variable on the right side of =~ ?
Message-Id: <Oc8N9.419196$%m4.128547@rwcrnsc52.ops.asp.att.net>

"Anno Siegel" <anno4000@lublin.zrz.tu-berlin.de> wrote in message
news:au2p76$9io$2@mamenchi.zrz.TU-Berlin.DE...
> According to Peter Davis <pd@world.std.com>:
>
> >         foreach $filt (@spamfilt) {
> >             ($tag, $val) = ($filt =~ /([^:]+)\:\s*([\s\S]*)/m);
>                                                       ^^^^
> That character-class catches everything.  Replace it by "." and add
> the /s modifier to make it more readable.

Thanks!  I had cobbled that code from some other script, and it seemed to
work.

> However, what you do is a plain substring match.  That can be done
> easier and more efficiently by index():
>
>     if ( index( $testhdr, $val) >= 0 ) {
>         # etc...

Excellent!  Thanks very much.

-pd






------------------------------

Date: Sun, 22 Dec 2002 01:13:01 GMT
From: "pd" <pd@world.std.com>
Subject: Re: Use a variable on the right side of =~ ?
Message-Id: <xk8N9.419306$WL3.120361@rwcrnsc54>


"Tad McClellan" <tadmc@augustmail.com> wrote in message
news:slrnb09sol.icc.tadmc@magna.augustmail.com...
> Peter Davis <pd@world.std.com> wrote:
>
> > I'm trying to enhance my mail fetching script by including a simple
> > filtering capability.
>
>
> You should give very serious consideration to using a module
> rather than reinvent a perfectly good (and existing) wheel.
>
> I use Mail::Audit.

I'm not familiar with Mail::Audit, but I'll take a look.  I am using
Net::POP3.  I wrote my own script for a few reasons:

1) It works with gnus, mutt, mh, etc.

2) It adds X-POP-Server and X-POP-UIDL, which my other script, nuke, can use
to mark certain messages for deletion next time I connect to the POP server.
That way, I can leave messages on the server, but selectively purge the ones
I don't want to read twice.

> What is in the %head hash?

Basically, it's keyed by message headers: subject, to, from, content-type,
etc.

>
>
> >             print "Comparing $testhdr :: $val\n"; # debugging
>
>
> If $val has whitespace at the end, you won't be able to see it...

Aha!  I was chomp'ing the values, but there was still a stray \r (Yes, I'm
on Windows).  Doing

$val =~ s/\r+$//;

seems to have cured the problem.  Thanks *very* much.

>    if ( index($testhdr, $val) >= 0 ) {

Yup, I like this one.

> >                 open OUT, ">$spampath/${nxtspammsgnum}";
>
>
> You should always, yes *always*, check the return value from open():

Agreed.

Thanks *very* much!

-pd





------------------------------

Date: Sat, 21 Dec 2002 15:53:55 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Wierd loop/chdir behavior
Message-Id: <3E04D4E3.D14E04C6@earthlink.net>

Steve Walker wrote:
> 
> Anybody have an idea what is going on here?
> 
> Background:
> 
>     goal - build a configuration file parser (yes, I know there is a
> package now but I didn't then) for use by a perl script.

Actually, there are a number of packages for parsing configuration
files. Some of this is because there are various different syntaxes for
how to write a config file, and some is because different people prefer
different APIs.

>     requirements -
>           1) since script will use 'find' on a directory that includes
> symbolic links and I don't want to include '-follow', paths configured
> to be searched should be globbed/converted to an absolute path before
> use.

1/ When you say it will use 'find', do you mean it will use the
File::Find module's function &find, or do you mean that the script will
use the external 'find' program?

2/ Why should a path being a symbolic link prevent find from working on
it?

3/ What does glob have to do with symbolic links?  It doesn't expand
them, you know.  Do you mean readlink()?

>           2) multiple target paths are to be considered
>
> Sample config:  (file name clarit.conf)
> 
> BuildPath=~/Data/build
> ActivePath=Data/active
> NewDataPath=~/epaa:~/epab:~/epac:~/epaa:~/epab:~/epac
> WebPath=~/clhtml

Hmm... these paths all have tildes in them -- afaik, only the shell, and
glob, directly understand what ~user/path and ~/path mean.  Is this why
you wanted to use glob() on the paths here?

sub tilde_expand($) {
   my $path = shift;
   $path =~ s/([^\w~])/\\$1/g;
   (glob($path))[0];
}

-- 
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r  coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 4301
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[22079] in Perl-Users-Digest

Perl-Users Digest, Issue: 4301 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sat Dec 21 21:05:52 2002

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Dec 21 21:05:52 2002