[19781] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1976 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Oct 22 03:05:51 2001

Date: Mon, 22 Oct 2001 00:05:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <1003734310-v10-i1976@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Mon, 22 Oct 2001     Volume: 10 Number: 1976

Today's topics:
    Re: [off topic] Thanks to all people here :-) <dha@panix.com>
        Calculating abitrary root? <michael-a-mayo@att.net>
    Re: Calculating abitrary root? (Chas Friedman)
    Re: Config data not as text file but as module code --  <goldbb2@earthlink.net>
    Re: File I/O <Neal.Coombes@telus.net>
        IO::Socket broken on win  <f.galassi@e-mind.it>
    Re: lookingglass.pl <godoy@conectiva.com>
        LTRIM & RTRIM ? <ad2ndhand@yahoo.com>
    Re: LTRIM & RTRIM ? <rereidy@indra.com>
    Re: Need multiple matches in a regular expression <please@no.spam>
    Re: Need multiple matches in a regular expression (Tad McClellan)
    Re: Need multiple matches in a regular expression <thelma@alpha2.csd.uwm.edu>
    Re: perl algorithm <please@no.spam>
    Re: perl algorithm <bwalton@rochester.rr.com>
    Re: perl algorithm (F. Xavier Noria)
    Re: Printing tif files <Francis.Derive@wanadoo.fr>
        Problem with cpan db_file-1.78 / linux redhat 7.1 (Michael J Rogers)
    Re: Remove duplicates from a logfile <nospam_artd@speakeasy.net>
    Re: Remove duplicates from a logfile (Garry Williams)
    Re: Remove duplicates from a logfile <please@no.spam>
    Re: Remove duplicates from a logfile <drl7122@yahoo.com>
    Re: Remove duplicates from a logfile <uri@sysarch.com>
    Re: Remove duplicates from a logfile <drl7122@yahoo.com>
    Re: Skipping following lines if the same <andrew@rivendale.net>
    Re: Truncation of array through reference <rog@stanford.edu>
    Re: xsub - does anything work? <nospam-abuse@ilyaz.org>
    Re: xsub - does anything work? <joe+usenet@sunstarsys.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 22 Oct 2001 05:33:35 GMT
From: "David H. Adler" <dha@panix.com>
Subject: Re: [off topic] Thanks to all people here :-)
Message-Id: <slrn9t7btf.qc7.dha@panix2.panix.com>

In article <3BD2CDE5.EDDCCD12@archangelis.com>, olivier laurent wrote:
> 
> I know it's could be considered as "off topic" but I just wanted to
> thank all people posting answers on this NG.

Ah, if only all off-topic posts in clpmisc were of this nature... :-)

dha

-- 
David H. Adler - <dha@panix.com> - http://www.panix.com/~dha/
Free Randal Schwartz!  <http://www.rahul.net/jeffrey/ovs/>
(ok, maybe not free, but competitively priced!)


------------------------------

Date: Mon, 22 Oct 2001 04:16:57 GMT
From: "Michael A Mayo" <michael-a-mayo@att.net>
Subject: Calculating abitrary root?
Message-Id: <Z4NA7.135808$3d2.4092877@bgtnsc06-news.ops.worldnet.att.net>

How do I calculate an arbitrary root of a number?  For example, 3rd root (or
cube root)?

                -Mike




------------------------------

Date: Mon, 22 Oct 2001 05:00:41 GMT
From: friedman@math.utexas.edu (Chas Friedman)
Subject: Re: Calculating abitrary root?
Message-Id: <3bd3a79f.16165404@news.itouch.net>

On Mon, 22 Oct 2001 04:16:57 GMT, "Michael A Mayo"
<michael-a-mayo@att.net> wrote:

>How do I calculate an arbitrary root of a number?  For example, 3rd root (or
>cube root)?
>
>                -Mike
The exponentiation operator, **, will probably do what you want.
(E,g,. print 8**(1/3);)
                            chas


------------------------------

Date: Mon, 22 Oct 2001 03:00:20 -0400
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Config data not as text file but as module code -- Oops!
Message-Id: <3BD3C404.73C98DE4@earthlink.net>

Ralph Snart wrote:
> 
> On Sat, 20 Oct 2001 17:38:36 +0200, Markus Dehmann <markus.cl@gmx.de>
> wrote:
> >no, this isn't a metaphor! I start the script that uses the 20MB
> >module and wait a minute or so. But then, the script ends and I can
> >read "Getötet."  which is German and means "Killed.".
> >
> >So, the process is killed. And "Exporter" is not the problem, I found
> >out.
> >
> >With some test data, some KB, it works and the module way is much
> >faster than the data file way. But a module with >20MB source code
> >seems to be too much for the interpreter...? That's bad...
> 
> it's almost certainly a per-process limit on the system you're using.
> 
> when you try to use more memory than you're allowed, your program just
> gets killed.  that's the unix way.

I thought that the unix way to indicate that you're using more memory
than you are allowed is for malloc to return NULL.

[Well, actually for sbrk to fail, but noone sane uses sbrk directly]

If you're really getting *killed* from memory allocation, then it means
that you are using more memory than the system has virtual swap space
[or more memory than physical memory+swap space, whichever].

Either way, the intelligent solution is not to ask for more memory --
which may not be possible -- it's to use less in the first place!

Don't keep your entire config file in memory -- instead, store it in
hash which is tied to a file, so that only the items which are currently
being operated on need to deal with are kept in memory, not the entire
thing.

-- 
"What does stupid old man mean pidgin talk?
Shampoo does not talk like a bird."


------------------------------

Date: Mon, 22 Oct 2001 03:16:45 GMT
From: "Neal E. Coombes" <Neal.Coombes@telus.net>
Subject: Re: File I/O
Message-Id: <3BD38FF2.9168D78F@telus.net>

Mark Jason Dominus wrote:
> >In the blow code, $name never makes it into testfile.txt which contains
> >the text "some pattern" within it.
> 
> It works fine for me, as long as you are expecting 'some name' to
> overwrite part of the next line in the file.  For example:

Strange.

> Are you sure your pattern is matching?  I notice that you didn't show
> us the real code, so if your pattern were defective we would not
> notice that.

I actually tried to run that code, and it did not work for me.  Simply
didn't modify the file at all.  The code I was really trying to run is a
few posts earlier, I simplified the code by isolating the problem.

I've recieved much help on this newsgroup already for which I am
thankful.  I think I just have to learn a lot more about Perl yet.  I'm
on day 3 now, so I'm not too disappointed in myself, especially if it
works for you!  At least something works!

Thanks for yours, Tad's, and Tony's help.  I'll get there.

Neal


------------------------------

Date: Mon, 22 Oct 2001 06:38:58 GMT
From: Fe <f.galassi@e-mind.it>
Subject: IO::Socket broken on win 
Message-Id: <nja7ttogo3ga2mkog13bceh92u0ckiij71@4ax.com>

have anybody noticed that IO::Handle->blocking(0) doesnt work under
win ?, it always return undef leaving the handle blocking.
i guess it's ok cause win doesnt support nonblocking fh (am i wrong?)
but i dont get why IO::Socket uses broken inherited blocking() method
instead of overriding it, since sockets can be made nonblocking via
ioctl  (that , according to perlport, is really a wrapper to
ioctlsocket) and FIONBIO set to 1 (i.e. POE does it right).
also i am missing a lot of error constants (even ubiquitous
EWOULDBLOCK) and getting inconsistent behaviors when multiplexing
(i.e. when nonblocking connect fails with certain errors like host
unreachable, select returns that socket got exceptions, but they're
meant to handle oob data only).
i fear the same problem applies to beos and os2, i am not sure about
mac.
worst of all, it seems noone has addressed the problem somewhere in
official documentation.

probably i am missing many things, anyways i am wondering:
will internals be fixed in the future?
and/or should IO::Socket cover os dependent issues to assure
portability?
and/or should someone make a IO::NonBlocking module exporting a
portable set_nonblocking?
and/or should someone cover the issue in documentation perlport and/or
perlfaq 8 ?
and/or is it the current situation just cool?

any reply/feedback would be appreciated, tnx

p.s. i am working on activestate 5.6.0/623, various m$ os's


------------------------------

Date: Sun, 21 Oct 2001 23:53:17 -0200
From: Jorge Godoy <godoy@conectiva.com>
Subject: Re: lookingglass.pl
Message-Id: <conectiva-linux.m3hessmodu.fsf@godoy.laptop>

"Steffen Müller" <tsee@gmx.net> writes:

> "Tad McClellan" <tadmc@augustmail.com> schrieb im Newsbeitrag
> news:slrn9t3pk0.2fg.tadmc@tadmc26.august.net...
> 
> [snip]
> 
> | Here?
> 
> or in alt.perl or in comp.lang.perl.moderated (which I doubt). I have been
> reading these ng's for a year or so at the most, so it's within that
> timeframe. I just tried to find it with google, but couldn't. I'm sorry.

If it's the same discussion I'm thinking about, it was in
clp.moderated and it was about arrays starting with zero or one in its
index. 



-- 
Godoy. <godoy@conectiva.com>

Solutions Developer       - Conectiva Inc. - http://en.conectiva.com
Desenvolvedor de Soluçġes - Conectiva S.A. - http://www.conectiva.com.br


------------------------------

Date: Mon, 22 Oct 2001 14:57:25 +0800
From: "ad2ndhand" <ad2ndhand@yahoo.com>
Subject: LTRIM & RTRIM ?
Message-Id: <9r0f39$qc9$1@coco.singnet.com.sg>

Anyway I can do left trim and right trim in Perl. The closest thing I can
think of is chomp and chop but that only takes care the back part but not
the front part.




------------------------------

Date: Mon, 22 Oct 2001 01:04:41 -0600
From: Ron Reidy <rereidy@indra.com>
Subject: Re: LTRIM & RTRIM ?
Message-Id: <3BD3C509.B7FDE2F4@indra.com>

ad2ndhand wrote:
> 
> Anyway I can do left trim and right trim in Perl. The closest thing I can
> think of is chomp and chop but that only takes care the back part but not
> the front part.
Look at regexes and the 's' operator.
-- 
Ron Reidy
Oracle DBA
Reidy Consulting, L.L.C.


------------------------------

Date: Mon, 22 Oct 2001 01:11:25 GMT
From: Andrew Cady <please@no.spam>
Subject: Re: Need multiple matches in a regular expression
Message-Id: <87vgh88or5.fsf@homer.cghm>

Thelma Lubkin <thelma@alpha2.csd.uwm.edu> writes:

>                      minimal change -- try this:
>                      if ($line =~ /Server passed checks{3}/) 

That will match  "Server passed checksss".  You must be thinking:

/(?:Server passed checks){3}/ but that will only match if the string
is repeated 3 times without even a space in between checksServer.

If you want to count the matches you can do:

my $i = 0;
$i++ while /Server passed checks/g;

To stop after 3:

my $i = 0;
$i++ while $i<3 && /Server passed checks/g;


------------------------------

Date: Mon, 22 Oct 2001 01:52:21 GMT
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Need multiple matches in a regular expression
Message-Id: <slrn9t6ncn.adu.tadmc@tadmc26.august.net>

Thelma Lubkin <thelma@alpha2.csd.uwm.edu> wrote:
>Linux_303 <linux_303@yahoo.com> wrote:
>
>
>: What I want to do is tell the script that if it finds 3
>: "Server passsed checks", everything is good

>: Can someone help?
>                     minimal change -- 


The OP asked for help.

That does not help. 


> try this:


Did you try it?

It does not work.


>                     if ($line =~ /Server passed checks{3}/) 


That matches:

   Server passed checksss

He wants to match the phrase 3 times. He does not want to match
3 's' characters.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 22 Oct 2001 02:07:38 GMT
From: Thelma Lubkin <thelma@alpha2.csd.uwm.edu>
Subject: Re: Need multiple matches in a regular expression
Message-Id: <9qvv1a$31u$1@uwm.edu>

Tad McClellan <tadmc@augustmail.com> wrote:
: Thelma Lubkin <thelma@alpha2.csd.uwm.edu> wrote:
:>Linux_303 <linux_303@yahoo.com> wrote:
:>
:>
:>: What I want to do is tell the script that if it finds 3
:>: "Server passsed checks", everything is good

:>: Can someone help?
:>                     minimal change -- 


: The OP asked for help.

: That does not help. 


:> try this:


: Did you try it?

: It does not work.


:>                     if ($line =~ /Server passed checks{3}/) 


: That matches:

:    Server passed checksss

: He wants to match the phrase 3 times. He does not want to match
: 3 's' characters.


         ...mea culpa.  I did test it, but I messed up the test as
         well--I replaced the phrase by a single 's'.  Worse yet,
         I neglected to notice that this, even if it worked as I
         thought it did, would succeed only if the
         characters were consecutive in the line, which is unlikely
         to be what OP wants.  That I realized soon after posting.
         So, I apologize to all.
                          --thelma

: -- 
:     Tad McClellan                          SGML consulting
:     tadmc@augustmail.com                   Perl programming
:     Fort Worth, Texas


------------------------------

Date: Mon, 22 Oct 2001 01:24:01 GMT
From: Andrew Cady <please@no.spam>
Subject: Re: perl algorithm
Message-Id: <87r8rw8o65.fsf@homer.cghm>

ivank@2xtreme.net (Ivan Kozik) writes:

> I have a tab delimited file that looks like this:

[...]

> What I'm trying to do is remove one of pairs of "identical" lines
> that have identical columns 1, 2, and 4.
>
> The third column should be completely ignored. In my sample of the
> file, there are two lines that look like this:
> 
> 100039  2129881 3e-28   100.000
> 2129881 100039  3e-30   100.000
>
> one of them should be removed by the script.

Why should one of these be removed?  They don't have identical columns
1, 2, and 4.  Only column 4 is identical.  You need to better outline
your goal.  It's not clear from this exactly what you're trying to do.


------------------------------

Date: Mon, 22 Oct 2001 01:33:35 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: perl algorithm
Message-Id: <3BD37785.FB42C148@rochester.rr.com>

Ivan Kozik wrote:
> I'm having trouble writing an algorithm in perl.
> I have a tab delimited file that looks like this:
 ...
> What I'm trying to do is remove one of pairs of "identical" lines that
> have identical columns 1, 2, and 4.
> The third column should be completely ignored. In my sample of the
> file, there are two lines that look like this:
> 
> 100039  2129881 3e-28   100.000
> 2129881 100039  3e-30   100.000
> 
> one of them should be removed by the script.
 ...
> If anyone could write something or give me an idea, i'd greatly
> appreciate it.
> 
> Ivan

Try:

while(<DATA>){
	($n1,$n2,$n3,$n4)=split ' ';
	push @file,$_;
	$h{join ':',(sort ($n1,$n2),$n4)}=$#file;
}
print "$file[$h{$_}]" for sort keys %h;
__END__
100039  2129881 3e-28   100.000
100039  128401  3e-28   100.000
100039  587509  3e-28   100.000
100039  169084  3e-28   100.000
100039  7442093 2.2e-20 80.000
100039  2499217 2.2e-20 80.000
100039  1167892 2.2e-20 80.000
100039  3914100 1.7e-16 61.947
100039  22649   1.7e-16 61.947
2129881 100039  3e-30   100.000
2129881 2129881 3e-30   100.000
2129881 128401  3e-28   100.000
2129881 587509  3e-28   100.000
2129881 169084  3e-28   100.000
2129881 7442093 2.2e-20 80.000
2129881 2499217 2.2e-20 80.000
2129881 1167892 2.2e-20 80.000

-- 
Bob Walton


------------------------------

Date: 22 Oct 2001 06:56:13 GMT
From: fxn@retemail.es (F. Xavier Noria)
Subject: Re: perl algorithm
Message-Id: <9r0fud$nu801@news1s.iddeo2.es>

On Mon, 22 Oct 2001 01:33:35 GMT, Bob Walton <bwalton@rochester.rr.com> wrote:

: while(<DATA>){
      #...
  }

Is there any benefit using DATA that way instead of a standard
filehandle to a file on disk with the data to process? If so, which is
the usual way to accomplish that? Would one wrap the original script
in a shell script or whatever that copies it on top of the data file
and then calls it?

-- fxn


------------------------------

Date: 22 Oct 2001 07:37:05 +0200
From: "Francis Derive" <Francis.Derive@wanadoo.fr>
To: "=?ISO-8859-1?Q?Thorbj=F8rn Ra?= =?ISO-8859-1?Q?vn Andersen?=" <thunderbear@bigfoot.com>
Subject: Re: Printing tif files
Message-Id: <B7F97D35-2F92E@193.248.254.237>

On Mon, Oct 22, 2001 2:59 AM, Thorbj=BFrn Ravn Andersen
<mailto:thunderbear@bigfoot.com> wrote:
>Francis Derive wrote:
>> 
>> Bonjour !
>> 
>> It would be helpful a lot to automatically send to the printer
these tif
>> image files from a perl script.
>
>Depends on your platform.

I should have tell : at work it is an NT station.

>Under Unix convert to PostScript and submit that to a PostScript
printer
>queue.
>
>Under Windows, I cannot help you.

I could try following the same idea as the one you gave me for Unix
stations.

Merci.
Francis.
>-- 
>  Thorbj=BFrn Ravn Andersen           "...plus... Tubular Bells!"
>  http://bigfoot.com/~thunderbear
>






------------------------------

Date: 21 Oct 2001 19:23:01 -0700
From: mike@tristateweb.com (Michael J Rogers)
Subject: Problem with cpan db_file-1.78 / linux redhat 7.1
Message-Id: <6b231b17.0110211823.58b9c653@posting.google.com>

Hi,

I've got a fresh linux redhat 7.1 with most of the updated RPM's and a
Fresh perl 5.6.1 that I built from scratch...

I'm trying to install Sympa mailing list manager 3.2.1 and am having
trouble installing some of the Perl Modules needed...

When I try to install the first one db_file-1.78, here is what
happens:

Writing Makefile for DB_File
cp DB_File.pm blib/lib/DB_File.pm
AutoSplitting blib/lib/DB_File.pm (blib/lib/auto/DB_File)
cc -c -I/usr/local/BerkeleyDB/include -fno-strict-aliasing
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g   -DVERSION=\"1.78\"
-DXS_VERSION=\"1.78\" -fpic
-I/usr/local/lib/perl5/5.6.1/i586-linux/CORE -DmDB_Prefix_t=size_t
-DmDB_Hash_t=u_int32_t  version.c
version.c:30:16: db.h: No such file or directory
make: *** [version.o] Error 1
  /usr/bin/make  -- NOT OK
Running make test
  Can't test without successful make
Running make install
  make had returned bad status, install seems impossible


I've been searching through the forums all day with no luck, can
anyone help or point me in the right direction?

Thanks in advance,

Michael Rogers
M&A Computer Services

Web Hosting, Co-location:  http://www.tristateweb.com


------------------------------

Date: Sun, 21 Oct 2001 21:05:13 -0400
From: "speakeasy" <nospam_artd@speakeasy.net>
Subject: Re: Remove duplicates from a logfile
Message-Id: <tt6sbba69m0161@corp.supernews.com>

Not sure if it's any better but I used a hash:

sub Parse_Log {
 my ($Logfile) = @_;
 open (LOGFILE, "$Logfile") or die "Can't open $Logfile";
 while (<LOGFILE>) {
      next if /^\#/;   # Skip any comments
        my ($date, $time, $clientip, $username, $destip, $prot, $mthd,
$file, $query, $status, $useragent) = /^(\S+) (\S+) (\S+) (\S+) (\S+) (\S+)
(\S+) (\/\S+) (\S+) (\S+) (.+)$/;
  next unless ($clientip);
  $IPDB{$clientip} = '' unless (defined($IPDB{$clientip}));
 }

 close(LOGFILE);


}

"jdcfan" <nospam@newsranger.com> wrote in message
news:C1KA7.37706$ev2.44251@www.newsranger.com...
> I wrote a piece of code to remove duplicate IP addresses from a log file.
It
> seems to work fine, but I think I may have done more than I needed to do.
>
> The log would contain data in this format:
> Date - Time - IP Address
> 09/21/2001 - 19:06.14 - xxx.xxx.xxx.xxx
>
> Here is the code I used:
>
> open LOGFILE, $logfile;
> @IPLogArray = <LOGFILE>;
> close (LOGFILE);
>
> @IPLogArrayCopy=@IPLogArray; # Copy the original array
>
> foreach $IPAddress (@IPLogArray)
> {
> $IPAddress=~s/(.*)-(.*)-\s//g; # Remove all other info except for IP
Address
> push (@ParsedIPLogArray, $IPAddress); # Add IP address into a new array
> }
>
> # The original total number of elements in the IP Log
> $totalIPLogElements=$#IPLogArrayCopy;
>
> for ($x=0; $x <= $#ParsedIPLogArray; $x++)
> {
> $hash{$ParsedIPLogArray[$x]}++; # Increment IP occurance counter
>
> if ($hash{$ParsedIPLogArray[$x]} > 1)
> {
> # Current no. of elements in the array and subtract the original total no.
> $elementNumber = $x + ($#IPLogArrayCopy-$totalIPLogElements);
> splice (@IPLogArrayCopy, $elementNumber, 1);
> }
> }
>
> print @IPLogArrayCopy;
>
> ---
>
> Is there a better way to do this?  It seems to me like I made it more
difficult
> than it should be.
>
> Thanks,
> Dan L.
>
>




------------------------------

Date: Mon, 22 Oct 2001 01:33:47 GMT
From: garry@ifr.zvolve.net (Garry Williams)
Subject: Re: Remove duplicates from a logfile
Message-Id: <slrn9t6trq.omc.garry@zfw.zvolve.net>

On Mon, 22 Oct 2001 00:48:34 GMT, jdcfan <nospam@newsranger.com> wrote:
> I wrote a piece of code to remove duplicate IP addresses from a log file.  It
> seems to work fine, but I think I may have done more than I needed to do.
> 
> The log would contain data in this format:
> Date - Time - IP Address
> 09/21/2001 - 19:06.14 - xxx.xxx.xxx.xxx



So, I assume that you want to only retain the first record for each
IP address?  

Maybe before going any further, you should read the FAQ: perlfaq4,
"How can I remove duplicate elements from a list or array?"  



> Here is the code I used:
> 
> open LOGFILE, $logfile;
> @IPLogArray = <LOGFILE>;
> close (LOGFILE);


I wonder why this cannot be done in one pass of the file?  Why read
the entire file into memory?  


> @IPLogArrayCopy=@IPLogArray;	# Copy the original array
> 
> foreach $IPAddress (@IPLogArray)
> {
> $IPAddress=~s/(.*)-(.*)-\s//g; # Remove all other info except for IP Address
> push (@ParsedIPLogArray, $IPAddress);	# Add IP address into a new array
> }


Your indentation is broken.  And StudlyCaps makes my eyes hurt.  This
is getting hard to follow.  


> # The original total number of elements in the IP Log
> $totalIPLogElements=$#IPLogArrayCopy; 


This comment is wrong or the code is wrong.


> for ($x=0; $x <= $#ParsedIPLogArray; $x++)
> {
> $hash{$ParsedIPLogArray[$x]}++; # Increment IP occurance counter
> 
> if ($hash{$ParsedIPLogArray[$x]} > 1)	
> {
> # Current no. of elements in the array and subtract the original total no.
> $elementNumber = $x + ($#IPLogArrayCopy-$totalIPLogElements);		
> splice (@IPLogArrayCopy, $elementNumber, 1);
> }
> }
> 
> print @IPLogArrayCopy;


Please fix the indentation!  


> Is there a better way to do this?  It seems to me like I made it more difficult
> than it should be.


Yes, if I understand that you simply want to print the first
occurrence of each unique IP address.  

Here's a way: 

  $ perl -ne 'print unless $seen{ (split)[4] }++' log_file

Hope this helps.  

-- 
Garry Williams


------------------------------

Date: Mon, 22 Oct 2001 03:09:10 GMT
From: Andrew Cady <please@no.spam>
Subject: Re: Remove duplicates from a logfile
Message-Id: <87lmi48jax.fsf@homer.cghm>

jdcfan <nospam@newsranger.com> writes:

> I wrote a piece of code to remove duplicate IP addresses from a log
> file.  It seems to work fine, but I think I may have done more than
> I needed to do.

You really need to indent your code, or get a newsreader that doesn't
strip the indenting.  Reading this is torture.

> The log would contain data in this format:
> Date - Time - IP Address
> 09/21/2001 - 19:06.14 - xxx.xxx.xxx.xxx
> 
> Here is the code I used:
> 
> open LOGFILE, $logfile;

Check the return of open.

> @IPLogArray = <LOGFILE>;
> close (LOGFILE);
> 
> @IPLogArrayCopy=@IPLogArray;	# Copy the original array
> 
> foreach $IPAddress (@IPLogArray)
> {
> $IPAddress=~s/(.*)-(.*)-\s//g; # Remove all other info except for IP Address
> push (@ParsedIPLogArray, $IPAddress);	# Add IP address into a new array
> }

It would be much better to use strict and declare arrays such as
@ParsedIPLogArray before use.  Also, the camel notation is almost as
hard on the eyes as flat indenting, IMO.  How can you stand it?  At
the very least, don't put the type in the identifier name.  It's
annoying enough when people do it in C, but in perl the type is shown
by the first character so it's totally worthless.

It would make a lot more sense if instead of doing that copy and
making a third array, you do this:

@ParsedIPLogArray = @IPLogArray;

for (@ParsedIPLogArray) {
  s/.*-.*-\s//; # no (parens), since you weren't using them anyway
}

But anyway when you want to generate one array from another you use
map:

@ParsedIPLogArray = map { /.*-.*-\s(.*)/ } @IPLogArray;

I used the regex match operator (//) rather than the regex replace
operator (s///), since I don't think you actually wanted to do any
replaces in @IPLogArray.  This regex uses the parens to match the text
that we want to keep.  It's still a bad regex in that it will
backtrack to the end of the string twice.  You should use:

@IPs = map { /[^-]*-[^-]*-\s+(.*)/ } @IPLogArray;

> # The original total number of elements in the IP Log
> $totalIPLogElements=$#IPLogArrayCopy; 

$# is not the total number of elements in @IPLogArrayCopy, it's the
last index of the array.  @IPLogArrayCopy in scalar context is the
number of elements.

> for ($x=0; $x <= $#ParsedIPLogArray; $x++)
> {
>   $hash{$ParsedIPLogArray[$x]}++; # Increment IP occurance counter

Is that kind of comment really necessary?

>   if ($hash{$ParsedIPLogArray[$x]} > 1)	
>   {
>     # Current no. of elements in the array and 
>     # subtract the original total no.
>     $elementNumber = $x + ($#IPLogArrayCopy-$totalIPLogElements);
>     splice (@IPLogArrayCopy, $elementNumber, 1);
>   }
> }

Your hash already has the unique IPs, you don't have to go back into
the list for them.  Just use the hash.  Or assign the keys of the hash
back into the list.  Or better yet, assign to a hash in the first
place, when you're reading from LOGFILE.

> print @IPLogArrayCopy;
> 
> ---
> 
> Is there a better way to do this?  It seems to me like I made it
> more difficult than it should be.

Yes.  Here's the whole thing, with better error checking:

open LOG, "< $logfile" or die "can't open $logfile: $!";
my %seen;
while (<LOG>) {
  my $ip = (split / - /, $_, 3)[2];
  next unless $ip;
  print $ip unless $seen{$ip}++;
}
close LOG or warn "error closing $logfile: $!";

This uses split for the match.  Unlike the original, it doesn't assume
the match will succeed; it ignores the line if it doesn't.  In order
to be compatible with the original it will match the entire line
following the IP as part of the IP.

If you want to get all the IP's at once, you can use this:

open LOG, "< $logfile" or die "can't open $logfile: $!";
my %IPs = map { (split / - /)[2] => 1 } <LOG>;

Then you can print sort keys %IPs or whatever.  Note that this behaves
slightly differently than the other I wrote, just for readability.


------------------------------

Date: Sun, 21 Oct 2001 22:18:54 -0500
From: "jdcfan" <drl7122@yahoo.com>
Subject: Re: Remove duplicates from a logfile
Message-Id: <9r035p$6en$1@slb3.atl.mindspring.net>

"jdcfan" <nospam@newsranger.com> wrote in message
news:C1KA7.37706$ev2.44251@www.newsranger.com...
> I wrote a piece of code to remove duplicate IP addresses from a log file.
It
> seems to work fine, but I think I may have done more than I needed to do.

I'll try again, hopefully this works (different news service):

open LOGFILE, $logfile;
@IPLogArray = <LOGFILE>;
close (LOGFILE);

@IPLogArrayCopy=@IPLogArray;

foreach $IPAddress (@IPLogArray)
{
    $IPAddress=~s/(.*)-(.*)-\s//g;
    push (@ParsedIPLogArray, $IPAddress);
}

$totalIPLogElements=$#IPLogArrayCopy;

for ($x=0; $x <= $#ParsedIPLogArray; $x++)
{
    $hash{$ParsedIPLogArray[$x]}++;

     if ($hash{$ParsedIPLogArray[$x]} > 1)
    {
        $elementNumber = $x + ($#IPLogArrayCopy-$totalIPLogElements);
        splice (@IPLogArrayCopy, $elementNumber, 1);
    }
}

Dan L.




------------------------------

Date: Mon, 22 Oct 2001 03:52:13 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: Remove duplicates from a logfile
Message-Id: <x7adyk49gn.fsf@home.sysarch.com>

>>>>> "j" == jdcfan  <drl7122@yahoo.com> writes:

  j> "jdcfan" <nospam@newsranger.com> wrote in message
  j> news:C1KA7.37706$ev2.44251@www.newsranger.com...
  >> I wrote a piece of code to remove duplicate IP addresses from a log file.
  j> It
  >> seems to work fine, but I think I may have done more than I needed to do.

  j> I'll try again, hopefully this works (different news service):

  j> open LOGFILE, $logfile;

did that open succeed? always check for that.

  j> @IPLogArray = <LOGFILE>;

yecch! studly caps suck. use underscores, they were created for this
purpose

and slurping here is probably a poor idea. log files can get very large.

  j> close (LOGFILE);

  j> @IPLogArrayCopy=@IPLogArray;

why the copy?

  j> foreach $IPAddress (@IPLogArray)

why not loop over the lines one at a time?

  j> {
  j>     $IPAddress=~s/(.*)-(.*)-\s//g;

why the grabbing? you don't use them for anything.

  j>     push (@ParsedIPLogArray, $IPAddress);
  j> }


  j> $totalIPLogElements=$#IPLogArrayCopy;

  j> for ($x=0; $x <= $#ParsedIPLogArray; $x++)

gack, a c style loop. not needed, just do a perl foreach loop.

is $x really needed below? i can't tell.

  j> {
  j>     $hash{$ParsedIPLogArray[$x]}++;

  j>      if ($hash{$ParsedIPLogArray[$x]} > 1)
  j>     {
  j>         $elementNumber = $x + ($#IPLogArrayCopy-$totalIPLogElements);
  j>         splice (@IPLogArrayCopy, $elementNumber, 1);
  j>     }
  j> }

i won't even begin to figure out that code. it is too late and i am
groggy. please try to make it clearer what you are attempting to do.

uri

-- 
Uri Guttman  ---------  uri@sysarch.com  ----------  http://www.sysarch.com
SYStems ARCHitecture and Stem Development ------ http://www.stemsystems.com
Search or Offer Perl Jobs  --------------------------  http://jobs.perl.org


------------------------------

Date: Mon, 22 Oct 2001 00:16:01 -0500
From: "jdcfan" <drl7122@yahoo.com>
Subject: Re: Remove duplicates from a logfile
Message-Id: <9r0a5i$kq9$1@slb4.atl.mindspring.net>

"Andrew Cady" <please@no.spam> wrote in message
news:87lmi48jax.fsf@homer.cghm...

> Also, the camel notation is almost as
> hard on the eyes as flat indenting, IMO.  How can you stand it?  At
> the very least, don't put the type in the identifier name.

Well, the indenting did not occur because of the posting service I was
using...  It was indented originally.  I have programmed in ASP, JAVA, and
C.  I was tought different naming conventions for each of those langauges.
My perl code is probably a blend of all those conventions unfortunately.

> > Is there a better way to do this?  It seems to me like I made it
> > more difficult than it should be.
>
> Yes.  Here's the whole thing, with better error checking:

Thanks for your help...

Dan L.




------------------------------

Date: 22 Oct 2001 02:06:21 +0100
From: Andrew Wilson <andrew@rivendale.net>
Subject: Re: Skipping following lines if the same
Message-Id: <87wv1ozdo2.fsf@gandalf.rivendale.net>

Jasper McCrea <jasper@guideguide.com> writes:

> you could use:
> 
> my $last = '';
> @a = map { $last eq $_ ? () : ($last = $_) } @a;
> 
> unfortunately, if you have umpteen empty elements at the start, these
> will be included (not much of a problem if reading from a file). And
> it's only useful if you've slurped the whole file first.

You can eliminate the blank lines problem with this

my $last = $a[0].'a';
@a = map { $last eq $_ ? () : ($last = $_) } @a;

which just makes sure $last won't ever be the same as the first
element in the array.

cheers

Andrew


------------------------------

Date: Sun, 21 Oct 2001 23:33:37 -0700
From: Roger Levy <rog@stanford.edu>
Subject: Re: Truncation of array through reference
Message-Id: <Pine.GSO.4.33.0110212323200.23121-100000@elaine34.Stanford.EDU>

On 21 Oct 2001, Damian James wrote:

> On Sat, 20 Oct 2001 15:24:28 -0700, Roger Levy said:
> >
> >Yes, this I understand.  So would I be correct to say that @ and $# as
> >symbols are overloaded between dereferencing and name lookup?
> >
>
> No. Since @ARRAY exists, $#ARRAY is just $#ARRAY. There is no
> deferencing involved. It would be more correct to say:
>
> $#$A, $#{$A}, $#{\@ARRAY}
>
> are all the same thing, in that they are all dereferencing a
> reference to @ARRAY ( and not caring whether that's $A or \@ARRAY),
> then using $# to get the last index.

Yes, that's exactly what I meant when I said that @ and $# are overloaded.
That is they have two distinct uses: as function sometimes as
dereferencers (as in $#$A); and as name lookup symbols, _not_ as
dereferencers (as in $#ARRAY).

Perhaps "overloaded" isn't quite accurate, however, if @ is not considered
a separable, individually meaningful part of @ARRAY from the perspective
of Perl syntax.

>
> $#ARRAY is the plain old, as advertised use of $# to get the last index
> of @ARRAY. No reference involved. Taking a reference to @ARRAY doesn't
> mean that it no longer exists, so long as it is in scope.
>
> At least, I think this is where you're confused. Maybe I misread?

I think I understand things now: from the perspective of a parser of Perl
code (either a computer or me :), when I see an @ symbol I don't
immediately know whether it is part of the name of an array, or as a
separate symbol that dereferences the scalar to its right.  Whatever
follows the @, resolves this.

Thanks for the help,

Roger



------------------------------

Date: Mon, 22 Oct 2001 05:17:05 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: xsub - does anything work?
Message-Id: <9r0a4h$nhr$1@agate.berkeley.edu>

[A complimentary Cc of this posting was NOT sent to
Joe Schaefer 
<joe+usenet@sunstarsys.com>], who wrote in article <m3d73g8utj.fsf@mumonkan.sunstarsys.com>:
> Did you try left-justifying the sample code?  There should be no 
> leading spaces in your function declaration.

You mean the first row or two, right?  IIRC, all the rest is free-style...

> Also, although it's
> not documented, you can use ANSI C declarations for functions 
> rather than the old K&R variety in the docs:

This did not work some time ago, and it was documented almost
immediately.  Only alpha version (like 5.6.0) get in the gap.

> You will no doubt run into other problems as well, but IMHO XS is
> less mysterious than the docs make it appear.

The doc were written when it *was* mysterious.  A lot of things/bugs were
simplified/fixed meanwhile.  Nobody had time to fix the docs.

Ilya


------------------------------

Date: 22 Oct 2001 01:49:28 -0400
From: Joe Schaefer <joe+usenet@sunstarsys.com>
Subject: Re: xsub - does anything work?
Message-Id: <m33d4c8brr.fsf@mumonkan.sunstarsys.com>

Ilya Zakharevich <nospam-abuse@ilyaz.org> writes:

> [A complimentary Cc of this posting was NOT sent to Joe Schaefer 
> <joe+usenet@sunstarsys.com>], who wrote in article 
> <m3d73g8utj.fsf@mumonkan.sunstarsys.com>: 
> > Did you try left-justifying the sample code?  There should be no 
> > leading spaces in your function declaration.
> 
> You mean the first row or two, right?  IIRC, all the rest is free-style...

Yes- the error message he's getting comes from this section of xsubpp:

    death ("Code is not inside a function"
           ." (maybe last function was ended by a blank line "
           ." followed by a a statement on column one?)")
        if $line[0] =~ /^\s/;

> > Also, although it's
> > not documented, you can use ANSI C declarations for functions 
> > rather than the old K&R variety in the docs:
> 
> This did not work some time ago, and it was documented almost
> immediately.  Only alpha version (like 5.6.0) get in the gap.

Sorry for not checking this- of course you are right.  It's not 
part of the 5.00503 nor 5.6.0 perlxs, but it is in 5.6.1.  It's
made cut and paste of .h files much less tedious now- thanks :)

-- 
Joe Schaefer    "The only thing necessary for the triumph of evil is for good
                                     men to do nothing."
                                               -- Edmund Burke



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 1976
***************************************


home help back first fref pref prev next nref lref last post