[24151] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 6345 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Mar 31 14:05:46 2004

Date: Wed, 31 Mar 2004 11:05:10 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 31 Mar 2004     Volume: 10 Number: 6345

Today's topics:
        [NEWBIE] newline question <jan.biel@tu-clausthal.de>
    Re: [NEWBIE] newline question <remorse@partners.org>
    Re: [NEWBIE] newline question <noreply@gunnar.cc>
    Re: [NEWBIE] newline question <noreply@gunnar.cc>
    Re: [NEWBIE] newline question <nobull@mail.com>
    Re: [NEWBIE] newline question <noreply@gunnar.cc>
        Array from a string. <spikeywan@bigfoot.com.delete.this.bit>
    Re: Array from a string. <1usa@llenroc.ude>
    Re: Array from a string. <noreply@gunnar.cc>
    Re: Array from a string. <spikeywan@bigfoot.com.delete.this.bit>
    Re: Array from a string. <ittyspam@yahoo.com>
    Re: Array from a string. <1usa@llenroc.ude>
    Re: Array from a string. <spikeywan@bigfoot.com.delete.this.bit>
    Re: Array from a string. <tadmc@augustmail.com>
    Re: Choosing Perl/Python for my particular niche (Cameron Laird)
        count files + dirs <simalt@totalise.co.uk>
    Re: count files + dirs <ittyspam@yahoo.com>
    Re: count files + dirs <tadmc@augustmail.com>
    Re: count files + dirs <jurgenex@hotmail.com>
    Re: count files + dirs <postmaster@castleamber.com>
    Re: count files + dirs <tore@aursand.no>
    Re: Is Perl supposed to use 100% of the CPU? <spamtrap@dot-app.org>
    Re: Lost data on socket - Can we start over politely? <ThomasKratz@REMOVEwebCAPS.de>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 31 Mar 2004 19:25:38 +0200
From: "Jan Biel" <jan.biel@tu-clausthal.de>
Subject: [NEWBIE] newline question
Message-Id: <c4eupc$mqe$1@ariadne.rz.tu-clausthal.de>

Hello!

From some tutorials on the web I managed to create a perl script which finds
and replaces certain occurences in text files via regular expressions.

Then something happened which I cannot really explain, so I hope you can
clarify it for me.

The original perl script looks like this:

-------------------------------
$filein = 'a.txt';
$fileout = 'b.txt';

open(INFO, $filein);
open(INFO2, ">$fileout");

@lines = <INFO>;

grep(s/\n//g,@lines);
grep(s/ab/found/g,@lines);

print INFO2 @lines;

close(INFO);
close(INFO2);
--------------------------------

where a.txt is a file containing:

--------------------------------
a
b
c
--------------------------------

The resulting b.txt contains:

--------------------------------
abc
--------------------------------

So the second regular expression is ignored.

But if I write two perl scripts where each executes only one of the regular
expressions it works with the result:

--------------------------------
foundc
--------------------------------

as expected.

What is the mystery here?

I hope this wasn't too confusing :)
Janbiel



------------------------------

Date: Wed, 31 Mar 2004 12:40:12 -0500
From: Richard Morse <remorse@partners.org>
Subject: Re: [NEWBIE] newline question
Message-Id: <remorse-BCDC7D.12401231032004@plato.harvard.edu>

In article <c4eupc$mqe$1@ariadne.rz.tu-clausthal.de>,
 "Jan Biel" <jan.biel@tu-clausthal.de> wrote:

> -------------------------------
> $filein = 'a.txt';
> $fileout = 'b.txt';
> 
> open(INFO, $filein);
> open(INFO2, ">$fileout");
> 
> @lines = <INFO>;

@lines = ( 'a\n', 'b\n', 'c\n' );

> grep(s/\n//g,@lines);

@lines = ( 'a', 'b', 'c');

> grep(s/ab/found/g,@lines);

At this point, no entry in @lines matched 'ab', so the substitute never 
occurs.


Try this:

#!/usr/bin/perl

# always use these next two lines
use strict;
use warnings;

my $filein = 'a.txt';
my $fileout = 'b.txt';

open(my $in, "<", $filein) or die("Can't open $filein: $!");

# slurp all of the data into one string, since you really don't
# care about newline separations
my $data;
{
   local $/;
   $data = <$in>;
}
close($in);

# remove any newline characters
$data =~ s/\n//g;

# change 'ab' to 'found'
$data =~ s/ab/found/g;

# save the data
open(my $out, ">", $fileout) or die("Couldn't open >$fileout: $!");
print $out $data, "\n";
close($out);

__END__

HTH,
Ricky

-- 
Pukku


------------------------------

Date: Wed, 31 Mar 2004 20:03:09 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: [NEWBIE] newline question
Message-Id: <c4f19e$2hd9e6$1@ID-184292.news.uni-berlin.de>

Jan Biel wrote:
> The original perl script looks like this:

     use strict;     # Make Perl help you detect errors
     use warnings;   # "-

> $filein = 'a.txt';
> $fileout = 'b.txt';

     my $filein = 'a.txt';
     my $fileout = 'b.txt';
----^^
Declare variables with my()

> open(INFO, $filein);
> open(INFO2, ">$fileout");

     open INFO, $filein or die $!;
     open INFO2, "> $fileout" or die $!;
----------------------------^^^^^^^^^^
Check if file was successfully opened

> @lines = <INFO>;

     my @lines = <INFO>;

> grep(s/\n//g,@lines);

That works, but it's clearer written as:

     @lines = map { tr/\n//d; $_ } @lines;

Now it's time for reflection. :)

@lines is an array, and at this point, it contains three elements. You 
  seem to want to concatenate the elements to a string. That can be 
done like this:

     my $string = join '', @lines;

> grep(s/ab/found/g,@lines);

That takes one element at a time, and replaces occurrences of the 
sting 'ab'. None of the elements contains that string, so nothing happens.

You can apply the s/// operator to $string instead:

     $string =~ s/ab/found/g;

> print INFO2 @lines;

     print INFO2 "$string\n";

> close(INFO);
> close(INFO2);
> --------------------------------


HTH

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Wed, 31 Mar 2004 20:08:15 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: [NEWBIE] newline question
Message-Id: <c4f1j1$2hkaud$1@ID-184292.news.uni-berlin.de>

Richard Morse wrote:
> In article <c4eupc$mqe$1@ariadne.rz.tu-clausthal.de>,
>  "Jan Biel" <jan.biel@tu-clausthal.de> wrote:
>>
>>open(INFO, $filein);
>>open(INFO2, ">$fileout");
>>
>>@lines = <INFO>;
> 
> @lines = ( 'a\n', 'b\n', 'c\n' );

No, that wouldn't populate @lines with the same thing. This would:

     @lines = ( "a\n", "b\n", "c\n" );

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: 31 Mar 2004 19:12:38 +0100
From: Brian McCauley <nobull@mail.com>
Subject: Re: [NEWBIE] newline question
Message-Id: <u9wu506g2x.fsf@wcl-l.bham.ac.uk>

Gunnar Hjalmarsson <noreply@gunnar.cc> writes:

> > grep(s/\n//g,@lines);
> 
> That works, but it's clearer written as:
> 
>      @lines = map { tr/\n//d; $_ } @lines;

That works, but it's not clearer.  Using tr/// rather than using s///
adds efficiency not clarity.  Using "map" where you really want "for"
instead of using "grep" where you really want "for" is a neutral change.
Addding a redundant assignement is just obfuscaion.

Clearer would be something like

  tr/\n//d for @lines;

Or 
  
  s/\n//g for @lines;

-- 
     \\   ( )
  .  _\\__[oo
 .__/  \\ /\@
 .  l___\\
  # ll  l\\
 ###LL  LL\\


------------------------------

Date: Wed, 31 Mar 2004 20:27:34 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: [NEWBIE] newline question
Message-Id: <c4f2n8$2ig01u$1@ID-184292.news.uni-berlin.de>

Brian McCauley wrote:
> Gunnar Hjalmarsson <noreply@gunnar.cc> writes:
> 
>>>grep(s/\n//g,@lines);
>>
>>That works, but it's clearer written as:
>>
>>     @lines = map { tr/\n//d; $_ } @lines;
> 
> That works, but it's not clearer.

Well, I could argue, but I won't. Let's just agree that I should 
better have used 'for'. :)

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Wed, 31 Mar 2004 15:31:45 +0100
From: "Richard S Beckett" <spikeywan@bigfoot.com.delete.this.bit>
Subject: Array from a string.
Message-Id: <c4eks5$9u5$1@newshost.mot.com>

I've written a nice little script, and it gets it's array of data from the
command line arguments.

If you enter something like this: myscript.pl one two "three four" five

then the @ARGV array will contain:

one

two

three four

five

Which is EXACTLY what I want.

Now, as people seem to have an aversion to DOS these days, I've added a GUI
to the script, so that running it with no arguments fires up the gui
version.

I get my aray from an entry box...

my $entrybox = $mw -> Entry(-textvariable => \$array) -> pack();

The problem is I can't use an array for the text variable, so I have to
convert the string $array into @array.

If I do this:

@array = split(' ', $array);

Entering: one two "three four" five

gives @array of:

one

two

"three

four"

five

I was just trying to sort this out, when I realised it's an absolute
nightmare if I loop through @array, and try to if /^\".*\"/ etc.

Is there an easy way to do this?

Thanks,
--
R.
GPLRank +79.699




------------------------------

Date: 31 Mar 2004 14:41:49 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude>
Subject: Re: Array from a string.
Message-Id: <Xns94BD62A5695F1asu1cornelledu@132.236.56.8>

"Richard S Beckett" <spikeywan@bigfoot.com.delete.this.bit> wrote in
news:c4eks5$9u5$1@newshost.mot.com: 

> I was just trying to sort this out, when I realised it's an absolute
> nightmare if I loop through @array, and try to if /^\".*\"/ etc.
> 
> Is there an easy way to do this?

Yes there is. It is called checking the FAQ list before posting:

perldoc -q inside


-- 
A. Sinan Unur
1usa@llenroc.ude (reverse each component for email address)


------------------------------

Date: Wed, 31 Mar 2004 17:35:07 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Array from a string.
Message-Id: <c4eojm$2idocb$1@ID-184292.news.uni-berlin.de>

Richard S Beckett wrote:
> If I do this:
> 
> @array = split(' ', $array);
> 
> Entering: one two "three four" five
> 
> gives @array of:
> 
> one
> 
> two
> 
> "three
> 
> four"
> 
> five

See the FAQ, as Sinan suggested.

A 'regexish' application of the FAQ answer may be:

     my @array;
     push @array, $+ while $array =~ /"([^"]+)"|(\S+)/g;

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Wed, 31 Mar 2004 16:46:36 +0100
From: "Richard S Beckett" <spikeywan@bigfoot.com.delete.this.bit>
Subject: Re: Array from a string.
Message-Id: <c4ep8f$bin$1@newshost.mot.com>

> > Is there an easy way to do this?
>
> Yes there is. It is called checking the FAQ list before posting:

How do you know I didn't?

> perldoc -q inside

Now, there's a word I would _never_ have associated with this problem,
thanks.

R.




------------------------------

Date: Wed, 31 Mar 2004 11:33:40 -0500
From: Paul Lalli <ittyspam@yahoo.com>
Subject: Re: Array from a string.
Message-Id: <20040331113254.F19862@dishwasher.cs.rpi.edu>

On Wed, 31 Mar 2004, Richard S Beckett wrote:

> > > Is there an easy way to do this?
> >
> > Yes there is. It is called checking the FAQ list before posting:
>
> How do you know I didn't?
>
> > perldoc -q inside
>
> Now, there's a word I would _never_ have associated with this problem,
> thanks.
>

Frankly, neither would have I.  But perldoc -q split and perldoc -q
delimited would have given you the same thing. :-P

Paul Lalli


------------------------------

Date: 31 Mar 2004 16:40:31 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude>
Subject: Re: Array from a string.
Message-Id: <Xns94BD76C503D92asu1cornelledu@132.236.56.8>

"Richard S Beckett" <spikeywan@bigfoot.com.delete.this.bit> wrote in 
news:c4ep8f$bin$1@newshost.mot.com:

>> > Is there an easy way to do this?
>>
>> Yes there is. It is called checking the FAQ list before posting:
> 
> How do you know I didn't?
> 
>> perldoc -q inside
> 
> Now, there's a word I would _never_ have associated with this problem,
> thanks.

There are many ways of looking for what you need in the FAQ list. What I 
gave you is a short-cut that one figures out after finding the entry for 
the first time.

The first time I found that entry was by reading through perlfaq4:

DESCRIPTION
    This section of the FAQ answers questions related to manipulating
    numbers, dates, strings, arrays, hashes, and miscellaneous data
    issues.

Hmmmm .. You would have found the answer had you looked at the table of 
contents and then read perlfaq4.

-- 
A. Sinan Unur
1usa@llenroc.ude (reverse each component for email address)


------------------------------

Date: Wed, 31 Mar 2004 18:16:53 +0100
From: "Richard S Beckett" <spikeywan@bigfoot.com.delete.this.bit>
Subject: Re: Array from a string.
Message-Id: <c4euhp$dfo$1@newshost.mot.com>

> Hmmmm .. You would have found the answer had you looked at the table of
> contents and then read perlfaq4.


OK, it's a fair cop! :-) I'll try harder next time.

Thanks for the help.
--
R.
GPLRank +79.699




------------------------------

Date: Wed, 31 Mar 2004 12:39:10 -0600
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: Array from a string.
Message-Id: <slrnc6m42e.1fg.tadmc@magna.augustmail.com>

Paul Lalli <ittyspam@yahoo.com> wrote:
> On Wed, 31 Mar 2004, Richard S Beckett wrote:
> 
>> > > Is there an easy way to do this?
>> >
>> > Yes there is. It is called checking the FAQ list before posting:
>>
>> How do you know I didn't?
>>
>> > perldoc -q inside
>>
>> Now, there's a word I would _never_ have associated with this problem,
>> thanks.
>>
> 
> Frankly, neither would have I.  But perldoc -q split and perldoc -q
> delimited would have given you the same thing. :-P


   perldoc -q string


would do it too, and the OP might have thought of that search
term since he used it in the Subject: himself.  :-)


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Wed, 31 Mar 2004 14:42:36 -0000
From: claird@lairds.com (Cameron Laird)
Subject: Re: Choosing Perl/Python for my particular niche
Message-Id: <106lm6sahuvb003@corp.supernews.com>

In article <4069F371.B4D08F58@doe.carleton.ca>,
Fred Ma  <fma@doe.carleton.ca> wrote:
			.
			.
			.
>Well, my bout with Perl took much, much more than an hour.
>It worked, though.  It's probably not enough experience to
>get a good look at the strength of Perl.  For example, I
>am a vim user (an editor), which is cryptic at first, but
>let's you fly when you get to know it.  I'm not saying that
>all things cryptic are efficient in the end, just that a
>brief bout won't always uncover the strengths.  As a
			.
			.
			.
Absolutely:  a brief bout is a poor guide to long-term
strengths.  The consensus of our follow-ups, though, is
this:  Perl and Python both have so many, and so com-
parable, strengths "in the large", and they both have
such interesting cosmetics, that it *is* meaningful for
you to spend an hour or two and get a clear first impres-
sion of each.  Some things (people, cultures, foods, ...)
can't be known at a first glance.  When restricting your
attention to Perl and Python, a first glance *is* help-
ful.
-- 

Cameron Laird <claird@phaseit.net>
Business:  http://www.Phaseit.net


------------------------------

Date: Wed, 31 Mar 2004 15:40:58 +0100
From: "Simon" <simalt@totalise.co.uk>
Subject: count files + dirs
Message-Id: <406ad87b@primark.com>

Hi,

Is there a way to count files and directories as separate entities.

If I use the following program it correctly displays the files in a
directory but also it treats subdirectories as files.

Could I not somehow code it to know that say x are files and n are
subdirectories so I know what is what?

Thanks for you help, newbie in perl.

Below is the test script.
#!/perl/bin/perl -w

use strict;
my ($count);

my $dir = 'd:/active/';

      opendir DIR, $dir or die "Could not opendir $dir; Reason: $!";

my @files = grep !/^\.\.?$/ => readdir DIR;



$count=@files + 1;

      closedir DIR;


   foreach (@files)

          {

               print "File: $_ \n";


           }

print "$count"-1;


result
========
D:\utils\Perl>perl read_dir.pl
File: open relay.txt
File: Server.txt
File: test
3
------------------

In the result "test" is not a file but directory and I am trying to
distinguish this from the files so it would correctly say 2 files and 1
subdirectory.

Much appreciated.

Simon




------------------------------

Date: Wed, 31 Mar 2004 09:59:24 -0500
From: Paul Lalli <ittyspam@yahoo.com>
Subject: Re: count files + dirs
Message-Id: <20040331095631.M19862@dishwasher.cs.rpi.edu>

On Wed, 31 Mar 2004, Simon wrote:

> Hi,
>
> Is there a way to count files and directories as separate entities.
>
> If I use the following program it correctly displays the files in a
> directory but also it treats subdirectories as files.
>
> Could I not somehow code it to know that say x are files and n are
> subdirectories so I know what is what?
>

perldoc -f -X

the -f and -d filetests tell you if a given directory entry is a file or
directory, respectively.


> Thanks for you help, newbie in perl.
>
> Below is the test script.
> #!/perl/bin/perl -w
>
> use strict;
> my ($count);
>
> my $dir = 'd:/active/';
>
>       opendir DIR, $dir or die "Could not opendir $dir; Reason: $!";
>
> my @files = grep !/^\.\.?$/ => readdir DIR;
>
>
>
> $count=@files + 1;


Why are you doing this?  @files in scalar context gives the number of
elements in the array.  You should not be adding one to it.

>
>       closedir DIR;
>
>
>    foreach (@files)
>
>           {
>
>                print "File: $_ \n";
>
>
>            }
>
> print "$count"-1;


what the heck is this??

>
>
> result
> ========
> D:\utils\Perl>perl read_dir.pl
> File: open relay.txt
> File: Server.txt
> File: test
> 3
> ------------------
>
> In the result "test" is not a file but directory and I am trying to
> distinguish this from the files so it would correctly say 2 files and 1
> subdirectory.
>
> Much appreciated.
>
> Simon
>
>
>


------------------------------

Date: Wed, 31 Mar 2004 09:08:45 -0600
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: count files + dirs
Message-Id: <slrnc6lnnt.3ja.tadmc@magna.augustmail.com>

Simon <simalt@totalise.co.uk> wrote:

> Is there a way to count files and directories as separate entities.


   perldoc -f -X


Be sure to re-read the docs for readdir() before applying a filetest
to the values that it returns.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Wed, 31 Mar 2004 15:59:52 GMT
From: "J�rgen Exner" <jurgenex@hotmail.com>
Subject: Re: count files + dirs
Message-Id: <YVBac.23724$u_2.13373@nwrddc01.gnilink.net>

Simon wrote:
> Could I not somehow code it to know that say x are files and n are
> subdirectories so I know what is what?

perldoc -f -d

jue




------------------------------

Date: Wed, 31 Mar 2004 11:34:52 -0600
From: John Bokma <postmaster@castleamber.com>
Subject: Re: count files + dirs
Message-Id: <406b0193$0$24356$58c7af7e@news.kabelfoon.nl>

Paul Lalli wrote:

> On Wed, 31 Mar 2004, Simon wrote:

>>$count=@files + 1;
> 
> Why are you doing this?  @files in scalar context gives the number of
> elements in the array.  You should not be adding one to it.

[snip]

>>print "$count"-1;
> 
> what the heck is this??

Fix for the "Why are you doing this" :D

-- 
John                            personal page:  http://johnbokma.com/

Freelance Perl / Java developer available  -  http://castleamber.com/


------------------------------

Date: Wed, 31 Mar 2004 20:59:33 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: count files + dirs
Message-Id: <pan.2004.03.31.18.55.54.364597@aursand.no>

On Wed, 31 Mar 2004 15:40:58 +0100, Simon wrote:
> Is there a way to count files and directories as separate entities.

As many others have pointed out, there are ways you can solve this
"manually".  I would rather go for a solution where you use the excellent
File::Find::Rule module.  Untested:

  #!/usr/bin/perl
  #
  use strict;
  use warnings;
  use File::Find::Rule;

  my $dir   = 'd:/active';
  my $depth = 1;

  my @files = File::Find::Rule->maxdepth( $depth )->file()->in( $dir );
  my @dirs  = File::Find::Rule->maxdepth( $depth )->directory()->in( $dir );

Read the documentation for more information;

  perldoc File::Find::Rule

> $count=@files + 1;

If you treat @files in scalar context, it will give you the number of
elements it holds.


-- 
Tore Aursand <tore@aursand.no>
"Every man usually has something he can do better than anyone else.
 Usually it is reading his own handwriting." -- Unknown


------------------------------

Date: Wed, 31 Mar 2004 10:24:33 -0500
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: Is Perl supposed to use 100% of the CPU?
Message-Id: <eKednW3IE-0Cf_fdRVn-ig@adelphia.com>

Ron Bean wrote:

> The question is:  Is it supposed to use so much of the CPU?

Perl, by itself, uses 0% CPU.

A badly-written Perl script, just like a badly-written program in any other
language, can use up 100% of your CPU and beg for more.

> Any ideas as to get my site speed back up?

Sure, three simple steps:

1. Figure out what script is swamping your server. If there is only one
script, it's obviously the culprit. If XP provides some means of listing
CPU usage for Perl scripts individually, do that. Otherwise, look to see if
any of the scripts on the system have been added or changed recently. As a
last resort, take scripts offline one at a time, observing the CPU load
each time. When removing one of them reduces the load, you've found the
culprit.

2. Profile the problem script to determine where it's slow. Take care of the
low-hanging fruit first; i.e. see what's changed recently. For example, if
the script was running fine, but a new function to calculate foo was added
and now it's bog-slow, then that function is what you should be looking at.
You could also use Devel::DProf to profile your code.

3. Fix it. This is an exercise left to the reader. ;-)

> Or am I looking at building a dedicated *nix box to run it?

Building one in hopes that it will magically make some unknown problem
disappear is pointless. Figure out what the problem is *first*, then figure
out what needs to be done to fix it.

sherm--

-- 
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org


------------------------------

Date: Wed, 31 Mar 2004 17:17:30 +0200
From: Thomas Kratz <ThomasKratz@REMOVEwebCAPS.de>
Subject: Re: Lost data on socket - Can we start over politely?
Message-Id: <406ae243.0@juno.wiesbaden.netsurf.de>

Vorxion wrote:
> In article <406a9005.0@juno.wiesbaden.netsurf.de>, Thomas Kratz wrote:
> 
>>>In short, I think my code is simply lagging behind, and when it lags far
>>>enough, the rest of the data vanishes.  I'm working on fixing that bit, and
>>>I'll also have to prioritize it so that when data is present, it stops
>>>working on processing its data internally and goes immediately back to
>>>reading from the socket.  And I'm simply going to have it scarf up as much
>>>data as is present and basically parse and process when there is no
>>>actual communication going on.  That should (hopefully) elminiate the
>>>problem.
>>
>>That was my guess to, but I couldn't confirm it :-). The problem could be, 
>>that the TCP buffers of the machine the server runs on, are filled because 
>>you are draining them too slow, but the client is sending data anyway. I 
>>didn't use IO::Select on the client with a can_write(), which should take 
>>care of that ( provided you have control over the client code :-)
> 
> 
> The question then becomes whether a non-forking model is viable for
> multiple connections if it needs to do processing.  I should think so.

That depends ;-) The questions are:

1. Do I need more than SOMAXCONN simultaneous connections?
2. Am I able to get the algorithm right for looping over the client
    sockets without getting too busy on one of them

Forking (and even threading) is surely easier to handle. Blocking will 
bite you fatally when jumping between the connection in one process/thread.

>  I
> think it's mostly the expense of doing a sysread() of -one- character
> about 18 times, and -then- getting bigger blocks, but still not big enough
> to make it keep pace--they were only as big as the packets, which were
> generally 110chars max.
> 
> At which point I probably had far more overhead than it could tolerate when
> the client was spewing things forth without delay, even though it was
> writing a packet/line at a time.  :)
> 
> Boy, I never considered the client-side.  You're basically saying that the
> can_write will actually get the socket equivalent of flow control from the
> remote end and only send when it won't get lost?

Exactly. It means that your client side send buffer has free space to 
queue the data. Actually it only guarantees you can safely write 1 byte.
I just tested it with a slightly modified client:

use strict;
use warnings;

use IO::Socket;
use IO::Select;

$| = 1;

my $sock = IO::Socket::INET->new(
    PeerAddr  => '235p008',
    PeerPort  => 4016,
    Proto     => 'tcp',
) or die "couldn't create socket";

my $sel = IO::Select->new();
$sel->add($sock);

my $i = 0;
while ( $sock->connected() ) {
    if ( $sel->can_write(0.05) ) {
       last unless print $sock $i, 'x' x (100-length($i++)), "\n";
    } else {
       print "cannot write\n";
       sleep(1); # todo: sleep less with Time::HiRes
    }
}

works like a charm ;-) I throttled the server with a sleep at the end of the
    foreach my $sock ( $sel->can_read(0.05) ) {
    ....
loop.

> 
> 
>>>Well, mine is multiplexing--it's meant to take in more than one connection
>>>at a time.  Yours was written with a single connection in mind, although it
>>>demonstrates perfectly that buffering shouldn't be an issue, even at 1k
>>>block sizes.  The multiplexing is what's complicated mine so much.  :)
>>
>>No. Have you tried it? It handles exactly SOMAXCONN connections.
>>The only cause for using the $max_buf variable is preventing one client 
> 
>>from flooding the server with data and neglecting the other clients.
> 
>>If the client data dripples in slowly, one could also use a $max_time 
>>value (with Time:HiRes) for the maximum time the server processes one 
>>socket at a time.
> 
> 
> Ah, that explains the limitation.  Good idea, actually, and I think I'll
> keep that then.  Actually, do you know if the buffer on the socket is one
> single pool, or if each connection to the port has its own buffer?  That
> goes baack to my other statement about not wanting one attacker to DoS the
> whole socket by flooding one fd.

Would be a bloody useless concept otherwise, wouldn't it? ;-) Each 
connection has it's own buffers on both ends.

Do you worry about your own clients behaving badly due to user input, or 
someone else connecting to the server flooding it with data?
The latter can be easily avoided by implementing identification of the 
client or proper authorization of an user. Else drop the connection.

> 
> 
>>The foreach loop looks for all readable sockets and handles them. The only 
>>thing left out is handling a specific timeout for connected client sockets 
>>without incoming data (could be easily done via a lookup hash with the 
>>stringified socket values as keys and the time value of the last action on 
>>the socket as values, shouldn't be more than a few lines).
> 
> 
> Yes, I caught the fact that it would take multiple connections.  I even
> tested it.  However, the logging was not multiplexed, where all my protocol
> states must be.  You weren't differentiating in the log between fd's, but
> it was really the proof that the buffering was fine that mattered to me.
> The rest is no problem.  That was what had me really worried.

Yeah, I didn't care about the logging, left that for you ;-)

> 
> I've got it halfway rolled into my code.  I separated out the flow into two
> loops--one will go as fast and hard as it can to read data (I'm going to
> implement the max_buf now that I know why you had it), and the other is for
> processing the input, and at every possible pausing point it checks to see
> if there is more data to be read and will go back to reading as immediately
> as possible if there is.  I "just" need to roll in the code that breaks up
> the packets from this large internal buffer where I'm storing huge hunks of
> code that aren't even analysed.  Once I have that, it should hopefully be
> fine.
> 
> Chances are, nothing in the intended use would stress it anywhere near what
> I have been putting it through.  Then again, I don't like to take chances.

Be careful not to reinvent the wheel. This separation into two loops and 
the "large internal buffer" seem not neccesary to me. Just limit the 
socket processing to a maximal size per check (like I did with $max_buf) 
and a maximum time (i.e. 1/10th of a second) and process it. You have to 
do this anyway. Storing the data elsewhere will not take less time ;-)
Especially since the client can stop sending data, if the server should 
not be able to process it.
An exception to that could be, that you need a lot of data in one piece, 
to be able to process it.

> 
> If I think back to how lousy NFS performance is if you have it set to less
> than 8192 byte packets, I should have realised exactly what was going on,
> probably.  I just didn't think those 15 single-byte reads per packet were
> that expensive.  And then the small packets on top of it.  Ugh.  That
> pretty much explains it all.
> 
> I thank you SO much for your assistance, Thomas!  You have no idea how much
> relief I feel at this point.  I did a brief test of your read methodology
> rolled partly into the small sample I had here and it was working 100%
> and consistantly.  I'm rolling into the real thing now, which is a wee bit
> more complicated.

Looking at your code, I have the impression that you made it more 
complicated than it should be.

If you could specify what exactly your application should do, I'll perhaps 
be able to give you a few design suggestions.

And if you want to concentrate on functionality have a look at the POE 
framework (http://poe.perl.org).


Thomas

-- 
open STDIN,"<&DATA";$=+=14;$%=50;while($_=(seek( #J~.> a>n~>>e~.......>r.
STDIN,$:*$=+$,+$%,0),getc)){/\./&&last;/\w| /&&( #.u.t.^..oP..r.>h>a~.e..
print,$_=$~);/~/&&++$:;/\^/&&--$:;/>/&&++$,;/</  #.>s^~h<t< ..~. ...c.^..
&&--$,;$:%=4;$,%=23;$~=$_;++$i==1?++$,:_;}__END__#....>>e>r^..>l^...>k^..


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6345
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[24151] in Perl-Users-Digest

Perl-Users Digest, Issue: 6345 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Mar 31 14:05:46 2004

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Mar 31 14:05:46 2004