[23540] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5748 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Nov 4 14:06:04 2003

Date: Tue, 4 Nov 2003 11:05:13 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 4 Nov 2003     Volume: 10 Number: 5748

Today's topics:
        Embedding Perl in Win32 Dll <software@werthmesstechnik.de>
    Re: grab the last few itmes of an array <abigail@abigail.nl>
    Re: grab the last few itmes of an array (Tad McClellan)
    Re: grab the last few itmes of an array (Roy Johnson)
        Intermittent Character Encoding Issues <dave@mo-seph.com>
    Re: Intermittent Character Encoding Issues <usenet@morrow.me.uk>
    Re: Making a range from numbers (Tad McClellan)
        Newbie gets Internal Server Error, among others (Jessica Smith)
        Newbie question - create a file <bluecat22@go.com>
    Re: Newbie question - create a file <ak+usenet@freeshell.org>
    Re: Newbie question - create a file <usenet@morrow.me.uk>
    Re: Newbie question - create a file <jurgenex@hotmail.com>
    Re: Newbie question - create a file <hexkid@hotpop.com>
    Re: Newbie question - create a file <HelgiBriem_1@hotmail.com>
    Re: Parsing Large Files <joeremovethis@tanga.com>
    Re: Parsing Large Files <usenet@morrow.me.uk>
    Re: Parsing Large Files <joeremovethis@tanga.com>
    Re: Parsing Large Files <xx087@freenet.carleton.ca>
    Re: Parsing Large Files <xx087@freenet.carleton.ca>
    Re: Parsing Large Files (Tad McClellan)
    Re: Parsing Large Files <joe@localhost.localdomain>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 4 Nov 2003 14:59:29 +0100
From: "Benedikt Feldhaus" <software@werthmesstechnik.de>
Subject: Embedding Perl in Win32 Dll
Message-Id: <bo8b9h$1aj6e5$1@ID-119432.news.uni-berlin.de>

Hello,

I wrote my first Win32 Dll that includes a perl interpreter. I copied my
code from the documentation for perlembed ("static PERLINTERPRETER
*my_perl"). Unfortuneatley my code is unportable to other machines. The dll
terminates itself when the code perl should be executed. I think I have to
include Embed::ExtUtils into the makefile, but I do not know how to do it.
I am using VC6 und Win98. Can anyone help?

Greetings, Jens




------------------------------

Date: 04 Nov 2003 11:37:24 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: grab the last few itmes of an array
Message-Id: <slrnbqf3rj.m7n.abigail@alexandra.abigail.nl>

asaik (tomorro@yesterday.org) wrote on MMMDCCXVII September MCMXCIII in
<URL:news:3FA761CC.8000202@yesterday.org>:
//  hello
//  is there a way I can make an arrayA from the last 3 items of arrayB 
//  where arrayB changes inside a loop?
//  e.g.
//  
//  foreach my $i ( @arrB ) {
//  	@arrA = the last 3 items of arrB
//  		$total = total the itmes in arrA
//  


In your code fragment, @arrB isn't being changed.

In general, to get the last 3 elements of an array, you'd use

    @arrB [-3 .. -1]

assuming @arrB contains at least 3 element.


But it's unclear to me what your are asking.


Abigail
-- 
perl -MLWP::UserAgent -MHTML::TreeBuilder -MHTML::FormatText -wle'print +(
HTML::FormatText -> new -> format (HTML::TreeBuilder -> new -> parse (
LWP::UserAgent -> new -> request (HTTP::Request -> new ("GET",
"http://work.ucsd.edu:5141/cgi-bin/http_webster?isindex=perl")) -> content))
=~ /(.*\))[-\s]+Addition/s) [0]'


------------------------------

Date: Tue, 4 Nov 2003 07:49:06 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: grab the last few itmes of an array
Message-Id: <slrnbqfbii.jbc.tadmc@magna.augustmail.com>

asaik <tomorro@yesterday.org> wrote:
> asaik wrote:

>> is there a way I can make an arrayA from the last 3 items of arrayB 
>> where arrayB changes inside a loop?


I doubt it, since the docs say "don't do that".

From the "Foreach Loops" section in perlsyn.pod:

   If any part of LIST is an array, C<foreach> will get very confused if
   you add or remove elements within the loop body, for example with
   C<splice>.   So don't do that.


>> foreach my $i ( @arrB ) {
>>     @arrA = the last 3 items of arrB
>>         $total = total the itmes in arrA


arrayB is not changing inside that loop.

I thought you said it _was_ changing...


> could I
>   push @arrA for (0..2) @arrB if @arrB > 2;


What happened when you tried it?


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 4 Nov 2003 09:07:31 -0800
From: rjohnson@shell.com (Roy Johnson)
Subject: Re: grab the last few itmes of an array
Message-Id: <3ee08638.0311040907.211d54ea@posting.google.com>

It looks like what you want to do is keep a moving subtotal. This
should be a pretty good method:

my @ar = (1..10);
my $moving_total = 0;
my @three = ();
for (@ar) {
    $moving_total += $_;
    push(@three, $_);
    $moving_total -= shift(@three) if @three > 3;
    print "$_: $moving_total\n";
}


------------------------------

Date: Tue, 4 Nov 2003 16:07:26 +0000 (UTC)
From: David Murray-Rust <dave@mo-seph.com>
Subject: Intermittent Character Encoding Issues
Message-Id: <slrnbqfjlu.5la.s0239182@gwendolyn.inf.ed.ac.uk>

Hi all,

Please excuse the long post, but this seems to be a subtle bug, which
I have been attacking for a while.

I'm having a problem with character encodings in perl 5.8. The overall
effect is that certain characters, in particular the UK pound symbol
are turned into two characters, generally a capital A circumflex
followed by the correct character. This would appear to be a simple 
character encoding issue, but there are a few caveats:

- It only happens on one machine. Taking a disk image of the OS and
  running it on different hardware results in a system without the
  problem.

- It can be intermittent. Two separate instances of (apparently) the
  same problem have been found. The first happened about 1% of the
  time the code was run. The second happened every time the code was
  run.


More Detail:

The application is a web based content management system, running
under apache/mod-perl with a mysql back end. The machine in question
is running Slackware 9, Perl 5.8.1 and kernel 2.4.20.

The precise nature of the bug is that a character represented by \243
(163 decimal) in the iso-8859-1 character set is replaced by two
octets, \302\243, in some places. It appears that perl is converting
the data to a unicode representation and forgetting that it has done
this.

The first version of the bug was that after the line:

$contentList = [ join '', @$contentList ] unless $separate;

certain characters in the entries in @$contentList would be changed to
two-byte versions. The only happened about 1% of the time this code
was run. Changing the above line to be:

unless( $separate )
{
	my $tmp = "";
	foreach my $contentBit ( @$contentList )
	{
	   $tmp .= $contentBit;
	}
	$contentList = [ $tmp ];
}

made the problem go away. In this case, the data comes directly from
the mysql database. It has been verified that the string is encoded
correctly up until that line, and wrongly afterwards.


In the second version of the bug, the line:

return $return . $parent;

resulted in a string being returned where all the pound signs in
$return had been altered. If a different string to $parent is
appended, there is no problem. The current solution is:

my $tmpParent = encode( "iso-8859-1", $parent );
return $return . $tmpParent;

NOTE: the characters which are altered are those in $return, while the
string whose endcoding I am playing with is in $parent.

In this case, there is data in $parent which comes via CGI, so I would
be able to believe an explanation along the lines of "$parent is
magically recognised as utf8, so when it is added to $return, $return
is converted to utf8 octets before they are joined", but I would find
this quite counter intuitive, since as I understand things perl uses
it's own internal representation for strings, and should only need to
convert on the way in or out.

With resepect to machine dependance, it happens on only one machine
which is running our software. To create a test platform, we took a
disk image of the system partition, loaded it onto a new machine and
compiled a new kernel which differed only in network card support.
This new machine did not fix the problem. As we were originally
running perl 5.8.0, we tried upgrading to 5.8.1, but this had no
effect.

So, to sum up,

- Can anyone explain what is going on here, the intermittent
  occurences, the machine dependance and the general behaviour?

- Can anyone suggest a way to avoid these problems?

(For the record, I've read the perldoc on perlunicode and utf8, lurked
for a while, read google archives and read a fair amount on character
encodings)

Thanks to anyone who's made it this far for your time,
Dave Murray-Rust


------------------------------

Date: Tue, 4 Nov 2003 18:55:55 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: Intermittent Character Encoding Issues
Message-Id: <bo8snr$qiv$1@wisteria.csv.warwick.ac.uk>


David Murray-Rust <dave@mo-seph.com> wrote:
> The first version of the bug was that after the line:
> 
> $contentList = [ join '', @$contentList ] unless $separate;
> 
> certain characters in the entries in @$contentList would be changed to
> two-byte versions. The only happened about 1% of the time this code
> was run. Changing the above line to be:
> 
> unless( $separate )
> {
> 	my $tmp = "";
> 	foreach my $contentBit ( @$contentList )
> 	{
> 	   $tmp .= $contentBit;
> 	}
> 	$contentList = [ $tmp ];
> }
> 
> made the problem go away. In this case, the data comes directly from
> the mysql database. It has been verified that the string is encoded
> correctly up until that line, and wrongly afterwards.

How perl stores the data internally should be considered none of your
business. (It is in fact either iso8859-1 or utf8 on ASCII machines,
with a flag set on each scalar to say which. It is easier, however, to
regard a text string as being a set of Unicode characters, and not
worry about how they are represented.) However, it may be that how it
is stored in your mysql database is confusing perl, if the code you
are using to interface to the database doesn't correctly decode the
data into perl's own encoding. In particular, if you use iso8859-1 you
may get bitten far more irregularly than if you use other encodings.

Decide on how you are going to encode text in the database: I
shall assume you wish to use iso8859-1. Now, every piece of textual
(as opposed to binary) data you write into the database should first
be converted from a sequence of characters into a sequence of octets,
using Encode::encode; and every piece of textual data should be
converted from octets back into character data using
Encode::decode. So, in the example above, you would write:

my $tmp = "";
foreach my $contentBit (@$contentList) {
    $tmp .= decode "iso8859-1", $content_Bit;
}
$contentList = [ $tmp ];

(assuming you didn't decode it closer to where it was read from the
database).

> In the second version of the bug, the line:
> 
> return $return . $parent;
> 
> resulted in a string being returned where all the pound signs in
> $return had been altered. If a different string to $parent is
> appended, there is no problem.

So what does $parent contain, which causes this problem? And what is
the result of
  use Encode qw/is_utf8/;
  warn is_utf8($parent) ? 
       "\$parent is chars internally" : 
       "\$parent is bytes internally";

?

> The current solution is:
> 
> my $tmpParent = encode( "iso-8859-1", $parent );
> return $return . $tmpParent;

This is almost certainly Wrong, as $tmpParent will here be considered
to be a string of octets rather than a sequence of characters. The
Right Answer is to make sure $return is considered to be a sequence of
characters as well.

> In this case, there is data in $parent which comes via CGI, so I would
> be able to believe an explanation along the lines of "$parent is
> magically recognised as utf8, so when it is added to $return, $return
> is converted to utf8 octets before they are joined", but I would find
> this quite counter intuitive, since as I understand things perl uses
> it's own internal representation for strings, and should only need to
> convert on the way in or out.

Yup. However, if the module you are using to talk to the database
and/or Apache hasn't been upgraded to 5.8 yet you will have to do
those conversions 'at the borders' by hand. Pushing an :encoding layer
onto your filehandles, perhaps with the 'open' pragma, may help
automate this; although you are using mod_perl, which relies on tied
filehandles: I don't know how well these play with PerlIO layers as
yet. You may want to write a custom 'print', 'readline' &c. that runs
all input through 'decode' and all output through 'encode'.

Another thing to watch out for is that if any of your locale variables
(LANG, LC_ALL, etc.) match /utf-?8/i then perl will assume all IO will
be in UTF8 until you disillusion it. This feature has been removed in
5.8.1, though, so it shouldn't be affecting your problem.

An alternative solution, if you can afford to treat all data as
'binary' rather than 'textual', is simply to put

  use bytes;

at the top of every file :).

Ben

-- 
   Although few may originate a policy, we are all able to judge it.
                                             - Pericles of Athens, c.430 B.C.
  ben@morrow.me.uk


------------------------------

Date: Tue, 4 Nov 2003 07:53:01 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Making a range from numbers
Message-Id: <slrnbqfbpt.jbc.tadmc@magna.augustmail.com>

Just in <goth1938@hotmail.com> wrote:

>> The "&foo()" form of subroutine call is certainly not deprecated, but
>> the "foo()" form should be preferred unless you know what effects the
>> leading "&" has and you need those effects.

> Care to expand on what the effects are?


Perl's subroutines are documented in

   perldoc perlsub

where it says:

           &NAME(LIST);   # Circumvent prototypes.


So I'm guessing the effect is to circumvent prototypes.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 4 Nov 2003 10:52:27 -0800
From: jessis@cobweb.net (Jessica Smith)
Subject: Newbie gets Internal Server Error, among others
Message-Id: <89ed09a8.0311041052.5f924b64@posting.google.com>

I'm using Mac OS X, and I'm trying to run a script I copied from a
tutorial. The script is online at
http://www.elanus.net/cgi/examples.cgi/view/ex_0302.txt

It's chmod'ed 755, just like it should be, and the sh-bang line is
correct. Yet, when I run the program in Terminal, I get:

 ./mail_form.cgi: =: command not found
 ./mail_form.cgi: =: command not found
 ./mail_form.cgi: =: command not found
 ./mail_form.cgi: =: command not found
 ./mail_form.cgi: =: command not found
 ./mail_form.cgi: line 18: syntax error near unexpected token `qw(:'
 ./mail_form.cgi: line 18: `use CGI qw(:standard);'
[Jessica-Smiths-Computer:/library/webServer/cgi-executables] jessicas%

When I hit it in Explorer, I get an Internal Server Error, and then my
error log produces this message:

[Tue Nov  4 13:21:06 2003] [error] (8)Exec format error: exec of
/Library/WebServer/CGI-Executables/mail_form.cgi failed
[Tue Nov  4 13:21:06 2003] [error] [client 192.168.7.33] Premature end
of script headers: /Library/WebServer/CGI-Executables/mail_form.cgi

I've searched newsgroups, Perl references, CGI references and 2
dead-tree Perl manuals, and I still have no idea what is causing the
problem.

Please help! Thanks.


------------------------------

Date: Tue, 4 Nov 2003 09:30:57 -0500
From: "Blue Cat" <bluecat22@go.com>
Subject: Newbie question - create a file
Message-Id: <bo8d6102ih@enews1.newsguy.com>

After toiling over "open" in the Perl docs and the Camel Book with no
success, I am asking for help:

How do I create a file named "dogs.txt" and write "My dog is a golden
retriever." into it?




------------------------------

Date: Tue, 4 Nov 2003 14:54:01 +0000 (UTC)
From: Andreas Kahari <ak+usenet@freeshell.org>
Subject: Re: Newbie question - create a file
Message-Id: <slrnbqffc8.gk8.ak+usenet@vinland.freeshell.org>

In article <bo8d6102ih@enews1.newsguy.com>, Blue Cat wrote:
> After toiling over "open" in the Perl docs and the Camel Book with no
> success, I am asking for help:
> 
> How do I create a file named "dogs.txt" and write "My dog is a golden
> retriever." into it?

open FH, ">dogs.txt" or die "Could not open file: $!";
print FH "My dog is a golden retriever\n";
close FH;


-- 
Andreas Kähäri


------------------------------

Date: Tue, 4 Nov 2003 14:57:07 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: Newbie question - create a file
Message-Id: <bo8eo3$jhd$1@wisteria.csv.warwick.ac.uk>

"Blue Cat" <bluecat22@go.com> wrote:
> After toiling over "open" in the Perl docs and the Camel Book with no
> success, I am asking for help:
> 
> How do I create a file named "dogs.txt" and write "My dog is a golden
> retriever." into it?

open my $DOGS, "> dogs.txt" or die "can't open dogs.txt: $!";
print $DOGS "My dog is a golden retriever.";

What did you try, and in what way did it fail?

Ben

-- 
And if you wanna make sense / Whatcha looking at me for?          (Fiona Apple)
                            * ben@morrow.me.uk *


------------------------------

Date: Tue, 04 Nov 2003 14:57:57 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: Newbie question - create a file
Message-Id: <V7Ppb.3630$n6.1113@nwrddc03.gnilink.net>

Blue Cat wrote:
> How do I create a file named "dogs.txt" and write "My dog is a golden
> retriever." into it?

Hi Blue

Quite simple

    use warnings;
    use strict;
    my $F;
    open F, ">dogs.txt" or die "Cannot open 'dogs.txt':$!\n";
    print F "My dog is a golden retriever.\n";
    close F;

jue




------------------------------

Date: 4 Nov 2003 15:01:49 GMT
From: Pedro <hexkid@hotpop.com>
Subject: Re: Newbie question - create a file
Message-Id: <bo8f0t$1aueih$1@ID-203069.news.uni-berlin.de>

Blue Cat wrote:
> After toiling over "open" in the Perl docs and the Camel Book with no
> success, I am asking for help:
>
> How do I create a file named "dogs.txt" and write "My dog is a golden
> retriever." into it?

Newbie answer:

$ cat dogs.pl
#!/usr/bin/perl
use strict;
use warnings;

open FILE, "> dogs.txt";
print FILE "My dog is a golden retriever\n";
close FILE;


$ cat dogs.txt
My dog is a golden retriever


HTH

-- 
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.


------------------------------

Date: Tue, 04 Nov 2003 15:20:36 +0000
From: Helgi Briem <HelgiBriem_1@hotmail.com>
Subject: Re: Newbie question - create a file
Message-Id: <nlgfqvo0rqb3kqi9gcugplae10i5rhm4bo@4ax.com>

On Tue, 4 Nov 2003 09:30:57 -0500, "Blue Cat" <bluecat22@go.com>
wrote:

>After toiling over "open" in the Perl docs and the Camel Book with no
>success, I am asking for help:
>
>How do I create a file named "dogs.txt" and write "My dog is a golden
>retriever." into it?

#!perl
use warnings;
use strict;
my $path = '/path/to/where/you/want/to/keep/file';
my $file = "$path/dogs.txt";
my $text = "My dog is a golden retriever.";

open OUT, ">", $file or die "Cannot open $file for writing:$!";
print OUT $text;
close OUT or die "Cannot close $file:$!";
__END__


------------------------------

Date: Tue, 04 Nov 2003 14:46:22 GMT
From: Jose Yimpho <joeremovethis@tanga.com>
Subject: Re: Parsing Large Files
Message-Id: <2ZOpb.76326$275.204916@attbi_s53>

Tad McClellan wrote:

> Jose Yimpho <swimmar@hotmail.com> wrote:
> 
>> Subject: Parsing Large Files
> 
> 
> I see nothing relating to large files in your post, so why
> did you say that there would be something relating to large
> files in your Subject?
> 

There's about 20,000 lines in the file.  I thought that was large?

> 
>> Perl newbie here.. I'm experienced with other languages, but this is
>> my first grapple with Perl + Regular Expressions, and I could use some
>> help or a starting point on this problem.
> 
> 
> You haven't told us enough to be of much help...

Sorry...

> 
> 
>> I have a text file that contains lines like what's at the bottom of
>> this message.
> 
> 
> To parse a file we need to know the rules that the file will follow.
> 
> What rules will the file follow?
> 
> 
>> Possible
> 
> 
> Which ones are optional?
> 
> Which ones are required?
> 
> 
>> entries are company name,
> 
> 
> Is that always the 1st line?

Yes

> 
> 
>> street address,
> 
> 
> Is that always the 2nd line?

Yes, the city, state, and zip are always the third line.

> 
> 
>> phone,
> 
> 
> Does that one always start with "Phone:" ?

Yes,  and the Fax number has Fax: in front of it.

> 
> 
>> email,
> 
> 
> Is that always the 5th line?

No, it's sometimes there.

> 
> 
>> url,
> 
> 
> (you know those aren't really URLs, right?)

Forgive me.

> 
> 
>> rep, membership type, business type, and major
>> products.
> 
> 
> Do those ones always have the something-ending-with-colon headings?

Yes

> 
> 
>> Business type: Accessories, Board games, Collectable card games,
>> Family
>> games, Magazines, Miniatures, Retailer, Roleplaying games, Video
>> games,
>> Wargames, Comic Books
> 
> 
> Even worse than the sample-with-no-spec approach to getting help
> is letting your newsreader break the data for you.
> 
> Is that all on one line in your Real Data?

No, not all on one line.  I don't think the newsreader broke any data (the
data is on multiple lines for each entitity wuth a blank line in between
each entitity). 

Also, something like the following is legal (the linebreaks are
intentional):

Business type: Accessories, Board Games, Books,
Other card games, Family 
Games, Magazines, Minatures
Major products: Wizkids Products; Wizards of the Coast 
Products; Reaper Minatures





> 
> 
> Maybe this will get you started:
> 
> ---------------------------
> #!/usr/bin/perl
> use strict;
> use warnings;
> 
> {  local $/ = '';  # enable paragraph mode
>    while ( <DATA> ) {
>       my($name, $street, $addr, $phone, $email) = /(.*)\n/g;
>       my($city, $state, $zip) = $addr =~ /(.*?)\s+([A-Z][A-Z])\s+(\d+)$/;
>       my($rep) = /^Business Representative:\s+(.*)/m;
> 
>       print "$name\n$street\n$city - $state - $zip\n$rep\n";
>       print "-----\n";
>    }
> }
> 
> __DATA__
> # your data here
> ---------------------------
> 
> 

Thanks, that will get me started.  Would appreciate any other help you could
give.  If there's anything I can answer, let me know.

With regards to the paragraph grouping, I tried something like this last
night:

$/ = '';
while <FILE>
{
        print;
        $count++;
}
print "\nNumber of paragraphs: $count\n";

It printed the file contents, and then: 'Number of paragraphs: 1', which
didn't seem right to me, as I was trying to count the number of paragraphs
(or blank lines) in the file.  Setting the $/ sets the 'splitter' to split
on all blank lines, right?  and each iteration of the while loop reads in
one section of the input (split by blank lines), right?  Not sure why it
was printing out a 1.  

Joe Laughlin


------------------------------

Date: Tue, 4 Nov 2003 15:14:19 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: Parsing Large Files
Message-Id: <bo8fob$k59$1@wisteria.csv.warwick.ac.uk>


Jose Yimpho <joeremovethis@tanga.com> wrote:
> With regards to the paragraph grouping, I tried something like this last
> night:
> 
> $/ = '';
> while <FILE>
> {
>         print;
>         $count++;
> }
> print "\nNumber of paragraphs: $count\n";
> 
> It printed the file contents, and then: 'Number of paragraphs: 1', which
> didn't seem right to me, as I was trying to count the number of paragraphs
> (or blank lines) in the file.

Are the lines between your paragraphs truly blank? If they contain any
whitespace (in the case of Win32 files opened in binary mode this
includes the \r at the end of each line), then they will not be
counted a paragraph breaks by Perl.

Try

$/ = $\ = "";
while <FILE> {
      print "Line $.: |$_|";
}

to see what Perl considers each paragraph to contain. If your file
does have 'blank' lines with spaces in, and you want to get rid of
them, use

  perl -pi~ -e's/^\s+$//' file

 .

Ben

-- 
$.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
$x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
{$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t    # ben@morrow.me.uk
$J::u::s::t, $a::n::o::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.


------------------------------

Date: Tue, 04 Nov 2003 15:24:25 GMT
From: Jose Yimpho <joeremovethis@tanga.com>
Subject: Re: Parsing Large Files
Message-Id: <JwPpb.78022$9E1.357140@attbi_s52>

Ben Morrow wrote:

> 
> Jose Yimpho <joeremovethis@tanga.com> wrote:
>> With regards to the paragraph grouping, I tried something like this last
>> night:
>> 
>> $/ = '';
>> while <FILE>
>> {
>>         print;
>>         $count++;
>> }
>> print "\nNumber of paragraphs: $count\n";
>> 
>> It printed the file contents, and then: 'Number of paragraphs: 1', which
>> didn't seem right to me, as I was trying to count the number of
>> paragraphs (or blank lines) in the file.
> 
> Are the lines between your paragraphs truly blank? If they contain any
> whitespace (in the case of Win32 files opened in binary mode this
> includes the \r at the end of each line), then they will not be
> counted a paragraph breaks by Perl.
> 
> Try
> 
> $/ = $\ = "";
> while <FILE> {
>       print "Line $.: |$_|";
> }
> 
> to see what Perl considers each paragraph to contain. If your file
> does have 'blank' lines with spaces in, and you want to get rid of
> them, use
> 
>   perl -pi~ -e's/^\s+$//' file
> 
> .
> 
> Ben
> 

Yeah, I thought that too.  

In vi (in Redhat 9), I created a file similiar to:

=============
Hello this

is a 

great file

and I am proud of it.
============

But I still got a paragraph count of one.




------------------------------

Date: 4 Nov 2003 15:32:38 GMT
From: Glenn Jackman <xx087@freenet.carleton.ca>
Subject: Re: Parsing Large Files
Message-Id: <slrnbqfhlt.82p.xx087@smeagol.ncf.ca>

Jose Yimpho <joeremovethis@tanga.com> wrote:
>  With regards to the paragraph grouping, I tried something like this last
>  night:
>  
>  $/ = '';
>  while <FILE>

syntax error:  should be: while (<FILE>)

>  {
>          print;
>          $count++;
>  }
>  print "\nNumber of paragraphs: $count\n";
>  
>  It printed the file contents, and then: 'Number of paragraphs: 1', which
>  didn't seem right to me, as I was trying to count the number of paragraphs
>  (or blank lines) in the file.  Setting the $/ sets the 'splitter' to split
>  on all blank lines, right?  and each iteration of the while loop reads in
>  one section of the input (split by blank lines), right?  Not sure why it
>  was printing out a 1.  

Are your blank lines truly empty, or do they have whitespace in them?  
For instance, if each line ends with "\r\n", and your processing the
file on a unixy OS where "\n" is the end of line character, you don't
have any empty lines in the file.  Test this theory with:  $/="\r\n\r\n";

-- 
Glenn Jackman
NCF Sysadmin
glennj@ncf.ca


------------------------------

Date: 4 Nov 2003 15:35:33 GMT
From: Glenn Jackman <xx087@freenet.carleton.ca>
Subject: Re: Parsing Large Files
Message-Id: <slrnbqfhrc.82p.xx087@smeagol.ncf.ca>

Jose Yimpho <joeremovethis@tanga.com> wrote:
>  In vi (in Redhat 9), I created a file similiar to:
[...]
>  But I still got a paragraph count of one.

In vi, is your file format 'dos'?
    :set fileformat
If so, set it to 'unix' before you save.
    :set ff=unix
    :wq

-- 
Glenn Jackman
NCF Sysadmin
glennj@ncf.ca


------------------------------

Date: Tue, 4 Nov 2003 10:23:28 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Parsing Large Files
Message-Id: <slrnbqfkk0.jh4.tadmc@magna.augustmail.com>

Jose Yimpho <joeremovethis@tanga.com> wrote:

> I tried something like this last
          ^^^^^^^^^^^^^^
> night:
> 
> $/ = '';
> while <FILE>
> {


Please post *real* code.

Have you seen the Posting Guidelines that are posted here frequently?


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Tue, 04 Nov 2003 08:54:30 -0800
From: "Jose Yimpho" <joe@localhost.localdomain>
Subject: Re: Parsing Large Files
Message-Id: <pan.2003.11.04.16.54.28.529122@localhost.localdomain>

On Tue, 04 Nov 2003 15:14:19 +0000, Ben Morrow wrote:

> 
> Jose Yimpho <joeremovethis@tanga.com> wrote:
>> With regards to the paragraph grouping, I tried something like this last
>> night:
>> 
>> $/ = '';
>> while <FILE>
>> {
>>         print;
>>         $count++;
>> }
>> print "\nNumber of paragraphs: $count\n";
>> 
>> It printed the file contents, and then: 'Number of paragraphs: 1', which
>> didn't seem right to me, as I was trying to count the number of paragraphs
>> (or blank lines) in the file.
> 
> Are the lines between your paragraphs truly blank? If they contain any
> whitespace (in the case of Win32 files opened in binary mode this
> includes the \r at the end of each line), then they will not be
> counted a paragraph breaks by Perl.
> 
> Try
> 
> $/ = $\ = "";
> while <FILE> {
>       print "Line $.: |$_|";
> }
> 
> to see what Perl considers each paragraph to contain. If your file
> does have 'blank' lines with spaces in, and you want to get rid of
> them, use
> 
>   perl -pi~ -e's/^\s+$//' file
> 
> .
> 
> Ben

Aha, you were right.  What I wanted was

perl -pi~ -e 's/^\s+$/\n/' file

as I want to keep the newlines in between the different entries.


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 100,000 Newsgroups - 19 Different Servers! =-----


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5748
***************************************


home help back first fref pref prev next nref lref last post