[24523] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 6703 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jun 18 06:05:45 2004

Date: Fri, 18 Jun 2004 03:05:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 18 Jun 2004     Volume: 10 Number: 6703

Today's topics:
        A neat trick to serialize arrays and hashes (J. Romano)
    Re: A neat trick to serialize arrays and hashes <matthew.garrish@sympatico.ca>
    Re: A neat trick to serialize arrays and hashes <usenet@morrow.me.uk>
    Re: A neat trick to serialize arrays and hashes <tassilo.parseval@rwth-aachen.de>
        Can't locate package AutoLoader for @File::List::ISA at <shahriar_mokhtarzad@pacbell.net>
    Re: parsing file name assigning extension to a variable <Joe.Smith@inwap.com>
    Re: pattern match problem <matthew.garrish@sympatico.ca>
    Re: pattern match problem <Joe.Smith@inwap.com>
    Re: pattern match problem <nospam@peng.nl>
    Re: pattern match problem <noreply@gunnar.cc>
        perl wrapper to limit stderr to first 1000 lines? <mhunter@berkeley.edu>
    Re: perl wrapper to limit stderr to first 1000 lines? <usenet@morrow.me.uk>
        Posting Guidelines for comp.lang.perl.misc ($Revision:  tadmc@augustmail.com
    Re: sorting text jamasd@hotmail.com
    Re: sorting text jamasd@hotmail.com
    Re: sorting text <noreply@gunnar.cc>
    Re: sorting text <Joe.Smith@inwap.com>
    Re: sorting text jamasd@hotmail.com
    Re: sorting text <noreply@gunnar.cc>
    Re: Why won't this split file script work? <max@NOSPAMkipness.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 17 Jun 2004 20:32:16 -0700
From: jl_post@hotmail.com (J. Romano)
Subject: A neat trick to serialize arrays and hashes
Message-Id: <b893f5d4.0406171932.376394cf@posting.google.com>

Dear Perl community,

   Today I invented a neat new trick that I thought I'd share with
everyone here.

   But before I continue, I'd like to point out to anyone out there
who thinks that my trick is "obvious to everyone but inexperienced
programmers" or that "it's not worth knowing because better approaches
exist" that some people enjoy learning a new simple trick, even if
they never get a chance to apply it.  Besides, sharing a trick that
was just discovered (even if most programmers already know about it)
has the benefit of educating any programmer who, for some reason or
another, happens to not be aware of that particular technique.  So if
you really must reply saying that you already knew this trick, instead
of saying how it didn't help you at all, how about sharing something
else that might be useful to someone in the Perl community?  That
would be much appreciated.

   Anyway, now that I'm off my soap box, here is what I discovered
this morning:

   The pack string "(w/a*)*" is useful for serializing arrays and
hashes -- that is, it can pack and unpack arrays and hashes to and
from a string.  Let me explain in more detail:

   I have an array, which holds the names of some animals:

      @a = ("dog", "cat", "bird", "camel", "giraffe");

I might want to serialize @a into a string for the purpose of storing
it off into a file so I can retrieve it later.  Well, I could use the
Data::Dumper module to create a string (and later the eval command to
extract out the reference which then I can assign to the array), but
that can get complicated if I don't have much experience using the
Data::Dumper module.

   Well, using the pack string "(w/a*)*" I can easily serialize the
array into a string like so:

      $string = pack("(w/a*)*", @a);

Now $string contains all the encoded information needed to reconstruct
the @a array.  So if I wanted to use $string to create a @b array that
was identical to the @a array, I can use unpack() with the same pack
string:

      @b = unpack("(w/a*)*", $string);

   Neat, doncha think?  This same technique also works with hashes:

      $string = pack("(w/a*)*", %ENV);
      %wow = unpack("(w/a*)*", $string);
      # The %wow hash is now an exact copy of %ENV

   Now that we have a string representation of an array or hash, we
can save the string to a file, send it over a socket, or even encrypt
it using some encryption algorithm.

   This approach can even handle arrays (and hashes) that contain
scalars consisting of newlines, null-bytes, and other unprintable
characters!

   There are a few important items to point out:

1.  The serialized string will most likely contain
    non-printable characters, which may include some
    newline characters, even if no scalar in the 
    original array/hash contains a "\n" character.
    Because of this, you should use the binmode()
    function on any filehandle you plan to print the
    string out to.

2.  If the array or hash contains any numbers, they
    will be converted to their string representation.

3.  This technique only handles simple arrays and hashes.
    In other words, multi-dimensional arrays and hashes,
    lists of lists, an references are not handled
    correctly.  If you really want to serialize a
    complex structure such as one of these, I recommend
    using another approach, like taking advantage of
    the Data::Dumper module.  You CAN however, create
    an array of these serialized arrays, and serialize
    that array!

4.  The "w" in the pack string "(w/a*)*" allows for the
    encoding of any arbitrary-length string, even if it
    is longer than 0xffffffff bytes (4,294,967,295
    bytes).  But since "w" is only used for encoding
    non-negative integers, the "(w/a*)*" pack string
    cannot be used to encode arrays or hashes
    containing negative-length strings.  Fortunately,
    that's never been a problem for me.  :)

5.  I do not know if this trick can handle arrays
    and hashes containing Unicode strings.  My guess
    is that it can, but I haven't tested it so I can't
    say for sure.

   Anyway, that's my trick that I thought I would share with the rest
of you.  Have fun with it!

   -- Jean-Luc Romano


------------------------------

Date: Thu, 17 Jun 2004 23:51:57 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: A neat trick to serialize arrays and hashes
Message-Id: <tFtAc.36736$nY.1164005@news20.bellglobal.com>


"J. Romano" <jl_post@hotmail.com> wrote in message
news:b893f5d4.0406171932.376394cf@posting.google.com...
>

<snip explanation of packing>

>
>    There are a few important items to point out:
>
> 1.  The serialized string will most likely contain
>     non-printable characters, which may include some
>     newline characters, even if no scalar in the
>     original array/hash contains a "\n" character.
>     Because of this, you should use the binmode()
>     function on any filehandle you plan to print the
>     string out to.
>
> 2.  If the array or hash contains any numbers, they
>     will be converted to their string representation.
>
> 3.  This technique only handles simple arrays and hashes.
>     In other words, multi-dimensional arrays and hashes,
>     lists of lists, an references are not handled
>     correctly.  If you really want to serialize a
>     complex structure such as one of these, I recommend
>     using another approach, like taking advantage of
>     the Data::Dumper module.  You CAN however, create
>     an array of these serialized arrays, and serialize
>     that array!
>
> 4.  The "w" in the pack string "(w/a*)*" allows for the
>     encoding of any arbitrary-length string, even if it
>     is longer than 0xffffffff bytes (4,294,967,295
>     bytes).  But since "w" is only used for encoding
>     non-negative integers, the "(w/a*)*" pack string
>     cannot be used to encode arrays or hashes
>     containing negative-length strings.  Fortunately,
>     that's never been a problem for me.  :)
>
> 5.  I do not know if this trick can handle arrays
>     and hashes containing Unicode strings.  My guess
>     is that it can, but I haven't tested it so I can't
>     say for sure.
>

Sorry to rain on your parade, but with all the caveats don't you think it
would be better just to use the Storable module, especially since it's part
of the core distribution? Better techniques are worth noting for the simple
reason that they're better...

Matt




------------------------------

Date: Fri, 18 Jun 2004 04:35:55 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: A neat trick to serialize arrays and hashes
Message-Id: <catrfb$k5d$1@wisteria.csv.warwick.ac.uk>


Quoth jl_post@hotmail.com (J. Romano):
> 3.  This technique only handles simple arrays and hashes.
>     In other words, multi-dimensional arrays and hashes,
>     lists of lists, an references are not handled
>     correctly.  If you really want to serialize a
>     complex structure such as one of these, I recommend
>     using another approach, like taking advantage of
>     the Data::Dumper module.  You CAN however, create
>     an array of these serialized arrays, and serialize
>     that array!

 ...however, you can't then unserialize it, as the references have been
stringified and can't be converted back to refs. Yet another reason the
use Storable, which does this right...

Ben

-- 
perl -e'print map {/.(.)/s} sort unpack "a2"x26, pack "N"x13,
qw/1632265075 1651865445 1685354798 1696626283 1752131169 1769237618
1801808488 1830841936 1886550130 1914728293 1936225377 1969451372
2047502190/'                                                 # ben@morrow.me.uk


------------------------------

Date: Fri, 18 Jun 2004 07:04:43 +0200
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: A neat trick to serialize arrays and hashes
Message-Id: <2jfbfdF116mtjU1@uni-berlin.de>

Also sprach J. Romano:

[...]

>    Well, using the pack string "(w/a*)*" I can easily serialize the
> array into a string like so:
> 
>       $string = pack("(w/a*)*", @a);
> 
> Now $string contains all the encoded information needed to reconstruct
> the @a array.  So if I wanted to use $string to create a @b array that
> was identical to the @a array, I can use unpack() with the same pack
> string:
> 
>       @b = unpack("(w/a*)*", $string);
> 
>    Neat, doncha think?  This same technique also works with hashes:
> 
>       $string = pack("(w/a*)*", %ENV);
>       %wow = unpack("(w/a*)*", $string);
>       # The %wow hash is now an exact copy of %ENV
> 
>    Now that we have a string representation of an array or hash, we
> can save the string to a file, send it over a socket, or even encrypt
> it using some encryption algorithm.

[...]

> 5.  I do not know if this trick can handle arrays
>     and hashes containing Unicode strings.  My guess
>     is that it can, but I haven't tested it so I can't
>     say for sure.

It can, but there's a slight drawback: You'll lose the UTF-8 flag when
unpacking the string:

    $ perl -MDevel::Peek -Mcharnames=:full
    Dump((unpack "(w/a*)*", pack "(w/a*)*", "\N{EURO-CURRENCY SIGN}123")[0]);
    ^D
    SV = PV(0x8139efc) at 0x8144cc4
      REFCNT = 1
      FLAGS = (TEMP,POK,pPOK)
      PV = 0x8140020 "\342\202\240123"\0
      CUR = 6
      LEN = 7
    $ perl -MDevel::Peek -Mcharnames=:full
    Dump("\N{EURO-CURRENCY SIGN}123");
    ^D
    SV = PV(0x8174280) at 0x814891c
      REFCNT = 1
      FLAGS = (POK,READONLY,pPOK,UTF8)
      PV = 0x81494a8 "\342\202\240123"\0 [UTF8 "\x{20a0}123"]
      CUR = 6
      LEN = 7
	    
That's a bit of a problem because you can't tell whether 
"\342\202\240123" is just a sequence of bytes or whether it happens to
be a unicode string.

>    Anyway, that's my trick that I thought I would share with the rest
> of you.  Have fun with it!

Very nice, thank you. I'm quite a fan of pack/unpack and so I love every
trick involving those.

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


------------------------------

Date: Fri, 18 Jun 2004 06:11:12 GMT
From: "Shahriar" <shahriar_mokhtarzad@pacbell.net>
Subject: Can't locate package AutoLoader for @File::List::ISA at...
Message-Id: <4IvAc.74255$Db3.53789@newssvr29.news.prodigy.com>

Hi Folks,

I just installed *FILE-LIST* from ASP. I am running ASP (see below for
version information)
does any one know about this error:

This is perl, v5.8.3 built for MSWin32-x86-multi-thread
(with 8 registered patches, see perl -V for more detail)

Copyright 1987-2003, Larry Wall

Binary build 809 provided by ActiveState Corp. http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Feb  3 2004 00:28:51

Regards,

-shahriar




------------------------------

Date: Fri, 18 Jun 2004 07:59:28 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: parsing file name assigning extension to a variable
Message-Id: <whxAc.132899$Ly.58210@attbi_s01>

Alexander Heimann wrote:

> If anyone is interested. I am pasting the code that worked for the
> above problem. The main problem i was having was that when I was
> reading the directory i forgot to add
> next if $file =~/^\.\.?$/; after the while (defined($file =
> readdir(DIR))) to skip over the .

But what if someone creates a subdirectory in D:/D2 ?
The check on /^\.\.$/ is for the cases where files and subdirectories
will both be processed.

In your case, it is more robust to use
   next unless -f "$mydir/$file";
to skip anything that is not a plain file (which will skip '.' and '..').

	-Joe


------------------------------

Date: Thu, 17 Jun 2004 21:15:51 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: pattern match problem
Message-Id: <8nrAc.34845$nY.1117702@news20.bellglobal.com>


"Michal Wojciechowski" <odyniec-usenet@odyniec.net> wrote in message
news:87wu262hhf.fsf@odyniec.odyniec.net...
> "Lex" <nospam@peng.nl> writes:
>
> [...]
>
> > look for <pre> and </pre> and erase all the <br> that you find
> > within it, no matter what you find. However: leave the rest! (
> > linebreaks etc.)
>
> [...]
>
> > $rec{'Text'} =~ s%<pre>(.*?)<br>(.*?)</pre>%<pre>$1 $2</pre>%gim;
>
> The above would work, if it could match overlapping occurrences. One
> solution is to use it in a loop, like:
>
>   while (s!<pre>(.*?)<br>(.*?)</pre>!<pre>$1 $2</pre>!sig) {}
>

Two quick things: you want foreach not while, and pre and break tags can
include style definitions etc., so best to check for <br[^>]*>.

foreach (s!<pre[^>]*>(.*?)<br[^>]*>(.*?)</pre>!<pre>$1 $2</pre>!sig) {}

I'd give my vote to Gunnar's method, though, as you could wind up doing many
passes over the file this way before you clear them all out (though what
<br> tags are doing inside <pre> tags eludes me at the moment).

Matt




------------------------------

Date: Fri, 18 Jun 2004 07:42:19 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: pattern match problem
Message-Id: <v1xAc.62123$eu.27793@attbi_s02>

Matt Garrish wrote:

> "Michal Wojciechowski" <odyniec-usenet@odyniec.net> wrote in message
> news:87wu262hhf.fsf@odyniec.odyniec.net...
> 
>>"Lex" <nospam@peng.nl> writes:
>>
>>[...]
>>
>>
>>>look for <pre> and </pre> and erase all the <br> that you find
>>>within it, no matter what you find. However: leave the rest! (
>>>linebreaks etc.)
>>
>>[...]
>>
>>
>>>$rec{'Text'} =~ s%<pre>(.*?)<br>(.*?)</pre>%<pre>$1 $2</pre>%gim;
>>
>>The above would work, if it could match overlapping occurrences. One
>>solution is to use it in a loop, like:
>>
>>  while (s!<pre>(.*?)<br>(.*?)</pre>!<pre>$1 $2</pre>!sig) {}
> 
> Two quick things: you want foreach not while, and pre and break tags can
> include style definitions etc., so best to check for <br[^>]*>.

No, foreach() will remove only the first <br>, not all of them.

The code below prints partial results so that you can see the
loop's actions.

unix% cat temp.pl
$string = "<pre>foo<br>bar<br>baz<br>xyzzy<br>quux</pre>";

$_ = $string;
while (s!<pre>(.*?)<br>(.*?)</pre>!<pre>$1 $2</pre>!sig) { print "Part:$_\n";}
print "End while(): $_\n";

$_ = $string;
print "Part:$_\n" while s!<pre>(.*?)<br>(.*?)</pre>!<pre>$1 $2</pre>!sig;
print "End 1 while: $_\n";

$_ = $string;
foreach (s!<pre[^>]*>(.*?)<br[^>]*>(.*?)</pre>!<pre>$1 $2</pre>!sig) { print 
"Part:$_\n";}
print "End foreach: $_\n";

unix% perl temp.pl
Part:<pre>foo bar<br>baz<br>xyzzy<br>quux</pre>
Part:<pre>foo bar baz<br>xyzzy<br>quux</pre>
Part:<pre>foo bar baz xyzzy<br>quux</pre>
Part:<pre>foo bar baz xyzzy quux</pre>
End while(): <pre>foo bar baz xyzzy quux</pre>
Part:<pre>foo bar<br>baz<br>xyzzy<br>quux</pre>
Part:<pre>foo bar baz<br>xyzzy<br>quux</pre>
Part:<pre>foo bar baz xyzzy<br>quux</pre>
Part:<pre>foo bar baz xyzzy quux</pre>
End 1 while: <pre>foo bar baz xyzzy quux</pre>
Part:1
End foreach: <pre>foo bar<br>baz<br>xyzzy<br>quux</pre>

	-Joe



------------------------------

Date: Fri, 18 Jun 2004 09:16:26 +0100
From: "Lex" <nospam@peng.nl>
Subject: Re: pattern match problem
Message-Id: <mBxAc.280746$Rc.8289988@news-reader.eresmas.com>


"Gunnar Hjalmarsson" <noreply@gunnar.cc> wrote in message
news:2jeo7nFv90knU1@uni-berlin.de...

> This problem appears to be rather limited,
> and under certain conditions, the OP's need may well be served through
> something like this:
>
>      $rec{'Text'} =~ s{(<pre.*?>.+?</pre>)}{
>          (my $rest = $1) =~ s/<br.*?>//gis;
>          $rest
>      }egis;
>
> To the OP: Please study the FAQ:
>
>      perldoc -q "remove HTML"
>
> and consider whether using the s/// operator like above would be
> 'safe' enough for your case.
>

Thanks a lot Gunnar!
It works like a charm.
It would be safe enough for my case as there is nothing more than <br> tags
within the <pre> and </pre> tags. I've got control over that, it's not
parsing just any html file you see. Just pieces of text from a database.

Lex




------------------------------

Date: Fri, 18 Jun 2004 10:24:27 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: pattern match problem
Message-Id: <2jfn65F10ng9bU1@uni-berlin.de>

Lex wrote:
> Gunnar Hjalmarsson wrote:
>> 
>>     $rec{'Text'} =~ s{(<pre.*?>.+?</pre>)}{
>>         (my $rest = $1) =~ s/<br.*?>//gis;
>>         $rest
>>     }egis;
> 
> Thanks a lot Gunnar!
> It works like a charm.

Good.

> It would be safe enough for my case as there is nothing more than
> <br> tags within the <pre> and </pre> tags. I've got control over
> that, it's not parsing just any html file you see. Just pieces of
> text from a database.

That's what I suspected.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Fri, 18 Jun 2004 01:36:03 +0000 (UTC)
From: Mike Hunter <mhunter@berkeley.edu>
Subject: perl wrapper to limit stderr to first 1000 lines?
Message-Id: <slrncd4hke.1sk.mhunter@celeste.net.berkeley.edu>

Hi,

I have some cron jobs that can sometimes send out too much noise to stderr,
which in turn causes sendmail to do bad things :(  I'm trying to limit the
amount of stderr I see from those scripts without changing the scripts
themselves.  I am looking to write a perl wrapper that does something like this:

my $program = shift @ARGV;
my $args = join " ", @ARGV;

open PGMSTDOUT, "$program $args|" or die "blah!";

 .....somehow get the program's stdout into PGMSTDOUT

while (<PGMSTDOUT>)
{
	print $_;
}

my $n = 0;
my $error_line = <PGMSTDERR>;
while (<PGMSTDERR> && ($n < 1000))
{
	$error_line = <PGMSTDERR>;
	print STDERR $error_line;
	$n++;
}

The only similar advice I've seen on the web was here:

http://perlmonks.thepen.com/730.html

But I don't want to follow that approach because I don't want to create a file
on disk with all the STDERR stuff, I want to discard it.

Any help?  How do I *pipe* stderr to something without duping it to stdout?

Thanks,

Mike


------------------------------

Date: Fri, 18 Jun 2004 01:54:49 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: perl wrapper to limit stderr to first 1000 lines?
Message-Id: <cati19$dru$1@wisteria.csv.warwick.ac.uk>


Quoth mhunter@uclink.berkeley.edu:
> 
> I have some cron jobs that can sometimes send out too much noise to stderr,
> which in turn causes sendmail to do bad things :(  I'm trying to limit the
> amount of stderr I see from those scripts without changing the scripts
> themselves.  I am looking to write a perl wrapper that does something like this:
> 
> my $program = shift @ARGV;
> my $args = join " ", @ARGV;

What's the point of shifing @ARGV if you're just going to join 
"$program " onto the beginning anyway?

> open PGMSTDOUT, "$program $args|" or die "blah!";

Don't do this: use three-arg open.
Use lexical file-handles.

open my $PGMSTDOUT, '-|', @ARGV or die "can't run $ARGV[0]: $!";

> while (<PGMSTDOUT>)
> {
> 	print $_;
> }
> 
> my $n = 0;

Perl provides the special variable $. for this job. See perldoc perlvar.

> my $error_line = <PGMSTDERR>;
> while (<PGMSTDERR> && ($n < 1000))

This is wrong: Perl does special magic with while (<>). What you mean is

while (<PGMSTDERR>) {
    $. > 999 and last;

or

while (defined($_ = <PGMSTDERR>) and $. < 1000) {

which is what perl expands while (<>) into.

> {
> 	$error_line = <PGMSTDERR>;

Presumably you are reading again because you lost the results when you
lost the magic while (<>); this will discard every other line, though.

> 	print STDERR $error_line;
> 	$n++;
> }
>
> The only similar advice I've seen on the web was here:
> 
> http://perlmonks.thepen.com/730.html
> 
> But I don't want to follow that approach because I don't want to create a file
> on disk with all the STDERR stuff, I want to discard it.
> 
> Any help?  How do I *pipe* stderr to something without duping it to stdout?

If you simply want to discard all of stderr, use 2>/dev/null in the
command line.  If you want to grab stdout and stderr separately, you
will need to use IPC::Open3; you will also need to use IO::Select to
process the bits of each as they arrive, or you'll get deadlocks (you'll
be waiting for the end of stdout, the program will be blocking trying to
write something to stderr).

If all you want to do to stderr is grab the first bit, then try this
shell script (untested):

#!/bin/sh

stderr=$(mktemp -t cron.XXXXXXXXXX)
stdout=$(mktemp -t cron.XXXXXXXXXX)

"$@" 2>&1 >"$stdout" | head -n1000 >"$stderr"
err=$?

cat "$stdout"
cat "$stderr" >&2

rm -f "$stdout" "$stderr"

exit $err

__END__

Using temporary files makes avoiding deadlock a lot easier.

Ben

-- 
               We do not stop playing because we grow old; 
                  we grow old because we stop playing.
                            ben@morrow.me.uk


------------------------------

Date: Fri, 18 Jun 2004 02:22:15 -0500
From: tadmc@augustmail.com
Subject: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.5 $)
Message-Id: <5YqdnVBrc-i6BU_d4p2dnA@august.net>

Outline
   Before posting to comp.lang.perl.misc
      Must
       - Check the Perl Frequently Asked Questions (FAQ)
       - Check the other standard Perl docs (*.pod)
      Really Really Should
       - Lurk for a while before posting
       - Search a Usenet archive
      If You Like
       - Check Other Resources
   Posting to comp.lang.perl.misc
      Is there a better place to ask your question?
       - Question should be about Perl, not about the application area
      How to participate (post) in the clpmisc community
       - Carefully choose the contents of your Subject header
       - Use an effective followup style
       - Speak Perl rather than English, when possible
       - Ask perl to help you
       - Do not re-type Perl code
       - Provide enough information
       - Do not provide too much information
       - Do not post binaries, HTML, or MIME
      Social faux pas to avoid
       - Asking a Frequently Asked Question
       - Asking a question easily answered by a cursory doc search
       - Asking for emailed answers
       - Beware of saying "doesn't work"
       - Sending a "stealth" Cc copy
      Be extra cautious when you get upset
       - Count to ten before composing a followup when you are upset
       - Count to ten after composing and before posting when you are upset
-----------------------------------------------------------------

Posting Guidelines for comp.lang.perl.misc ($Revision: 1.5 $)
    This newsgroup, commonly called clpmisc, is a technical newsgroup
    intended to be used for discussion of Perl related issues (except job
    postings), whether it be comments or questions.

    As you would expect, clpmisc discussions are usually very technical in
    nature and there are conventions for conduct in technical newsgroups
    going somewhat beyond those in non-technical newsgroups.

    The article at:

        http://www.catb.org/~esr/faqs/smart-questions.html

    describes how to get answers from technical people in general.

    This article describes things that you should, and should not, do to
    increase your chances of getting an answer to your Perl question. It is
    available in POD, HTML and plain text formats at:

     http://mail.augustmail.com/~tadmc/clpmisc.shtml

    For more information about netiquette in general, see the "Netiquette
    Guidelines" at:

     http://andrew2.andrew.cmu.edu/rfc/rfc1855.html

    A note to newsgroup "regulars":

       Do not use these guidelines as a "license to flame" or other
       meanness. It is possible that a poster is unaware of things
       discussed here.  Give them the benefit of the doubt, and just
       help them learn how to post, rather than assume 

    A note about technical terms used here:

       In this document, we use words like "must" and "should" as
       they're used in technical conversation (such as you will
       encounter in this newsgroup). When we say that you *must* do
       something, we mean that if you don't do that something, then
       it's unlikely that you will benefit much from this group.
       We're not bossing you around; we're making the point without
       lots of words.

    Do *NOT* send email to the maintainer of these guidelines. It will be
    discarded unread. The guidelines belong to the newsgroup so all
    discussion should appear in the newsgroup. I am just the secretary that
    writes down the consensus of the group.

Before posting to comp.lang.perl.misc
  Must
    This section describes things that you *must* do before posting to
    clpmisc, in order to maximize your chances of getting meaningful replies
    to your inquiry and to avoid getting flamed for being lazy and trying to
    have others do your work.

    The perl distribution includes documentation that is copied to your hard
    drive when you install perl. Also installed is a program for looking
    things up in that (and other) documentation named 'perldoc'.

    You should either find out where the docs got installed on your system,
    or use perldoc to find them for you. Type "perldoc perldoc" to learn how
    to use perldoc itself. Type "perldoc perl" to start reading Perl's
    standard documentation.

    Check the Perl Frequently Asked Questions (FAQ)
        Checking the FAQ before posting is required in Big 8 newsgroups in
        general, there is nothing clpmisc-specific about this requirement.
        You are expected to do this in nearly all newsgroups.

        You can use the "-q" switch with perldoc to do a word search of the
        questions in the Perl FAQs.

    Check the other standard Perl docs (*.pod)
        The perl distribution comes with much more documentation than is
        available for most other newsgroups, so in clpmisc you should also
        see if you can find an answer in the other (non-FAQ) standard docs
        before posting.

    It is *not* required, or even expected, that you actually *read* all of
    Perl's standard docs, only that you spend a few minutes searching them
    before posting.

    Try doing a word-search in the standard docs for some words/phrases
    taken from your problem statement or from your very carefully worded
    "Subject:" header.

  Really Really Should
    This section describes things that you *really should* do before posting
    to clpmisc.

    Lurk for a while before posting
        This is very important and expected in all newsgroups. Lurking means
        to monitor a newsgroup for a period to become familiar with local
        customs. Each newsgroup has specific customs and rituals. Knowing
        these before you participate will help avoid embarrassing social
        situations. Consider yourself to be a foreigner at first!

    Search a Usenet archive
        There are tens of thousands of Perl programmers. It is very likely
        that your question has already been asked (and answered). See if you
        can find where it has already been answered.

        One such searchable archive is:

         http://groups.google.com/advanced_group_search

  If You Like
    This section describes things that you *can* do before posting to
    clpmisc.

    Check Other Resources
        You may want to check in books or on web sites to see if you can
        find the answer to your question.

        But you need to consider the source of such information: there are a
        lot of very poor Perl books and web sites, and several good ones
        too, of course.

Posting to comp.lang.perl.misc
    There can be 200 messages in clpmisc in a single day. Nobody is going to
    read every article. They must decide somehow which articles they are
    going to read, and which they will skip.

    Your post is in competition with 199 other posts. You need to "win"
    before a person who can help you will even read your question.

    These sections describe how you can help keep your article from being
    one of the "skipped" ones.

  Is there a better place to ask your question?
    Question should be about Perl, not about the application area
        It can be difficult to separate out where your problem really is,
        but you should make a conscious effort to post to the most
        applicable newsgroup. That is, after all, where you are the most
        likely to find the people who know how to answer your question.

        Being able to "partition" a problem is an essential skill for
        effectively troubleshooting programming problems. If you don't get
        that right, you end up looking for answers in the wrong places.

        It should be understood that you may not know that the root of your
        problem is not Perl-related (the two most frequent ones are CGI and
        Operating System related), so off-topic postings will happen from
        time to time. Be gracious when someone helps you find a better place
        to ask your question by pointing you to a more applicable newsgroup.

  How to participate (post) in the clpmisc community
    Carefully choose the contents of your Subject header
        You have 40 precious characters of Subject to win out and be one of
        the posts that gets read. Don't waste them. Take care while
        composing them, they are the key that opens the door to getting an
        answer.

        Spend them indicating what aspect of Perl others will find if they
        should decide to read your article.

        Do not spend them indicating "experience level" (guru, newbie...).

        Do not spend them pleading (please read, urgent, help!...).

        Do not spend them on non-Subjects (Perl question, one-word
        Subject...)

        For more information on choosing a Subject see "Choosing Good
        Subject Lines":

         http://www.cpan.org/authors/id/D/DM/DMR/subjects.post

        Part of the beauty of newsgroup dynamics, is that you can contribute
        to the community with your very first post! If your choice of
        Subject leads a fellow Perler to find the thread you are starting,
        then even asking a question helps us all.

    Use an effective followup style
        When composing a followup, quote only enough text to establish the
        context for the comments that you will add. Always indicate who
        wrote the quoted material. Never quote an entire article. Never
        quote a .signature (unless that is what you are commenting on).

        Intersperse your comments *following* each section of quoted text to
        which they relate. Unappreciated followup styles are referred to as
        "top-posting", "Jeopardy" (because the answer comes before the
        question), or "TOFU" (Text Over, Fullquote Under).

        Reversing the chronology of the dialog makes it much harder to
        understand (some folks won't even read it if written in that style).
        For more information on quoting style, see:

         http://web.presby.edu/~nnqadmin/nnq/nquote.html

    Speak Perl rather than English, when possible
        Perl is much more precise than natural language. Saying it in Perl
        instead will avoid misunderstanding your question or problem.

        Do not say: I have variable with "foo\tbar" in it.

        Instead say: I have $var = "foo\tbar", or I have $var = 'foo\tbar',
        or I have $var = <DATA> (and show the data line).

    Ask perl to help you
        You can ask perl itself to help you find common programming mistakes
        by doing two things: enable warnings (perldoc warnings) and enable
        "strict"ures (perldoc strict).

        You should not bother the hundreds/thousands of readers of the
        newsgroup without first seeing if a machine can help you find your
        problem. It is demeaning to be asked to do the work of a machine. It
        will annoy the readers of your article.

        You can look up any of the messages that perl might issue to find
        out what the message means and how to resolve the potential mistake
        (perldoc perldiag). If you would like perl to look them up for you,
        you can put "use diagnostics;" near the top of your program.

    Do not re-type Perl code
        Use copy/paste or your editor's "import" function rather than
        attempting to type in your code. If you make a typo you will get
        followups about your typos instead of about the question you are
        trying to get answered.

    Provide enough information
        If you do the things in this item, you will have an Extremely Good
        chance of getting people to try and help you with your problem!
        These features are a really big bonus toward your question winning
        out over all of the other posts that you are competing with.

        First make a short (less than 20-30 lines) and *complete* program
        that illustrates the problem you are having. People should be able
        to run your program by copy/pasting the code from your article. (You
        will find that doing this step very often reveals your problem
        directly. Leading to an answer much more quickly and reliably than
        posting to Usenet.)

        Describe *precisely* the input to your program. Also provide example
        input data for your program. If you need to show file input, use the
        __DATA__ token (perldata.pod) to provide the file contents inside of
        your Perl program.

        Show the output (including the verbatim text of any messages) of
        your program.

        Describe how you want the output to be different from what you are
        getting.

        If you have no idea at all of how to code up your situation, be sure
        to at least describe the 2 things that you *do* know: input and
        desired output.

    Do not provide too much information
        Do not just post your entire program for debugging. Most especially
        do not post someone *else's* entire program.

    Do not post binaries, HTML, or MIME
        clpmisc is a text only newsgroup. If you have images or binaries
        that explain your question, put them in a publically accessible
        place (like a Web server) and provide a pointer to that location. If
        you include code, cut and paste it directly in the message body.
        Don't attach anything to the message. Don't post vcards or HTML.
        Many people (and even some Usenet servers) will automatically filter
        out such messages. Many people will not be able to easily read your
        post. Plain text is something everyone can read.

  Social faux pas to avoid
    The first two below are symptoms of lots of FAQ asking here in clpmisc.
    It happens so often that folks will assume that it is happening yet
    again. If you have looked but not found, or found but didn't understand
    the docs, say so in your article.

    Asking a Frequently Asked Question
        It should be understood that you may have missed the applicable FAQ
        when you checked, which is not a big deal. But if the Frequently
        Asked Question is worded similar to your question, folks will assume
        that you did not look at all. Don't become indignant at pointers to
        the FAQ, particularly if it solves your problem.

    Asking a question easily answered by a cursory doc search
        If folks think you have not even tried the obvious step of reading
        the docs applicable to your problem, they are likely to become
        annoyed.

        If you are flamed for not checking when you *did* check, then just
        shrug it off (and take the answer that you got).

    Asking for emailed answers
        Emailed answers benefit one person. Posted answers benefit the
        entire community. If folks can take the time to answer your
        question, then you can take the time to go get the answer in the
        same place where you asked the question.

        It is OK to ask for a *copy* of the answer to be emailed, but many
        will ignore such requests anyway. If you munge your address, you
        should never expect (or ask) to get email in response to a Usenet
        post.

        Ask the question here, get the answer here (maybe).

    Beware of saying "doesn't work"
        This is a "red flag" phrase. If you find yourself writing that,
        pause and see if you can't describe what is not working without
        saying "doesn't work". That is, describe how it is not what you
        want.

    Sending a "stealth" Cc copy
        A "stealth Cc" is when you both email and post a reply without
        indicating *in the body* that you are doing so.

  Be extra cautious when you get upset
    Count to ten before composing a followup when you are upset
        This is recommended in all Usenet newsgroups. Here in clpmisc, most
        flaming sub-threads are not about any feature of Perl at all! They
        are most often for what was seen as a breach of netiquette. If you
        have lurked for a bit, then you will know what is expected and won't
        make such posts in the first place.

        But if you get upset, wait a while before writing your followup. I
        recommend waiting at least 30 minutes.

    Count to ten after composing and before posting when you are upset
        After you have written your followup, wait *another* 30 minutes
        before committing yourself by posting it. You cannot take it back
        once it has been said.

AUTHOR
    Tad McClellan <tadmc@augustmail.com> and many others on the
    comp.lang.perl.misc newsgroup.



------------------------------

Date: 17 Jun 2004 18:45:23 -0700
From: jamasd@hotmail.com
Subject: Re: sorting text
Message-Id: <3151a273.0406171745.5d83e854@posting.google.com>

Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in message news:<2jedptF107b39U1@uni-berlin.de>...
> jamasd@hotmail.com wrote to comp.lang.perl.modules:
> > Here is a sample of my data (each column is separated by tabs):
> 
> What the h...  Yesterday you posted the same question to 
> comp.lang.perl.misc, and several people have helped you. Why are you 
> repeating the original question now in comp.lang.perl.modules??? (And 
> how is your question related to the use of Perl modules?)

I am new to the boards, and needed an answer fast. But I didnt realize
how helpful you guys are. Thank you. I am not sure if the program
above does this, because when I tried to run it, an error message came
about. However, I want a program that compares data in the hash with
data in the third column. If data in the hash matches a third column
of a row, it prints the row. Sorry for the misunderstanding.

Thanks again


------------------------------

Date: 17 Jun 2004 18:46:50 -0700
From: jamasd@hotmail.com
Subject: Re: sorting text
Message-Id: <3151a273.0406171746.4104539f@posting.google.com>

Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in message news:<2jedptF107b39U1@uni-berlin.de>...
> jamasd@hotmail.com wrote to comp.lang.perl.modules:
> > Here is a sample of my data (each column is separated by tabs):
> 
> What the h...  Yesterday you posted the same question to 
> comp.lang.perl.misc, and several people have helped you. Why are you 
> repeating the original question now in comp.lang.perl.modules??? (And 
> how is your question related to the use of Perl modules?)


Also, I didnt realize there was a delay in posting (I didnt read the
thing following a post which was stupid on my part). I was wondering
why it wasnt posting.


------------------------------

Date: Fri, 18 Jun 2004 04:06:27 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: sorting text
Message-Id: <2jf11aF10anurU1@uni-berlin.de>

jamasd@hotmail.com wrote:
> Gunnar Hjalmarsson wrote:
>> What the h...  Yesterday you posted the same question to 
>> comp.lang.perl.misc, and several people have helped you. Why are
>> you repeating the original question now in
>> comp.lang.perl.modules??? (And how is your question related to
>> the use of Perl modules?)
> 
> I am new to the boards,

That's okay... Except that they are not "boards" - they are newsgroups
or Usenet groups.

Please study the posting guidelines for comp.lang.perl.misc:

    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

and maybe also this, for instance:

    http://www.islandnet.com/~tmc/html/articles/usentnws.htm

> and needed an answer fast.

That, on the other hand, is not an excuse for breaking the Usenet
etiquette.

> I am not sure if the program above does this, because when I tried
> to run it, an error message came about.

See the replies in comp.lang.perl.misc.

> Also, I didnt realize there was a delay in posting

Hmm.. I see that you are posting from Google groups, and if you are
also reading there, it's correct that you need to calculate with
several hours delay.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Fri, 18 Jun 2004 07:11:05 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: sorting text
Message-Id: <dAwAc.49355$Hg2.33024@attbi_s04>

jamasd@hotmail.com wrote:

> if ( 2 < @fields ) { # Ignore if less than 3 fields
> next;
> }

That test will ignore lines with 3, 4 or more fields; the
exact opposite of what you want.  It will allow lines
with 1 or 2 fields, which is guarenteed to cause an
"uninitialized variable" error with {$fields[2]}.

I recommend that you use perl idioms instead of C.

   next if @fields < 3; # Skip current line if less than 3 fields.

	-Joe


------------------------------

Date: 18 Jun 2004 01:19:52 -0700
From: jamasd@hotmail.com
Subject: Re: sorting text
Message-Id: <3151a273.0406180019.5f52a43b@posting.google.com>

use strict;
use warnings;

my ( $buffer , @fields , $filename , %hash1 );

$filename = 'C:\Documents and Settings\vhlab\Desktop\doc.txt';
open(INPUT,"<$filename") or
die("Can't open file \"$filename\" : $!\n");

%hash1 = ( "ytkyk" => 1 , "ghjhg" => 1 );

while ( $buffer = <INPUT> ) {
     chomp $buffer;
@fields = split(/\t+/,$buffer);
    if ( 2 < @fields ) { # Ignore if less than 3 fields
        next;
              }
        unless ( exists $hash1{$fields[2]} ) {
            next;
   } print "$buffer\n";
}
close INPUT;

The code returned the message:
Use of uninitialized value in exists at C:\DOCUME~1\OWNER~1.NOT\LOCALS~1\Temp\di
r3F.tmp\Untitled line 20, <INPUT> line 1.
Use of uninitialized value in exists at C:\DOCUME~1\OWNER~1.NOT\LOCALS~1\Temp\di
r3F.tmp\Untitled line 20, <INPUT> line 2.
Use of uninitialized value in exists at C:\DOCUME~1\OWNER~1.NOT\LOCALS~1\Temp\di
r3F.tmp\Untitled line 20, <INPUT> line 3.
Use of uninitialized value in exists at C:\DOCUME~1\OWNER~1.NOT\LOCALS~1\Temp\di
r3F.tmp\Untitled line 20, <INPUT> line 4.
Use of uninitialized value in exists at C:\DOCUME~1\OWNER~1.NOT\LOCALS~1\Temp\di
r3F.tmp\Untitled line 20, <INPUT> line 5.

I am new to this and would appreciate help. Sorry for the bad netiquette.


------------------------------

Date: Fri, 18 Jun 2004 10:44:25 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: sorting text
Message-Id: <2jfobjF10lkajU1@uni-berlin.de>

jamasd@hotmail.com wrote:
> 
>     if ( 2 < @fields ) { # Ignore if less than 3 fields
>         next;
>               }

Why haven't you corrected this yet, as has been suggested in several
replies in comp.lang.perl.misc?

> The code returned the message:
> Use of uninitialized value in exists at 
> C:\DOCUME~1\OWNER~1.NOT\LOCALS~1\Temp\dir3F.tmp\Untitled line 20,
> <INPUT> line 1.

<another 4 similar warning messages snipped>

That's because the above faulty logic is still there in combination
with that the file you are reading contains five lines with two
columns or less (maybe blank lines...).

Just fix the above code!

> Sorry for the bad netiquette.

No problem, as long as you don't repeat the mistakes next time. ;-)

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Fri, 18 Jun 2004 01:41:31 GMT
From: "Max" <max@NOSPAMkipness.com>
Subject: Re: Why won't this split file script work?
Message-Id: <fLrAc.397$9t7.264@newssvr16.news.prodigy.com>

I really appreciate everybody's input. I got the script working and learned
some very good programming tips.

Thanks,
Max

"Thomas Church" <tchurch@gmail.com> wrote in message
news:ea0cdb4c.0406171526.77acf2e6@posting.google.com...
> "Max" <max@NOSPAMkipness.com> wrote in message
> news:<mJlAc.2227$Pt.1153@newssvr19.news.prodigy.com>...
> > I can't seem to figure out what I'm doing wrong, or maybe I'm just
rushing
> > as I need to split a 15000 line file into chunks.
>
> One other thought: you don't need to chomp unless you actually care about
> eliminating the newline. Since you add it back in again anyway when you
print,
> you can simplify the loop to:  (untested)
>
> while (<IN>) {
>     next if ($. < $start);
>     last if ($. > $stop);
>     print OUT $_;
> }




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6703
***************************************


home help back first fref pref prev next nref lref last post