[31127] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2372 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Apr 28 06:09:45 2009

Date: Tue, 28 Apr 2009 03:09:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 28 Apr 2009     Volume: 11 Number: 2372

Today's topics:
    Re: F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString <hjp-usenet2@hjp.at>
    Re: How to find duplicate in certain fields of array?? sln@netherlands.com
    Re: Is there a better way to convert foreign characters <hjp-usenet2@hjp.at>
    Re: Perl is too slow - A statement <bugbear@trim_papermule.co.uk_trim>
        Posting Guidelines for comp.lang.perl.misc ($Revision:  tadmc@seesig.invalid
    Re: Posting to perl.beginners via google groups <hjp-usenet2@hjp.at>
    Re: Problem in parsing from a pipe <whynot@pozharski.name>
    Re: Tk::Listbox greymausg@mail.com
    Re: unexplained warning message in m{...} regexp <devnull4711@web.de>
    Re: unexplained warning message in m{...} regexp sln@netherlands.com
    Re: unexplained warning message in m{...} regexp sln@netherlands.com
    Re: unexplained warning message in m{...} regexp <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 28 Apr 2009 11:03:29 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString() -vs- nodeValue())
Message-Id: <slrngvdhj4.hk.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-23 23:48, Eric Pozharski <whynot@pozharski.name> wrote:
> On 2009-04-22, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> On 2009-04-22 00:32, Eric Pozharski <whynot@pozharski.name> wrote:
>>> On 2009-04-20, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>>>> On 2009-04-17 00:23, Eric Pozharski <whynot@pozharski.name> wrote:
>>>>> On 2009-04-15, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>>>>>> On 2009-04-14 23:45, Eric Pozharski <whynot@pozharski.name> wrote:
> *SKIP*
>>>> * utf8::upgrade and utf8::downgrade aren't symmetric.
>>>
>>> I've noted, but...  F<encoding.pm> is wrong exactly how?
>>
>> It isn't "wrong". It is documented. But it is surprising and illogical
>> behaviour. A source of subtle bugs because the programmer most likely
>> won't think of that. And I think encoding.pm is full of such crannies. I
>> learned to avoid it pretty quickly.
>
> OK, let's leave it as a point to unnamed F<encoding.pm>'s dragons.
>
> *SKIP*
>>> So F<utf8.pm> utfizes symbols by accident.  At least that wasn't an
>>> intention.
>>
>> I'm not sure what you mean by "utfize", but if you mean: "Symbols can
>> contain all Unicode letters and digits, not just A-Z, a-z, 0-9", then
>> that's quite intentional, not an accident. But it is a logical
>> consequence of interpreting the source code as a sequence of Unicode
>> characters instead of a sequence of ASCII characters. So, as a
>> programmer you don't have to remember that "use utf8" decodes string
>> constants from UTF-8 *and* that it allows all Unicode letters and digits
>> in symbols *and* that the DATA stream has an ':encoding(utf8)' layer
>
> What seems to be undocumented BTW.  However, after your explanation, I
> think that it can't be any other way.
>
>> *and* whatever else may be affected. You have to remember one thing
>> only: Your source code consists of Unicode characters encoded in UTF-8
>> (or UTF-EBCDIC). Period. Nothing else. Clean and simple.
>
> I wasn't about what to remember.  I'm about "doing one thing".  I think,
> that neither F<utf8.pm> nor F<encoding.pm> do one thing.

What it does and what I have to remember is the same thing at the
interface level. I don't have to know how it is implemented. "use utf8"
may be implemented by sending carrier pidgeons to the oracle of Delphi,
for all I care. But what "use utf8" *does* in an observable manner, is
exactly one thing: It turns my source code from a byte stream into a
character stream (encoded in UTF-8). 


> (maybe I wasn't enough verbose this time)  English fits in 7bit
> encoding, whatever encoding it would have been.  It could be any other
> encoding (I did some reading about ASCII history (yes, I know wikipedia
> is a vague source)).  It could not be any other language.

German fits very nicely into 7 bits, too (we have 7 more letters, but
who needs an @, a \ or three sets of brackets? Or 33 control
characters?). So does Russian or Greek (if you only care about the
Russian language, you need only the Cyrillic alphabet, not both the
Cyrillic and the Latin alphabet).

>
>>> That seemingly contradicts my point of having an option.  Yes, but there
>>> must be something common for all.  By an accident -- it's English.
>>
>> Yes. English. Not ASCII. If you write Russian in ASCII I understand it
>> just as little than if you write in in Cyrillic.
>>
>> If you can write your programs in English, please do. Especially if you
>
> That "if" (the latter one) is somewhat offending.

Firstly, this was a generic "you", I wasn't speaking about you
personally. But even if I was, I don't see why that should be offensive.
You were asserting several times that sometimes there is no choice. So
why do you think it is offensive if I agree that sometimes there is no
choice? If your employer or client insists on the local language, you
can't use English. The only option you have then is to quit or reject
the contract.


>> plan to make it open source. Almost every programmer on the world has at
>
> That "open" is somewhat offending.

Again, I don't see why. Proprietary software is likely to be maintained
by programmers who speak the same language as the original programmer
(especially if the programmer was forced to use this language by company
policy). Free software OTOH will be maintained by people all over the
world.


>> least a basic grasp of English. But if for some reason you have to write
>
> "Quotation needed (tm)".  Or define "programmer".

Anybody who writes programs of more than trivial complexity on a regular
basis.


>>> My point isn't language mix;  I have no problem with this.
>>
>> I have. A program where all the identifiers, comments etc. are written
>> in Portugese or Polish is hard to figure out if you don't speak the
>> language. That they use the latin alphabet doesn't help much (except
>> that I have an inuitive (though very probably wrong) idea how to
>> pronounce them). 
>
> And here we have another difference between us.  I look inside others
> code mostly when I have problems with it, and sometimes when
> documentation is incomplete, or seemingly wrong, or there's no
> documentation at all.

So do I.

> I don't look inside out of pure curiousity.  And
> you know what?  I bet you know.  There's no comments.

But the subroutine and variable names are almost always at least related
to their meaning. Sometimes they are too short and cryptic, and
sometimes they are misleading, but usually you get a good impression of
what the programmer was trying to achieve from the identifiers alone,
without analysing the algorithm in detail. Just try one of these code
obfuscators one time which turn all subroutine names into "s0001",
"s0002", etc. and all variable names into "v0001", "v0002", etc. and
then try to understand the program. It is possible, of course, but it is
a *lot* harder.

> OK, read this (that depends on your context of course, it's possible
> you would get it even without I<-Mstrict> or I<-Mwarnings>):
>
> 	perl -Mutf8 -le '
> 	print "vvv";
> 	@OEM = qw/ 1 2 3 /;
> 	print "@ОЕМ";
> 	print "^^^";'
> 	vvv
>
> 	^^^

This is deliberately obfuscated. 

> (I'm still unclear)

Yes.

	hp


------------------------------

Date: Tue, 28 Apr 2009 01:05:20 -0700
From: sln@netherlands.com
Subject: Re: How to find duplicate in certain fields of array??
Message-Id: <8uddv4lhqeh59dlmq5536kc2suo6nlm7u5@4ax.com>

On Mon, 27 Apr 2009 17:54:00 -0400, somebody <some@body.com> wrote:

>In the __DATA__ below, the first and last rows contain duplicate names,
>i.e., WILLIAM in the first row, and HARRIS in the last row.  I need to
>check each of the 4 name fields (first, middle, last, and suffix) for
>duplicate names.  So, I've pushed the 4 name fields into a separate array
>named @array.  How do I check for duplicates within the 4 named fields?
>This is a bit more involved than finding duplicate array elements, since
>sub-strings are involved.
>
>-Thanks
>
>
>while ( <DATA> ) {
>
>  @row = split /\|/;
>
>  #Place all 4 name fields in array(first, middle, last, suffix).
>  for ($i=3; $i <= 6; $i++) {
>    push (@array, "$row[$i]");
>  }
>
>
>__DATA__
>CA|6299| |WILLIAM| |S|SMITH WILLIAM|
>DE|6209| |MANNY|J|MORALES|JR.|
>WA|1838| |KATHLEEN| |MCDONALD| |
>OH|3968| |JR|SCOTT HARRIS|HARRIS|JR| 
           ^^                     ^^
The only duplicates I see are 'JR' in the last record.
If your data is this screwed up you need to eyeball the
data and forget about a computer program.

Btw, your half assed attempt at the obligatory sample code
won't even compile. WTF do *you* think you should do huh?

-sln


------------------------------

Date: Tue, 28 Apr 2009 11:56:19 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Is there a better way to convert foreign characters?
Message-Id: <slrngvdkm3.hk.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-23 19:38, Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
> Tim McDaniel wrote:
>> In article <751lqvF1647sbU1@mid.individual.net>, Gunnar Hjalmarsson
>> <noreply@gunnar.cc> wrote:
>>> ( $word = lc $value ) =~ tr/àâÀéèëêÉÊçÇîïôÔùû/aaaeeeeeecciioouu/;
>> 
>> I don't combine s///, tr///, or chomp with assignments -- personal 
>> idiom and I'm not familiar with the Perl effects.
>
> Finding out the effects is trivial, isn't it?
>
>> The above assigns the lowercase translation of $value to $word, and 
>> then does a tr/// on $word, right?
>
> Yes. So you do know about the effects, after all. ;-)
>
>> Then there should be no need for the capitalized characters in the 
>> tr///, because there shouldn't be any to match.
>
> That's true only if a suitable locale is enabled.

Or if the $value is a character string.

> If a programmer wants to do that kind of transliteration, there is a
> great chance that s/he doesn't care about any kind of i18n or l10n.

The simple fact that he does specifically operate on accented characters
shows that he *does* care.

If $value is a byte string and no locale is in effect, lc on a non-ASCII
string is poorly defined. If the string is in a multi-byte encoding lc
might convert a byte which happens to be part of a character, which is
almost certainly wrong. Also, tr almost certainly doesn't work as
intended.

In a single-byte encoding which is a superset of ASCII (e.g. ISO-8859-X)
the code works, because lc is a noop on all accented characters. But I
still think this is unclean. You should convert to ASCII first and then
case-fold.

(of course I really think you should use character strings if you do
operations on characters, and not muck around with byte strings)

	hp


------------------------------

Date: Tue, 28 Apr 2009 09:56:30 +0100
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: Re: Perl is too slow - A statement
Message-Id: <2uOdnbhLC8wjWWvUnZ2dnUVZ8hBi4p2d@posted.plusnet>

Uri Guttman wrote:
>>>>>> "n" == neilsolent  <n@solenttechnology.co.uk> writes:
> 
>   >> > char *str = "This is a long sentence";
>   >> > printf ("%s", &str[10]);
>   >> > ----------------
>   >> > my $str = "This is a long sentence";
>   >>     print substr($str, 10);
> 
>   n> Running these "equivalent" bits of Perl an C in a tight loop (1000000
>   n> iterations) - shows on my machine:
> 
>   n> approx 0.3s run time for the C
>   n> approx 0.9s run time for Perl
> 
>   n> I think this is pretty good considering Perl is interpreted and (I
>   n> suspect) the example is deliberately picked to find something C is
>   n> faster at!
> 
> it also shows you have no clue about what is important these
> days. development time is way more expensive than running time. you can
> always get a faster computer but you rarely can speed up a development
> schedule.

Agreed; further, algorithm developement gives
greater speed up than "cycle counting".

Jon Bentley, in a rather old book, shows a quicksort
in Basic on a TRS-80 outrunning a fully optimised
bubble sort on a Cray-1 (*)

    BugBear

(*) I did say the book was old!


------------------------------

Date: Tue, 28 Apr 2009 07:13:21 GMT
From: tadmc@seesig.invalid
Subject: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.9 $)
Message-Id: <l0yJl.9100$im1.7890@nlpi061.nbdc.sbc.com>

Outline
   Before posting to comp.lang.perl.misc
      Must
       - Check the Perl Frequently Asked Questions (FAQ)
       - Check the other standard Perl docs (*.pod)
      Really Really Should
       - Lurk for a while before posting
       - Search a Usenet archive
      If You Like
       - Check Other Resources
   Posting to comp.lang.perl.misc
      Is there a better place to ask your question?
       - Question should be about Perl, not about the application area
      How to participate (post) in the clpmisc community
       - Carefully choose the contents of your Subject header
       - Use an effective followup style
       - Speak Perl rather than English, when possible
       - Ask perl to help you
       - Do not re-type Perl code
       - Provide enough information
       - Do not provide too much information
       - Do not post binaries, HTML, or MIME
      Social faux pas to avoid
       - Asking a Frequently Asked Question
       - Asking a question easily answered by a cursory doc search
       - Asking for emailed answers
       - Beware of saying "doesn't work"
       - Sending a "stealth" Cc copy
      Be extra cautious when you get upset
       - Count to ten before composing a followup when you are upset
       - Count to ten after composing and before posting when you are upset
-----------------------------------------------------------------

Posting Guidelines for comp.lang.perl.misc ($Revision: 1.9 $)
    This newsgroup, commonly called clpmisc, is a technical newsgroup
    intended to be used for discussion of Perl related issues (except job
    postings), whether it be comments or questions.

    As you would expect, clpmisc discussions are usually very technical in
    nature and there are conventions for conduct in technical newsgroups
    going somewhat beyond those in non-technical newsgroups.

    The article at:

        http://www.catb.org/~esr/faqs/smart-questions.html

    describes how to get answers from technical people in general.

    This article describes things that you should, and should not, do to
    increase your chances of getting an answer to your Perl question. It is
    available in POD, HTML and plain text formats at:

     http://www.rehabitation.com/clpmisc.shtml

    For more information about netiquette in general, see the "Netiquette
    Guidelines" at:

     http://andrew2.andrew.cmu.edu/rfc/rfc1855.html

    A note to newsgroup "regulars":

       Do not use these guidelines as a "license to flame" or other
       meanness. It is possible that a poster is unaware of things
       discussed here.  Give them the benefit of the doubt, and just
       help them learn how to post, rather than assume that they do 
       know and are being the "bad kind" of Lazy.

    A note about technical terms used here:

       In this document, we use words like "must" and "should" as
       they're used in technical conversation (such as you will
       encounter in this newsgroup). When we say that you *must* do
       something, we mean that if you don't do that something, then
       it's unlikely that you will benefit much from this group.
       We're not bossing you around; we're making the point without
       lots of words.

    Do *NOT* send email to the maintainer of these guidelines. It will be
    discarded unread. The guidelines belong to the newsgroup so all
    discussion should appear in the newsgroup. I am just the secretary that
    writes down the consensus of the group.

Before posting to comp.lang.perl.misc
  Must
    This section describes things that you *must* do before posting to
    clpmisc, in order to maximize your chances of getting meaningful replies
    to your inquiry and to avoid getting flamed for being lazy and trying to
    have others do your work.

    The perl distribution includes documentation that is copied to your hard
    drive when you install perl. Also installed is a program for looking
    things up in that (and other) documentation named 'perldoc'.

    You should either find out where the docs got installed on your system,
    or use perldoc to find them for you. Type "perldoc perldoc" to learn how
    to use perldoc itself. Type "perldoc perl" to start reading Perl's
    standard documentation.

    Check the Perl Frequently Asked Questions (FAQ)
        Checking the FAQ before posting is required in Big 8 newsgroups in
        general, there is nothing clpmisc-specific about this requirement.
        You are expected to do this in nearly all newsgroups.

        You can use the "-q" switch with perldoc to do a word search of the
        questions in the Perl FAQs.

    Check the other standard Perl docs (*.pod)
        The perl distribution comes with much more documentation than is
        available for most other newsgroups, so in clpmisc you should also
        see if you can find an answer in the other (non-FAQ) standard docs
        before posting.

    It is *not* required, or even expected, that you actually *read* all of
    Perl's standard docs, only that you spend a few minutes searching them
    before posting.

    Try doing a word-search in the standard docs for some words/phrases
    taken from your problem statement or from your very carefully worded
    "Subject:" header.

  Really Really Should
    This section describes things that you *really should* do before posting
    to clpmisc.

    Lurk for a while before posting
        This is very important and expected in all newsgroups. Lurking means
        to monitor a newsgroup for a period to become familiar with local
        customs. Each newsgroup has specific customs and rituals. Knowing
        these before you participate will help avoid embarrassing social
        situations. Consider yourself to be a foreigner at first!

    Search a Usenet archive
        There are tens of thousands of Perl programmers. It is very likely
        that your question has already been asked (and answered). See if you
        can find where it has already been answered.

        One such searchable archive is:

         http://groups.google.com/advanced_search

  If You Like
    This section describes things that you *can* do before posting to
    clpmisc.

    Check Other Resources
        You may want to check in books or on web sites to see if you can
        find the answer to your question.

        But you need to consider the source of such information: there are a
        lot of very poor Perl books and web sites, and several good ones
        too, of course.

Posting to comp.lang.perl.misc
    There can be 200 messages in clpmisc in a single day. Nobody is going to
    read every article. They must decide somehow which articles they are
    going to read, and which they will skip.

    Your post is in competition with 199 other posts. You need to "win"
    before a person who can help you will even read your question.

    These sections describe how you can help keep your article from being
    one of the "skipped" ones.

  Is there a better place to ask your question?
    Question should be about Perl, not about the application area
        It can be difficult to separate out where your problem really is,
        but you should make a conscious effort to post to the most
        applicable newsgroup. That is, after all, where you are the most
        likely to find the people who know how to answer your question.

        Being able to "partition" a problem is an essential skill for
        effectively troubleshooting programming problems. If you don't get
        that right, you end up looking for answers in the wrong places.

        It should be understood that you may not know that the root of your
        problem is not Perl-related (the two most frequent ones are CGI and
        Operating System related), so off-topic postings will happen from
        time to time. Be gracious when someone helps you find a better place
        to ask your question by pointing you to a more applicable newsgroup.

  How to participate (post) in the clpmisc community
    Carefully choose the contents of your Subject header
        You have 40 precious characters of Subject to win out and be one of
        the posts that gets read. Don't waste them. Take care while
        composing them, they are the key that opens the door to getting an
        answer.

        Spend them indicating what aspect of Perl others will find if they
        should decide to read your article.

        Do not spend them indicating "experience level" (guru, newbie...).

        Do not spend them pleading (please read, urgent, help!...).

        Do not spend them on non-Subjects (Perl question, one-word
        Subject...)

        For more information on choosing a Subject see "Choosing Good
        Subject Lines":

         http://www.cpan.org/authors/id/D/DM/DMR/subjects.post

        Part of the beauty of newsgroup dynamics, is that you can contribute
        to the community with your very first post! If your choice of
        Subject leads a fellow Perler to find the thread you are starting,
        then even asking a question helps us all.

    Use an effective followup style
        When composing a followup, quote only enough text to establish the
        context for the comments that you will add. Always indicate who
        wrote the quoted material. Never quote an entire article. Never
        quote a .signature (unless that is what you are commenting on).

        Intersperse your comments *following* each section of quoted text to
        which they relate. Unappreciated followup styles are referred to as
        "top-posting", "Jeopardy" (because the answer comes before the
        question), or "TOFU" (Text Over, Fullquote Under).

        Reversing the chronology of the dialog makes it much harder to
        understand (some folks won't even read it if written in that style).
        For more information on quoting style, see:

         http://web.presby.edu/~nnqadmin/nnq/nquote.html

    Speak Perl rather than English, when possible
        Perl is much more precise than natural language. Saying it in Perl
        instead will avoid misunderstanding your question or problem.

        Do not say: I have variable with "foo\tbar" in it.

        Instead say: I have $var = "foo\tbar", or I have $var = 'foo\tbar',
        or I have $var = <DATA> (and show the data line).

    Ask perl to help you
        You can ask perl itself to help you find common programming mistakes
        by doing two things: enable warnings (perldoc warnings) and enable
        "strict"ures (perldoc strict).

        You should not bother the hundreds/thousands of readers of the
        newsgroup without first seeing if a machine can help you find your
        problem. It is demeaning to be asked to do the work of a machine. It
        will annoy the readers of your article.

        You can look up any of the messages that perl might issue to find
        out what the message means and how to resolve the potential mistake
        (perldoc perldiag). If you would like perl to look them up for you,
        you can put "use diagnostics;" near the top of your program.

    Do not re-type Perl code
        Use copy/paste or your editor's "import" function rather than
        attempting to type in your code. If you make a typo you will get
        followups about your typos instead of about the question you are
        trying to get answered.

    Provide enough information
        If you do the things in this item, you will have an Extremely Good
        chance of getting people to try and help you with your problem!
        These features are a really big bonus toward your question winning
        out over all of the other posts that you are competing with.

        First make a short (less than 20-30 lines) and *complete* program
        that illustrates the problem you are having. People should be able
        to run your program by copy/pasting the code from your article. (You
        will find that doing this step very often reveals your problem
        directly. Leading to an answer much more quickly and reliably than
        posting to Usenet.)

        Describe *precisely* the input to your program. Also provide example
        input data for your program. If you need to show file input, use the
        __DATA__ token (perldata.pod) to provide the file contents inside of
        your Perl program.

        Show the output (including the verbatim text of any messages) of
        your program.

        Describe how you want the output to be different from what you are
        getting.

        If you have no idea at all of how to code up your situation, be sure
        to at least describe the 2 things that you *do* know: input and
        desired output.

    Do not provide too much information
        Do not just post your entire program for debugging. Most especially
        do not post someone *else's* entire program.

    Do not post binaries, HTML, or MIME
        clpmisc is a text only newsgroup. If you have images or binaries
        that explain your question, put them in a publically accessible
        place (like a Web server) and provide a pointer to that location. If
        you include code, cut and paste it directly in the message body.
        Don't attach anything to the message. Don't post vcards or HTML.
        Many people (and even some Usenet servers) will automatically filter
        out such messages. Many people will not be able to easily read your
        post. Plain text is something everyone can read.

  Social faux pas to avoid
    The first two below are symptoms of lots of FAQ asking here in clpmisc.
    It happens so often that folks will assume that it is happening yet
    again. If you have looked but not found, or found but didn't understand
    the docs, say so in your article.

    Asking a Frequently Asked Question
        It should be understood that you may have missed the applicable FAQ
        when you checked, which is not a big deal. But if the Frequently
        Asked Question is worded similar to your question, folks will assume
        that you did not look at all. Don't become indignant at pointers to
        the FAQ, particularly if it solves your problem.

    Asking a question easily answered by a cursory doc search
        If folks think you have not even tried the obvious step of reading
        the docs applicable to your problem, they are likely to become
        annoyed.

        If you are flamed for not checking when you *did* check, then just
        shrug it off (and take the answer that you got).

    Asking for emailed answers
        Emailed answers benefit one person. Posted answers benefit the
        entire community. If folks can take the time to answer your
        question, then you can take the time to go get the answer in the
        same place where you asked the question.

        It is OK to ask for a *copy* of the answer to be emailed, but many
        will ignore such requests anyway. If you munge your address, you
        should never expect (or ask) to get email in response to a Usenet
        post.

        Ask the question here, get the answer here (maybe).

    Beware of saying "doesn't work"
        This is a "red flag" phrase. If you find yourself writing that,
        pause and see if you can't describe what is not working without
        saying "doesn't work". That is, describe how it is not what you
        want.

    Sending a "stealth" Cc copy
        A "stealth Cc" is when you both email and post a reply without
        indicating *in the body* that you are doing so.

  Be extra cautious when you get upset
    Count to ten before composing a followup when you are upset
        This is recommended in all Usenet newsgroups. Here in clpmisc, most
        flaming sub-threads are not about any feature of Perl at all! They
        are most often for what was seen as a breach of netiquette. If you
        have lurked for a bit, then you will know what is expected and won't
        make such posts in the first place.

        But if you get upset, wait a while before writing your followup. I
        recommend waiting at least 30 minutes.

    Count to ten after composing and before posting when you are upset
        After you have written your followup, wait *another* 30 minutes
        before committing yourself by posting it. You cannot take it back
        once it has been said.

AUTHOR
    Tad McClellan and many others on the comp.lang.perl.misc newsgroup.

-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Tue, 28 Apr 2009 11:12:34 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Posting to perl.beginners via google groups
Message-Id: <slrngvdi43.hk.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-26 14:44, Eric Pozharski <whynot@pozharski.name> wrote:
> On 2009-04-25, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> On 2009-04-24 00:00, Eric Pozharski <whynot@pozharski.name> wrote:
>>> On 2009-04-22, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>>
>> [postings to a moderated group via google groups vanish]
>>
>>>> The moderator might not even get messages posted through google groups.
>>>> For a moderated group, the newsserver where you post (Google Groups in
>>>> this case) needs to be set up to send all postings by email to the
>>>> moderator's address. The address is of course different for each
>>>> moderated group, although many hierarchies have a common pattern. If
>>>> Google Groups wasn't careful, they may send the submissions to
>>>> perl-beginners@moderators.isc.org or something like that ...
>>>
>>> Google Groups isn't newsserver
>>
>> Of course it is. You can read and write usenet messages there and it
>> exchanges these messages with other news servers. So it is a
>> newsserver.
>
> There's no NNTP capable software on this host --

You are confusing "Usenet" and "NNTP". NNTP is only one of many
protocols to exchange Usenet messages. By far the most common these
days, but not the only one. So newsservers without NNTP capable software
are entirely possible. (But google almost certainly does have NNTP
capable software on the cluster of hosts known as "Google Groups" - they
exchange News with other news servers and it would be crazy to use
anything other than NNTP for that. They just don't offer an NNTP
interface for NUAs - you have to use their web interface).

> According to RFC3977, newsserver MUST send a greeting (section 3.1.).
> Greetings are described in section 5.1.1.
>
> 	{5732:7} [0:0]$ telnet groups.google.com www

How do you get the crazy idea that RFC3977 has anything to say about
what should happen on the HTTP port? Ok, don't answer this question, it
was purely rhetorical.

	hp



------------------------------

Date: Tue, 28 Apr 2009 11:40:23 +0300
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Problem in parsing from a pipe
Message-Id: <slrngvdg8p.eco.whynot@orphan.zombinet>

On 2009-04-27, January Weiner <january.weiner@gmail.com> wrote:
> On 2009-04-24, Eric Pozharski <whynot@pozharski.name> wrote:
>> On 2009-04-22, January Weiner <january.weiner@gmail.com> wrote:
>>> On 2009-04-21, Eric Pozharski <whynot@pozharski.name> wrote:
>>>> Think 3-args B<open>, it's safer.
>>>
>>> Why?
>>
>> 	perl -wle '$cmd = q|rm -rf /; true|; open $fh, "$cmd|" or die $!'
>> 	Name "main::fh" used only once: possible typo at -e line 1.
>> 	rm: cannot remove root directory `/'
>
> Maybe I'm not too bright, but I don't get it :( Would you mind being a
> little more verbose? I mean, you can do the same with 3 arg open:
>
>   perl -wle '$cmd = q|rm -rf /; true|; open( $fh, "-|", "$cmd" ) or die $!'
>
> Where is the difference?  I understand that using open( $fh, $file )
> instead of open( $fh, "<$file" ) can in some cases lead to problems (if
> $file becomes ">something"), but in this particular case we are reading

Think C<"|something">.  That would result in "Can't open biderectional
pipe"...  warning.  While opening C<"something|"> pipe for writing.
What would be run via shell with output of F<something> just going
through I<STDOUT> uncatched.

> from a pipe anyways, and if the $cmd has been manipulated (and we were
> careless and haven't checked it) than the tree args version will not be of
> any help.

Forget what I've said.  3-arg B<open> is no-way safer, in this regard.
Splitting on spaces wouldn't help in all cases (while, I suppose, in
most).  F<perlipc> suggests going B<fork>/B<exec> to avoid shell
invocation.  What I do.

So, let me rephrase: 3-arg B<open> avoids misinterpretting redirecting
metachars as a mode specs, while stays with shell for pipes.  Then --
3-arg B<open> used consistently (or constantly) is just a matter of
habit.

I've trusted Perl that much.  What a sad day.

> And anyway, I have always thought that preventing malicious input from the
> users should be happening on an altogether different level, starting with
> at least using taint mode -- am I wrong?

No and yes.  (maybe I'm wrong, again) A tainted string just indicates
that it wasn't preprocessed.  While amount of preprocessing is left at
coders option.  I haven't fought taintedness a lot.  Quite simple (but
non-trivial, in my case) regexp removes taintedness.  Does it make a
string safe?  Who knows, it depends on task.

>> Define "quite large".  As of second, I think, it's possible to go
>> through pattern one match per time (but not at 3AM).
>
> $ du -hs human_est.out
> 1.2G  human_est.out
> $ du -hs nr
> 3.5G  nr

Define "quite large".  (I think, sizes are in bytes).

	perl -wle '
	open $fh, "<", "/proc/$$/stat";
	print +(split / /, <$fh>)[22]'
	5775360

	time perl -wle '
	$x = " " x (512 * 1E6);
	open $fh, "<", "/proc/$$/stat";
	print +(split / /, <$fh>)[22]'
	Name "main::x" used only once: possible typo at -e line 2.
	1029783552

	real    1m51.687s
	user    0m3.588s
	sys     0m7.068s

	time perl -wle '
	$x = " " x (256 * 1E6);
	open $fh, "<", "/proc/$$/stat";
	print +(split / /, <$fh>)[22]'
	Name "main::x" used only once: possible typo at -e line 2.
	517783552

	real    0m11.334s
	user    0m1.788s
	sys     0m1.668s

I have only 512Mb real memory.  However, looking at B<time> output I
should agree, that loading even 1.2G (virtual memory provided) would be
quite exciting.  Remember, that's string but array.

>> And a piece of advice.  If you're going to stay here, anytime think of
>> C<use File::Slurp;>, and find a good reason against.  Because sooner or
>> later, you'll be adviced of it anyway.
>
> Maybe, but this is not going to happen. I want to stop reading a huge file
> after I have collected all the information that I need from it - why should
> I slurp 3.5 gb if I have what I need after reading 10k?

I wasn't about quitting the file.  I was about quitting c.l.p.m.


-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: 28 Apr 2009 10:04:36 GMT
From: greymausg@mail.com
Subject: Re: Tk::Listbox
Message-Id: <75o2lkF18j0rcU1@mid.individual.net>

On 2009-04-27, Lamprecht <ch.l.ngre-nospam@online.de> wrote:
> greymausg@mail.com wrote:
>> I seem  to be having areal problem with Tk::Listbox, wrote the script
>> 
>> http://sial.org/pbot/36228. 
>> 
>> Which fails with error:
>> 
>> no event type or button # or keysym at
>> /usr/lib/perl5/site_perl/5.10.0/i486-linux-thread-multi/Tk/Widget.pm
>
> This seems to be fixed in
> http://search.cpan.org/~srezic/Tk-804.028_501/
> See also:
> http://rt.cpan.org/Public/Bug/Display.html?id=38746
>
> Regards, Christoph

Seems OK now. Thanks


-- 
greymaus
 .
 .
 ...


------------------------------

Date: Tue, 28 Apr 2009 10:07:59 +0200
From: Frank Seitz <devnull4711@web.de>
Subject: Re: unexplained warning message in m{...} regexp
Message-Id: <75nrqrF17ntduU4@mid.individual.net>

Uri Guttman wrote:
> 
>   >> the latter doesn't have escaped braces since the string parser
>   >> removed them so the regex parser sees a quantifier. in the former
>   >> the \Q is done in the regex parser so you get a literal {1}.
>
>   FS> I know all that. I wanted to show how to put a literal "{" or "}"
>   FS> in a {}-delimited regex. "\{" doesn't work but "\Q{\E" does.
> 
> then consider my post an explanation to others why \Q\E works and \'s
> don't.

Your explanation sounds plausible. But why doesn't it help
to double the backslashes then:

say "a{1}" =~ m{\Aa\\{1\\}\z}? 1: 0;
__END__
0

Frank
-- 
Dipl.-Inform. Frank Seitz; http://www.fseitz.de/
Anwendungen für Ihr Internet und Intranet
Tel: 04103/180301; Fax: -02; Industriestr. 31, 22880 Wedel


------------------------------

Date: Tue, 28 Apr 2009 01:18:23 -0700
From: sln@netherlands.com
Subject: Re: unexplained warning message in m{...} regexp
Message-Id: <enedv4l4a07bkmj19fn4h6147mqrgu675q@4ax.com>

On Tue, 28 Apr 2009 10:07:59 +0200, Frank Seitz <devnull4711@web.de> wrote:

>Uri Guttman wrote:
>> 
>Your explanation sounds plausible. But why doesn't it help
>to double the backslashes then:
>
>say "a{1}" =~ m{\Aa\\{1\\}\z}? 1: 0;
>__END__
>0
>
>Frank

Why don't you print it and find out.
The answer is \\ is the escaped escape, not the escaped delimeters {}.

my $rx = qr {\Aa\\{1\\}\z};
print $rx,"\n";

-sln



------------------------------

Date: Tue, 28 Apr 2009 01:35:10 -0700
From: sln@netherlands.com
Subject: Re: unexplained warning message in m{...} regexp
Message-Id: <rdfdv4th97drbon311pj5aha56vefp3err@4ax.com>

On Tue, 28 Apr 2009 01:18:23 -0700, sln@netherlands.com wrote:

>On Tue, 28 Apr 2009 10:07:59 +0200, Frank Seitz <devnull4711@web.de> wrote:
>> why doesn't it help
>>to double the backslashes then:
>>
>>say "a{1}" =~ m{\Aa\\{1\\}\z}? 1: 0;
>>__END__
>>0
>>
>>Frank
>

The *parser* un-escapes all delimeters before the string goes to the
regex engine. Below is 1..5 escapes on the {. Check the output, its all
even number of escapes after it is parsed. This makes it *impossible*
to escape the delimeter in normal fashion.

my $rx = qr {\Aa\{1\}\z};
print $rx,"\n";
$rx = qr {\Aa\\{1\\}\z};
print $rx,"\n";
$rx = qr {\Aa\\\{1\\\}\z};
print $rx,"\n";
$rx = qr {\Aa\\\\{1\\\\}\z};
print $rx,"\n";
$rx = qr {\Aa\\\\\{1\\\\\}\z};
print $rx,"\n";

__END__

(?-xism:\Aa{1}\z)
(?-xism:\Aa\\{1\\}\z)
(?-xism:\Aa\\{1\\}\z)
(?-xism:\Aa\\\\{1\\\\}\z)
(?-xism:\Aa\\\\{1\\\\}\z)

-sln



------------------------------

Date: Tue, 28 Apr 2009 10:00:57 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: unexplained warning message in m{...} regexp
Message-Id: <97tic6-4t2.ln1@osiris.mauzo.dyndns.org>


Quoth Frank Seitz <devnull4711@web.de>:
> Uri Guttman wrote:
> > 
> >   >> the latter doesn't have escaped braces since the string parser
> >   >> removed them so the regex parser sees a quantifier. in the former
> >   >> the \Q is done in the regex parser so you get a literal {1}.
> >
> >   FS> I know all that. I wanted to show how to put a literal "{" or "}"
> >   FS> in a {}-delimited regex. "\{" doesn't work but "\Q{\E" does.
> > 
> > then consider my post an explanation to others why \Q\E works and \'s
> > don't.
> 
> Your explanation sounds plausible. But why doesn't it help
> to double the backslashes then:
> 
> say "a{1}" =~ m{\Aa\\{1\\}\z}? 1: 0;
> __END__
> 0

Because things aren't quite that simple :). The first (interpolation and
unescaping) pass over a regex isn't quite the same as for a qqish
string. It leaves regex-specific escapes like \A and \z alone (in a
qqish string, they'd become 'A' and 'z' with a warning), it leaves
double-backwhacks alone, and it has some special cases regarding
interpolation (for example, qr/$)/ will *not* interpolate the $) special
variable, as the parser presumes the '$' was a regex metachar rather
than an interpolated variable).

If you want to know what actually happens, you have to read toke.c. Then
you'll spend the next week trying to forget how horrifying it is :).

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2372
***************************************


home help back first fref pref prev next nref lref last post