[28461] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 9825 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Oct 9 18:10:15 2006

Date: Mon, 9 Oct 2006 15:10:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 9 Oct 2006     Volume: 10 Number: 9825

Today's topics:
    Re: Posting Guidelines for comp.lang.perl.misc ($Revisi <mgarrish@gmail.com>
    Re: Posting Guidelines for comp.lang.perl.misc ($Revisi <nospam-abuse@ilyaz.org>
    Re: QUERY_STRING parsing and '$value =~ tr/+/ /;' treat <hjp-usenet2@hjp.at>
        second substitution to work only on a found pattern <is@invalid.111>
    Re: second substitution to work only on a found pattern <someone@example.com>
    Re: Syntax for getting web page links <benmorrow@tiscali.co.uk>
    Re: Tricky regex (exclude some multiple characters) <solitude.standing@silent-force.info>
    Re: Tricky regex (exclude some multiple characters) <solitude.standing@silent-force.info>
    Re: Unit-testing and mock objects <eugene.morozov@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 9 Oct 2006 11:20:55 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.6 $)
Message-Id: <1160418055.229027.62770@h48g2000cwc.googlegroups.com>


Ilya Zakharevich wrote:

> [A complimentary Cc of this posting was sent to
> <tadmc@augustmail.com>], who wrote in article <45260435$0$47258$ae4e5890@news.nationwide.net>:

>
> In my optinion, the rudeness level on this newsgroup increased very
> much during the last 10 years; and most of this increase is, IMO, due
> to (some) KNOWLEDGEABLE REGULARS being INTENTIONALLY rude.  I do not
> know what strategy they have in mind when doing this; neither do I
> know whether their strategy works.
>

Yes, but it's an old rant. I'm sure I've fought with just about
everyone who posts here at one time or another either because I think
they're jerks for the way they post or they think I am. It's not a
friendly playground, but I rarely see questions go completely
unanswered.

What I do see is a lot of posters taking needless pot-shots at
beginners because they're on their own ego trip and beginners getting
all uppity when they feel their incompetence has been brought too much
to light. You haven't earned your stripes in clpm until you can sneer
down your nose when you post, it often seems. I personally try and
reserve that treatment for well-established trolls, but there are times
when your patience gets pushed too far even with the most well-meaning
behginner.

Alt.perl was a much happier place for beginners (or so I found some 6
or so years ago when I first picked up Perl), if not quite as
technically sound at times (although there were regulars there who
would correct the more outrageous posts), but somewhere in the last 6
years traffic there died off almost completely. CLPM is not the place
to get your feet wet in usenet, but being the only real source for perl
help you wade in at your peril.

For all that, all I wanted to say is that changing the posting
guidelines isn't going to help. If you really want to change the
culture you need to start singling out any instances of vulgarity you
come across until the tone changes (which means being a knowledgable
regular poster, otherwise you'll just get shouted down).

Oh, and if I'm on your list... bite me!  : )

Matt



------------------------------

Date: Mon, 9 Oct 2006 22:03:39 +0000 (UTC)
From:  Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.6 $)
Message-Id: <egegvr$2jvd$1@agate.berkeley.edu>

[A complimentary Cc of this posting was sent to
Tad McClellan 
<tadmc@augustmail.com>], who wrote in article <slrneikevu.i70.tadmc@magna.augustmail.com>:
> >   One should not post rude replies even if you consider the message
> >   you reply to as violating these guidelines, as rude, as inbalanced,
> >   or as insane.  If one can't post a polite informative up-to-a-point
> >   reply, one should not post at all...

> I don't see that there is much diffence in that from what we
> already have:

>    Do not use these guidelines as a "license to flame" or other
>    meanness. It is possible that a poster is unaware of things
>    discussed here.  Give them the benefit of the doubt, and just
>    help them learn how to post, rather than assume that they do
>    know and are being the "bad kind" of Lazy.

Well, I do :-(.  Let me repeat the "bird fly picture" as I see it: the
current guidelines are written as if the newbies are the most guilty
parties.  IMO, the other parties are much more guilty.

Yours,
Ilya


------------------------------

Date: Mon, 9 Oct 2006 23:37:06 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: QUERY_STRING parsing and '$value =~ tr/+/ /;' treatment
Message-Id: <slrneilg82.l3i.hjp-usenet2@yoyo.hjp.at>

On 2006-10-08 17:43, Yohan N Leder <ynl@nsparks.net> wrote:
> Parsing $ENV{'QUERY_STRING'} w/o CGI.pm, it's usual to see code treating 
> extracted values with a "$value =~ tr/+/ /;" for 'unwebification' of the 
> '+' signs.
>
> But, this kind of treatment will corrupt any QUERY_STRING content which 
> is not simple text (for example, I've done test on base64 data).
>
> So, how does CGI.pm handle this ? How does it differenciates 
> QUERY_STRING content's values which has to be 'not treated' (e.g. like a 
> base64 encoded image) from the one which has to be 'treated' (e.g. like 
> a simple text) ?

It doesn't. Why should it? It is perfectly possiblt to parse
$ENV{'QUERY_STRING'} without modifying it. Even if you want to use
"destructive" operators like tr///, you can always apply them to a copy:

$value= $ENV{'QUERY_STRING'};
$value =~ tr/+/ /;

BTW, I just noticed that the manual doesn't describe query_string()
correctly:

       You can also retrieve the unprocessed query string with
       query_string():

While CGI does leave $ENV{'QUERY_STRING'} unprocessed, query_string()
returns a changed version in the case of old isindex-style queries:

"aa+bb" is converted into "keywords=aa;keywords=bb".

	hp


-- 
   _  | Peter J. Holzer    | > Wieso sollte man etwas erfinden was nicht
|_|_) | Sysadmin WSR       | > ist?
| |   | hjp@hjp.at         | Was sonst wäre der Sinn des Erfindens?
__/   | http://www.hjp.at/ |	-- P. Einstein u. V. Gringmuth in desd


------------------------------

Date: Mon, 09 Oct 2006 19:02:37 +0200
From: "I.M. Postor" <is@invalid.111>
Subject: second substitution to work only on a found pattern
Message-Id: <dubqv3x7o1.ln2@abacus.mid.example.com>

Hello, 

I have some xml which is formatted bij a xsl processor:


<c02 level="group">
  <did>
  <unitid>13-16</unitid>
  <unittitle>several weeklies</unittitle>
  <unitdate normal="1928/1931">1928-1931</unitdate>
  <unitid>13-16</unitid>
  </did>
  <c03 level="file">
  <unitid>13</unitid>
  <unittitle>
  <unitdate>1928</unitdate>
  </unittitle>
  </c03>
  <c03 level="file">
  <unitid>14</unitid>
  <unittitle>
  <unitdate>1929</unitdate>
  </unittitle>
  </c03>
  <c03 level="file">
  <unitid>15</unitid>
  <unittitle>
  <unitdate>1930</unitdate>
  </unittitle>
  </c03>
</c02>



but whenever there is a <c0X level="file"> element, for working
purposese i'd rather have flattened components:


<c02 level="group">
  <did>
  <unitid>13-16</unitid>
  <unittitle>several weeklies</unittitle>
  <unitdate normal="1928/1931">1928-1931</unitdate>
  <unitid>13-16</unitid>
  </did>
<c03 level="file"><unitid>13</unitid><unittitle><unitdate>1928</unitdate></unittitle></c03>
<c03 level="file"><unitid>14</unitid><unittitle><unitdate>1929</unitdate></unittitle></c03>
<c03 level="file"><unitid>15</unitid><unittitle><unitdate>1930</unitdate></unittitle></c03>
</c02>


<XXX level="file"> could be from <c01 level="file"> to <c12 level="file">, therefore:


while ($slurped_text =~ /(<(c0[1-9]|c1[012]) level="file">.*?<\/\2>)/sg) 
  { print "hello"; } #OK

while ($slurped_text =~ /(<(c0[1-9]|c1[012]) level="file">.*?<\/\2>)/sg) 
  {$1 =~ s/\n *//g; }
ERROR: Modification of a read-only value attempted 


How can I get the secondary substitution to work only on a found regex?
Or should I try another approach?


Cheers


------------------------------

Date: Mon, 09 Oct 2006 18:36:20 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: second substitution to work only on a found pattern
Message-Id: <EEwWg.16393$N4.15634@clgrps12>

I.M. Postor wrote:
> 
> I have some xml which is formatted bij a xsl processor:
> 
> 
> <c02 level="group">
>   <did>
>   <unitid>13-16</unitid>
>   <unittitle>several weeklies</unittitle>
>   <unitdate normal="1928/1931">1928-1931</unitdate>
>   <unitid>13-16</unitid>
>   </did>
>   <c03 level="file">
>   <unitid>13</unitid>
>   <unittitle>
>   <unitdate>1928</unitdate>
>   </unittitle>
>   </c03>
>   <c03 level="file">
>   <unitid>14</unitid>
>   <unittitle>
>   <unitdate>1929</unitdate>
>   </unittitle>
>   </c03>
>   <c03 level="file">
>   <unitid>15</unitid>
>   <unittitle>
>   <unitdate>1930</unitdate>
>   </unittitle>
>   </c03>
> </c02>
> 
> 
> 
> but whenever there is a <c0X level="file"> element, for working
> purposese i'd rather have flattened components:
> 
> 
> <c02 level="group">
>   <did>
>   <unitid>13-16</unitid>
>   <unittitle>several weeklies</unittitle>
>   <unitdate normal="1928/1931">1928-1931</unitdate>
>   <unitid>13-16</unitid>
>   </did>
> <c03 level="file"><unitid>13</unitid><unittitle><unitdate>1928</unitdate></unittitle></c03>
> <c03 level="file"><unitid>14</unitid><unittitle><unitdate>1929</unitdate></unittitle></c03>
> <c03 level="file"><unitid>15</unitid><unittitle><unitdate>1930</unitdate></unittitle></c03>
> </c02>
> 
> 
> <XXX level="file"> could be from <c01 level="file"> to <c12 level="file">, therefore:
> 
> 
> while ($slurped_text =~ /(<(c0[1-9]|c1[012]) level="file">.*?<\/\2>)/sg) 
>   { print "hello"; } #OK
> 
> while ($slurped_text =~ /(<(c0[1-9]|c1[012]) level="file">.*?<\/\2>)/sg) 
>   {$1 =~ s/\n *//g; }
> ERROR: Modification of a read-only value attempted 
> 
> 
> How can I get the secondary substitution to work only on a found regex?
> Or should I try another approach?

$ perl -e'
my $slurped_text = <<XML;

<c02 level="group">
  <did>
  <unitid>13-16</unitid>
  <unittitle>several weeklies</unittitle>
  <unitdate normal="1928/1931">1928-1931</unitdate>
  <unitid>13-16</unitid>
  </did>
  <c03 level="file">
  <unitid>13</unitid>
  <unittitle>
  <unitdate>1928</unitdate>
  </unittitle>
  </c03>
  <c03 level="file">
  <unitid>14</unitid>
  <unittitle>
  <unitdate>1929</unitdate>
  </unittitle>
  </c03>
  <c03 level="file">
  <unitid>15</unitid>
  <unittitle>
  <unitdate>1930</unitdate>
  </unittitle>
  </c03>
</c02>

XML


print $slurped_text;
$slurped_text =~ s{(<(c0[1-9]|c1[012]) level="file">.*?</\2>)}{ ( my $x = $1 )
=~ s!\n *!!g; $x }seg;
print $slurped_text;
'

<c02 level="group">
  <did>
  <unitid>13-16</unitid>
  <unittitle>several weeklies</unittitle>
  <unitdate normal="1928/1931">1928-1931</unitdate>
  <unitid>13-16</unitid>
  </did>
  <c03 level="file">
  <unitid>13</unitid>
  <unittitle>
  <unitdate>1928</unitdate>
  </unittitle>
  </c03>
  <c03 level="file">
  <unitid>14</unitid>
  <unittitle>
  <unitdate>1929</unitdate>
  </unittitle>
  </c03>
  <c03 level="file">
  <unitid>15</unitid>
  <unittitle>
  <unitdate>1930</unitdate>
  </unittitle>
  </c03>
</c02>


<c02 level="group">
  <did>
  <unitid>13-16</unitid>
  <unittitle>several weeklies</unittitle>
  <unitdate normal="1928/1931">1928-1931</unitdate>
  <unitid>13-16</unitid>
  </did>
  <c03
level="file"><unitid>13</unitid><unittitle><unitdate>1928</unitdate></unittitle></c03>
  <c03
level="file"><unitid>14</unitid><unittitle><unitdate>1929</unitdate></unittitle></c03>
  <c03
level="file"><unitid>15</unitid><unittitle><unitdate>1930</unitdate></unittitle></c03>
</c02>




John
-- 
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order.       -- Larry Wall


------------------------------

Date: Mon, 9 Oct 2006 04:15:45 +0100
From: Ben Morrow <benmorrow@tiscali.co.uk>
Subject: Re: Syntax for getting web page links
Message-Id: <1grov3-nc8.ln1@osiris.mauzo.dyndns.org>


Quoth dysgraphia <ldolan@bigpond.net.au>:
> Hi, I'm using win xp and have the ActivePerl download.
> 
> This is my first attempt at a perl script. It tries to go to the 
> chessbase site and find the links to chess tournaments in 2006.
> 
> What I hope to do is have my script collect the links on the page
> listed under Events 2006 and put this collection of links into
> an Excel workbook ("C:\ChessEvents.xls"), spreadsheet ("Year2006")
> 
> If I progress I will make script to follow some of the links and 
> retrieve the information about that chess tournament.
> 
> I plan to put a button on the spreadsheet to fire the perl script
> via some VBA.
> 
> So far I have only got this. Any help to push me along
> would be appreciated!
> 
> #!/usr/bin/perl -w

    use warnings;

is better than -w.

> use LWP::UserAgent;
> use HTTP::Cookies;
> use LWP;
> use HTTP::Request::Common qw(POST GET);

You would probably be better off using LWP::Simple for this, or perhaps
WWW::Mechanize.

> use strict;
> use DBI;
> use IO::Dir;
> use LWP::Debug qw(-);
> # the chessbase page with list of events is at
> # http://www.chessbase.com/events/index.asp and find under Events 2006
> my $url = 'http://www.chessbase.com/events/index.asp';
> my $ua = LWP::UserAgent->new();
> $ua->agent("Mozilla/8.0");

Why? AFAIK, no current browser uses this User-Agent string. What makes
you think you need one at all?

> $ua->cookie_jar(HTTP::Cookies->new);
> my $res = $ua->request(new HTTP::Request GET => $url);

This look OK, as far as it goes. What is your problem with the next
step?

Ben

-- 
               We do not stop playing because we grow old; 
                  we grow old because we stop playing.
                        benmorrow@tiscali.co.uk


------------------------------

Date: Mon, 09 Oct 2006 22:46:02 +0200
From: Amelia <solitude.standing@silent-force.info>
Subject: Re: Tricky regex (exclude some multiple characters)
Message-Id: <solitude.standing-069E1E.22455709102006@news.home.net.pl>

In article <1154033077.626906.77120@s13g2000cwa.googlegroups.com>,
 "Uncle_Fester" <tinthaut@hotmail.com> wrote:

> I want to test for "things that look more or less like real English
> words" from parsed hypertext.

    [...]

> How might I allow 'oo' and 'ee' and not 'ff' or '--' ?
> How might I exclude patterns like '_________' or '010101010101' ?
> 
> Any thoughts?

    That reminds me of comical science-fiction story by Isaac Asimov,
 Nine Billion Names of God. What the monks in the story demanded to
 compute is very similar to what you need - better be careful, read
 it first and think if you really want to know the complete answer ;-}.
 Tongue in cheek, but the story really can give you some clues as to how
 to solve your problem (boundary conditions very congruent). In case
 of emergency of supernatural, blame me, Lucy A. G. Faire ;-}.

-- 
<uri: https://hyperreal.info > Six of Chalices mirrors Nine of Wands.
The World is in the locus of Chaos. The Universe pretends to be Queen
of Pentacles, reversed. She feels like inverse of The Tower reflecting
on inverse of The Chariot. My myself until the dark is searching for...


------------------------------

Date: Mon, 09 Oct 2006 23:35:45 +0200
From: Amelia <solitude.standing@silent-force.info>
Subject: Re: Tricky regex (exclude some multiple characters)
Message-Id: <solitude.standing-1F1B8B.23354509102006@news.home.net.pl>

In article <1154033077.626906.77120@s13g2000cwa.googlegroups.com>,
 "Uncle_Fester" <tinthaut@hotmail.com> wrote:

> I want to test for "things that look more or less like real English
> words" from parsed hypertext.
> 
> I know that
> 
> while ($text =~ /([A-Za-z0-9_\'\-]+)/g )
> 
> will catch most of what I want most of the time.
> 
> The tricky bit is this :
> 
> How might I allow 'oo' and 'ee' and not 'ff' or '--' ?
> How might I exclude patterns like '_________' or '010101010101' ?
> 
> Any thoughts?

    You n{ee}d a reasonable boundaries for many a reason and because
 she told me so using tel{ee}mpathy.

    The number of characters of The Word! It implies and includes your
 constraint for no nonsense repetitions. They lead you to a clue or twain
 that RE yer s{ee}k, Lancelot, shall be recursive, in en{ss}ence o{ff}
 saying; what you wont to match (or *function*) is tightly related onto
 what you're going to match with (or *form*), known to evolutionary
 biology and linguistics and more (languages be recursive, but multiple 
 repetitions of the same pa{tt}ern ought to be rare except exceptions). 
 This for some reason reminds of me pattern recognition subsystem
 about a po{ss}ibility of including perl code in RE and mul tuple
 embe{dd} ed cond iterational ca{ll}s to eval. 
    LISP me before I su{ff}ocate. Make love not war on Afganistan.
 One more pu{ff}. You n{ee}d a regular expre{ss}ion *generator* wytch a 
 regular expre{ss}ion is. That kind a-think might be sentient.
 Be gentle in case youse encounter true alien intelligence, AI
 co{ll}ective co{ll}ects them whom we consider pliable ;-}.

    Wri{tt}en under influence of Mephisto by M{oo}nspell from album
 I{rr}eligious and some Lor{ee}na McKe{nn}i{tt}. Don't ki{ll}file me now,
 please...Wait for fu{ll}ness ;-}. Grok!


------------------------------

Date: 9 Oct 2006 08:26:28 -0700
From: "jmv" <eugene.morozov@gmail.com>
Subject: Re: Unit-testing and mock objects
Message-Id: <1160407588.747542.176150@m7g2000cwm.googlegroups.com>


Ben Morrow wrote:
> [snip]
> in your test file. I can see this may be a pain if you have a sub that
> is imported all over the place and you want to mock it the same
> everywhere; it should be a small matter of programming to run through
> the symbol table and find everywhere this sub has been exported to...

Thanks, that was very helpful answer. I'm new to using mock objects (my
usual unit-tests were much simpler before that project) and overriding
subs/modules, so I didn't find out that myself.
Eugene



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 9825
***************************************


home help back first fref pref prev next nref lref last post