[24377] in Perl-Users-Digest
Perl-Users Digest, Issue: 6566 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat May 15 03:06:45 2004
Date: Sat, 15 May 2004 00:05:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sat, 15 May 2004 Volume: 10 Number: 6566
Today's topics:
2 Thread questions (Charlie)
Re: 2 Thread questions <usenet@morrow.me.uk>
Dynamic content and search engines <redalert@wakproductions.com>
Re: Dynamic content and search engines <spamtrap@dot-app.org>
Re: Dynamic content and search engines <nospam@bigpond.com>
Re: Favorite Editor for Perl programming <jurgenex@hotmail.com>
Re: Favorite Editor for Perl programming <jwillmore@remove.adelphia.net>
Re: IO::socket question <mysympatico001@sympatico.ca>
Re: IO::socket question <usenet@morrow.me.uk>
lookahead or lookbehind (Aqua)
Re: lookahead or lookbehind <usenet@morrow.me.uk>
Newbie question: removing array reference information <wheelscribe@mac.com>
Re: Newbie question: removing array reference informati <usenet@morrow.me.uk>
Re: Operator precedence <eric-amick@comcast.net>
Re: Perldoc versus Man <jurgenex@hotmail.com>
Re: PPM 3.1 hanging on install, search <invalid-email@rochester.rr.com>
problem with string <tihana@kata.com>
Re: Random Integer <bmb@ginger.libs.uga.edu>
regex split conditional <nospam@nospam.net>
Re: regex split conditional <tadmc@augustmail.com>
Re: RegEx to delete // comments NOT in quotes: ( ' ) OR <abigail@abigail.nl>
Re: RegEx to delete // comments NOT in quotes: ( ' ) OR <NewsGroups@US-Webmasters.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 14 May 2004 19:57:59 -0700
From: cji_work@yahoo.com (Charlie)
Subject: 2 Thread questions
Message-Id: <1dc70d61.0405141857.22b2aba0@posting.google.com>
Hi folks,
I am using the Apache/Perl to test some web application now, I am
quite new to the thread area and am just learning now. Here are my
questions
1. In one of my test cases, I am trying to test how may users can
access the web application to do the work at same time, the code that
I have written is something as:
"
...
while($Inputs{users}--) {
my $thread = Thread->new(\&PlaceOrder);
push(@threads, $thread);
}
$_-> join foreach(@threads);
...
",
where I get the user number, create one thread per user, have the
users does the same work and join each threads.
It works very well there there are 30 - 50 users, but when I was
trying 100~200 users, it just does nothing and ends the test
application after some time.
What I want to know is there must be a limitation of the Threads that
I can create at same time(?), where I can find such info, is it
depends on the perl module I am using, or the platform on which the
application is running ?
2. I also looked at thread pool module at www.cpan.org. My feeling is
the about code can be written by using the Thread pool, and I gave a
try. The problem that I find is no matter what I have done, there is
always one thread, I can not make multi threads works, can someone
tell me what went wrong ?
"
...
my $pool = Thread::Pool->new(
{optimize => 'cpu',
do =>\&PlaceOrder(),
frequency => 100,
autoshutdown => 1,
workers => 10,
maxjobs => 50,
minjobs => 5,
},);
while($Inputs{users}--) {
$pool->job();
}
$pool->join;
"
From the doc, my understanding is the "job" that I have add is
actually the total threads that I wand to create, and the number of
the workers is the max amount of threads running at any given time, am
I right ?
Help me if you could !
Thanks a lot
Charlie
------------------------------
Date: Sat, 15 May 2004 03:16:28 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: 2 Thread questions
Message-Id: <c8422c$etd$1@wisteria.csv.warwick.ac.uk>
Quoth cji_work@yahoo.com (Charlie):
> Hi folks,
> I am using the Apache/Perl to test some web application now, I am
> quite new to the thread area and am just learning now. Here are my
> questions
Firstly, are you using perl 5.8? Are you using ithreads? perl -v should say
This is perl, v5.8.something ...
and perl -V should have a line saying
Compile-time options: ... USE_ITHREADS ...
. If not, you're out of luck unless you reinstall perl: 5005threads were
never very reliable.
> 1. In one of my test cases, I am trying to test how may users can
> access the web application to do the work at same time, the code that
> I have written is something as:
> "
> ...
> while($Inputs{users}--) {
Why destroy your data?
for (1..$Inputs{users}) {
> my $thread = Thread->new(\&PlaceOrder);
If you *are* using ithreads, you'd be better off using the new
threads.pm module: start with
use threads;
instead of
use Thread;
and here you want either
my $thread = threads->create(\&PlaceOrder);
or
my $thread = async { PlaceOrder };
> push(@threads, $thread);
> }
> $_-> join foreach(@threads);
> ...
> ",
> where I get the user number, create one thread per user, have the
> users does the same work and join each threads.
> It works very well there there are 30 - 50 users, but when I was
> trying 100~200 users, it just does nothing and ends the test
> application after some time.
More info please. What reports do you get in your error log? Do you get
a 500 error? Does perl crash?
> What I want to know is there must be a limitation of the Threads that
> I can create at same time(?), where I can find such info, is it
> depends on the perl module I am using, or the platform on which the
> application is running ?
It depends on your platform. On Linux, threads are a special case of
processes, so the max number of threads is the same as the max
number of processes; and there is no limit to the number of threads in a
single process.
> 2. I also looked at thread pool module at www.cpan.org. My feeling is
> the about code can be written by using the Thread pool,
This sounds likely, but I'm afraid I know nothing about it. I would have
thought that once you get the approach above to work then Thread::Pool
will work as well (possibly more efficiently).
Ben
--
Although few may originate a policy, we are all able to judge it.
- Pericles of Athens, c.430 B.C.
ben@morrow.me.uk
------------------------------
Date: Sat, 15 May 2004 01:48:21 GMT
From: "Winston Kotzan" <redalert@wakproductions.com>
Subject: Dynamic content and search engines
Message-Id: <FFepc.3749$Is7.2125@newssvr15.news.prodigy.com>
I would like to set up a CGI system to deliver dynamic pages on my website
using Perl. Is it possible to pass a parameter to my dynamic page script
without the question mark in the URL. For example, a search engine robot
may not index a page like:
http://www.wakproductions.com/cgi-bin/viewpage.cgi?page=/nukclock/index.htm
However, I'm thinking that I can fool most search engines into indexing a
URL like:
http://www.wakproductions.com/viewpage.cgi/nukclock/index.htm
Is this possible in Perl? If so, how can I grab the "/nukclock/index.htm"
parameter in the script?
Thanks.
--
Winston Kotzan
http://www.wakproductions.com/
------------------------------
Date: Fri, 14 May 2004 22:11:38 -0400
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: Dynamic content and search engines
Message-Id: <5dqdnZgrXPPB4Tjd4p2dnA@adelphia.com>
Winston Kotzan wrote:
> For example, a search engine robot may not index a page like:
>
>
http://www.wakproductions.com/cgi-bin/viewpage.cgi?page=/nukclock/index.htm
What leads you to think it won't? A URL is a URL is a URL.
> However, I'm thinking that I can fool most search engines into indexing a
> URL like:
>
> http://www.wakproductions.com/viewpage.cgi/nukclock/index.htm
>
> Is this possible in Perl?
Of course. Perl will print any URL you ask it to print.
print 'http://www.example.com/viewpage.cgi/nukclock/index.html';
;-)
> If so, how can I grab the "/nukclock/index.htm"
> parameter in the script?
See the path_info() and translated_path_info() methods in the CGI.pm module.
Frankly though, I think you're barking in the wrong forest. I see URLs with
query parameters in them returned from Google and other SEs all the time;
they obviously have no problems indexing them.
If you're not getting indexed or appears low in the rankings, there are more
likely causes. The first is simply that it takes time for the engines to
notice your site, so you need to be patient; your site won't appear the day
after you put it up, and probably not a week after either. Also, Google
uses links to your site to help determine its rankings, so if there are
none out there, your site will probably rank low.
If you're serving up tag soup instead of valid, error-free HTML, then search
engines might not be able to cope with the mess well enough to extract
links from it. Similarly, <meta ...> elements are ignored because of
frequent past abuses, so if you're using those in an attempt to compensate
for a lack of meaningful content, it won't work.
(Note - that last paragraph is generic advice, not a specific criticism of
your site, which I haven't visited.)
sherm--
--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
------------------------------
Date: Sat, 15 May 2004 12:26:13 +1000
From: Gregory Toomey <nospam@bigpond.com>
Subject: Re: Dynamic content and search engines
Message-Id: <1259752.k0LIbdYHHx@GMT-hosting-and-pickle-farming>
Winston Kotzan wrote:
> I would like to set up a CGI system to deliver dynamic pages on my website
> using Perl. Is it possible to pass a parameter to my dynamic page script
> without the question mark in the URL. For example, a search engine robot
> may not index a page like:
>
> http://www.wakproductions.com/cgi-bin/viewpage.cgi?page=/nukclock
index.htm
>
> However, I'm thinking that I can fool most search engines into indexing a
> URL like:
>
> http://www.wakproductions.com/viewpage.cgi/nukclock/index.htm
>
> Is this possible in Perl? If so, how can I grab the "/nukclock/index.htm"
> parameter in the script?
>
> Thanks.
>
>
> --
> Winston Kotzan
> http://www.wakproductions.com/
These issued are discussed in detail at www.webmasterworld.com
I have various sites, all in Perl. Indexing dynamic cgi is not a problem. If
you need to, use an Apache rewrite rule to make it a little more "user
friendly".
Apache passes various environment variables & it differs from installation
to installation. You can read these to formulate your URLs. You can see
their values with code like:
my @thekeys=keys(%ENV);
foreach my $key (sort @thekeys)
{print "$key = $ENV{$key}\n<br>";}
But there IS a problem, often with PHP sites, in that they pass 'session id'
or similar as part of the URL.
ie "www.site.com/page=123&session_id=321123321123"
anything like this is unlikely to get indexed.
Commercial products (like www.vbulletin.com ) have changed their code to
overcome this problem.
gtoomey
------------------------------
Date: Sat, 15 May 2004 02:29:14 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: Favorite Editor for Perl programming
Message-Id: <_ffpc.128488$G_.54099@nwrddc02.gnilink.net>
knowsitall wrote:
> On Fri, 14 May 2004 17:59:03 -0500, John Bokma wrote:
>>> editor
>
> forgot the echo command:
>
> echo '#!/usr/local/bin/perl' > file.pl
> echo 'use warnings;' >> file.pl
> echo 'use strict;' >> file.pl
Jürgen> 'Real programmers use "cat > a.out" '
Real programmers "cp /dev/audio a.out" and whistle into the mike.
[Randal L. Schwartz]
jue
------------------------------
Date: Sat, 15 May 2004 00:21:26 -0400
From: James Willmore <jwillmore@remove.adelphia.net>
Subject: Re: Favorite Editor for Perl programming
Message-Id: <pan.2004.05.15.04.21.15.686939@remove.adelphia.net>
On Sat, 15 May 2004 00:48:14 +0000, Edward Wijaya wrote:
>
> In unix/linux environment.
>
> Is X(Emacs) better than vi(M)?
> How does one fare to another?
> Or there is other better choice?
Use Google and find out.
Thanks for starting yet another pointless thread. Pointless ... because
this has been asked and answered before. And ... there's a ... FAQ on
this question.
--
Jim
Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.
a fortune quote ...
If only I could be respected without having to be respectable.
------------------------------
Date: Fri, 14 May 2004 23:00:15 -0400
From: "B. W." <mysympatico001@sympatico.ca>
Subject: Re: IO::socket question
Message-Id: <zJfpc.40277$dr1.1055854@news20.bellglobal.com>
What I tried was:
$server =
IO::Socket::UNIX-new(LocalAddr=>"/script/file",Type=>SOCK_DGRAM, Listen=>5);
$client = IO::Socket::UNIX->new(PeerAddr=>"/script/file",
Type=>SOCK_DGRAM, Timeout=>10);
And I got "died at line 3", the value for $! is "unknown error".
Did I use the correct module? Should I use IO::Socket::INET?
thanks,
B.W.
"A. Sinan Unur" <1usa@llenroc.ude> wrote in message
news:Xns94E99099C76F4asu1cornelledu@132.236.56.8...
> "B. W." <mysympatico001@sympatico.ca> wrote in
news:Jm7pc.79721$FH5.1821934
> @news20.bellglobal.com:
>
> > Is it possible to open a socket for loopback on win32 system?
>
> What happened when you tried?
>
> --
> A. Sinan Unur
> 1usa@llenroc.ude (reverse each component for email address)
------------------------------
Date: Sat, 15 May 2004 03:23:33 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: IO::socket question
Message-Id: <c842fl$etd$2@wisteria.csv.warwick.ac.uk>
[please quote properly]
Quoth "B. W." <mysympatico001@sympatico.ca>:
> "A. Sinan Unur" <1usa@llenroc.ude> wrote in message
> news:Xns94E99099C76F4asu1cornelledu@132.236.56.8...
> > "B. W." <mysympatico001@sympatico.ca> wrote in
> news:Jm7pc.79721$FH5.1821934
> > @news20.bellglobal.com:
> >
> > > Is it possible to open a socket for loopback on win32 system?
> >
> > What happened when you tried?
>
> What I tried was:
>
> $server =
> IO::Socket::UNIX-new(LocalAddr=>"/script/file",Type=>SOCK_DGRAM, Listen=>5);
> $client = IO::Socket::UNIX->new(PeerAddr=>"/script/file",
> Type=>SOCK_DGRAM, Timeout=>10);
>
> And I got "died at line 3", the value for $! is "unknown error".
>
> Did I use the correct module? Should I use IO::Socket::INET?
Win32 doesn't have Unix-domain sockets, so yes, you could use
IO::Socket::INET with localhost for the {Local,Remote}Host. Note that
inet-domain sockets are rather different from Unix-domain: anyone (on
that machine, if Win32 implements binding sockets to interfaces
properly; on the whole network, of it doesn't) can connect, and you have
to specify a numerical port rather than a filename. You may also wish to
look at Win32::Pipe.
Ben
--
It will be seen that the Erwhonians are a meek and long-suffering people,
easily led by the nose, and quick to offer up common sense at the shrine of
logic, when a philosopher convinces them that their institutions are not based
on the strictest morality. [Samuel Butler, paraphrased] ben@morrow.me.uk
------------------------------
Date: 14 May 2004 23:02:08 -0700
From: junk@dlink.org (Aqua)
Subject: lookahead or lookbehind
Message-Id: <55d7995c.0405142202.4a09729a@posting.google.com>
Group
I have a huge data file like this
========
fdp "abcd18"
$anchor "fig3","","",0<>
$text "Figure_fig3",68.0711,0,0,0,0,0<>
$gap 0,8,0,10.3<>
$bcolour White<>
$areas<>
fdp "abcd19"
$anchor "fig1","","",0<>
$text "Figure_fig1",40.1263,0,0,0,0,0<>
$gap 2,6,2,8<>
$bcolour White<>
$areas<>
fdp "abcd21"
$anchor "tab1","","",0<>
$text "Table_tab1",0,0,0,0,0,0<>
$gap 0,4.2175,0,7.42<>
$areas<>
========
I wanted to match text "Figure_fig1" block which should start with fdp
and ends with $areas<>.
use strict;
open( INFILE, "<domtest.txt" ) or die "Cannot open: $!\n";
my @Line = <INFILE>;
my $FileLine = join( "", @Line );
$FileLine =~ /(fdp .*?(?!fdp).*text \"Figure_fig1\")/s;
close INFILE;
But in my Reg exp it always matches from first fdp.
I would appreciate any help in this regard
Regards
Dominic
------------------------------
Date: Sat, 15 May 2004 06:15:05 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: lookahead or lookbehind
Message-Id: <c84ch9$l4d$1@wisteria.csv.warwick.ac.uk>
Quoth junk@dlink.org (Aqua):
> Group
>
> I have a huge data file like this
> ========
> fdp "abcd18"
> $anchor "fig3","","",0<>
> $text "Figure_fig3",68.0711,0,0,0,0,0<>
> $gap 0,8,0,10.3<>
> $bcolour White<>
> $areas<>
>
> fdp "abcd19"
> $anchor "fig1","","",0<>
> $text "Figure_fig1",40.1263,0,0,0,0,0<>
> $gap 2,6,2,8<>
> $bcolour White<>
> $areas<>
>
> fdp "abcd21"
> $anchor "tab1","","",0<>
> $text "Table_tab1",0,0,0,0,0,0<>
> $gap 0,4.2175,0,7.42<>
> $areas<>
> ========
>
> I wanted to match text "Figure_fig1" block which should start with fdp
> and ends with $areas<>.
>
> use strict;
> open( INFILE, "<domtest.txt" ) or die "Cannot open: $!\n";
> my @Line = <INFILE>;
> my $FileLine = join( "", @Line );
> $FileLine =~ /(fdp .*?(?!fdp).*text \"Figure_fig1\")/s;
> close INFILE;
#!/usr/bin/perl
use strict;
use warnings;
open my $IN, '<', 'domtest.txt' or die "cannot open domtest.txt: $!";
$/ = "\n\$areas<>\n";
my $FileLine;
while (<$IN>) {
s/^\n//;
$FileLine = $_, last if /Figure_fig1/;
}
# lexical FH will close when it goes out of scope
Ben
--
If you put all the prophets, | You'd have so much more reason
Mystics and saints | Than ever was born
In one room together, | Out of all of the conflicts of time.
ben@morrow.me.uk The Levellers, 'Believers'
------------------------------
Date: Sat, 15 May 2004 02:55:57 GMT
From: Ben Miller <wheelscribe@mac.com>
Subject: Newbie question: removing array reference information
Message-Id: <BCCAF0EC.20CD6%wheelscribe@mac.com>
I'm working on a program which reads hyperlinks from a specific web page.
Using the WWW::Mechanize module, I read the page in question using regex to
scan for a particular URL string. Every instance of that string I find is
stored in an array. However, when I read the array, instead of the matched
text, I find a list of array references for example:
WWW::Mechanize::Link=ARRAY(0x90c160).
Regardless of what changes I make to how I read the information on the page
or even what information I look for I can't escape these array references.
Does anyone have a clue on how I can remove them or avoid them?
My code looks like:
#@links = $mech->find_all_links(text => 'a', text_regex => qr
/memory\/english\/products/i );
$mech is defined earlier as the absolute URL I'm scraping.
print @links;
Any help is much appreciated,
Ben
------------------------------
Date: Sat, 15 May 2004 03:02:30 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: Newbie question: removing array reference information
Message-Id: <c84186$e9l$1@wisteria.csv.warwick.ac.uk>
Quoth Ben Miller <wheelscribe@mac.com>:
> I'm working on a program which reads hyperlinks from a specific web page.
> Using the WWW::Mechanize module, I read the page in question using regex to
> scan for a particular URL string. Every instance of that string I find is
> stored in an array. However, when I read the array, instead of the matched
> text, I find a list of array references for example:
> WWW::Mechanize::Link=ARRAY(0x90c160).
>
> Regardless of what changes I make to how I read the information on the page
> or even what information I look for I can't escape these array references.
> Does anyone have a clue on how I can remove them or avoid them?
>
> My code looks like:
> #@links = $mech->find_all_links(text => 'a', text_regex => qr
> /memory\/english\/products/i );
>
> $mech is defined earlier as the absolute URL I'm scraping.
> print @links;
RTFM, in this case perldoc WWW::Mechanize, which says that
find_all_links returns a list of WWW::Mechanize::Link objects (as you
found). Read perldoc WWW::Mechanize::Link to find how to get at the
various properties of these links.
Ben
--
Heracles: Vulture! Here's a titbit for you / A few dried molecules of the gall
From the liver of a friend of yours. / Excuse the arrow but I have no spoon.
(Ted Hughes, [ Heracles shoots Vulture with arrow. Vulture bursts into ]
/Alcestis/) [ flame, and falls out of sight. ] ben@morrow.me.uk
------------------------------
Date: Fri, 14 May 2004 21:38:30 -0400
From: Eric Amick <eric-amick@comcast.net>
Subject: Re: Operator precedence
Message-Id: <dqraa0lhpvjnlgbkqn5l9t4uphio179egm@4ax.com>
On Fri, 14 May 2004 19:04:37 GMT, David Frauzel <net.weathersongATnemo>
wrote:
>I had to work hard just to craft of an example of when operator
>precedence *does* actually take effect:
>
>print 0 < 0 || 1; # true
>print 0 < 0 or 1; # false
>
>But then how did Perl know to evaluate || before < in this statement,
It didn't. These are equivalent to
print((0 < 0) || 1);
(print(0 < 0)) or 1;
When list operators such as print have other operators to their right,
the list operators have very low precedence relative to the subsequent
operators; in fact, only the word-based logical operators (not, and, or,
xor) have lower precedence. As you can see, the less-than operator is
evaluated before || because of its higher precedence and because of the
left-to-right nature of all the logical operators. It is pure
coincidence that evaluating || first produces the same result; change
the statement to
print 0 < 0 || 100;
and you'll see what I mean.
--
Eric Amick
Columbia, MD
------------------------------
Date: Sat, 15 May 2004 02:31:10 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: Perldoc versus Man
Message-Id: <Ohfpc.128489$G_.104094@nwrddc02.gnilink.net>
Sherif Zaroubi wrote:
> On Thu, 13 May 2004 23:51:34 -0500, Tad McClellan
> <tadmc@augustmail.com> wrote:
>> Why don't you have perldoc installed?
>>
>> It is part of a normal install of perl itself.
>
> On some rpm based system, The perl rpm and perldoc rpm are 2 diffrent
> packages.
> I had that problem with: Suse, RedHat and Mandrake. (Don't remember
> which version).
Then kick your admin until he fixes the incomplete installation.
jue
------------------------------
Date: Sat, 15 May 2004 01:53:39 GMT
From: Bob Walton <invalid-email@rochester.rr.com>
Subject: Re: PPM 3.1 hanging on install, search
Message-Id: <40A57812.6070007@rochester.rr.com>
Stephen wrote:
...
> Just installed ActivePerl v5.8.3 on a Windows 2000 Server. I need to
> install the DBI and ODBC modules, however, when I try searching or
> downloading them, PPM just hangs. All the other features [query,
> repository, etc] work for me, at least I think they all do.
>
> How come this isn't working? It's a fresh install, everything went
> fine. I have ActivePerl v5.8.3 installed on my XP box as well, and it
> hangs too. I did manage, however, to somehow get the DBI and ODBC
> modules via PPM on my XP box, but now I no longer can. In case you're
> wondering, I remote into the Windows 2000 server.
...
> -Steve
>
Did you by chance install Perl v5.8.3 over top of the directories for an
older version of Perl (like 5.6.x)? If so, you might try wiping out
these directories entirely and then reinstalling V5.8.3. That fixed my
problems with very similar symptoms with PPM.
--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl
------------------------------
Date: Sat, 15 May 2004 08:59:07 +0200
From: <tihana@kata.com>
Subject: problem with string
Message-Id: <c84f37$sii$1@ls219.htnet.hr>
how to make someting like basic command:
var$="123456789a"
1.) a$=mid$(var",5,2) :: a$="56"
------------------------------
Date: Fri, 14 May 2004 21:53:26 -0400
From: Brad Baxter <bmb@ginger.libs.uga.edu>
Subject: Re: Random Integer
Message-Id: <Pine.A41.4.58.0405142152340.5866@ginger.libs.uga.edu>
On Fri, 14 May 2004, Walter Roberson wrote:
> If a troll and a half can hook a reader and a half in a posting and a half,
> how many readers can six trolls hook in six postings?
Is that a trick question?
Brad
------------------------------
Date: Sat, 15 May 2004 04:04:56 GMT
From: "Jeff Thies" <nospam@nospam.net>
Subject: regex split conditional
Message-Id: <IFgpc.6585$zO3.3055@newsread2.news.atl.earthlink.net>
I have data that looks like this:
[DE-SOHT (7.5 LBS., 16"), $139.99; DE-HT (8.5 LBS., 30"), $139.99; DE-GB
(17.5 LBS., 40"), $184.99]
I want to split this on the commas, but not the commas that are enclosed
in the brackets:
[DE-SOHT (7.5 LBS., 16"), $139.99
^no ^ yes
I have a feeling I'll need to do a look ahead with :?, but I don't have
enough of a grasp to start work on that.
How do I do this?
Jeff
------------------------------
Date: Fri, 14 May 2004 23:25:07 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: regex split conditional
Message-Id: <slrncab6t3.f6a.tadmc@magna.augustmail.com>
Jeff Thies <nospam@nospam.net> wrote:
> I want to split this on the commas, but not the commas that are enclosed
> in the brackets:
> How do I do this?
perldoc -q split
How can I split a [character] delimited string except when inside
[character]? (Comma-separated files)
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: 15 May 2004 02:23:45 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: RegEx to delete // comments NOT in quotes: ( ' ) OR (")???
Message-Id: <slrncaavpf.4ga.abigail@alexandra.abigail.nl>
W. D. (NewsGroups@US-Webmasters.com) wrote on MMMCMIX September MCMXCIII
in <URL:news:40A4579F.7D4@US-Webmasters.com>:
}} Hi Folks,
}}
}} I am about to ship myself to a mental hospital! Can't figure
}} out a regular expression to strip out comments that begin
}} with double slashes "//" but are not contained in quotation
}} marks, either single (') or double (").
Assuming that quoted strings can contain backslashed quotes, as in:
"This is a \"string\" that ends here --->"
one can use Regexp::Common (well, if it's not allowed, it's still
possible with Regexp::Common): (untested):
use Regexp::Common;
s/($RE{delimited}{-delim => q {'"}})|$RE{comment}{Portia}/$1||""/ge;
Abigail
--
my $qr = qr/^.+?(;).+?\1|;Just another Perl Hacker;|;.+$/;
$qr =~ s/$qr//g;
print $qr, "\n";
------------------------------
Date: Sat, 15 May 2004 00:52:24 -0500
From: "W. D." <NewsGroups@US-Webmasters.com>
Subject: Re: RegEx to delete // comments NOT in quotes: ( ' ) OR (")???
Message-Id: <40A5B018.6932@US-Webmasters.com>
Thanks, Abigail for your reply.
New RegEx below...
Abigail wrote:
>
> W. D. (NewsGroups@US-Webmasters.com) wrote on MMMCMIX September MCMXCIII
> in <URL:news:40A4579F.7D4@US-Webmasters.com>:
> }} Hi Folks,
> }}
> }} I am about to ship myself to a mental hospital! Can't figure
> }} out a regular expression to strip out comments that begin
> }} with double slashes "//" but are not contained in quotation
> }} marks, either single (') or double (").
>
> Assuming that quoted strings can contain backslashed quotes, as in:
>
> "This is a \"string\" that ends here --->"
>
> one can use Regexp::Common (well, if it's not allowed, it's still
> possible with Regexp::Common): (untested):
>
> use Regexp::Common;
>
> s/($RE{delimited}{-delim => q {'"}})|$RE{comment}{Portia}/$1||""/ge;
>
> Abigail
Here's what I've come up with using a trial and error process:
$TheCode = preg_replace('#^([^"\'\/]*)//[^"\']*[\n\r]$#mU', '$1',
$TheCode);
Applying this RegEx to the following text,
===========================================================================
// 1a. Some comments that should be trashed
// 2a. Some comments that should be trashed
// 1b. Some comments that should be trashed
// 2b. Some comments that should be trashed
// 1c. Some comments that should be trashed
// 2c. Some comments that should be trashed
3.
Code goes here;
5.
6. $SomeVar = '// These comments should be left alone!';
7. $SomeVar = ' // These comments should be left alone!';
8.
/* 9. Some more comments that will remain */
10.
"// 11. These comments should also be left alone "
" // 12. These comments should also be left alone "
" // 13. These comments should also be left alone "
# 14. Hash/Pound sign comments should remain for now
15.
16. More Code goes here
17.
{ 18. This code should stay // 19. This comment should go
20.
} 21.
"This is a \"string\" that ends here 22. --->"
" 23. This is a \"string\" that ends here --->"
// 24. Some comments that should be trashed
// 25. Some comments that should be trashed
// 26. Some comments that should be trashed
// 27. Some comments that should be trashed
// 28. Some comments that should be trashed
// 29. Some comments that should be trashed
===========================================================================
Produces:
===========================================================================
3.
Code goes here;
5.
6. $SomeVar = '// These comments should be left alone!';
7. $SomeVar = ' // These comments should be left alone!';
8.
/* 9. Some more comments that will remain */
10.
"// 11. These comments should also be left alone "
" // 12. These comments should also be left alone "
" // 13. These comments should also be left alone "
# 14. Hash/Pound sign comments should remain for now
15.
16. More Code goes here
17.
{ 18. This code should stay
20.
} 21.
"This is a \"string\" that ends here 22. --->"
" 23. This is a \"string\" that ends here --->"
===========================================================================
========================================================================
// Here is the breakdown of how this works:
// ' // Opening quote to contain the RegEx String
// # // Opening RegEx delimiter. Using # because
//
// is used in the match
// ^ // Start at the beginning of the string
// ( // Begin capture section 1
// [^"\'\/] // Any character except these 3: " ' /
// * // Zero or more of the previous
// ) // Close capture section 1
// // // Must have the double slashes // on the line
// [^"\'] // Any character except these 2: " '
// * // Zero or more of the previous
// [\n\r] // End of line character
// $ // End of string marker
// # // Closing RegEx delimiter
// m // Multi-line string modifier
// U // Ungreedy modifier
// ' // Closing quote for RegEx string
=============================================================================
This appears to strip out all // comments that are NOT contained in
quote marks, whether single or double.
However, it doesn't remove the line completely if there are only
blank spaces remaining after the RegEx operation. Oh, well. I guess
these can be removed with a subsequent RegEx.
Any comments or suggestions?
--
Start Here to Find It Fast!™ ->
http://www.US-Webmasters.com/best-start-page/
$8.77 Domain Names -> http://domains.us-webmasters.com/
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6566
***************************************