[23179] in Perl-Users-Digest
Perl-Users Digest, Issue: 5400 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 20 21:06:05 2003
Date: Wed, 20 Aug 2003 18:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 20 Aug 2003 Volume: 10 Number: 5400
Today's topics:
Re: "Simple" regex advice (possibly '/' causing an issu <ian@WINDOZEdigiserv.net>
Automatically save a webpage as a text file (Chris Petersen)
Re: Automatically save a webpage as a text file <dreamguard@dreamguard.at>
Re: Automatically save a webpage as a text file <tdavis@gearbox.maem.umr.edu>
Re: column_info (Philip M. Gollucci)
Re: column_info (Philip M. Gollucci)
Re: dogma ...without the personal attacks <uri@stemsystems.com>
Re: dogma ...without the personal attacks <none@nobody.com>
Re: dogma ...without the personal attacks <none@nobody.com>
Re: dogma ...without the personal attacks <kkeller-spammmm@wombat.san-francisco.ca.us>
Re: dogma ...without the personal attacks (Sam Holden)
Re: Explain the method for a newbie <noreply@gunnar.cc>
Re: How can I put my email address on my website withou <flavell@mail.cern.ch>
Re: Module::Build is yet more broken... (Sam Holden)
Re: OS/2 line feed question <nospam-abuse@ilyaz.org>
Re: OS/2 line feed question <abuseonly@sgrail.org>
Re: perl zombies (aka ? the Platypus)
Re: perl zombies <scripts_you_know_the_drill_@hudsonscripting.com>
Re: perl zombies <minceme@start.no>
Re: perl zombies <none@nobody.com>
Re: Regular expression, getting href which is followed (Tad McClellan)
Re: Regular Expression <simon@unisolve.com.au>
UTF16 and Win32 again, was Re: OS/2 line feed question <flavell@mail.cern.ch>
Re: <bwalton@rochester.rr.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 20 Aug 2003 23:23:09 GMT
From: "Ian.H [dS]" <ian@WINDOZEdigiserv.net>
Subject: Re: "Simple" regex advice (possibly '/' causing an issue?)
Message-Id: <20030821002318.22fab4cf.ian@WINDOZEdigiserv.net>
On 20 Aug 2003 22:11:57 +0100 in
<message-id:u94r0cf2zm.fsf@wcl-l.bham.ac.uk>
Brian McCauley <nobull@mail.com> wrote:
> "Ian.H [dS]" <ian@WINDOZEdigiserv.net> writes:
>
> > Evening all =)
>
> Due to the propagation delays of Usenet, the timezone differences in
> an international forum and the fact that most people only read
> newsgroups a couple of times a day it is totally unsafe to assume that
> it is evening to all. :-)
heh.. can I say "mornin'" to some now? =)
>
>
> > #!/usr/bin/perl
>
> [ snip 11 lines of code that doesn't do anything that would set $_ ]
>
> > if (m#^$scan_path(.*?)->(.*)\s+Infection:\s+(.*)$#) {
>
> > Use of uninitialized value in pattern match (m//) at ./mail_scan.pl
> > line 13
>
> Well, you've done nothing to set $_ (so it'll be undef) then you do a
> m// with no =~ (so it'll act on $_, which, as I said, is undef).
>
Well that'll teach me.
After reading this.. just realised that I chopped the code and modified
it for this script from an IRC bot I coded.. but that runs inside a
permanent while() loop, thus giving me $_. This script doesn't thus not
working as you say.
Thanks Brian.. square eyes aren't so good sometimes.
Back to the drawing board................. =)
Regards,
Ian
--
Ian.H [Design & Development]
digiServ Network - Web solutions
www.digiserv.net | irc.digiserv.net | forum.digiserv.net
Programming, Web design, development & hosting.
------------------------------
Date: 20 Aug 2003 16:44:24 -0700
From: petersen_cp@hotmail.com (Chris Petersen)
Subject: Automatically save a webpage as a text file
Message-Id: <d1bad213.0308201544.3d7fc5fb@posting.google.com>
OS: XP Pro with IE 6.0
Every morning I come in, open 2 websites, and save their content as
TXT files, one is a TAB file the other is a CSV file.
Example addresses:
http://somesite/reports/date.tab
http://somesite/reports/date.csv
I then save them as:
http://somesite/reports/date.tab.txt
http://somesite/reports/date.csv.txt
I would like to write a program to automate this, I was wondering
which language would be best, and maybe get a couple of quick and
dirty examples.
Thanks in advance
------------------------------
Date: Thu, 21 Aug 2003 02:03:20 +0200
From: "Wolfgang 'Dreamguard' Nagele" <dreamguard@dreamguard.at>
Subject: Re: Automatically save a webpage as a text file
Message-Id: <3f440c47$0$19888$91cee783@newsreader01.highway.telekom.at>
> I would like to write a program to automate this, I was wondering
> which language would be best, and maybe get a couple of quick and
> dirty examples.
as you describe i think you got shell access to that machine?
if so - just make a cronjob (google be your friend) and simply copy those
files with bash 'cp' command.
yours, dreamguard.
------------------------------
Date: Wed, 20 Aug 2003 19:58:31 -0500
From: Ted Davis <tdavis@gearbox.maem.umr.edu>
Subject: Re: Automatically save a webpage as a text file
Message-Id: <sf58kv07eagpueg1fdh8umakirqi2nfo9q@4ax.com>
On 20 Aug 2003 16:44:24 -0700, petersen_cp@hotmail.com (Chris
Petersen) wrote:
>OS: XP Pro with IE 6.0
>
>Every morning I come in, open 2 websites, and save their content as
>TXT files, one is a TAB file the other is a CSV file.
>
>Example addresses:
>http://somesite/reports/date.tab
>http://somesite/reports/date.csv
>
>I then save them as:
>http://somesite/reports/date.tab.txt
>http://somesite/reports/date.csv.txt
>
>I would like to write a program to automate this, I was wondering
>which language would be best, and maybe get a couple of quick and
>dirty examples.
>
>Thanks in advance
Either wget or Lynx will do this as a simple command
wget http://somesite/reports/date.tab
wget http://somesite/reports/date.csv
lynx -dump http://somesite/reports/date.tab > date.tab
lynx -dump http://somesite/reports/date.csv > date.csv
<http://unxutils.sourceforge.net/> and <http://lynx.isc.org/release/>
You can put the commands in a batch file and do the whole thing with
one click. Of you log in each morning, you can put the batch file in
your startup folder, though you might want to check their dates before
downloading, even though double downloads (in case of reboot) would
not likely be a problem.
T.E.D. (tdavis@gearbox.maem.umr.edu - e-mail must contain "T.E.D." or my .sig in the body)
------------------------------
Date: 20 Aug 2003 15:26:42 -0700
From: pgollucci@ejpress.com (Philip M. Gollucci)
Subject: Re: column_info
Message-Id: <75e6245d.0308201426.1e4d5248@posting.google.com>
The relevant line from the .trace file with level at 12
T <- column_info(undef 'ejp' ...)= undef at AMSDbMaintenance.pm line 281
> You may also want to check with the DBD::Oracle documentation as well.
> I am not certain, but some methods are available with some drivers
> (DBD modules) and others are not. Again, I'm not sure about this, so
> you may want to double check me.
Yes you are correct about some being available in one driver and not
in others. The only reason I even asked this question is because
_OF_ the documentation.
Inerestingly, I had already checked. The DBI documentation lists it.
The Oracle Documentation lists it. The ODBC documentation does _NOT_.
Quite the opposite of what I would have expected.
I've even look through the Code of DBI.
in ~1.32
sub column_info() {
shift->_not_implent(.....)
around 1.37 this now calls _columns(). This must be an XS function
cause it was no where to be found.
At any rate, I've worked around it, but would still like to know whats going on.
SELECT table_name, column_name, data_type, nullable char_col_decl_length
FROM all_tab_columns
WHERE table_name = UPPER(?)
AND owner = UPPER(?)
------------------------------
Date: 20 Aug 2003 15:32:18 -0700
From: pgollucci@ejpress.com (Philip M. Gollucci)
Subject: Re: column_info
Message-Id: <75e6245d.0308201432.33cfad84@posting.google.com>
I've now run into another problem.
Constraints and Indexes
my $constraints;
foreach my $table_nm (sort @table_nms) {
$table_nm = lc $table_nm;
my @key_names = $dbh->primary_key(undef, $cf->get('dbusername'), $table_nm);
@key_names = sort grep { $_ = lc $_} @key_names;
push @$constraints, {
name => "$table_nm\_cst",
columns => \@key_names
};
}
If there is not Primary Key (CONSTRAINT - _cst) then I get the index (_idx)
I may be able to use
select index_name from user_indexes
to work around this, but one would hope there is a better way.
If your wondering, next up are Referential Integrity, Sequences
and triggers. But I'll save those for tomorrow.
Thanks again/in advance.
------------------------------
Date: Wed, 20 Aug 2003 23:01:07 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: dogma ...without the personal attacks
Message-Id: <x7fzjwue6k.fsf@mail.sysarch.com>
>>>>> "SB" == Si Ballenger <shb> writes:
SB> There is a small clique of comp.lang.perl.misc "clerics" that
SB> like to beat up on nubies and other infidels that don't bow
SB> before them. When they resort to personal attacks, you know
SB> you've got them out of their safe little caves and have them on
SB> the run. Sooner or later they will give you their big "PLONK!",
SB> which is the equivalent of them rolling over like a dog,
SB> urinating on themselves, and then hiding under mommies skirt.
SB> Actually good entertainment. ;-)
and there is the other clique which doesn't care about professional
quality code or being correct or efficiency or good perl in general. you
can choose which side to be on. note that this second clique doesn't
teach perl professionally, doesn't attend or lecture at conferences,
doesn't write articles or tutorials for various publications, doesn't
write/edit/review books, etc. with your choice of the other clique i
would expect you to also get your medical advice from the radio call in
show or your financial advice from spam. both are very user friendly and
won't ever give you practical feedback or criticism.
programming is a career and a living for most (if not all) of the
regulars here. like most professions, experience matters. you want the
lawyer who has done your type of law and successfully, not some kid who
just watched law and order season 3 on dvd. the problem with programming
(and this group) is that that kid can also post answers here and there
is no public way of judging the quality of those answers except via
feedback from others. yet you would claim to use that dvd watching kid
just because he is nicer to you or lets you tell him how to plead your
case. that is a fool hiring a fool. go for it. just don't let me near
your resume.
programming is so easy to get into and make a hobby. it not even hard to
find a job (at least when the market is hot) without massive experience
or degrees. there is a constant discussion over 'certification' in
programming (and perl in particular). would you rather use a CPA or your
cousin who knows how to run quicken to do your taxes?
so stop with your silly lambasting of the regulars here. the regulars
all know and respect each other and notice that we don't flame anyone
for a mistake or feedback or whatever. we all take proper critical
feedback as what it is and not personal attacks. only the weak spirited
and unprofessional take such replies personally. i have no problem with
anyone commenting on the technical aspects of my posts and code. i may
disagree with them and even say so but that is not personal. as they
said in the godfather, it is just business. coding is all about peer
review. code is for people, not computers. but that is too high a
concept for most newbies and amateur coders. i have been coding for 29
years now (24 of those as a paid professional) and i have seen and
written more code than most of you. i am hired for that experience. i
offer it here for free. you can take it or leave it but insulting me for
my technical comments marks you as a fool. and that is a personal
comment on you.
and dave adler's comments on dogma were right on target. the majority
choice is not right or wrong just based on that majority. dogma is bad
when it is not created from free choice. here the use of modules and
cpan is not issued from above and forced upon the perl community. it was
developed over 50 years ago and refined in the greater computer
community. the perl community just has adopted it and refined it even
further with modules and cpan. and there is nothing to stop you from not
using a module. just the voice of experience and reason will say it is a
bad thing and that isn't nice to hear when you think you know it all
(and really don't).
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: Wed, 20 Aug 2003 19:06:04 -0700
From: hudson <none@nobody.com>
Subject: Re: dogma ...without the personal attacks
Message-Id: <u4a8kvs09b9advdh556fplepe2cjt69j4p@4ax.com>
On Wed, 20 Aug 2003 21:55:17 GMT, shb*NO*SPAM*@comporium.net (Si
Ballenger) wrote:
>On Wed, 20 Aug 2003 01:31:33 -0700, hudson
><scripts_you_know_the_drill_@hudsonscripting.com> wrote:
>
>>I'm going to say it again...there is a whole lot of dogma and
>>mythology in this group.
>>
>>Leave out the personal attacks this time and try to have a serious
>>discussion about it...otherwise you are just degrading Perl......
>
>There is a small clique of comp.lang.perl.misc "clerics" that
>like to beat up on nubies and other infidels that don't bow
>before them. When they resort to personal attacks, you know
>you've got them out of their safe little caves and have them on
>the run. Sooner or later they will give you their big "PLONK!",
>which is the equivalent of them rolling over like a dog,
>urinating on themselves, and then hiding under mommies skirt.
>Actually good entertainment. ;-)
Thanks for the reply, Si...I realized I got baited and was probably
pretty good entertainment for everyone....oh well...
I like your description a lot...my thoughts exactly ;-)
------------------------------
Date: Wed, 20 Aug 2003 19:12:31 -0700
From: hudson <none@nobody.com>
Subject: Re: dogma ...without the personal attacks
Message-Id: <tha8kvcf4uj0ecs49p6c2jnj462c56tvl6@4ax.com>
if you think about it uri, you started this whole mess by calling me a
script kiddie and unexperienced when I posted some code and questioned
what you said.
------------------------------
Date: Wed, 20 Aug 2003 16:38:01 -0700
From: Keith Keller <kkeller-spammmm@wombat.san-francisco.ca.us>
Subject: Re: dogma ...without the personal attacks
Message-Id: <po01ib.122.ln@goaway.wombat.san-francisco.ca.us>
-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1
In article <tha8kvcf4uj0ecs49p6c2jnj462c56tvl6@4ax.com>, hudson wrote:
> if you think about it uri, you started this whole mess by calling me a
> script kiddie and unexperienced when I posted some code and questioned
> what you said.
"Mom, hudson's growing!"
It would be considerate of you to not morph your address, so that
those of us with you in our killfiles can continue to enjoy
relative peace and quiet.
- --keith
- --
kkeller-mmmspam@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://wombat.san-francisco.ca.us/cgi-bin/fom
-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iEYEARECAAYFAj9EBlgACgkQhVcNCxZ5ID9/kACeJPxJOGkb6EXVFS8H0S1ZKlof
PI8AoJ4EOZ+rc3rJB46onM9vBtDqfDLA
=M6/V
-----END PGP SIGNATURE-----
------------------------------
Date: 21 Aug 2003 00:04:17 GMT
From: sholden@flexal.cs.usyd.edu.au (Sam Holden)
Subject: Re: dogma ...without the personal attacks
Message-Id: <slrnbk8341.6sp.sholden@flexal.cs.usyd.edu.au>
On Wed, 20 Aug 2003 19:12:31 -0700, hudson <none@nobody.com> wrote:
> if you think about it uri, you started this whole mess by calling me a
> script kiddie and unexperienced when I posted some code and questioned
> what you said.
You said you wanted to be killfiled.
Could you stop posting from different fake addresses, so that the killfile
would actually get all of your blatherings.
Thanks.
--
Sam Holden
------------------------------
Date: Thu, 21 Aug 2003 00:26:15 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Explain the method for a newbie
Message-Id: <bi0sma$43mjl$1@ID-184292.news.uni-berlin.de>
Tad McClellan wrote:
> Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
>
>>this group is not intended for
>>people who "don't know anything of perl".
>
> Sure it is, _if_ they want to change that.
I stand corrected, Tad. I expressed myself very badly.
> (The OP did not appear to meet that predicate though...)
No, he indicated no such intention, which of course contributed to my
response...
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Wed, 20 Aug 2003 23:21:22 +0200
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: How can I put my email address on my website without attracting Spam?
Message-Id: <Pine.LNX.4.53.0308202311070.25055@lxplus095.cern.ch>
On Wed, Aug 20, Nick Kew inscribed on the eternal scroll:
> Talking of which, all the crap
> has a header "X-MailScanner: Found to be clean". Am I in danger of
> losing legit. email by filtering on that?
This is OT for both of the groups it's posted to, but with my deputy
postmaster hat on, we're cooking that line together with other
indicators before rejecting bogus anti-virus reports. (The actual
virus of course was rejected at our mailer from the very beginning,
thanks to its filename extension marking it as active content.
You've got MS to thank for making that email option unusable.)
But I must say, email with a header proclaiming it to be virus-clean
is about as convincing as UBE stating "this is not spam", or a
corroded spear dug up with a label stating 'this is a genuine bronze
age relic'. (Like Perl4, eh?).
------------------------------
Date: 20 Aug 2003 23:56:42 GMT
From: sholden@flexal.cs.usyd.edu.au (Sam Holden)
Subject: Re: Module::Build is yet more broken...
Message-Id: <slrnbk82lp.6sp.sholden@flexal.cs.usyd.edu.au>
On Wed, 20 Aug 2003 17:11:53 +0000 (UTC),
Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
> [A complimentary Cc of this posting was sent to
> Sam Holden
><sholden@cs.usyd.edu.au>], who wrote in article <slrnbk5gr0.d2f.sholden@flexal.cs.usyd.edu.au>:
>> Simply because I can't see any reason why perl would want to do otherwise.
>>
>> If there are such systems then replace "Build.PL" with "Setup.PL" and
>> "Build" with "Build.PL" or whatever. The actual names of the files
>> are pretty much irrelevant to me.
>
> Same for me - as far the files differ by more than extension. But
> Module::Build does exactly this.
The filenames used are a pretty trivial thing to change.
>
>> If there is no platform independant way of specifying run the perl
>> script "foo" and pass it the argument "bar", then I guess some sort
>> of english description (better than mine, hopefully) would need to
>> be used.
>
> It is much easier to fix bugs/problems than document them.
I don't think there is a bug/problem. As I sais I don't know of a
platform which makes it impossible to run a program and pass it
two command line arguments which are short strings.
>
>> If there are a handful of wierdo architectures that are completely
>> different then examples of those can be used along with the
>> general case.
>
> This is not acceptable. I do not want my users to read 20-page README
> just to find out how to install things on a non-broken system. Too
> many Perl installations are broken; if non-broken ones give troubles
> too, the whole idea of supporting modules becomes a nightmare...
perl Setup.PL
perl Build.PL test
perl Build.PL install
is hardly 20 pages. If there exists a system where you can't run perl scripts
by name and give them arguments, then that wierdo platform would need
the corresponding three instructions. Maybe create a seperate icon for
each action for system with no command line, or a single icon which prompts
for the arguments in a platform spcific way.
>
>> If you have no problem with "perl Makefile.PL" as instructions
>> then I can't see how some name other than "Makefile" can be
>> a problem.
>
> See above.
Makefile.PL is OK, but Setup.PL and Build.PL won't work?
>
>> > You mean "currently" or "in foreseeable future"? I would not vouch in
>> > the second case...
>
>> Why would that ever change? perl5.8.0 still accepts ' as a package
>> seperator, why would backwards compatibility be broken so badly for
>> absolutely no gain whatsoever (as far as I can tell).
>
> Simple: suppose a filesystem does not allow files with empty extensions.
So use .PL on all the names. As I said the names of the files don't seem
to be a serious issue, that's something which is trivial to change.
But then what about a filesystem which doesn't allow capital letters in names?
Or a filesystem which doesn't allow .s in names? Or a file system which has
a limit of three letters for filenames?
My answer is that you make it work on as many as possible without harsh
restrictions and then specialise for those platforms which are different.
--
Sam Holden
------------------------------
Date: Wed, 20 Aug 2003 23:08:55 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: OS/2 line feed question
Message-Id: <bi0v27$2ihc$1@agate.berkeley.edu>
[A complimentary Cc of this posting was sent to
Alan J. Flavell
<flavell@mail.cern.ch>], who wrote in article <Pine.LNX.4.53.0308201904400.6361@lxplus089.cern.ch>:
> > and 2) The build from sources of 5.8.0 on OS/2 does not seem to
> > build/install perldoc and his dependencies.
>
> That seems *most* unfortunate.
While I did not check how `make install' works for several years (the
releases of Perl after 5.005* being very much broken on DOSISH
architectures), I would think this is the matter of not reading the
fine docs.
Hope this helps,
Ilya
------------------------------
Date: Wed, 20 Aug 2003 23:15:24 GMT
From: derek / nul <abuseonly@sgrail.org>
Subject: Re: OS/2 line feed question
Message-Id: <9008kvko74oh5ct44b2lkafuotrj50adlv@4ax.com>
On Wed, 20 Aug 2003 19:22:17 +0200, "Alan J. Flavell" <flavell@mail.cern.ch>
wrote:
>I have to admit I'm not familiar enough with OS/2 specifics, but I
>assume it expects to get text from external files that are in "DOS"
Win NT and OS/2 were that same code in V2.0, so I would expect the same crlf
treatment
>format, and as such it would adjust the external CRLF format into
>Perl's internal format (which on these platforms uses \012 to
>represent \n)
In the hex dumps that I have done with win32 there is still crlf in variables
rather than the \012. This is likely the same problem that I am having with my
win32 issue.
>. The perlport document certainly rolls it in with DOS
>and Windows platforms in -that- regard.
>
>On unix-ish systems in general, the platform-native text format does
>not use CR with its LF, and so -its- input/output routines provide no
>such adjustment.
>
>(Mac OS X might be different than other unix systems, I'm afraid I'm
>no more familiar with that than I am with OS/2)
OS X is a BSD clone so should have the same lf as unix.
Derek
------------------------------
Date: Wed, 20 Aug 2003 22:08:41 GMT
From: "David Formosa (aka ? the Platypus)" <dformosa@dformosa.zeta.org.au>
Subject: Re: perl zombies
Message-Id: <slrnbk7sba.6cf.dformosa@dformosa.zeta.org.au>
On Sat, 16 Aug 2003 12:47:57 GMT, Randal L. Schwartz
<merlyn@stonehenge.com> wrote:
[...]
> It is precisely for the google-hit-and-run reader. Which of these are better
> when viewed in a search engine?
>
> Question gets posted
> Troll answer gets posted
> Good answer gets posted
>
> Question gets posted
> Troll answer gets posted
> Troll answer gets challenged
> Troller responds
> Troll answer gets challenged
> Troller responds
> Good answer gets posted
I don't wish to be a pedent here but I wish to challenge your use of
the word "Troll" in this context. A troll is someone who knowlingly
posts missleading or incorrect infomation for the amusement value
there of. Trolls perposely sturs up trouble because they like sturing
up trouble.
Hudson like ansers don't come from knowligly posting incorrect
infomation, Hudson's answers come from ignorence. So its inaccurate
to class him as a troll.
--
Please excuse my spelling as I suffer from agraphia. See
http://dformosa.zeta.org.au/~dformosa/Spelling.html to find out more.
Free the Memes.
------------------------------
Date: Wed, 20 Aug 2003 18:40:54 -0700
From: hudson <scripts_you_know_the_drill_@hudsonscripting.com>
Subject: Re: perl zombies
Message-Id: <ki88kvofspf621vgpbu7nqfj07sq5o53lk@4ax.com>
>I don't wish to be a pedent here but I wish to challenge your use of
>the word "Troll" in this context. A troll is someone who knowlingly
>posts missleading or incorrect infomation for the amusement value
>there of. Trolls perposely sturs up trouble because they like sturing
>up trouble.
>
>Hudson like ansers don't come from knowligly posting incorrect
>infomation, Hudson's answers come from ignorence. So its inaccurate
>to class him as a troll.
christ, man...your spelling and english are poor or you are in the
sixth grade...how's that for pointing out ignorance?
here's with spell checking, but I still can't decode your thoughts
because of the poor grammer:
>I don't wish to be a pendant here but I wish to challenge your use of
>the word "Troll" in this context. A troll is someone who knowingly
>posts misleading or incorrect information for the amusement value
>there of. Trolls purposely stirs up trouble because they like stirring
>up trouble.
>
>Hudson like answers don't come from knowingly posting incorrect
>information, Hudson's answers come from ignorance. So its inaccurate
>to class him as a troll.
------------------------------
Date: Thu, 21 Aug 2003 00:30:53 +0000 (UTC)
From: Vlad Tepes <minceme@start.no>
Subject: Re: perl zombies
Message-Id: <bi13rt$sr9$1@troll.powertech.no>
hudson <scripts_you_know_the_drill_@hudsonscripting.com> wrote:
>> Hudson like ansers don't come from knowligly posting incorrect
>> infomation, Hudson's answers come from ignorence. So its inaccurate
>> to class him as a troll.
>
> christ, man...your spelling and english are poor or you are in the
> sixth grade...how's that for pointing out ignorance?
>
> here's with spell checking, but I still can't decode your thoughts
> because of the poor grammer:
It looks like you need to check your own spelling, grammar and
punctuation...
--
Vlad
------------------------------
Date: Wed, 20 Aug 2003 21:02:54 -0700
From: hudson <none@nobody.com>
Subject: Re: perl zombies
Message-Id: <l2h8kvsc7hogrj27g0ev6ju7lu63on2fkm@4ax.com>
On Thu, 21 Aug 2003 00:30:53 +0000 (UTC), Vlad Tepes
<minceme@start.no> wrote:
>It looks like you need to check your own spelling, grammar and
>punctuation...
well...I know...but his was really bad
------------------------------
Date: Wed, 20 Aug 2003 18:12:12 -0500
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Regular expression, getting href which is followed by img tag with specific src
Message-Id: <slrnbk802c.867.tadmc@magna.augustmail.com>
Fatted <fatted@yahoo.com> wrote:
> "Tad McClellan" <tadmc@augustmail.com> wrote in message
> news:slrnbk6u8t.73q.tadmc@magna.augustmail.com...
>> fatted <fatted@yahoo.com> wrote:
>
>> You should use a module that understands HTML for processing HTML data.
>
> Unfortunately I don't think that will help me with my problem,
Yes it will. That is why I suggested it.
> I want to
> extract the value of a href, for an <a> tag, preceding an <img> tag which
> has an attribute src with a specific value. I'm not sure what module does
> this. (I'm going to look again though!)
I understood what you wanted to do quite clearly, that's why the
code that I already posted does just what you describe above!
Did you run the program?
>> "lines" do not matter in HTML.
>
> Thanks for the reminder :)
But you are going to forget it again before you get to the
end of your followup...
> I
> first wanted to find the line
If you think of "lines" when processing HTML you aren't thinking
correctly, and it will hurt you at some point.
So don't do that. :-)
> which contained the <img src="importantimage.gif" (there just happens to be
> lots of tags on this line), and then try to find the preceding value of the
><a> tags href.
That is what my code does.
> I posted 1 line (at least that was the attempt), unfortunately Google groups
> did a bit of a hatchet job on it, and it got spread over 4 lines. Thats why
> I referred to one line :)
Yes I expected that that is what happened.
Have you seen the Posting Guidelines that are posted here frequently?
If you had said it "in Perl" then you could have conveyed your
actual data without "helpful" tools (attempting to) break it for you.
$html = '<a class="red" href="uninteresting.html" target="_new">'
. 'Not so exciting text</a><a href="equallyboring.html" '
. 'class = "blue"> ...';
>> If that _was_ really all on a single line, then it would still be
>> equivalent HTML, since most whitespace does not matter in HTML data.
>> This will NOT do what you asked, because it does not handle
>> arbitrary HTML, it handles only the one case that you have shown.
>
> You're right it won't do what I asked,
You're wrong, it *will* do what you asked.
Did you run the program?
It prints
IwantThis.html
isn't that what you wanted to be able to find?
But it will not work for real-world HTML, only for the specific
example of HTML that you posted. This legal HTML would break
it for instance:
<a class="green" href="Ido*NOT*wantThis.html">
<!-- src="importantimage.gif" -->
</a>
Whereas a Real HTML parser would not report that false positive.
> I think the google wrap, put you off.
No it didn't.
First, my code does exactly what you asked for with the data you gave.
(and if you modify the data to be all on one line, it will _still_
do the Right Thing.
)
Did you run the program?
Secondly, the word-wrapping did *not* break anything, because the
HTML is equivalent whether wrapped or all on a single line.
Your code should be able to handle HTML, and line breaks don't matter
in HTML, so your code should be able to handle the data either way.
>> It would work correctly if I had used a module that understands
>> HTML data...
>
> See my first comment, but I'd be delighted to be proved wrong.
^^^^^^^^^^^^
I'll do that a little farther down.
> In the mean
> time, I'd still appreciate some tips on the regular expression...
Trying to accomplish what you want with regular expressions is the
path to madness. You can work on it for many days and it will
still be easily broken by legal HTML data.
I know, I've been doing this sort of thing for 13 years.
regexs are not sufficiently powerful for the job you need done.
You need a Real Parser.
[snip working code]
You can do it in less than 10 lines of code with HTML::Tree
http://search.cpan.org/author/SBURKE/HTML-Tree-3.17/
---------------------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;
my $html = '
<a class="red" href="uninteresting.html" target="_new">Not so exciting
text</a><a href="equallyboring.html" class = "blue">yawn</a><a
class="green" href="IwantThis.html"><img border="0"
src="importantimage.gif" alt="MeMe"></a>
';
# $html =~ s/\n/ /g; # make it all on one line
my $tree = HTML::TreeBuilder->new();
$tree->parse($html);
# find elements containing: src="importantimage.gif"
foreach my $img ( $tree->look_down('src', 'importantimage.gif') ) {
next unless $img->tag eq 'img'; # ensure the "src" attr was on
# an <img> element
next unless $img->parent->tag eq 'a'; # ensure parent is an <a> element
my $href = $img->parent->attr('href'); # grab its "href" attr value
print "$href\n";
}
$tree->delete;
---------------------------------------------------------
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Thu, 21 Aug 2003 08:52:40 +1000
From: Simon Taylor <simon@unisolve.com.au>
Subject: Re: Regular Expression
Message-Id: <bi0u6n$kgg$1@otis.netspace.net.au>
Frederik Aerts wrote:
> I want to do a find-replace method using VBScripts Regular Expressions.
> Here is what I want to do:
[other stuff snipped]
We sometimes do development in VBA and have found the JendaRex
perl regular expression library to be invaluable.
http://jenda.krynicky.cz/VB/JendaRex.html
JendaRex allows us to use perl re's to add significant expressive power
to VBA.
Hope this helps
Simon Taylor
------------------------------
Date: Thu, 21 Aug 2003 01:58:15 +0200
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: UTF16 and Win32 again, was Re: OS/2 line feed question
Message-Id: <Pine.LNX.4.53.0308210139480.9659@lxplus103.cern.ch>
On Wed, Aug 20, derek / nul inscribed on the eternal scroll:
> >format, and as such it would adjust the external CRLF format into
> >Perl's internal format (which on these platforms uses \012 to
> >represent \n)
>
> In the hex dumps that I have done with win32 there is still crlf in
> variables rather than the \012. This is likely the same problem
> that I am having with my win32 issue.
Sorry, on this thread so far I was talking about conventional
8-bit-character handling on DOS platforms. If you get crlf there,
then you must be using binmode() on your file handle.
Our other recent thread (re utf16) is still a mystery. Sure, the file
contains the 16-bit units x000d x000a, and when read-in these are
still present internally, I agree with your findings there.
I found today that I could read the utf16LE files into Perl's native
Unicode text format by means of the following bizarre incantation:
open IN, '<:encoding(utf16le):crlf:utf8', $infile or
die "unable to open $infile: $!";
I think I vaguely understand why it appears to work, but I refuse to
believe this is what we're intended to do. Furthermore it seems to be
irreversible: nothing I've managed to do has produced correct newlines
on output. (I keep getting nonsense in the output file such as (hex)
000d 0a0d 0d00 0d00 000a - and variations of that kind - for what were
supposed to be just a pair of newlines, 000d 000a 000d 000a).
I've had a read around the subject and so far the impression I get is
that this maybe isn't mature on Win32 yet. So your original plan to
handle the external files in binmode and apply encoding layers by hand
separately seems to be a good plan, as far as I can see. At any rate
you'll have seen the studious lack of response we've got here. Next
port of call would be the archive of the perl unicode mailing list,
and the only hit I got from win32 and newline was
http://www.mail-archive.com/perl-unicode@perl.org/msg00734.html
But the handling of unicode data characters is fine. It's only the
newlines that got me baffled. (And yes, I *did* set the wide system
calls flag).
------------------------------
Date: Sat, 19 Jul 2003 01:59:56 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re:
Message-Id: <3F18A600.3040306@rochester.rr.com>
Ron wrote:
> Tried this code get a server 500 error.
>
> Anyone know what's wrong with it?
>
> if $DayName eq "Select a Day" or $RouteName eq "Select A Route") {
(---^
> dienice("Please use the back button on your browser to fill out the Day
> & Route fields.");
> }
...
> Ron
...
--
Bob Walton
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 5400
***************************************