[32521] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3786 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Sep 28 18:09:20 2012

Date: Fri, 28 Sep 2012 15:09:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 28 Sep 2012     Volume: 11 Number: 3786

Today's topics:
    Re: Database usage best practices <cwilbur@chromatico.net>
    Re: Database usage best practices <gogala.mladen@gmail.com>
    Re: Database usage best practices <rweikusat@mssgmbh.com>
    Re: Database usage best practices <rweikusat@mssgmbh.com>
    Re: Database usage best practices <rweikusat@mssgmbh.com>
    Re: Database usage best practices <kaz@kylheku.com>
    Re: Database usage best practices <rweikusat@mssgmbh.com>
    Re: Database usage best practices <gogala.mladen@gmail.com>
    Re: Database usage best practices <gogala.mladen@gmail.com>
    Re: Database usage best practices <rweikusat@mssgmbh.com>
    Re: Database usage best practices <kaz@kylheku.com>
    Re: Database usage best practices <kaz@kylheku.com>
    Re: Database usage best practices <kaz@kylheku.com>
        HTML::TableExtract w. perl 5.10 <markoriedelde@yahoo.de>
    Re: HTML::TableExtract w. perl 5.10 <ben@morrow.me.uk>
    Re: HTML::TableExtract w. perl 5.10 <markoriedelde@yahoo.de>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 27 Sep 2012 14:45:52 -0400
From: Charlton Wilbur <cwilbur@chromatico.net>
Subject: Re: Database usage best practices
Message-Id: <87a9wbticf.fsf@new.chromatico.net>

>>>>> "RW" == Rainer Weikusat <rweikusat@mssgmbh.com> writes:

    RW> I strongly disagree with the opinion that it would be 'best
    RW> practice' to treat a RDBMS as 'dumb' system for storing
    RW> structured data in binary files and reimplement all the features
    RW> it already has in application code on top of it. 

Point the 1st: it depends very much on the DB.  If you're using MySQL,
the only sane path is to treat it as a dumb system and reimplement the
features you need on top of it. 

Point the 2nd: you're treating this as a black and white issue.  It's
not whether an ORM layer is saintly or wicked, it's what costs you pay
in terms of performance and maintainability versus what benefits you get
in terms of expressivity, error-resistance, and risk of change when you
include an ORM layer.

In my experience, if you have an application large enough to consider in
terms of MVC that uses a database as a data store in any but the most
trivial of manners, you are a moron if you *don't* use an ORM layer.  In
particular, this architecture allows new features in the application to
be developed entirely in software, and then once the desired behavior is
understood and stable, the logic can be pushed down into the database.
The risk and cost of change at levels higher than the ORM layer is much
lower than the risk and cost of change at levels below the ORM layer,
and this architecture offers a way to mitigate that risk.

    RW> This rather strikes me as 'back to the 1960!' idea which likely
    RW> comes from the fact that RDBMSes, originally supposed to enable
    RW> people to perform operations on datasets without having to learn
    RW> programming in some imperative language first, were so
    RW> successful that people who already know how to program in an
    RW> imperative language are more or less forced to use them but - of
    RW> course - they don't have the slightest interest in actually
    RW> learning how to so do efficiently, especially if this means
    RW> 'learning a second' (or even more than 'the second') other
    RW> programming language.

I'm sure waving your dick around like this is very pleasant, but please
tuck it back in your pants; people are pointing and laughing.

Charlton


-- 
Charlton Wilbur
cwilbur@chromatico.net


------------------------------

Date: Thu, 27 Sep 2012 22:38:38 +0000 (UTC)
From: Mladen Gogala <gogala.mladen@gmail.com>
Subject: Re: Database usage best practices
Message-Id: <pan.2012.09.27.22.37.45@gmail.com>

On Thu, 27 Sep 2012 14:45:52 -0400, Charlton Wilbur wrote:

> Point the 1st: it depends very much on the DB.  If you're using MySQL,
> the only sane path is to treat it as a dumb system and reimplement the
> features you need on top of it.

Why? MySQL also supports triggers and procedures, as far as I am aware 
of. Granted, there is a big difference between MySQL and DB2 or Oracle, 
but it is far from being a dumb system.


> 
> Point the 2nd: you're treating this as a black and white issue.  It's
> not whether an ORM layer is saintly or wicked, it's what costs you pay
> in terms of performance and maintainability versus what benefits you get
> in terms of expressivity, error-resistance, and risk of change when you
> include an ORM layer.


How do you measure expressivity and error resistance? How would you even 
define expressivity? I am not sure what is the connection between risk 
and ORM, so I will not comment on this one. Performance gains are 
measurable. There is no way around it, ORM layer usually has detrimental 
effect to the application performance, when the database is involved. 
Every DBA will tell you that and I've been one for the last quarter of 
century. As opposed to "expressivity" (my spelling checker is flagging 
the word!) monetary cost of the lost performance can be measured.



> 
> In my experience, if you have an application large enough to consider in
> terms of MVC that uses a database as a data store in any but the most
> trivial of manners, you are a moron if you *don't* use an ORM layer.  In

Oh yes, ORM layer is a good thing, it enables to program very quickly and 
generates classes for you. ORM is usually accompanied by an application 
generator, like Groovy on Grails (Hibernate) which means that the 
projects will be done much more quickly. However, if you don't modify the 
database layer, it means that the project is done too quickly. 
Also, there is the problem of consistency: there is no guarantee that two 
applications will treat the same table in the same way. The only way to 
enforce the business rules consistently across the applications is to 
move them into the database. 


> particular, this architecture allows new features in the application to
> be developed entirely in software, 

As opposed to MySQL which is developed in hardware?

> and then once the desired behavior is
> understood and stable, the logic can be pushed down into the database.
> The risk and cost of change at levels higher than the ORM layer is much
> lower than the risk and cost of change at levels below the ORM layer,
> and this architecture offers a way to mitigate that risk.

How do you measure the risk? Can you prove your statement? Why would the 
risk be significantly lower?



> I'm sure waving your dick around like this is very pleasant, but please
> tuck it back in your pants; people are pointing and laughing.
> 
> Charlton

This is childish, rude and completely unprovoked. If this was supposed to 
be funny, I fail to see the humor. I wonder whether responding to your 
post is worth the effort.



-- 
Mladen Gogala
The Oracle Whisperer
http://mgogala.byethost5.com


------------------------------

Date: Fri, 28 Sep 2012 10:58:35 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Database usage best practices
Message-Id: <87r4pm8o50.fsf@sapphire.mobileactivedefense.com>

Charlton Wilbur <cwilbur@chromatico.net> writes:

[...]

> I'm sure waving your dick around like this is very pleasant, but please
> tuck it back in your pants; people are pointing and laughing.

I would like to add an additional metapoint: The opinions of worthless
people are worthless.



------------------------------

Date: Fri, 28 Sep 2012 11:50:37 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Database usage best practices
Message-Id: <87lifu8lqa.fsf@sapphire.mobileactivedefense.com>

Mladen Gogala <gogala.mladen@gmail.com> writes:
> On Thu, 27 Sep 2012 14:45:52 -0400, Charlton Wilbur wrote:
>> Point the 2nd: you're treating this as a black and white issue.  It's
>> not whether an ORM layer is saintly or wicked, it's what costs you pay
>> in terms of performance and maintainability versus what benefits you get
>> in terms of expressivity, error-resistance, and risk of change when you
>> include an ORM layer.
>
> How do you measure expressivity and error resistance? How would you even 
> define expressivity? I am not sure what is the connection between risk 
> and ORM, so I will not comment on this one.

I don't know what 'expressivity' is supposed to be but I have some
ideas of 'error resistance' for program interacting with databases:
Usually, this means catching database errors and deciding on a
sensible recovery strategy. This is especially important for programs
which are supposed to run continously without supervision. In Perl,
this relatively easy: DBI supports defining a database error
handler. This handler can throw an exception (meaning, an exception
object containing information about the error, not something like
"Can't database!"). This exception can be caught at a suitable place
in the application code. Provided the problem seems transient, the
usually sensible way to deal with that it to disconnect from the
database, reconnect and re-execute the query at the front of the query
queue. This is not anyhow 'magically' tied to 'ORM layers' and not
really difficult to implement.

In simpler cases, it is sufficient to fork the 'worker process' (this
job can also be performed by a special supervisor program) and do the
actual work in the child. Should the child hit a transient problem, it
exits with a non-zero exit code and after seeing that, the parent
simply forks again and the procedure starts over (I know this is
difficult to imagine but 'fork' is actually a useful primitive, not
just an encumbrance on the way to 'execute another program' ...).

[...]

>> In my experience, if you have an application large enough to consider in
>> terms of MVC that uses a database as a data store in any but the most
>> trivial of manners, you are a moron if you *don't* use an ORM layer.  In
>
> Oh yes, ORM layer is a good thing, it enables to program very quickly and 
> generates classes for you. ORM is usually accompanied by an application 
> generator, like Groovy on Grails (Hibernate) which means that the 
> projects will be done much more quickly.

This is simply not true: Over the lifetime of a program, the amortized
impplementation cost of every infrastructure facility contained in it
is zero provided that it actually works and works as it would need to
work. As a practical example, the largest Perl program I'm currently
working with is just about to break through the 10,000 LOC barrier
(I've done a couple of large ones in the past). This includes a
database interface layer designed to be suitable for this particular
program and writing that has maybe costed me something like a week
back in 2010. Presently, it amounts to 6% of the code (678 lines) and
something like this is by far too trivial to warrant downloading
seriously large amounts of unknown code 'from the internet'. In
exchange for this minor effort (and some assorted fixes/ enhancements
to the DBD::Pg module), this has been working 24x7 in dozens of
installations world-wide (which is not much in absolute terms but a
fair lot for a single person to support) without ever causing the
slightest headache to me or the people who use this code. In contrast
to that, 'the Java team' (larger than 1 person) uses Hibernate and
they've certainly meanwhile spend as much time as I spent writing this
code with debugging, investigating and working-around bugs/ quirks in
the ORM. These chores are still ongoing and in exchange for that, they
got a technically inferior solution both from an 'error resilience'
and a user experience standpoint (of course, it can be argued that the
solution to any problem really ought to be 'download more stuff from
the internet' :-).


------------------------------

Date: Fri, 28 Sep 2012 12:41:27 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Database usage best practices
Message-Id: <87k3vewf14.fsf@sapphire.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mssgmbh.com> writes:

[Hibernate]

> in exchange for that, they got a technically inferior solution both
> from an 'error resilience' and a user experience standpoint (of
> course, it can be argued that the solution to any problem really
> ought to be 'download more stuff from the internet' :-).

Some details on this:

By default, the ORM which happens to be used here does not support
reconnecting to the database after an existing connection was severed
for whatever reason. Should this happen, the UI code becomes dead in
water until the corresponding process is restarted. Meanwhile, this
default policy has been changed and the program survives TCP-related
malaises. But this didn't happen until it caused problems for
customers and understandingly so: The theory behind using third-party
developed infrastructure code is that the people who did that are 'the
experts' in this domain and that a poor application programmer should
be happy that he can concentrate on his problems instead of having to
learn how to tame 'wild beasts' like a RDBMS. Because of this, people
who use this code naturally tend to leaving 'the defaults' alone.

Also by default (or at least when being used in the 'obvious,
straight-forward way') the ORM performs database operations
synchronously and 'the user session' becomes unresponsive until they
are completed. In contrast to this, the Perl program I mentioned in
the previous posting has been designed as 'traditional'
single-threaded, event-driven UNIX(*) process whose 'processing core'
is an I/O multiplexing loop (since it has lots of other stuff to do
than interact with the database). Because of this, it uses the
PostgreSQL asynchronous query processing interface. At the application
level, this means that a query is 'started' based on a query object
and some parameters for this particular invocation and once it has
been completed, a suitable 'continuation' will be invoked (while I
still think the way 'continuations' are supposed to work in scheme is
completely mad, the concept/ term itself is IMHO very useful). Since
Perl supports closures, this is an easy thing to do. The same can very
likely be accomplished with 'the ORM' as well, at the very least by
employing threading in a suitable way, but it wasn't done: The
difference is really that I started with designing how 'the program',
considering its purpose, was generally supposed to operate and then
started looking for already existing code which could be helpful for
accomplishing this instead of starting with "download what everybody
else uses" and then code according the properties it happens to have.

The morale of this is that 'complex abstraction layers' are only
really useful when the people using them already have the knowledge
and skills they are not supposed to need because of the abstraction
layer.

A note the 'dicksizing' auto-responder: I understand that you are
usually motivated by showing off, however, this is not necessarily
true for everyone else. I try to avoid using my own experiences as
an example as hard as possible exactly to avoid soliciting these kind
of useless retorts. But this isn't always possible and I'm not really
convinced that it is sensible.


------------------------------

Date: Fri, 28 Sep 2012 18:12:30 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Database usage best practices
Message-Id: <20120928105534.118@kylheku.com>

On 2012-09-27, Mladen Gogala <gogala.mladen@gmail.com> wrote:
> How do you measure expressivity and error resistance? How would you even 
> define expressivity?

Expressivity is the extent to which you can express the solution to a problem
in terms of only the symbols and language of the problem domain. Let us
call those relevant symbols.

So for instance if we're solving some problem in linguistics, but stuff like
"malloc(sizeof *node * n)" occurs in the solution, then that takes away
from the expressivity. The entity "node" perhaps represents something in the
problem domain, and perhaps n also (being the number of such things), but
malloc, sizeof, and the * dereferencing operator are three irrelevant symbols,
and so is the * multiplicative operator, because it's involved in a memory
management calculation which is irrelevant to the problem domain.

To obtain a measure of expressivity, we could count the total number of symbols
in the code, excluding any punctuation. That yields a denominator by which we
can divide the number of just the relevant symbols to obtain a fraction.

Expressivity is, intuitively, a very real, concrete concept which we can
readily recognize, and that tells us we can put a number on it, just like we
can put a number on how funny a joke is, or how moving is a symphony.


------------------------------

Date: Fri, 28 Sep 2012 19:48:37 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Database usage best practices
Message-Id: <87sja27zlm.fsf@sapphire.mobileactivedefense.com>

Kaz Kylheku <kaz@kylheku.com> writes:
> On 2012-09-27, Mladen Gogala <gogala.mladen@gmail.com> wrote:
>> How do you measure expressivity and error resistance? How would you even 
>> define expressivity?
>
> Expressivity is the extent to which you can express the solution to a problem
> in terms of only the symbols and language of the problem domain. Let us
> call those relevant symbols.
>
> So for instance if we're solving some problem in linguistics, but stuff like
> "malloc(sizeof *node * n)" occurs in the solution, then that takes away
> from the expressivity. The entity "node" perhaps represents something in the
> problem domain, and perhaps n also (being the number of such things), but
> malloc, sizeof, and the * dereferencing operator are three irrelevant symbols,
> and so is the * multiplicative operator, because it's involved in a memory
> management calculation which is irrelevant to the problem domain.

It is irrelevant 'to the problem domain' because 'the problem' is
different from the 'the solution'. Because of this, a problem can be
stated abstractly while any technical solution to the problem will
need to make use of existing tools suitable for solving problems. Eg,
let's assume the problem is "I want a coffee!". Neither the ground
coffee beans nor the filter nor the water kettle, the water itself,
the desk all of this is placed on, the material which was used to
construct this desk nor even the coffeepot I employ in order to drink
the result has any 'problem domain' relation to the problem
itself. They are just necessary (or helpful) parts of a technical
solution to it.








------------------------------

Date: Fri, 28 Sep 2012 19:16:19 +0000 (UTC)
From: Mladen Gogala <gogala.mladen@gmail.com>
Subject: Re: Database usage best practices
Message-Id: <pan.2012.09.28.19.15.27@gmail.com>

On Fri, 28 Sep 2012 11:50:37 +0100, Rainer Weikusat wrote:

> This is simply not true: Over the lifetime of a program, the amortized
> impplementation cost of every infrastructure facility contained in it is
> zero provided that it actually works and works as it would need to work.

Rainer, I agree with you about the DB usage, I disagree about the ORM, 
but this is a Perl group. This is not the right place to debate databases 
and ORM implementations, at least not in my humble opinion. 



-- 
Mladen Gogala
The Oracle Whisperer
http://mgogala.byethost5.com


------------------------------

Date: Fri, 28 Sep 2012 19:21:12 +0000 (UTC)
From: Mladen Gogala <gogala.mladen@gmail.com>
Subject: Re: Database usage best practices
Message-Id: <pan.2012.09.28.19.20.20@gmail.com>

On Fri, 28 Sep 2012 18:12:30 +0000, Kaz Kylheku wrote:

> Expressivity is the extent to which you can express the solution to a
> problem in terms of only the symbols and language of the problem domain.
> Let us call those relevant symbols.

Thanks for the nice explanation. I haven't heard about the expressivity 
before, my spelling checker is still flagging it out. What does 
expressivity show me and what can I use it for? May I conclude anything 
about the program unit being studied, based on its expressivity?



-- 
Mladen Gogala
The Oracle Whisperer
http://mgogala.byethost5.com


------------------------------

Date: Fri, 28 Sep 2012 20:22:48 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Database usage best practices
Message-Id: <87obkq7y0n.fsf@sapphire.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mssgmbh.com> writes:

[...]

> let's assume the problem is "I want a coffee!". Neither the ground
> coffee beans nor

I was actually being to conservative: As demonstrated by
'  !Iaaceeffnotw', stating the problem already requires a lot of things
with absolutely no inherent relation to it. Not even the morphemes or
the sounds the spoken sentence would be composed have any: All just
perfectly abitrary conventions forced onto the poor users of language
because of 'technicalities' of the solution domain.



------------------------------

Date: Fri, 28 Sep 2012 19:26:03 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Database usage best practices
Message-Id: <20120928121201.275@kylheku.com>

On 2012-09-28, Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> Kaz Kylheku <kaz@kylheku.com> writes:
>> On 2012-09-27, Mladen Gogala <gogala.mladen@gmail.com> wrote:
>>> How do you measure expressivity and error resistance? How would you even 
>>> define expressivity?
>>
>> Expressivity is the extent to which you can express the solution to a problem
>> in terms of only the symbols and language of the problem domain. Let us
>> call those relevant symbols.
>>
>> So for instance if we're solving some problem in linguistics, but stuff like
>> "malloc(sizeof *node * n)" occurs in the solution, then that takes away
>> from the expressivity. The entity "node" perhaps represents something in the
>> problem domain, and perhaps n also (being the number of such things), but
>> malloc, sizeof, and the * dereferencing operator are three irrelevant symbols,
>> and so is the * multiplicative operator, because it's involved in a memory
>> management calculation which is irrelevant to the problem domain.
>
> It is irrelevant 'to the problem domain' because 'the problem' is
> different from the 'the solution'. Because of this, a problem can be
> stated abstractly while any technical solution to the problem will
> need to make use of existing tools suitable for solving problems. Eg,
> let's assume the problem is "I want a coffee!". Neither the ground
> coffee beans nor the filter nor the water kettle, the water itself,
> the desk all of this is placed on, the material which was used to
> construct this desk nor even the coffeepot I employ in order to drink
> the result has any 'problem domain' relation to the problem
> itself. They are just necessary (or helpful) parts of a technical
> solution to it.

Those things may be necessary and helpful, but they are not expressive.
(Well, they are "espressive", haha.)

The details of the process for brewing coffee is irrelevant to the problem
domain of "I would like a coffee".  The relevant concepts are: are what kind of
coffee I want: latte, cappucino, black. How dark a roast, how much sugar, if
any and so on.

The barista at the coffee shop speaks more or less the relevant language of
this problem domain, as does any well-designed coffee dispensing automaton.
They hide the irrelevant details from the consumer, giving us an abstract
cofee-making interface whereby we can "program" the coffee that we want
in "coffee shop lingo".

In computing, we can similarly hide the necessary and helpful, but irrelevant
concepts.  Those are not always bits, bytes, pointers and memory management.

For instance "Low pass filter these samples with a cutoff frequency of
4000 Hz, rolling off at 18 db per octave" is more expressive than constructing
a matrix full of coefficients and then explicitly munging the data
trhough it. 


------------------------------

Date: Fri, 28 Sep 2012 20:54:54 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Database usage best practices
Message-Id: <20120928123950.836@kylheku.com>

On 2012-09-28, Mladen Gogala <gogala.mladen@gmail.com> wrote:
> On Fri, 28 Sep 2012 18:12:30 +0000, Kaz Kylheku wrote:
>
>> Expressivity is the extent to which you can express the solution to a
>> problem in terms of only the symbols and language of the problem domain.
>> Let us call those relevant symbols.
>
> Thanks for the nice explanation. I haven't heard about the expressivity 
> before, my spelling checker is still flagging it out. What does 
> expressivity show me and what can I use it for? May I conclude anything 
> about the program unit being studied, based on its expressivity?

Yes. For example, if you have sufficient expressivity, there isn't much
difference between the specification of a problem and the solution. And, of
course, correctness means that the solution implements the specification of the
problem, so if they are not so different, correctness is easier to verify.

So certain conclusions are easier to obtain from expressive code. But not
necessarily others. If you require conclusions about the details which
are hidden, that may be difficult to impossible. 

For instance, suppose you want reach a conclusion about how much memory the
solution requires, but there is no explicit memory management. Then you have to
infer it somehow by knowing how it works "under the hood", or rely on
profiling tools.


------------------------------

Date: Fri, 28 Sep 2012 21:06:25 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Database usage best practices
Message-Id: <20120928135626.391@kylheku.com>

On 2012-09-28, Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>
> [...]
>
>> let's assume the problem is "I want a coffee!". Neither the ground
>> coffee beans nor
>
> I was actually being to conservative: As demonstrated by
> '  !Iaaceeffnotw', stating the problem already requires a lot of things
> with absolutely no inherent relation to it.

Well, you need a language. You need symbols, and the representation of
symbols and how they map to concepts is arbitrary.

But a solution using one symbols is isomorphic to another one using
different symbols for the same thing. (It can be symbol-for-symbol isomorphic.)

If you don't understand the conventions of the language, then an utterance may
not look like gibberish to you, but that's not the same thing as lacking
inherent expressivity.

Expressivity *demands* reliance on material that has been internalized between
the originator of the message and the recipient: some common language,
common understanding of domain abstractions and so on.

Expressivity means putting less information in, and relying on "exformation".
(I didn't make up that word and I'm using it consistently with:
http://en.wikipedia.org/wiki/Exformation)

> Not even the morphemes or
> the sounds the spoken sentence would be composed have any: All just

For the purpose of expressivity, we don't look at morphemes unless they act as
independent symbols. Morphemes which are just fragments of the representation
of a symbol are uninteresting, because symbols are atoms, and could be
replaced by other atoms in a way that preserves meaning, if the replacement
is consistent.

The pattern of morphemes is valuable only to the extent that when it occurs
twice in the utterance (and in the same context), it refers to the same thing.


------------------------------

Date: Fri, 28 Sep 2012 01:38:41 +0200
From: Marko Riedel <markoriedelde@yahoo.de>
Subject: HTML::TableExtract w. perl 5.10
Message-Id: <871uhn11fi.fsf@yahoo.de>


Greetings to all.

the following issue does not  occur with perl 5.12, unfortunately I have
to work with 5.10 at my installation and I don't have the administration
rights just to upgrade my perl.

The version is:
    "This is perl, v5.10.0 built for x86_64-linux-gnu-thread-multi".

I am trying to use HTML::TableExtract on an ISO-8859-1 encoded file. The
extraction works, the data are precisely what I want, but I always get a
warning, namely that "Parsing of  undecoded UTF-8 will give garbage when
decoding entities".

Is there anything other than turning  warnings off locally that I can do
to  supress   this  warning?   Or  does  this   module  not   work  with
latin1-encoded data? I also tried invoking utf8_mode(0), to no avail.

My version of HTML::Parser is 3.69 and of HTML::TableExtract 2.10.

Best regards,

Marko Riedel


------------------------------

Date: Fri, 28 Sep 2012 04:18:27 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: HTML::TableExtract w. perl 5.10
Message-Id: <3hhfj9-n4c2.ln1@anubis.morrow.me.uk>


Quoth Marko Riedel <markoriedelde@yahoo.de>:
> 
> I am trying to use HTML::TableExtract on an ISO-8859-1 encoded file. The
> extraction works, the data are precisely what I want, but I always get a
> warning, namely that "Parsing of  undecoded UTF-8 will give garbage when
> decoding entities".
> 
> Is there anything other than turning  warnings off locally that I can do
> to  supress   this  warning?   Or  does  this   module  not   work  with
> latin1-encoded data? I also tried invoking utf8_mode(0), to no avail.

Please post a *minimal* example of a program which exhibits this
behaviour.

Ben



------------------------------

Date: Fri, 28 Sep 2012 23:03:37 +0200
From: Marko Riedel <markoriedelde@yahoo.de>
Subject: Re: HTML::TableExtract w. perl 5.10
Message-Id: <87bogplv12.fsf@yahoo.de>

Ben Morrow <ben@morrow.me.uk> writes:

> Quoth Marko Riedel <markoriedelde@yahoo.de>:
>> 
>> I am trying to use HTML::TableExtract on an ISO-8859-1 encoded file. The
>> extraction works, the data are precisely what I want, but I always get a
>> warning, namely that "Parsing of  undecoded UTF-8 will give garbage when
>> decoding entities".
>> 
>> Is there anything other than turning  warnings off locally that I can do
>> to  supress   this  warning?   Or  does  this   module  not   work  with
>> latin1-encoded data? I also tried invoking utf8_mode(0), to no avail.
>
> Please post a *minimal* example of a program which exhibits this
> behaviour.
>
> Ben

Greetings.

I will work  on that, it's not  that easy as the program  is complex. In
the meantime,  does anyone  know how to  get HTML::Parser to  output the
position and the value of the offending byte sequence? I installed it in
my home directory  so I can modify the source  if necessary. The warning
is easy to find.

This looks tricky. Like I mentioned the code works fine with Perl 5.12.

Regards,

Marko


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3786
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32521] in Perl-Users-Digest

Perl-Users Digest, Issue: 3786 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Sep 28 18:09:20 2012

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Sep 28 18:09:20 2012