[86912] in tlhIngan-Hol
Re: The topic marker -'e'
daemon@ATHENA.MIT.EDU (Tracy Canfield)
Sun Nov 22 12:02:32 2009
In-Reply-To: <249d5b950911220828x176ad975s8f0decb2caedcc2c@mail.gmail.com>
Date: Sun, 22 Nov 2009 12:00:04 -0500
From: Tracy Canfield <toastrix@gmail.com>
To: tlhingan-hol@kli.org
Errors-to: tlhingan-hol-bounce@kli.org
Reply-to: tlhingan-hol@kli.org
2009/11/22 Steven Lytle <lytlesw@gmail.com>:
> It seems that your (or any) MT program should at least attempt to translate
> even ungrammatical utterances.
I actually do take a pass at them after marking them as ungrammatical.
It's still important to distinguish the two - first, because you can
be much more confident about the intended overall meaning of the
grammatical ones, and second, because the grammatical ones are a lot
less unambiguous - you don't have to consider the possibility that a
noun ending in -vaD or -Daq could be the subject.
On the current build, if you take a sentence like
mapum Sor
which I think we can all agree is awful, you get
* fall tree
The * marks it as ungrammatical, but the program makes a try at the
individual words without trying to establish any relationship between
them.
In contrast
ngemDaq pum Sor
returns
The tree falls in the forest
with re-ordering, insertion of appropriate articles and prepositions,
etc. (Plus a gentle reminder on a different line that there are other
legitimate parses because "ngem" and "Sor" could be plural.)
While it might well be worth doing more re-ordering of the
ungrammatical sentences, it's a lower priority than trying to ensure
that if a sentence *is* grammatical, the program can handle it.