[98321] in tlhIngan-Hol
Re: [Tlhingan-hol] Certification Test Woes
daemon@ATHENA.MIT.EDU (d'Armond Speers, Ph.D.)
Tue Apr 1 12:06:11 2014
In-Reply-To: <533A6A31.1090206@gmx.de>
Date: Tue, 1 Apr 2014 10:05:54 -0600
From: "d'Armond Speers, Ph.D." <speersd@georgetown.edu>
To: tlhIngan-Hol List <tlhingan-hol@kli.org>
Errors-To: tlhingan-hol-bounces@kli.org
--===============8489678825731779210==
Content-Type: multipart/alternative; boundary=047d7b677384c75be904f5fd5780
--047d7b677384c75be904f5fd5780
Content-Type: text/plain; charset=ISO-8859-1
On Tue, Apr 1, 2014 at 1:26 AM, Lieven <levinius@gmx.de> wrote:
>
> Am 01.04.2014 04:39, schrieb d'Armond Speers, Ph.D.:
>
> questions on any given test by weight/content wouldn't interfere with
>> the randomization, making certain questions more or less likely to
>> appear on a test. I'm open to suggestions on whether this is an issue
>>
>
> You have prbably thought about it already, but can't the weight of a
> quetion be attached to the number of words or syllables in the answer?
>
> e.g
> translate "shoe" - answer = 1 point
> translate "where is the bathroom" - answer = 6 points
>
> This may not always work, but can make some more difference.
Well, if we were to go with an approach like this I wouldn't count
syllables, but morphemes. No reason that "bathroom" should have a higher
value than "loo". The questions are (a) how do you reliably calculate this
value; and (b) what do you do with it?
For (a) how do you reliably calculate the content value, the problem is
that we're talking about the content value of the expected answer, not the
question itself. I didn't want to just have every question be a "translate
this sentence" type of question; we also have "fill in the blank" and other
types of direct questions ("what is the subject and object indicated by
this verb prefix?"), which are typically low-content-value answers. For
the translation type questions, the student is free to translate however
they like, even though the possibilities are still pretty few in the Level
1 test. Just because I think they may use 4 morphemes in their answer
doesn't preclude them using 8 (with twice as many opportunities for
errors). And each question isn't just testing a single grammar point; some
are testing multiple topics at once. We do define the expected answer (as
a benefit to the one grading the test, not as a hard-and-fast right/wrong
test), so we could just use that as an estimate and call it good enough.
Or should we also take into account the number of topics associated with
each question? See, this is complicated.
For (b) what do you do with it, there are two general ways to approach
this. (Well, three, if you count the way we did it, which is to allow
randomization to take care of it.) You can make all expected answers have
the same amount of content (very hard, and doesn't permit direct
questions), or you can measure the content of expected answers and
establish some heuristic for how many questions with each content value to
include on a test and ensuring that each test has the same content value
total across 20 questions. You would probably accomplish this by ensuring
that each topic listed in the guidelines for that level had the same number
of questions for each content value, which would mean greatly expanding the
size of the test bank. Defining that heuristic is an empirical question on
test design (I remember doing this stuff in college and it was tedious!),
so not my preference to undertake the task.
My preference was/is not to make all of the questions uniform. I could
easily come up with a test bank of 500 questions just asking to translate
each vocabulary term, but that's (a) boring and (b) not really measuring
their practical skill with the language. Some questions are directly about
grammar, some are translation (simultaneously evaluating their grasp of
grammar and vocabulary), and some are on some specific topic, like the
distinction between {ghobe'} and {Qo'}. Honestly, when considering the
question of how to create questions that were meaningful, interesting, and
sufficiently varied to account for the range of language use, while
maintaining balance across the topics that we identified in the guidelines,
I didn't think it was practical to take into account the content value of
the expected answer as well, not without having to increase the size of the
test bank considerably. It was already a huge effort, so I didn't think
that was practical.
Ah, memories. :) veqlargh is always in the details. Am I over-thinking
this? Is there a simpler way? And if not, is the original problem serious
enough to warrant this level of test re-design?
--Holtej
--047d7b677384c75be904f5fd5780
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><br><div class=3D"gmail_extra"><div class=3D"gmail_quote">=
On Tue, Apr 1, 2014 at 1:26 AM, Lieven <span dir=3D"ltr"><<a href=3D"mai=
lto:levinius@gmx.de" target=3D"_blank">levinius@gmx.de</a>></span> wrote=
:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:so=
lid;padding-left:1ex">
<br>
Am 01.04.2014 04:39, schrieb d'Armond Speers, Ph.D.:<div class=3D""><br=
>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
questions on any given test by weight/content wouldn't interfere with<b=
r>
the randomization, making certain questions more or less likely to<br>
appear on a test. =A0I'm open to suggestions on whether this is an issu=
e<br>
</blockquote>
<br></div>
You have prbably thought about it already, but can't the weight of a qu=
etion be attached to the number of words or syllables in the answer?<br>
<br>
e.g<br>
translate "shoe" - answer =3D 1 point<br>
translate "where is the bathroom" - answer =3D 6 points<br>
<br>
This may not always work, but can make some more difference.</blockquote><d=
iv><br></div><div>Well, if we were to go with an approach like this I would=
n't count syllables, but morphemes. =A0No reason that "bathroom&qu=
ot; should have a higher value than "loo". =A0The questions are (=
a) how do you reliably calculate this value; and (b) what do you do with it=
?</div>
<div><br></div><div>For (a) how do you reliably calculate the content value=
, the problem is that we're talking about the content value of the expe=
cted answer, not the question itself. =A0I didn't want to just have eve=
ry question be a "translate this sentence" type of question; we a=
lso have "fill in the blank" and other types of direct questions =
("what is the subject and object indicated by this verb prefix?")=
, which are typically low-content-value answers. =A0For the translation typ=
e questions, the student is free to translate however they like, even thoug=
h the possibilities are still pretty few in the Level 1 test. =A0Just becau=
se I think they may use 4 morphemes in their answer doesn't preclude th=
em using 8 (with twice as many opportunities for errors). =A0And each quest=
ion isn't just testing a single grammar point; some are testing multipl=
e topics at once. =A0We do define the expected answer (as a benefit to the =
one grading the test, not as a hard-and-fast right/wrong test), so we could=
just use that as an estimate and call it good enough. =A0Or should we also=
take into account the number of topics associated with each question? =A0S=
ee, this is complicated.</div>
<div><br></div><div>For (b) what do you do with it, there are two general w=
ays to approach this. =A0(Well, three, if you count the way we did it, whic=
h is to allow randomization to take care of it.) =A0You can make all expect=
ed answers have the same amount of content (very hard, and doesn't perm=
it direct questions), or you can measure the content of expected answers an=
d establish some heuristic for how many questions with each content value t=
o include on a test and ensuring that each test has the same content value =
total across 20 questions. =A0You would probably accomplish this by ensurin=
g that each topic listed in the guidelines for that level had the same numb=
er of questions for each content value, which would mean greatly expanding =
the size of the test bank. =A0Defining that heuristic is an empirical quest=
ion on test design (I remember doing this stuff in college and it was tedio=
us!), so not my preference to undertake the task.</div>
<div><br></div><div>My preference was/is not to make all of the questions u=
niform. =A0I could easily come up with a test bank of 500 questions just as=
king to translate each vocabulary term, but that's (a) boring and (b) n=
ot really measuring their practical skill with the language. =A0Some questi=
ons are directly about grammar, some are translation (simultaneously evalua=
ting their grasp of grammar and vocabulary), and some are on some specific =
topic, like the distinction between {ghobe'} and {Qo'}. =A0Honestly=
, when considering the question of how to create questions that were meanin=
gful, interesting, and sufficiently varied to account for the range of lang=
uage use, while maintaining balance across the topics that we identified in=
the guidelines, I didn't think it was practical to take into account t=
he content value of the expected answer as well, not without having to incr=
ease the size of the test bank considerably. =A0It was already a huge effor=
t, so I didn't think that was practical.<br>
</div><div><br></div><div>Ah, memories. =A0:) =A0veqlargh is always in the =
details. =A0Am I over-thinking this? =A0Is there a simpler way? =A0And if n=
ot, is the original problem serious enough to warrant this level of test re=
-design?</div>
<div><br></div><div>--Holtej</div></div></div></div>
--047d7b677384c75be904f5fd5780--
--===============8489678825731779210==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
Tlhingan-hol mailing list
Tlhingan-hol@kli.org
http://mail.kli.org/mailman/listinfo/tlhingan-hol
--===============8489678825731779210==--