[R] Off topic --- underdispersed (pseudo) binomial data.
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Fri Mar 26 10:43:29 CET 2021
On 25/03/2021 10:25 p.m., Rolf Turner wrote:
>
> On Fri, 26 Mar 2021 13:41:00 +1300
> Abby Spurdle <spurdle.a using gmail.com> wrote:
>
>> I haven't checked this, but I guess that the number of students that
>> *pass* a particular exam/subject, per semester would be like that.
>>
>> e.g.
>> Let's say you have a course in maximum likelihood, that's taught once
>> per year to 3rd year students, and a few postgrads.
>> You could count the number of passes, each year.
>>
>> If you assume a near-constant probability of passing in each
>> exam/semester: Then I would assume it would follow the distribution
>> that you're requesting.
>
> <SNIP>
>
> Thanks Abby. I've experimented (simulated) a wee bit and found
> that if I keep the numbers of students (undergrad and grad) exactly
> constant, then the results are underdispersed. However if the
> numbers are allowed to vary then the results are overdispersed.
>
> It seems that the universe is very reluctant to produce underdispersed
> pseudo-binomial data!
I'd expect underdispersion to happen in competitive situations: if
subject A succeeds, that makes it less likely that other subjects will
also succeed.
An extreme case is a contest winner. With some contests there will
always be one winner (a little too-underdispersed for you, probably),
but others allow a small amount of variation.
For example, sports events that allow ties. This page
https://en.wikipedia.org/wiki/List_of_ties_for_medals_at_the_Olympics
seems to indicate that speed skating had a lot of ties up until 1980.
Duncan Murdoch
More information about the R-help
mailing list