[R] Off topic --- underdispersed (pseudo) binomial data.

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Fri Mar 26 10:43:29 CET 2021


On 25/03/2021 10:25 p.m., Rolf Turner wrote:
> 
> On Fri, 26 Mar 2021 13:41:00 +1300
> Abby Spurdle <spurdle.a using gmail.com> wrote:
> 
>> I haven't checked this, but I guess that the number of students that
>> *pass* a particular exam/subject, per semester would be like that.
>>
>> e.g.
>> Let's say you have a course in maximum likelihood, that's taught once
>> per year to 3rd year students, and a few postgrads.
>> You could count the number of passes, each year.
>>
>> If you assume a near-constant probability of passing in each
>> exam/semester: Then I would assume it would follow the distribution
>> that you're requesting.
> 
> <SNIP>
> 
> Thanks Abby.  I've experimented (simulated) a wee bit and found
> that if I keep the numbers of students (undergrad and grad) exactly
> constant, then the results are underdispersed.  However if the
> numbers are allowed to vary then the results are overdispersed.
> 
> It seems that the universe is very reluctant to produce underdispersed
> pseudo-binomial data!

I'd expect underdispersion to happen in competitive situations:  if 
subject A succeeds, that makes it less likely that other subjects will 
also succeed.

An extreme case is a contest winner.  With some contests there will 
always be one winner (a little too-underdispersed for you, probably), 
but others allow a small amount of variation.

For example, sports events that allow ties.  This page 
https://en.wikipedia.org/wiki/List_of_ties_for_medals_at_the_Olympics 
seems to indicate that speed skating had a lot of ties up until 1980.

Duncan Murdoch



More information about the R-help mailing list