[Rd] Are r2dtable and C_r2dtable behaving correctly?
Martin Maechler
maechler at stat.math.ethz.ch
Fri Aug 25 18:06:53 CEST 2017
>>>>> Peter Dalgaard <pdalgd at gmail.com>
>>>>> on Fri, 25 Aug 2017 11:43:40 +0200 writes:
>> On 25 Aug 2017, at 10:30 , Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>>
> [...]
>> https://stackoverflow.com/questions/37309276/r-r2dtable-contingency-tables-are-too-concentrated
>>
>>
>>> set.seed(1); system.time(tabs <- r2dtable(1e6, c(100, 100), c(100, 100))); A11 <- vapply(tabs, function(x) x[1, 1], numeric(1))
>> user system elapsed
>> 0.218 0.025 0.244
>>> table(A11)
>>
>> 34 35 36 37 38 39 40 41 42 43
>> 2 17 40 129 334 883 2026 4522 8766 15786
>> 44 45 46 47 48 49 50 51 52 53
>> 26850 42142 59535 78851 96217 107686 112438 108237 95761 78737
>> 54 55 56 57 58 59 60 61 62 63
>> 59732 41474 26939 16006 8827 4633 2050 865 340 116
>> 64 65 66 67
>> 38 13 7 1
>>>
>>
>> For a 2x2 table, there's really only one degree of freedom,
>> hence the above characterizes the full distribution for that
>> case.
>>
>> I would have expected to see all possible values in 0:100
>> instead of such a "normal like" distribution with carrier only
>> in [34, 67].
> Hmm, am I missing a point here?
>> round(dhyper(0:100,100,100,100)*1e6)
> [1] 0 0 0 0 0 0 0 0 0 0
> [11] 0 0 0 0 0 0 0 0 0 0
> [21] 0 0 0 0 0 0 0 0 0 0
> [31] 0 0 0 1 4 13 43 129 355 897
> [41] 2087 4469 8819 16045 26927 41700 59614 78694 95943 108050
> [51] 112416 108050 95943 78694 59614 41700 26927 16045 8819 4469
> [61] 2087 897 355 129 43 13 4 1 0 0
> [71] 0 0 0 0 0 0 0 0 0 0
> [81] 0 0 0 0 0 0 0 0 0 0
> [91] 0 0 0 0 0 0 0 0 0 0
> [101] 0
No, you ain't, I was. :-(
Martin
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-devel
mailing list