[R] Cochran-Mantel-Haenszel test
Peter Langfelder
peter.langfelder at gmail.com
Wed Sep 19 01:14:26 CEST 2012
Bert is correct that this is a statistics questions, but I'll throw in
my 2 cents anyway. The CMH test is formulated for count data and makes
certain assumptions on the distribution of the observed values. Since
you don't have count data (your data are not integer), chances are
that the assumptions of the CMH test are not satisfied and you will
get incorrect p-values. Without knowing a bit more how the satellite
behaves (or, in statistical terms, what is the distribution of the
depths and whether it can be approximated by one of the standard
distributions used in statistics) it is, to the best of my knowledge,
impossible to do a meaningful statistical test of the differences you
want to study.
At the very least, instead of producing a summary table of the overall
time the satellite spends at each depth, you should start with the raw
data. That way you can obtain some measure of how variable the
percentage of time spent at each depth is and whether there is some
dependence (if the satellite is at depth 0 at time t, it may be more
likely to be at depth 0 or near 0 at time t+1, right?).
Peter
On Tue, Sep 18, 2012 at 3:46 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> This looks more like a statistics than an R issue. Try posting on
> stats.stackexchange.com, a statistics list, instead.
>
> ALternatively, talk to your local statistician (if there is one).
>
> -- Bert
>
> On Tue, Sep 18, 2012 at 3:02 PM, McPhie, Romney
> <Romney.McPhie at dfo-mpo.gc.ca> wrote:
>> Hello,
>>
>> I have some satellite tag time-at-depth (TAD) frequency data that I
>> would like some help with.
>>
>> The data was transmitted via satellite as percent time spent in each of
>> 7 depth bins (0m, 0-1m, 1-10m, 10-50m etc.), binned over 6-hour
>> intervals. I categorized each row of data corresponding to a date and
>> time into summer vs. winter, and day vs. night, and then summed and
>> averaged the given % for each depth bin. My data looks like this (for
>> one individual, HG03):
>>
>> HG03.dat
>> Season Time Depth Sum Avrg
>> 1 summ day 0 17.2 0.1702970
>> 2 summ day 1 23.9 0.2366337
>> 3 summ day 10 868.5 8.5990099
>> 4 summ day 50 2698.2 26.7148515
>> 5 summ day 100 419.7 4.1554455
>> 6 summ day 200 266.1 2.6346535
>> 7 summ day 300 1668.6 16.5207921
>> 8 summ day 500 4138.2 40.9722772
>> 9 summ night 0 283.6 5.7877551
>> 10 summ night 1 229.1 4.6755102
>> 11 summ night 10 479.3 9.7816327
>> 12 summ night 50 761.9 15.5489796
>> 13 summ night 100 235.8 4.8122449
>> 14 summ night 200 40.9 0.8346939
>> 15 summ night 300 763.1 15.5734694
>> 16 summ night 500 2106.1 42.9816327
>> 17 wint day 0 0.0 0.0000000
>> 18 wint day 1 0.0 0.0000000
>> 19 wint day 10 0.0 0.0000000
>> 20 wint day 50 0.0 0.0000000
>> 21 wint day 100 7.9 1.1285714
>> 22 wint day 200 92.1 13.1571429
>> 23 wint day 300 0.0 0.0000000
>> 24 wint day 500 600.0 85.7142857
>> 25 wint night 0 43.9 1.7560000
>> 26 wint night 1 0.3 0.0120000
>> 27 wint night 10 0.3 0.0120000
>> 28 wint night 50 0.8 0.0320000
>> 29 wint night 100 10.5 0.4200000
>> 30 wint night 200 51.6 2.0640000
>> 31 wint night 300 411.4 16.4560000
>> 32 wint night 500 1981.2 79.2480000
>>
>> I wanted to test whether significant differences existed between depth
>> in summer vs. winter, and day vs. night, controlling first for season
>> and then for time of day. I carried out a Cochran-Mantel-Haenszel test,
>> using Average Frequency (Avrg) as the dependent variable (2x2x8
>> contingency table).
>>
>>> ct<-xtabs(Avrg~Time+Depth+Season,data=HG03.dat)
>>> mantelhaen.test(ct)
>>
>> Cochran-Mantel-Haenszel test
>>
>> data: ct
>> Cochran-Mantel-Haenszel M^2 = 28.4548, df = 7, p-value = 0.0001818
>>
>>> ct<-xtabs(Avrg~Season+Depth+Time,data=HG03.dat)
>>> mantelhaen.test(ct)
>>
>> Cochran-Mantel-Haenszel test
>>
>> data: ct
>> Cochran-Mantel-Haenszel M^2 = 111.5986, df = 7, p-value < 2.2e-16
>>
>> However, I'm not sure if these results are valid, since my raw data is
>> already in frequencies, not in counts. When I used Sum as the dependent
>> variable, I obtained different results.
>>
>> I am at a loss on how to proceed. If anyone has any ideas, they would
>> be greatly appreciated.
>>
>> Thanks!
>> Romney
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list