[R] Cochran-Mantel-Haenszel test

Peter Langfelder peter.langfelder at gmail.com
Wed Sep 19 01:14:26 CEST 2012


Bert is correct that this is a statistics questions, but I'll throw in
my 2 cents anyway. The CMH test is formulated for count data and makes
certain assumptions on the distribution of the observed values. Since
you don't have count data (your data are not integer), chances are
that the assumptions of the CMH test are not satisfied and you will
get incorrect p-values. Without knowing a bit more how the satellite
behaves (or, in statistical terms, what is the distribution of the
depths and whether it can be approximated by one of the standard
distributions used in statistics) it is, to the best of my knowledge,
impossible to do a meaningful statistical test of the differences you
want to study.

At the very least, instead of producing a summary table of the overall
time the satellite spends at each depth, you should start with the raw
data. That way you can obtain some measure of how variable the
percentage of time spent at each depth is and whether there is some
dependence (if the satellite is at depth 0 at time t, it may be more
likely to be at depth 0 or near 0 at time t+1, right?).

Peter

On Tue, Sep 18, 2012 at 3:46 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> This looks more like a statistics than an R issue. Try posting on
> stats.stackexchange.com, a statistics list, instead.
>
> ALternatively, talk to your local statistician (if there is one).
>
> -- Bert
>
> On Tue, Sep 18, 2012 at 3:02 PM, McPhie, Romney
> <Romney.McPhie at dfo-mpo.gc.ca> wrote:
>> Hello,
>>
>> I have some satellite tag time-at-depth (TAD) frequency data that I
>> would like some help with.
>>
>> The data was transmitted via satellite as percent time spent in each of
>> 7 depth bins (0m, 0-1m, 1-10m, 10-50m etc.), binned over 6-hour
>> intervals.  I categorized each row of data corresponding to a date and
>> time into summer vs. winter, and day vs. night, and then summed and
>> averaged the given % for each depth bin.  My data looks like this (for
>> one individual, HG03):
>>
>> HG03.dat
>>    Season  Time Depth    Sum       Avrg
>> 1    summ   day     0   17.2  0.1702970
>> 2    summ   day     1   23.9  0.2366337
>> 3    summ   day    10  868.5  8.5990099
>> 4    summ   day    50 2698.2 26.7148515
>> 5    summ   day   100  419.7  4.1554455
>> 6    summ   day   200  266.1  2.6346535
>> 7    summ   day   300 1668.6 16.5207921
>> 8    summ   day   500 4138.2 40.9722772
>> 9    summ night     0  283.6  5.7877551
>> 10   summ night     1  229.1  4.6755102
>> 11   summ night    10  479.3  9.7816327
>> 12   summ night    50  761.9 15.5489796
>> 13   summ night   100  235.8  4.8122449
>> 14   summ night   200   40.9  0.8346939
>> 15   summ night   300  763.1 15.5734694
>> 16   summ night   500 2106.1 42.9816327
>> 17   wint   day     0    0.0  0.0000000
>> 18   wint   day     1    0.0  0.0000000
>> 19   wint   day    10    0.0  0.0000000
>> 20   wint   day    50    0.0  0.0000000
>> 21   wint   day   100    7.9  1.1285714
>> 22   wint   day   200   92.1 13.1571429
>> 23   wint   day   300    0.0  0.0000000
>> 24   wint   day   500  600.0 85.7142857
>> 25   wint night     0   43.9  1.7560000
>> 26   wint night     1    0.3  0.0120000
>> 27   wint night    10    0.3  0.0120000
>> 28   wint night    50    0.8  0.0320000
>> 29   wint night   100   10.5  0.4200000
>> 30   wint night   200   51.6  2.0640000
>> 31   wint night   300  411.4 16.4560000
>> 32   wint night   500 1981.2 79.2480000
>>
>> I wanted to test whether significant differences existed between depth
>> in summer vs. winter, and day vs. night, controlling first for season
>> and then for time of day.  I carried out a Cochran-Mantel-Haenszel test,
>> using Average Frequency (Avrg) as the dependent variable (2x2x8
>> contingency table).
>>
>>> ct<-xtabs(Avrg~Time+Depth+Season,data=HG03.dat)
>>> mantelhaen.test(ct)
>>
>>         Cochran-Mantel-Haenszel test
>>
>> data:  ct
>> Cochran-Mantel-Haenszel M^2 = 28.4548, df = 7, p-value = 0.0001818
>>
>>> ct<-xtabs(Avrg~Season+Depth+Time,data=HG03.dat)
>>> mantelhaen.test(ct)
>>
>>         Cochran-Mantel-Haenszel test
>>
>> data:  ct
>> Cochran-Mantel-Haenszel M^2 = 111.5986, df = 7, p-value < 2.2e-16
>>
>> However, I'm not sure if these results are valid, since my raw data is
>> already in frequencies, not in counts.  When I used Sum as the dependent
>> variable, I obtained different results.
>>
>> I am at a loss on how to proceed.  If anyone has any ideas, they would
>> be greatly appreciated.
>>
>> Thanks!
>> Romney
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list