[R] data analysis
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Aug 7 05:20:35 CEST 2007
On Tue, 7 Aug 2007, ted.harding at nessie.mcc.ac.uk wrote:
> On 06-Aug-07 19:26:59, lamack lamack wrote:
>> Dear all, I have a factorial design where the
>> response is an ordered categorical response.
>>
>> treatment (two levels: 1 and 2)
>> time (four levels: 30, 60,90 and 120)
>> ordered response (0,1,2,3)
>>
>> could someone suggest a correct analysis or some references?
>
> For your data below, I would be inclined to start from here,
> which gives the counts for the different responses:
>
>
> Response
> --------------------
> Trt Time 0 1 2 3
> --------+--------------------+----
> Tr1 30 | 1 3 | 4
> 60 | 2 1 1 | 4
> 90 | 3 1 | 4
> 120 | 3 1 | 4
> --------+--------------------+---
> Tr2 30 | 2 2 | 4
> 60 | 3 1 | 4
> 90 | 3 1 | 4
> 120 | 1 2 1 | 4
> =================================
> Tr1 | 0 9 3 4 | 16
> --------+--------------------+---
> Tr2 | 1 10 2 3 | 16
> =================================
>
> This suggests that, if anything is happening there at all,
> it is a tendency for high response to occur at shorter times,
> and low response at longer times, with little if any difference
> between the treatments.
>
> To approach this formally, I would consider adopting a
> "re-randomisation" approach, re-allocating the outcomes at
> random in such a way as to preserve the marginal totals,
> and evaluating a statistic T, defined in terms of the counts
> and such as to be sensitive to the kind of effect you seek.
>
> Then situate the value of T obtained from the above counts
> within the distribution of T obtained by this re-randomisation.
>
> There must be, somewhere in R, routines which can perform this
> kind of constrained re-randomisation,but I'm not sufficiently
> familiar with that area of R to know for sure about them.
?r2dtable for 2D tables. But there is a classic way to do this without
using randomization and holding the time*treatment marginals fixed:
log-linear models.
> I hope other readers who know about this area in R can come
> up with suggestions!
However, that approach is not taking into account that the response is
ordered. First make sure the variables are factors: here in data frame
'dat'.
dat <- read.table("...", header=TRUE, colClasses="factor")
library(MASS)
summary(polr(response ~ time*treatment, data = dat))
suggests there is nothing very significant here, and dropping the
interaction
> summary(polr(response ~ time+treatment, data = dat))
Re-fitting to get Hessian
Call:
polr(formula = response ~ time + treatment, data = dat)
Coefficients:
Value Std. Error t value
time60 -1.7030709 1.0323027 -1.649779
time90 -2.1833059 1.0959290 -1.992196
time120 -2.7900588 1.1703586 -2.383935
treatment2 -0.8168075 0.7663541 -1.065836
shows a marginal effect of time:
> stepAIC(polr(response ~ time*treatment, data = dat))
selects a model with just 'time' as an explanatory variable.
> anova(polr(response ~ time, dat), polr(response ~ 1, dat))
Likelihood ratio tests of ordinal regression models
Response: response
Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
1 1 29 66.58130
2 time 26 59.68091 1 vs 2 3 6.900383 0.07514162
again suggests that the effect of time is marginal.
References: obviously this is covered in MASS (see the R FAQ).
>
> best wishes,
> Ted.
>
>> subject treatment time response
>> 1 1 30 3
>> 2 1 30 3
>> 3 1 30 1
>> 4 1 30 3
>> 5 1 60 3
>> 6 1 60 1
>> 7 1 60 1
>> 8 1 60 2
>> 9 1 90 2
>> 10 1 90 1
>> 11 1 90 1
>> 12 1 90 1
>> 13 1 120 2
>> 14 1 120 1
>> 15 1 120 1
>> 16 1 120 1
>> 17 2 30 3
>> 18 2 30 3
>> 19 2 30 1
>> 20 2 30 1
>> 21 2 60 1
>> 22 2 60 2
>> 23 2 60 1
>> 24 2 60 1
>> 25 2 90 1
>> 26 2 90 1
>> 27 2 90 1
>> 28 2 90 3
>> 29 2 120 1
>> 30 2 120 2
>> 31 2 120 0
>> 32 2 120 1
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list