[R] How to solve: Error with Anova {car} due to "deficient rank" ?

peter dalgaard pdalgd at gmail.com
Thu May 6 15:47:37 CEST 2010

On May 6, 2010, at 1:42 PM, Tal Galili wrote:

> Hi Joris,
> Thank you for taking the time to answer.
> This data is of a test done for 39 subjects (from 2 groups) over 12 weeks.
> And the questions I would like to answer are:
> 1) Did the test results changed over time?
> 2) Did the group effected the test results?
> 3) Did the effect of time differ for each group?
> I understand that the general limitation of using repeated measures anova
> here is (obviously) that even if one get's a significant "effect" of time,
> the analysis doesn't give any clue as to how time influences the test (the
> same goes for the interaction term).
> But a more appropriate tool would probably be some sort of GAM lm, which is
> based on models I don't have much understanding of (yet).
> I am using this test since the researcher for whom I am doing the analysis
> asked me to use it (since this is what was done in the previous work on
> similar data, done by someone else).
> Due to the current stage of my ignorance, and the researchers tendency
> towards this analysis - I am not sure how to proceed.

You may be able to get through with anova.mlm (little-a anova) and sphericity assumptions. However, I wouldn't trust the results.

These data are nowhere near normally distributed, and with the size of the data set and the pattern of many series of straight 4s, I don't think anyone has a chance of figuring out how this affects the p-values.

I'd rather do something like this (with the original "dat", before jittering):

First look at the average patterns per group:

> aggregate(dat[-1],dat[1],mean)
          DC week6 week7    week8    week9   week10   week11
1    control     4     4 4.000000 3.900000 3.900000 3.900000
2 head (20g)     4     4 3.894737 3.789474 3.736842 3.736842
    week12   week13   week14   week15   week16   week17
1 3.900000 3.900000 3.900000 3.850000 3.850000 3.750000
2 3.736842 3.684211 3.526316 3.421053 3.368421 3.315789
> matplot(t(aggregate(dat[-1],dat[1],mean)[-1]))

which looks promising and roughly linear. However, the slopes might differ between subjects and this would be the appropriate variation to gauge the mean slope differences against. So let's compute the individual slopes:

> slope <- apply(dat[-1],1,function(x)coef(lm(x~I(1:12)))[2])

We can compare these between the groups with a t test:

> t.test(slope~dat$DC)

	Welch Two Sample t-test

data:  slope by dat$DC 
t = 1.6138, df = 27.189, p-value = 0.1181
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.01217805  0.10203819 
sample estimates:
   mean in group control mean in group head (20g) 
             -0.01800699              -0.06293706 

However, looking more carefully at the data, we realize that many slopes are exactly zero, so a nonparametric test might be in order. It doesn't change anything, though:

> wilcox.test(slope~dat$DC)

	Wilcoxon rank sum test with continuity correction

data:  slope by dat$DC 
W = 232, p-value = 0.09845
alternative hypothesis: true location shift is not equal to 0 

Warning message:
In wilcox.test.default(x = c(-2.36672330631823e-16, -2.36672330631823e-16,  :
  cannot compute exact p-value with ties
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-help mailing list