[R] summary.manova rank deficiency error + data

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Aug 13 10:34:55 CEST 2008


Pedro Mardones wrote:
> Dear R-users;
>
> Previously I posted a question about the problem of rank deficiency in
> summary.manova. As somebody suggested, I'm attaching a small part of
> the data set.
>
> #***************************************************
>
> "test" <-
>
> structure(.Data = list(structure(.Data = c(rep(1,3),rep(2,18),rep(3,10)),
> levels = c("1", "2", "3"),
> class = "factor")
>
> ,c(0.181829,0.090159,0.115824,0.112804,0.134650,0.249136,0.163144,0.122012,0.157554,0.126283,
> 0.105344,0.125125,0.126232,0.084317,0.092836,0.108546,0.159165,0.121620,0.142326,0.122770,
> 0.117480,0.153762,0.156551,0.185058,0.161651,0.182331,0.139531,0.188101,0.103196,0.116877,0.113733)
>
> ,c(0.181445,0.090254,0.115840,0.112863,0.134610,0.249003,0.163116,0.122135,0.157206,0.126129,
> 0.105302,0.124917,0.126243,0.084455,0.092818,0.108458,0.158769,0.121244,0.141981,0.122595,
> 0.117556,0.153507,0.156308,0.184644,0.161421,0.181999,0.139376,0.187708,0.103126,0.116615,0.113746)
>
> ,c(0.181058,0.090426,0.115926,0.113022,0.134632,0.248845,0.163140,0.122331,0.156871,0.126023,
> 0.105335,0.124757,0.126325,0.084690,0.092885,0.108455,0.158386,0.120913,0.141676,0.122492,
> 0.117707,0.153293,0.156095,0.184242,0.161214,0.181670,0.139271,0.187318,0.103129,0.116421,0.113826)
>
> ,c(0.180692,0.090704,0.116110,0.113319,0.134745,0.248678,0.163256,0.122637,0.156581,0.125998,
> 0.105479,0.124686,0.126514,0.085066,0.093088,0.108587,0.158040,0.120674,0.141446,0.122488,
> 0.117972,0.153150,0.155954,0.183885,0.161063,0.181383,0.139251,0.186956,0.103232,0.116351,0.114001)
>
> ,c(0.180353,0.091088,0.116392,0.113753,0.134965,0.248520,0.163475,0.123046,0.156354,0.126067,
> 0.105726,0.124713,0.126821,0.085584,0.093432,0.108858,0.157742,0.120533,0.141309,0.122595,
> 0.118340,0.153088,0.155897,0.183582,0.160975,0.181143,0.139314,0.186636,0.103449,0.116415,0.114275)
> )
> ,names = c("GROUP", "Y1", "Y2", "Y3", "Y4","Y5")
> ,row.names = seq(1:31)
> ,class = "data.frame"
> )
>
> summary(manova(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
>
> #Error in summary.manova(manova(cbind(Y1, Y2, Y3, Y4, Y5) ~ GROUP, test),  :
>   residuals have rank 3 < 5
>
> #***************************************************
>
> What I don't understand is why SAS returns no errors using PROC GLM
> for the same data set. Is because PROC GLM doesn't take into account
> problems of rank deficiency? So, should I trust manova instead of PROC
> GLM output? I know it can be a touchy question but I would like to
> receive some insights.
> Thanks
> PM
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
What you have here is extremely correlated data:

 > (V <- estVar(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test)))
Y1 Y2 Y3 Y4 Y5
Y1 0.001262567 0.001259177 0.001254746 0.001249106 0.001242385
Y2 0.001259177 0.001255814 0.001251416 0.001245812 0.001239132
Y3 0.001254746 0.001251416 0.001247055 0.001241494 0.001234861
Y4 0.001249106 0.001245812 0.001241494 0.001235983 0.001229405
Y5 0.001242385 0.001239132 0.001234861 0.001229405 0.001222889
 > eigen(V)
$values
[1] 6.224077e-03 2.313066e-07 3.499837e-10 4.259125e-12 1.334146e-12

$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 0.4503756 0.61213579 0.5204920 -0.3485941 0.1732681
[2,] 0.4491807 0.32333236 -0.1873653 0.5929444 -0.5540795
[3,] 0.4476157 0.01442094 -0.5498688 0.1272921 0.6934503
[4,] 0.4456201 -0.31202109 -0.3198606 -0.6557557 -0.4144143
[5,] 0.4432397 -0.65052351 0.5378809 0.2840428 0.1017918

Notice the more than 9 orders of magnitude between the eigenvalues.

I think that what is happening is that what SAS calls MANOVA is actually 
looking at within-row contrasts, which effectively removes the largest 
eigenvalue. In R, the equivalent would be

 > anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), X=~1, test = "Wilks")
Analysis of Variance Table


Contrasts orthogonal to
~1

Df Wilks approx F num Df den Df Pr(>F)
(Intercept) 1 0.037 164.873 4 25 <2e-16 ***
GROUP 2 0.701 1.215 8 50 0.3098
Residuals 28
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

or (this could be computationally more precice, but in fact it gives the 
same result)

 > anova(lm(cbind(Y2,Y3,Y4,Y5)-Y1~GROUP, test), test = "Wilks")


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list