[R] summary.manova rank deficiency error + data
Pedro Mardones
mardones.p at gmail.com
Wed Aug 13 16:51:26 CEST 2008
Thanks for the reply. The SAS output is attached but seems to me that
doesn't correspond to the wihtin-row contrasts as you suggested. By
the way, yes the data are highly correlated, in fact each row
correspond to the first part of a signal vector. Thanks anyway....
PM
The GLM Procedure
Multivariate Analysis of Variance
E = Error SSCP Matrix
y1 y2 y3
y4 y5
y1 0.0353518799 0.035256904 0.0351327804
0.0349749601 0.0347868018
y2 0.035256904 0.0351627227 0.0350395053
0.0348827098 0.0346956744
y3 0.0351327804 0.0350395053 0.0349173343
0.0347617352 0.0345760232
y4 0.0349749601 0.0348827098 0.0347617352
0.0346075203 0.0344233531
y5 0.0347868018 0.0346956744 0.0345760232
0.0344233531 0.0342409225
Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 28 y1 y2 y3
y4 y5
y1 1.000000 0.999992 0.999967
0.999921 0.999852
<.0001 <.0001
<.0001 <.0001
y2 0.999992 1.000000 0.999991
0.999963 0.999911
<.0001 <.0001
<.0001 <.0001
y3 0.999967 0.999991 1.000000
0.999990 0.999958
<.0001 <.0001
<.0001 <.0001
y4 0.999921 0.999963 0.999990
1.000000 0.999989
<.0001 <.0001 <.0001
<.0001
y5 0.999852 0.999911 0.999958
0.999989 1.000000
<.0001 <.0001 <.0001 <.0001
The SAS System 10:33 Wednesday, August 13, 2008 8
The GLM Procedure
Multivariate Analysis of Variance
H = Type III SSCP Matrix for group
y1 y2 y3
y4 y5
y1 0.0023822408 0.002365848 0.0023471328
0.0023261249 0.0023030993
y2 0.002365848 0.0023495679 0.0023309816
0.0023101183 0.0022872511
y3 0.0023471328 0.0023309816 0.0023125426
0.0022918453 0.0022691608
y4 0.0023261249 0.0023101183 0.0022918453
0.0022713359 0.0022488593
y5 0.0023030993 0.0022872511 0.0022691608
0.0022488593 0.0022266141
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for group
E = Error SSCP Matrix
Characteristic Characteristic Vector V'EV=1
Root Percent y1 y2 y3
y4 y5
0.41840103 71.72 -7542.628 17131.814 5347.394
-31627.317 16700.100
0.16496011 28.28 -4180.854 -4413.446 32096.035
-35545.204 12040.697
0.00000001 0.00 -41004.875 107291.004 -95905.664
32641.189 -3028.470
0.00000000 0.00 -416.226 -111.206 410.721
295.193 -171.953
0.00000000 0.00 -14678.651 5787.997 54718.250
-69055.249 23218.580
MANOVA Test Criteria and F Approximations for the Hypothesis of No
Overall group Effect
H = Type III SSCP Matrix for group
E = Error SSCP Matrix
S=2 M=1 N=11
Statistic Value F Value Num DF
Den DF Pr > F
Wilks' Lambda 0.60518744 1.37 10
48 0.2227
Pillai's Trace 0.43658228 1.40 10
50 0.2095
Hotelling-Lawley Trace 0.58336114 1.37 10
33.362 0.2385
Roy's Greatest Root 0.41840103 2.09 5
25 0.1000
On Wed, Aug 13, 2008 at 4:34 AM, Peter Dalgaard
<p.dalgaard at biostat.ku.dk> wrote:
> Pedro Mardones wrote:
>>
>> Dear R-users;
>>
>> Previously I posted a question about the problem of rank deficiency in
>> summary.manova. As somebody suggested, I'm attaching a small part of
>> the data set.
>>
>> #***************************************************
>>
>> "test" <-
>>
>> structure(.Data = list(structure(.Data = c(rep(1,3),rep(2,18),rep(3,10)),
>> levels = c("1", "2", "3"),
>> class = "factor")
>>
>>
>> ,c(0.181829,0.090159,0.115824,0.112804,0.134650,0.249136,0.163144,0.122012,0.157554,0.126283,
>>
>> 0.105344,0.125125,0.126232,0.084317,0.092836,0.108546,0.159165,0.121620,0.142326,0.122770,
>>
>> 0.117480,0.153762,0.156551,0.185058,0.161651,0.182331,0.139531,0.188101,0.103196,0.116877,0.113733)
>>
>>
>> ,c(0.181445,0.090254,0.115840,0.112863,0.134610,0.249003,0.163116,0.122135,0.157206,0.126129,
>>
>> 0.105302,0.124917,0.126243,0.084455,0.092818,0.108458,0.158769,0.121244,0.141981,0.122595,
>>
>> 0.117556,0.153507,0.156308,0.184644,0.161421,0.181999,0.139376,0.187708,0.103126,0.116615,0.113746)
>>
>>
>> ,c(0.181058,0.090426,0.115926,0.113022,0.134632,0.248845,0.163140,0.122331,0.156871,0.126023,
>>
>> 0.105335,0.124757,0.126325,0.084690,0.092885,0.108455,0.158386,0.120913,0.141676,0.122492,
>>
>> 0.117707,0.153293,0.156095,0.184242,0.161214,0.181670,0.139271,0.187318,0.103129,0.116421,0.113826)
>>
>>
>> ,c(0.180692,0.090704,0.116110,0.113319,0.134745,0.248678,0.163256,0.122637,0.156581,0.125998,
>>
>> 0.105479,0.124686,0.126514,0.085066,0.093088,0.108587,0.158040,0.120674,0.141446,0.122488,
>>
>> 0.117972,0.153150,0.155954,0.183885,0.161063,0.181383,0.139251,0.186956,0.103232,0.116351,0.114001)
>>
>>
>> ,c(0.180353,0.091088,0.116392,0.113753,0.134965,0.248520,0.163475,0.123046,0.156354,0.126067,
>>
>> 0.105726,0.124713,0.126821,0.085584,0.093432,0.108858,0.157742,0.120533,0.141309,0.122595,
>>
>> 0.118340,0.153088,0.155897,0.183582,0.160975,0.181143,0.139314,0.186636,0.103449,0.116415,0.114275)
>> )
>> ,names = c("GROUP", "Y1", "Y2", "Y3", "Y4","Y5")
>> ,row.names = seq(1:31)
>> ,class = "data.frame"
>> )
>>
>> summary(manova(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
>>
>> #Error in summary.manova(manova(cbind(Y1, Y2, Y3, Y4, Y5) ~ GROUP, test),
>> :
>> residuals have rank 3 < 5
>>
>> #***************************************************
>>
>> What I don't understand is why SAS returns no errors using PROC GLM
>> for the same data set. Is because PROC GLM doesn't take into account
>> problems of rank deficiency? So, should I trust manova instead of PROC
>> GLM output? I know it can be a touchy question but I would like to
>> receive some insights.
>> Thanks
>> PM
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> What you have here is extremely correlated data:
>
>> (V <- estVar(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test)))
> Y1 Y2 Y3 Y4 Y5
> Y1 0.001262567 0.001259177 0.001254746 0.001249106 0.001242385
> Y2 0.001259177 0.001255814 0.001251416 0.001245812 0.001239132
> Y3 0.001254746 0.001251416 0.001247055 0.001241494 0.001234861
> Y4 0.001249106 0.001245812 0.001241494 0.001235983 0.001229405
> Y5 0.001242385 0.001239132 0.001234861 0.001229405 0.001222889
>> eigen(V)
> $values
> [1] 6.224077e-03 2.313066e-07 3.499837e-10 4.259125e-12 1.334146e-12
>
> $vectors
> [,1] [,2] [,3] [,4] [,5]
> [1,] 0.4503756 0.61213579 0.5204920 -0.3485941 0.1732681
> [2,] 0.4491807 0.32333236 -0.1873653 0.5929444 -0.5540795
> [3,] 0.4476157 0.01442094 -0.5498688 0.1272921 0.6934503
> [4,] 0.4456201 -0.31202109 -0.3198606 -0.6557557 -0.4144143
> [5,] 0.4432397 -0.65052351 0.5378809 0.2840428 0.1017918
>
> Notice the more than 9 orders of magnitude between the eigenvalues.
>
> I think that what is happening is that what SAS calls MANOVA is actually
> looking at within-row contrasts, which effectively removes the largest
> eigenvalue. In R, the equivalent would be
>
>> anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), X=~1, test = "Wilks")
> Analysis of Variance Table
>
>
> Contrasts orthogonal to
> ~1
>
> Df Wilks approx F num Df den Df Pr(>F)
> (Intercept) 1 0.037 164.873 4 25 <2e-16 ***
> GROUP 2 0.701 1.215 8 50 0.3098
> Residuals 28
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> or (this could be computationally more precice, but in fact it gives the
> same result)
>
>> anova(lm(cbind(Y2,Y3,Y4,Y5)-Y1~GROUP, test), test = "Wilks")
>
>
> --
> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>
>
More information about the R-help
mailing list