[R] summary.manova rank deficiency error + data

Wed Aug 13 16:51:26 CEST 2008

Thanks for the reply. The SAS output is attached but seems to me that
doesn't correspond to the wihtin-row contrasts as you suggested. By
the way, yes the data are highly correlated, in fact each row
correspond to the first part of a signal vector. Thanks anyway....
PM

                                      The GLM Procedure
                              Multivariate Analysis of Variance
                                    E = Error SSCP Matrix
                   y1                y2                y3
  y4                y5
 y1      0.0353518799       0.035256904      0.0351327804
0.0349749601      0.0347868018
 y2       0.035256904      0.0351627227      0.0350395053
0.0348827098      0.0346956744
 y3      0.0351327804      0.0350395053      0.0349173343
0.0347617352      0.0345760232
 y4      0.0349749601      0.0348827098      0.0347617352
0.0346075203      0.0344233531
 y5      0.0347868018      0.0346956744      0.0345760232
0.0344233531      0.0342409225

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
      DF = 28             y1             y2             y3
y4             y5
      y1            1.000000       0.999992       0.999967
0.999921       0.999852
                                     <.0001         <.0001
<.0001         <.0001
      y2            0.999992       1.000000       0.999991
0.999963       0.999911
                      <.0001                        <.0001
<.0001         <.0001
      y3            0.999967       0.999991       1.000000
0.999990       0.999958
                      <.0001         <.0001
<.0001         <.0001
      y4            0.999921       0.999963       0.999990
1.000000       0.999989
                      <.0001         <.0001         <.0001
           <.0001
      y5            0.999852       0.999911       0.999958
0.999989       1.000000
                      <.0001         <.0001         <.0001         <.0001

The SAS System     10:33 Wednesday, August 13, 2008   8
                                      The GLM Procedure
                              Multivariate Analysis of Variance
                              H = Type III SSCP Matrix for group
                   y1                y2                y3
  y4                y5
 y1      0.0023822408       0.002365848      0.0023471328
0.0023261249      0.0023030993
 y2       0.002365848      0.0023495679      0.0023309816
0.0023101183      0.0022872511
 y3      0.0023471328      0.0023309816      0.0023125426
0.0022918453      0.0022691608
 y4      0.0023261249      0.0023101183      0.0022918453
0.0022713359      0.0022488593
 y5      0.0023030993      0.0022872511      0.0022691608
0.0022488593      0.0022266141

                 Characteristic Roots and Vectors of: E Inverse * H, where
                              H = Type III SSCP Matrix for group
                                    E = Error SSCP Matrix
Characteristic           Characteristic Vector  V'EV=1
          Root  Percent            y1            y2            y3
      y4            y5
    0.41840103    71.72     -7542.628     17131.814      5347.394
-31627.317     16700.100
    0.16496011    28.28     -4180.854     -4413.446     32096.035
-35545.204     12040.697
    0.00000001     0.00    -41004.875    107291.004    -95905.664
32641.189     -3028.470
    0.00000000     0.00      -416.226      -111.206       410.721
 295.193      -171.953
    0.00000000     0.00    -14678.651      5787.997     54718.250
-69055.249     23218.580

   MANOVA Test Criteria and F Approximations for the Hypothesis of No
Overall group Effect
                             H = Type III SSCP Matrix for group
                                    E = Error SSCP Matrix
                                     S=2    M=1    N=11
       Statistic                        Value    F Value    Num DF
Den DF    Pr > F
       Wilks' Lambda               0.60518744       1.37        10
   48    0.2227
       Pillai's Trace              0.43658228       1.40        10
   50    0.2095
       Hotelling-Lawley Trace      0.58336114       1.37        10
33.362    0.2385
       Roy's Greatest Root         0.41840103       2.09         5
   25    0.1000

On Wed, Aug 13, 2008 at 4:34 AM, Peter Dalgaard
<p.dalgaard at biostat.ku.dk> wrote:
> Pedro Mardones wrote:
>>
>> Dear R-users;
>>
>> Previously I posted a question about the problem of rank deficiency in
>> summary.manova. As somebody suggested, I'm attaching a small part of
>> the data set.
>>
>> #***************************************************
>>
>> "test" <-
>>
>> structure(.Data = list(structure(.Data = c(rep(1,3),rep(2,18),rep(3,10)),
>> levels = c("1", "2", "3"),
>> class = "factor")
>>
>>
>> ,c(0.181829,0.090159,0.115824,0.112804,0.134650,0.249136,0.163144,0.122012,0.157554,0.126283,
>>
>> 0.105344,0.125125,0.126232,0.084317,0.092836,0.108546,0.159165,0.121620,0.142326,0.122770,
>>
>> 0.117480,0.153762,0.156551,0.185058,0.161651,0.182331,0.139531,0.188101,0.103196,0.116877,0.113733)
>>
>>
>> ,c(0.181445,0.090254,0.115840,0.112863,0.134610,0.249003,0.163116,0.122135,0.157206,0.126129,
>>
>> 0.105302,0.124917,0.126243,0.084455,0.092818,0.108458,0.158769,0.121244,0.141981,0.122595,
>>
>> 0.117556,0.153507,0.156308,0.184644,0.161421,0.181999,0.139376,0.187708,0.103126,0.116615,0.113746)
>>
>>
>> ,c(0.181058,0.090426,0.115926,0.113022,0.134632,0.248845,0.163140,0.122331,0.156871,0.126023,
>>
>> 0.105335,0.124757,0.126325,0.084690,0.092885,0.108455,0.158386,0.120913,0.141676,0.122492,
>>
>> 0.117707,0.153293,0.156095,0.184242,0.161214,0.181670,0.139271,0.187318,0.103129,0.116421,0.113826)
>>
>>
>> ,c(0.180692,0.090704,0.116110,0.113319,0.134745,0.248678,0.163256,0.122637,0.156581,0.125998,
>>
>> 0.105479,0.124686,0.126514,0.085066,0.093088,0.108587,0.158040,0.120674,0.141446,0.122488,
>>
>> 0.117972,0.153150,0.155954,0.183885,0.161063,0.181383,0.139251,0.186956,0.103232,0.116351,0.114001)
>>
>>
>> ,c(0.180353,0.091088,0.116392,0.113753,0.134965,0.248520,0.163475,0.123046,0.156354,0.126067,
>>
>> 0.105726,0.124713,0.126821,0.085584,0.093432,0.108858,0.157742,0.120533,0.141309,0.122595,
>>
>> 0.118340,0.153088,0.155897,0.183582,0.160975,0.181143,0.139314,0.186636,0.103449,0.116415,0.114275)
>> )
>> ,names = c("GROUP", "Y1", "Y2", "Y3", "Y4","Y5")
>> ,row.names = seq(1:31)
>> ,class = "data.frame"
>> )
>>
>> summary(manova(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
>>
>> #Error in summary.manova(manova(cbind(Y1, Y2, Y3, Y4, Y5) ~ GROUP, test),
>>  :
>>  residuals have rank 3 < 5
>>
>> #***************************************************
>>
>> What I don't understand is why SAS returns no errors using PROC GLM
>> for the same data set. Is because PROC GLM doesn't take into account
>> problems of rank deficiency? So, should I trust manova instead of PROC
>> GLM output? I know it can be a touchy question but I would like to
>> receive some insights.
>> Thanks
>> PM
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> What you have here is extremely correlated data:
>
>> (V <- estVar(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test)))
> Y1 Y2 Y3 Y4 Y5
> Y1 0.001262567 0.001259177 0.001254746 0.001249106 0.001242385
> Y2 0.001259177 0.001255814 0.001251416 0.001245812 0.001239132
> Y3 0.001254746 0.001251416 0.001247055 0.001241494 0.001234861
> Y4 0.001249106 0.001245812 0.001241494 0.001235983 0.001229405
> Y5 0.001242385 0.001239132 0.001234861 0.001229405 0.001222889
>> eigen(V)
> $values
> [1] 6.224077e-03 2.313066e-07 3.499837e-10 4.259125e-12 1.334146e-12
>
> $vectors
> [,1] [,2] [,3] [,4] [,5]
> [1,] 0.4503756 0.61213579 0.5204920 -0.3485941 0.1732681
> [2,] 0.4491807 0.32333236 -0.1873653 0.5929444 -0.5540795
> [3,] 0.4476157 0.01442094 -0.5498688 0.1272921 0.6934503
> [4,] 0.4456201 -0.31202109 -0.3198606 -0.6557557 -0.4144143
> [5,] 0.4432397 -0.65052351 0.5378809 0.2840428 0.1017918
>
> Notice the more than 9 orders of magnitude between the eigenvalues.
>
> I think that what is happening is that what SAS calls MANOVA is actually
> looking at within-row contrasts, which effectively removes the largest
> eigenvalue. In R, the equivalent would be
>
>> anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), X=~1, test = "Wilks")
> Analysis of Variance Table
>
>
> Contrasts orthogonal to
> ~1
>
> Df Wilks approx F num Df den Df Pr(>F)
> (Intercept) 1 0.037 164.873 4 25 <2e-16 ***
> GROUP 2 0.701 1.215 8 50 0.3098
> Residuals 28
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> or (this could be computationally more precice, but in fact it gives the
> same result)
>
>> anova(lm(cbind(Y2,Y3,Y4,Y5)-Y1~GROUP, test), test = "Wilks")
>
>
> --
>  O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>
>