[BioC] Re: Manova nuances

Liaw, Andy andy_liaw at merck.com
Mon Nov 24 15:30:27 MET 2003


I did a bit of sanity check.  If anyone can tell me what I'm missing, I'll
be very grateful.

I did a small simulation to check the empirical level (type I error) of the
procedure as described by Dr. Baker, compared with the MANOVA tests
available in summary.manova().  The setup is as follows (R code shown
below):

o  Simple oneway layout with three groups.  n=50 in each group.
o  Response is 10, 20 or 30 dimensional, independent standard normal.
o  Repeat the simulation 1000 times and count the percent of times null is
rejected.

The result looks like this (took 9 minutes on my P3 933MHz laptop):

Dr. Baker's suggested procedure:
> apply(pval.pc, 2, function(x) mean(x <= 0.05))
   10    20    30 
0.439 0.638 0.810 

Pillai's trace from summary.manova:
> apply(pval.m, 2, function(x) mean(x <= 0.05))
   10    20    30 
0.039 0.052 0.055 

The problem with using minimum of p-values for univariate ANOVA on PCs, as I
see it, is the absense of multiplicity adjustment, which is taken care of in
the multivariate test.  I suppose one way to "fix" it is to use p.adjust().

Here's the R code for the simulation:

==============================================
n <- 150
p <- c(10, 20, 30)
x <- factor(rep(letters[1:3], each=n/3))
nreps <- 1000
pval.pc <- pval.m <- matrix(NA, nreps, length(p), 
                            dimnames=list(NULL, p))
system.time(
for (i in 1:length(p)) {
  for (j in 1:nreps) {
    y <- matrix(rnorm(n*p[i]), n, p[i])
    y.pc <- princomp(y)$scores
    pval.pc[j, i] <- min(sapply(summary(aov(y.pc ~ x)),
                                function(x) x["x", "Pr(>F)"]))
    pval.m[j, i] <- summary(manova(y ~ x))$stat["x", "Pr(>F)"]
  }
})
=============================================

Best,
Andy

> From: Stephen P. Baker
> 
> Michael,
> One of the matrices output by svd should have dimensions = k 
> by n or n by k where k is number of original variates and n 
> the number of chips, this would be a set of eigenvectors.  
> This matrix times the vector of expression levels from one 
> chip should produce a vector of length n with values for the 
> new components.  There should also be either a vector or 
> diagonal matrix ouput with values that are in descending 
> order; the first corresponds to the variation in the first 
> component, the second the next highest, etc., these should be 
> the eigenvalues.  Eigenvalues greater than 1 can be 
> interpreted as indicating that component accounts for a 
> significant portion of the variation in the original variables.
> 
> -.- -.. .---- .--. ..-.
> Stephen P. Baker, MScPH , PhD(ABD)                      (508) 856-2625
> Senior Biostatistician
> (775) 254-4885 fax
> Academic Computing Services
> Lecturer in Biostatistics , Graduate School of Biomedical 
> Sciences University of Massachusetts Medical School
> 55 Lake Avenue North                          
> stephen.baker at umassmed.edu
> Worcester, MA 01655  USA
> 
> ----- Original Message ----- 
> From: "Michael Benjamin" <msb1129 at bellsouth.net>
> To: "'Stephen P. Baker'" <stephen.baker at umassmed.edu>; 
> <bioconductor at stat.math.ethz.ch>
> Sent: Sunday, November 23, 2003 8:53 PM
> Subject: RE: [BioC] Re: Manova nuances
> 
> 
> > This sounds very reasonable.  I'm having a bit of trouble with the 
> > implementation.
> >
> > How do you solve for a variable?  I know u * diag(d) * t(v) 
> gets your 
> > original data, but how do you pick variables?  Just taking 
> the first 
> > four columns of u or v alone doesn't work--I tried.  There 
> must be a 
> > way to combine d, u, and v to represent the first few variables in 
> > low-dimensional space.
> >
> > In other words, after you do svd, then what?  What do you 
> compare? U? 
> > V?
> >
> > Thanks,
> > Mike
> >
> >
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch
> > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf 
> Of Stephen 
> > P. Baker
> > Sent: Saturday, November 22, 2003 8:57 AM
> > To: Michael Benjamin; bioconductor at stat.math.ethz.ch
> > Subject: Re: [BioC] Re: Manova nuances
> >
> > The principal components are orthogonal and independent and measure 
> > different things so it makes no sense to compare them, like 
> comparing 
> > horizontal to vertical or heart rate to IQ.
> >
> > Treat the components like variables and perform the same 
> analysis on 
> > them with ANOVA that you would have with MANOVA.  Like 2 
> groups, do a 
> > t-test, 3
> > do ANOVA, whatever analysis is appropriate for your 
> experimental design.
> > If
> > ANY of these are significant, the MANOVA would have been 
> significant.
> >
> > Stephen
> > -.- -.. .---- .--. ..-.
> > Stephen P. Baker, MScPH , PhD(ABD)                      
> (508) 856-2625
> > Senior Biostatistician
> > (775) 254-4885 fax
> > Academic Computing Services
> > Lecturer in Biostatistics , Graduate School of Biomedical Sciences 
> > University of Massachusetts Medical School
> > 55 Lake Avenue North                          
> stephen.baker at umassmed.edu
> > Worcester, MA 01655  USA
> >
> > ----- Original Message -----
> > From: "Michael Benjamin" <msb1129 at bellsouth.net>
> > To: "'Baker, Stephen'" <Stephen.Baker at umassmed.edu>; "'Liaw, Andy'"
> > <andy_liaw at merck.com>; <bioconductor at stat.math.ethz.ch>
> > Sent: Friday, November 21, 2003 11:11 PM
> > Subject: RE: [BioC] Re: Manova nuances
> >
> >
> > > Can I do instead:
> > > comps1<-svd(teset[group1])$d
> > > comps2<-svd(teset[group2])$d
> > > t.test(comps1,comps2)
> > >
> > > Maybe I could just compare the top two or three components to one 
> > > another?
> > >
> > > Mike
> > >
> > > -----Original Message-----
> > > From: Baker, Stephen [mailto:Stephen.Baker at umassmed.edu]
> > > Sent: Friday, November 21, 2003 3:05 PM
> > > To: Liaw, Andy; bioconductor at stat.math.ethz.ch
> > > Cc: msb1129 at bellsouth.net
> > > Subject: RE: [BioC] Re: Manova nuances
> > >
> > > Andy et al.
> > >
> > > (Thanks for correcting my typo on the spelling of "principal").
> > >
> > > Yes I know that ANOVA of n principal components will result in n 
> > > p-values, however the SMALLEST p-value will be equivalent to a 
> > > multivariate test of his hypotheses on his data.
> > >
> > > MANOVA and univariate ANOVA on the principal components are
> > essentially
> > > equivalent in theory and quite similar in that both approaches 
> > > involve the characteristic roots and functions of the same design 
> > > and
> > covariance
> > > matrices.
> > >
> > > The equivalence is based on the fact that multivariate hypotheses 
> > > will be rejected only if the equivalent univariate 
> hypotheses do not 
> > > hold
> > for
> > > all variates (Morrison,1976).  Principle components simply 
> > > transforms the original variates into new variates which conserve 
> > > all the
> > original
> > > information. Michael Benjamin's problem is that he CANNOT 
> run MANOVA
> > as
> > > he has fewer cases than variates however my suggested 
> approach WOULD 
> > > work.
> > >
> > > With regards to Michael's request/need for a SINGLE SUMMARY 
> > > STATISTIC, he would use the minimum of the p-values for the 
> > > appropriate effect
> > from
> > > the univariate ANOVA's on the principal components as his single 
> > > p-value.  These are orthogonal tests and the minimum would be
> > equivalent
> > > to testing the same hypotheses with MANOVA on his dataset.
> > >
> > >
> > > The only caveat is that with K genes and n<K and he will 
> be able to
> > test
> > > his hypotheses on the first n principal components which 
> account for
> > the
> > > largest portions of the variation.  However, in my 20 years
> > experience,
> > > in most datasets the number of "significant" components (with 
> > > eigenvalues >1) is usually much smaller than the number 
> of variates.
> > It
> > > would be unusual for any real biological effect to not be 
> > > represented among one or more of the first n components 
> given n is 
> > > not too small.
> > In
> > > his case that's 35 and I think that's probably enough.
> > >
> > > Best wishes
> > > Stephen
> > >
> > > -.- -.. .---- .--. ..-.
> > > Stephen P. Baker, MScPH, PhD (ABD)            (508) 856-2625
> > > Sr. Biostatistician- Information Services
> > > Lecturer in Biostatistics                     (775) 254-4885 fax
> > > Graduate School of Biomedical Sciences
> > > University of Massachusetts Medical School, Worcester
> > > 55 Lake Avenue North
> > stephen.baker at umassmed.edu
> > > Worcester, MA 01655  USA
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Liaw, Andy [mailto:andy_liaw at merck.com]
> > > Sent: Friday, November 21, 2003 8:13 AM
> > > To: Baker, Stephen; bioconductor at stat.math.ethz.ch
> > > Cc: 'msb1129 at bellsouth.net'
> > > Subject: RE: [BioC] Re: Manova nuances
> > >
> > >
> > > > From: Stephen P. Baker [mailto:stephen.baker at umassmed.edu]
> > > >
> > > > Principle component analyses should reduce your data array to
> > >   ^^^^^^^^^
> > >   Principal
> > >
> > > > as many independent components as you have samples, and 
>  for each 
> > > > sample get a score for each dimension.  These will have 
> the same 
> > > > total information as the original data.  These can then be 
> > > > analysed separately with univariate anova but since these are 
> > > > "orthogonal" analyses, multiple comparisons adjustments 
> would not 
> > > > be needed.
> > >
> > > The analysis you described is quite different than MANOVA, so the 
> > > conclusion/interpretation would be quite different, too. MANOVA 
> > > treats the data as coming from multivariate normal 
> distribution, and 
> > > tests whether all groups have the same mean vector.  What you
> > described
> > > is n (number of samples) ANOVA analyses that gives n p-values.
> > >
> > > Cheers,
> > > Andy
> > > Andy Liaw, PhD
> > > Biometrics Research      PO Box 2000, RY33-300
> > > Merck Research Labs           Rahway, NJ 07065
> > > mailto:andy_liaw at merck.com        732-594-0820
> > >
> > >
> > >
> > > > -.- -.. .---- .--. ..-.
> > > > Stephen P. Baker, MScPH , PhD(ABD)                      (508)
> > 856-2625
> > > > Senior Biostatistician
> > > > (775) 254-4885 fax
> > > > Academic Computing Services
> > > > Lecturer in Biostatistics , Graduate School of 
> Biomedical Sciences 
> > > > University of Massachusetts Medical School 55 Lake Avenue North
> > > > stephen.baker at umassmed.edu
> > > > Worcester, MA 01655  USA
> > > > --------------------------------------------------------------
> > > > --------------
> > > > ----
> > > > Date: Fri, 21 Nov 2003 00:18:54 -0500
> > > > From: "Michael Benjamin" <msb1129 at bellsouth.net>
> > > > Subject: [BioC] Manova nuances
> > > > To: <bioconductor at stat.math.ethz.ch>
> > > > Message-ID: <003401c3afee$f7eff000$7a05fea9 at amd>
> > > > Content-Type: text/plain; charset="US-ASCII"
> > > >
> > > >
> > > > Anybody here using manova?  It's powerful and pretty 
> fast, but I'm 
> > > > finding that you can't have more variables than samples (limits 
> > > > its applicability to microarray research). Is there any 
> way around 
> > > > this? Assume
> > > >
> > > > dim(eset)
> > > >
> > > > 1200 35
> > > >
> > > > transeset<-t(eset)
> > > > fit<-manova(transeset ~ categories)
> > > > summary(fit)
> > > >
> > > > There is probably a complicated mathematical truth that 
> underlies 
> > > > this limitation--if anybody can shed some light, that would be 
> > > > great.
> > > >
> > > > Also, if anyone knows of a quick, free multivariate tool that 
> > > > summarizes all the tests into a single test statistic, 
> that would 
> > > > be much appreciated.
> > > >
> > > > Regards,
> > > > Michael Benjamin, MD
> > > > Emory University
> > > > Winship Cancer Institute
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch 
> > > > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor
> > > >
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch 
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >
> >
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch 
> https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor
>



More information about the Bioconductor mailing list