[BioC] Re: Manova nuances
Liaw, Andy
andy_liaw at merck.com
Mon Nov 24 15:30:27 MET 2003
I did a bit of sanity check. If anyone can tell me what I'm missing, I'll
be very grateful.
I did a small simulation to check the empirical level (type I error) of the
procedure as described by Dr. Baker, compared with the MANOVA tests
available in summary.manova(). The setup is as follows (R code shown
below):
o Simple oneway layout with three groups. n=50 in each group.
o Response is 10, 20 or 30 dimensional, independent standard normal.
o Repeat the simulation 1000 times and count the percent of times null is
rejected.
The result looks like this (took 9 minutes on my P3 933MHz laptop):
Dr. Baker's suggested procedure:
> apply(pval.pc, 2, function(x) mean(x <= 0.05))
10 20 30
0.439 0.638 0.810
Pillai's trace from summary.manova:
> apply(pval.m, 2, function(x) mean(x <= 0.05))
10 20 30
0.039 0.052 0.055
The problem with using minimum of p-values for univariate ANOVA on PCs, as I
see it, is the absense of multiplicity adjustment, which is taken care of in
the multivariate test. I suppose one way to "fix" it is to use p.adjust().
Here's the R code for the simulation:
==============================================
n <- 150
p <- c(10, 20, 30)
x <- factor(rep(letters[1:3], each=n/3))
nreps <- 1000
pval.pc <- pval.m <- matrix(NA, nreps, length(p),
dimnames=list(NULL, p))
system.time(
for (i in 1:length(p)) {
for (j in 1:nreps) {
y <- matrix(rnorm(n*p[i]), n, p[i])
y.pc <- princomp(y)$scores
pval.pc[j, i] <- min(sapply(summary(aov(y.pc ~ x)),
function(x) x["x", "Pr(>F)"]))
pval.m[j, i] <- summary(manova(y ~ x))$stat["x", "Pr(>F)"]
}
})
=============================================
Best,
Andy
> From: Stephen P. Baker
>
> Michael,
> One of the matrices output by svd should have dimensions = k
> by n or n by k where k is number of original variates and n
> the number of chips, this would be a set of eigenvectors.
> This matrix times the vector of expression levels from one
> chip should produce a vector of length n with values for the
> new components. There should also be either a vector or
> diagonal matrix ouput with values that are in descending
> order; the first corresponds to the variation in the first
> component, the second the next highest, etc., these should be
> the eigenvalues. Eigenvalues greater than 1 can be
> interpreted as indicating that component accounts for a
> significant portion of the variation in the original variables.
>
> -.- -.. .---- .--. ..-.
> Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625
> Senior Biostatistician
> (775) 254-4885 fax
> Academic Computing Services
> Lecturer in Biostatistics , Graduate School of Biomedical
> Sciences University of Massachusetts Medical School
> 55 Lake Avenue North
> stephen.baker at umassmed.edu
> Worcester, MA 01655 USA
>
> ----- Original Message -----
> From: "Michael Benjamin" <msb1129 at bellsouth.net>
> To: "'Stephen P. Baker'" <stephen.baker at umassmed.edu>;
> <bioconductor at stat.math.ethz.ch>
> Sent: Sunday, November 23, 2003 8:53 PM
> Subject: RE: [BioC] Re: Manova nuances
>
>
> > This sounds very reasonable. I'm having a bit of trouble with the
> > implementation.
> >
> > How do you solve for a variable? I know u * diag(d) * t(v)
> gets your
> > original data, but how do you pick variables? Just taking
> the first
> > four columns of u or v alone doesn't work--I tried. There
> must be a
> > way to combine d, u, and v to represent the first few variables in
> > low-dimensional space.
> >
> > In other words, after you do svd, then what? What do you
> compare? U?
> > V?
> >
> > Thanks,
> > Mike
> >
> >
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch
> > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf
> Of Stephen
> > P. Baker
> > Sent: Saturday, November 22, 2003 8:57 AM
> > To: Michael Benjamin; bioconductor at stat.math.ethz.ch
> > Subject: Re: [BioC] Re: Manova nuances
> >
> > The principal components are orthogonal and independent and measure
> > different things so it makes no sense to compare them, like
> comparing
> > horizontal to vertical or heart rate to IQ.
> >
> > Treat the components like variables and perform the same
> analysis on
> > them with ANOVA that you would have with MANOVA. Like 2
> groups, do a
> > t-test, 3
> > do ANOVA, whatever analysis is appropriate for your
> experimental design.
> > If
> > ANY of these are significant, the MANOVA would have been
> significant.
> >
> > Stephen
> > -.- -.. .---- .--. ..-.
> > Stephen P. Baker, MScPH , PhD(ABD)
> (508) 856-2625
> > Senior Biostatistician
> > (775) 254-4885 fax
> > Academic Computing Services
> > Lecturer in Biostatistics , Graduate School of Biomedical Sciences
> > University of Massachusetts Medical School
> > 55 Lake Avenue North
> stephen.baker at umassmed.edu
> > Worcester, MA 01655 USA
> >
> > ----- Original Message -----
> > From: "Michael Benjamin" <msb1129 at bellsouth.net>
> > To: "'Baker, Stephen'" <Stephen.Baker at umassmed.edu>; "'Liaw, Andy'"
> > <andy_liaw at merck.com>; <bioconductor at stat.math.ethz.ch>
> > Sent: Friday, November 21, 2003 11:11 PM
> > Subject: RE: [BioC] Re: Manova nuances
> >
> >
> > > Can I do instead:
> > > comps1<-svd(teset[group1])$d
> > > comps2<-svd(teset[group2])$d
> > > t.test(comps1,comps2)
> > >
> > > Maybe I could just compare the top two or three components to one
> > > another?
> > >
> > > Mike
> > >
> > > -----Original Message-----
> > > From: Baker, Stephen [mailto:Stephen.Baker at umassmed.edu]
> > > Sent: Friday, November 21, 2003 3:05 PM
> > > To: Liaw, Andy; bioconductor at stat.math.ethz.ch
> > > Cc: msb1129 at bellsouth.net
> > > Subject: RE: [BioC] Re: Manova nuances
> > >
> > > Andy et al.
> > >
> > > (Thanks for correcting my typo on the spelling of "principal").
> > >
> > > Yes I know that ANOVA of n principal components will result in n
> > > p-values, however the SMALLEST p-value will be equivalent to a
> > > multivariate test of his hypotheses on his data.
> > >
> > > MANOVA and univariate ANOVA on the principal components are
> > essentially
> > > equivalent in theory and quite similar in that both approaches
> > > involve the characteristic roots and functions of the same design
> > > and
> > covariance
> > > matrices.
> > >
> > > The equivalence is based on the fact that multivariate hypotheses
> > > will be rejected only if the equivalent univariate
> hypotheses do not
> > > hold
> > for
> > > all variates (Morrison,1976). Principle components simply
> > > transforms the original variates into new variates which conserve
> > > all the
> > original
> > > information. Michael Benjamin's problem is that he CANNOT
> run MANOVA
> > as
> > > he has fewer cases than variates however my suggested
> approach WOULD
> > > work.
> > >
> > > With regards to Michael's request/need for a SINGLE SUMMARY
> > > STATISTIC, he would use the minimum of the p-values for the
> > > appropriate effect
> > from
> > > the univariate ANOVA's on the principal components as his single
> > > p-value. These are orthogonal tests and the minimum would be
> > equivalent
> > > to testing the same hypotheses with MANOVA on his dataset.
> > >
> > >
> > > The only caveat is that with K genes and n<K and he will
> be able to
> > test
> > > his hypotheses on the first n principal components which
> account for
> > the
> > > largest portions of the variation. However, in my 20 years
> > experience,
> > > in most datasets the number of "significant" components (with
> > > eigenvalues >1) is usually much smaller than the number
> of variates.
> > It
> > > would be unusual for any real biological effect to not be
> > > represented among one or more of the first n components
> given n is
> > > not too small.
> > In
> > > his case that's 35 and I think that's probably enough.
> > >
> > > Best wishes
> > > Stephen
> > >
> > > -.- -.. .---- .--. ..-.
> > > Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625
> > > Sr. Biostatistician- Information Services
> > > Lecturer in Biostatistics (775) 254-4885 fax
> > > Graduate School of Biomedical Sciences
> > > University of Massachusetts Medical School, Worcester
> > > 55 Lake Avenue North
> > stephen.baker at umassmed.edu
> > > Worcester, MA 01655 USA
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Liaw, Andy [mailto:andy_liaw at merck.com]
> > > Sent: Friday, November 21, 2003 8:13 AM
> > > To: Baker, Stephen; bioconductor at stat.math.ethz.ch
> > > Cc: 'msb1129 at bellsouth.net'
> > > Subject: RE: [BioC] Re: Manova nuances
> > >
> > >
> > > > From: Stephen P. Baker [mailto:stephen.baker at umassmed.edu]
> > > >
> > > > Principle component analyses should reduce your data array to
> > > ^^^^^^^^^
> > > Principal
> > >
> > > > as many independent components as you have samples, and
> for each
> > > > sample get a score for each dimension. These will have
> the same
> > > > total information as the original data. These can then be
> > > > analysed separately with univariate anova but since these are
> > > > "orthogonal" analyses, multiple comparisons adjustments
> would not
> > > > be needed.
> > >
> > > The analysis you described is quite different than MANOVA, so the
> > > conclusion/interpretation would be quite different, too. MANOVA
> > > treats the data as coming from multivariate normal
> distribution, and
> > > tests whether all groups have the same mean vector. What you
> > described
> > > is n (number of samples) ANOVA analyses that gives n p-values.
> > >
> > > Cheers,
> > > Andy
> > > Andy Liaw, PhD
> > > Biometrics Research PO Box 2000, RY33-300
> > > Merck Research Labs Rahway, NJ 07065
> > > mailto:andy_liaw at merck.com 732-594-0820
> > >
> > >
> > >
> > > > -.- -.. .---- .--. ..-.
> > > > Stephen P. Baker, MScPH , PhD(ABD) (508)
> > 856-2625
> > > > Senior Biostatistician
> > > > (775) 254-4885 fax
> > > > Academic Computing Services
> > > > Lecturer in Biostatistics , Graduate School of
> Biomedical Sciences
> > > > University of Massachusetts Medical School 55 Lake Avenue North
> > > > stephen.baker at umassmed.edu
> > > > Worcester, MA 01655 USA
> > > > --------------------------------------------------------------
> > > > --------------
> > > > ----
> > > > Date: Fri, 21 Nov 2003 00:18:54 -0500
> > > > From: "Michael Benjamin" <msb1129 at bellsouth.net>
> > > > Subject: [BioC] Manova nuances
> > > > To: <bioconductor at stat.math.ethz.ch>
> > > > Message-ID: <003401c3afee$f7eff000$7a05fea9 at amd>
> > > > Content-Type: text/plain; charset="US-ASCII"
> > > >
> > > >
> > > > Anybody here using manova? It's powerful and pretty
> fast, but I'm
> > > > finding that you can't have more variables than samples (limits
> > > > its applicability to microarray research). Is there any
> way around
> > > > this? Assume
> > > >
> > > > dim(eset)
> > > >
> > > > 1200 35
> > > >
> > > > transeset<-t(eset)
> > > > fit<-manova(transeset ~ categories)
> > > > summary(fit)
> > > >
> > > > There is probably a complicated mathematical truth that
> underlies
> > > > this limitation--if anybody can shed some light, that would be
> > > > great.
> > > >
> > > > Also, if anyone knows of a quick, free multivariate tool that
> > > > summarizes all the tests into a single test statistic,
> that would
> > > > be much appreciated.
> > > >
> > > > Regards,
> > > > Michael Benjamin, MD
> > > > Emory University
> > > > Winship Cancer Institute
> > > >
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at stat.math.ethz.ch
> > > > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor
> > > >
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >
> >
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor
>
More information about the Bioconductor
mailing list