[BioC] Re: "validity" of p-values
Gordon Smyth
smyth at wehi.edu.au
Mon Sep 29 14:03:13 MEST 2003
At 06:27 AM 29/09/2003, Jenny Drnevich wrote:
>See below...
>
> >>However, have you seen: Chu, Weir, & Wolfinger. A systematic
> >> statistical linear modeling approach to oligonucleotide array
> >> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002
> >>They advocate using the probe-level data in a linear mixed model.
> >> Assuming that each probe is an independent measure (which I know is not
> >> true because many of them overlap, but I'm ignoring this for now),
> >> using probe-level data gives 14-20 "replicates" per chip. We've based
> >> our analysis methods on this, and with two biological replicates per
> >> genetic line, and three genetic lines per phenotypic group, we've been
> >> able to detect as little as a 15% difference in gene expression at
> >> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001).
> >
> > Mmmm. Getting very low p-values from just two biological replicates
> > doesn't lead you to question the validity of the p-values?? :)
>
>But we don't just have two biological replicates. We're interested in
>consistent gene expression differences between phenotype 1 and phenotype
>2. We looked at three different genetic lines showing phenotype 1 and
>three other lines that had phenotype 2.
If I understand correctly, you have 6 arrays on each phenotype, all
biologically independent.
> We made two biological replicates
>of each line, and the expression level of each gene was estimated by 14
>probes. By running a mixed-model ANOVA separately for each gene with
>phenotype, line (nested within phenotype), probe, and all second-order
>interactions, the phenotype comparison has around 120 df (or so, off the
>top of my head).
There are only 2 phenotypes, so the phenotype comparison has 1 df. I think
what you mean is that you have something like 120 df for estimating the
variability of repeated measurements at the probe level. But this isn't the
most important variance component for comparing phenotypes. Your model, if
I understand it, neglects any variance component at the array level even
though your treatments (the phenotypes) are applied at the array level. You
are in a way treating the probes as if they were separate arrays, and one
doesn't have to be a mathematical statistician to question to validity of
that.
> That's how we can detect a 15% difference in gene
>expression. As long as the statistical model is set up correctly, I never
>"question" the validity of p-values, although I might question the
>biological significance... :)
You should! A famous and true saying goes "All statistical models are
wrong, but some are useful." It is encumbant on you to understand how the
assumptions of your statistical model relate to reality and how sensitive
your conclusions are to these assumptions.
There are actually deep reasons why, in my opinion, none of the statistical
methods for small numbers of arrays can produce p-values which are
believable in an absolute sense (and this inclues my own methods in the
limma package).
The real test would be to try out your method on some data sets where the
answers are known, for example to apply to methods to some replicate arrays
hybridized with RNA from the same source. My guess is that the method would
detect a lot of spurious differential expression.
Gordon
More information about the Bioconductor
mailing list