[BioC] "validity" of p-values

Rafael A. Irizarry ririzarr at jhsph.edu
Sun Sep 28 18:23:05 MEST 2003


remeber p-value means "chance of seeing something as extreme as we saw
given the null". 
If the null isnt true then the pvalue no longer means what
we think it means. beware that many ANOVA models make assumptions about
normality that are hard to defend when studying microarray data. with
so few arrays we cant rely on the central limit theorem so we are stuck
hoping the assumptions of normality hold, and they become part of the
null hypothesis. i think sometimes, we are over optimistice thinking
the "statistical model is setup correct"  

... and then you have the multiple comparison problem!

-r

On Sun, 28 Sep 2003, Jenny Drnevich wrote:

> See below...
> 
> >>However, have you seen: Chu, Weir, & Wolfinger.  A systematic
> >> statistical linear modeling approach to oligonucleotide array
> >> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002
> >>They advocate using the probe-level data in a linear mixed model.
> >> Assuming that each probe is an independent measure (which I know is not
> >> true because many of them overlap, but I'm ignoring this for now),
> >> using probe-level data gives 14-20 "replicates" per chip. We've based
> >> our analysis methods on this, and with two biological replicates per
> >> genetic line, and three genetic lines per phenotypic group, we've been
> >> able to detect as little as a 15% difference in gene expression at
> >> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001).
> >
> > Mmmm. Getting very low p-values from just two biological replicates
> > doesn't  lead you to question the validity of the p-values?? :)
> 
> But we don't just have two biological replicates. We're interested in
> consistent gene expression differences between phenotype 1 and phenotype
> 2. We looked at three different genetic lines showing phenotype 1 and
> three other lines that had phenotype 2. We made two biological replicates
> of each line, and the expression level of each gene was estimated by 14
> probes. By running a mixed-model ANOVA separately for each gene with
> phenotype, line (nested within phenotype), probe, and all second-order
> interactions, the phenotype comparison has around 120 df (or so, off the
> top of my head). That's how we can detect a 15% difference in gene
> expression. As long as the statistical model is set up correctly, I never
> "question" the validity of p-values, although I might question the
> biological significance... :)
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list