[BioC] limma, subsets of design matrix and p-values

Mike Schaffer mschaff at bu.edu
Thu Mar 10 20:15:29 CET 2005


I've run limma for a few months and had a question about the p-values 
being calculated from a large set of data vs. just a subset.


All of my 2-color array data is relative to an untreated sample and 
each is replicated three times.
For example:
	Treatment1 vs untreated x 3,
	Treatment2 vs. untreated x 3,
	Treatment3 vs. untreated x 3
	...etc.

So I have a large MA object of all the data that is normalized within 
arrays and a design matrix created by:

design <- modelMatrix(targets,ref="untreated")


My confusion stems from the fact that if I run eBayes on the entire 
dataset (code below), I get different p-values (but same M values) for 
the TreatmentX vs. untreated, than if I fit the subset that only 
includes data for one treatment (e.g. just Treatment 1 vs. untreated).

For example,

fit <- lmFit(MA,design=design)
eb <- eBayes(fit)
eb$p.values[1:10,1]


gives different p-values than if I were to only initially subset on the 
Treatment1 vs. untreated data, and then run lmFit.
For example,

design <- modelMatrix(targets[1:3,],ref="untreated")
fit <- lmFit(MA[,1:3],design=design)       # the first three data sets 
include ALLof the Treatment1 vs. untreated data
eb <- eBayes(fit)
eb$p.values[1:10]


Am I incorrect to assume that the p-values should be the same 
regardless of how much data is included in the MA object, as long as 
the design matrix has no overlap between experiments (e.g. treatment1 
vs. untreated data is distinct from treatment2 vs. untreated data) -- 
aside from the fact they are all relative to an untreated sample?

Or is the moderated t-statistic based on ALL of the data in the MA 
object regardless of each experiment's relationship to others?

I'd like to read in all the data and keep it in one large RG and MA 
object.  If I'm looking to determine the p-values for genes induced 
relative to untreated by just one of the treatments, should I use lmFit 
on the large MA object or should it be subset first (e.g MA[,1:3]) 
before doing the linear fit?

Am I missing something?

Thanks, in advance, for any help.



More information about the Bioconductor mailing list