[BioC] Unexpected results using limma with numerical factor - regression intercept?

Gordon Smyth smyth at wehi.edu.au
Fri Aug 27 01:52:20 CEST 2004


At 11:11 PM 26/08/2004, Matthew  Hannah wrote:
>Sorry for getting things confused, I'd like to ask a statistian but we
>don't really have one:(
>But hopefully I'm improving...
>
>My mistakes were -
>1.I mixed up lmFit(limma) with lm.fit(stats) - sorry, 1 character's not
>alot when you are in a hurry;)
>
>2.Being unsure of how lmFit handles factor c(1,2,3..) vs numeric
>c(1,2,3..) and thinking this example in the Limma user guide meant -ve
>numbers could be confused as dye swaps.
> > design <- c(-1,1,-1,1) #"The negative numbers in the design matrix
>indicate the dye-swaps."
> > fit <- lmFit(MA,design)
>sorry, no real excuse there, it just seemed logical when I typed
>it...doh
>
>Anyway after much searching I *think* I've found out how to use limma
>for (standard?) regression and might even be able to solve the original
>posters query....
>
>lm() and (I guess) lmFit have an implied intercept term, so the design
>of the original posters fit
> >design <- model.matrix(~-1 + TestScore1, pData(eset));
>removes the intercept and so forces the fit to go through zero?? For
>linear regression against a quantified variable you need to allow an
>intercept to be fitted (unless you have a reason to think it goes
>through zero?)??
>
>Maybe I'm wrong but here's my logic -
>num <- c(1,2,3,4,5,6,7) #measured growth...etc
>#Pearson and its p-value
>cor(exprs(esetgcrma)[1,]~num)
>cor.test(exprs(esetgcrma)[1,]~num)
>
>#lm - produces pearson squared, and same p-value as above, also produces
>coefficients
>lm(exprs(esetgcrma)[1,]~num)
>
>#limma - produces the same coefficients as lm() above
>design <- model.matrix(~num)
>fit <- lmFit(esetgcrma,design)
>
>So limma produces the 'same' results as pearson and lm(), but you can
>carry them on for further analysis.....

That's right.

Remember that any function like lm() which accepts a formula, uses the 
formula to convert the data into response y and a design matrix X, and 
these quantities are then passed to lower-level functions. For example, 
lm() can accept a categorical factor which is converted into a numeric 
design matrix. Design matrices are always numeric. For efficiency and 
flexibility reasons, lmFit() assumes that that you have already formed your 
design matrix. For a numeric x, the standard design matrix is simply design 
<- cbind(1,x).

The other difference between lmFit() and lm() is in the storage of the 
results. lmFit() outputs a streamlined data object for which memory usage 
is directly proportion to the number of genes and the number of 
coefficients. The output from lm() requires memory proportion to the 
_square_ of the number of coefficients, and this is prohibitive for a large 
model and a large number of genes.

Gordon

>So is this correct, or should I go back to wearing the dunce hat in the
>corner?
>
>Cheers,
>Matt



More information about the Bioconductor mailing list