[BioC] limma with missing values
Hans-Ulrich Klein
h.klein at uni-muenster.de
Thu Mar 6 13:43:34 CET 2008
Dear all,
I am struggling with using limma in the case of many missing values.
Here is a small example:
df = data.frame(
g1=c(5,5,6,6,NA,NA),
g2=c(5,5,NA,NA,6,6),
g3=c(NA,NA,5,5,6,6),
treat=c("T1","T1","T2","T2","T3","T3"),
class=c("A","B","A","B","A","B"))
conts = list(treat="contr.sum", class="contr.treatment")
X = t(df[,1:3])
D = model.matrix( ~ class + treat, data=df, contrasts=conts)
Fit = lmFit(X, D)
These are the results:
> Fit$coefficients
(Intercept) classB treatT2 treatT3
g1 5 2.220446e-16 1 NA
g2 5 2.220446e-16 NA 1
g3 6 3.330669e-16 -1 NA
Suppose I am interessted in the comparison of T2-T1. The column
"treatT2" of "Fit$coefficients" looks fine for gene 1 and gene 2. In the
case of gene 3 the baseline level T1 can not be estimated and the
coefficient treatT2 seems to be the comparison of T2-T3. I find this
confusing when computing many thousand models and applying toptable(Fit,
coef="treatT2") afterwards.
I changed the design matrix by setting
conts = list(treat="contr.sum", class="contr.sum")
Now, I get the results:
> Fit$coefficients
(Intercept) class1 treat1 treat2
g1 6.0 -1.665335e-16 -1.0 NA
g2 5.5 -1.665335e-16 -0.5 NA
g3 5.0 -1.110223e-16 -1.0 NA
However, I expected:
Intercept treat1 treat2
g1 5.5 -0.5 0.5
g2 5.5 -0.5 NA
g3 5.5 NA -0.5
Is it appropriate to use limma in such scenarios? Did I something wrong
or misinterpreted the results? I think, replacing the NAs by mean
intensities would be a solution in this case.
Regards,
Hans-Ulrich
> sessionInfo()
R version 2.6.2 (2008-02-08)
x86_64-pc-linux-gnu
locale:
C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] limma_2.12.0
loaded via a namespace (and not attached):
[1] rcompgen_0.1-17
More information about the Bioconductor
mailing list