[BioC] Limma: NA handling in contrasts.fit
Simon Anders
anders at ebi.ac.uk
Tue Mar 31 18:10:09 CEST 2009
Dear Gordon et al.
I've just noticed an oddity in the way how limma's contrasts.fit
function handles missing observations, and I wonder if it might be a bug.
Please have a look at the following test case:
I use the this design matrix:
> dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) )
> dm
intercept a b
[1,] 1 0 0
[2,] 1 0 1
[3,] 1 1 0
[4,] 1 1 1
Let's construct sample data for 5 genes and put in two NAs as follows:
> y <- t( replicate( 5, dm %*% runif(3) ) )
> y[ 1, 3:4 ] <- NA
> y
[,1] [,2] [,3] [,4]
[1,] 0.099221925 0.5628846 NA NA
[2,] 0.009771325 0.7748060 0.3977409 1.162776
[3,] 0.223688182 0.6330630 0.6791238 1.088499
[4,] 0.957762805 1.4338553 1.4259875 1.902080
[5,] 0.766597103 1.3905022 0.9947635 1.618669
If I fit this data with lmFit, I unsurprisingly get a warning that some
coefficients cannot be estimated:
> library( limma )
> fit <- lmFit( y, dm )
Warning message:
Partial NA coefficients for 1 probe(s)
The missing coefficient is the coefficient 'a' for the first gene. Note
that 'b' can be estimated:
> coefficients( fit )
intercept a b
[1,] 0.099221925 NA 0.4636627
[2,] 0.009771325 0.3879696 0.7650347
[3,] 0.223688182 0.4554356 0.4093748
[4,] 0.957762805 0.4682247 0.4760925
[5,] 0.766597103 0.2281664 0.6239051
I now ask 'contrast.fit' to boil the fit object down to only contain the
"b" coefficients. I should get the same coefficients, but only the "b"
column.
> fit2 <- contrasts.fit( fit, coefficients="b" )
> coefficients(fit2)
b
[1,] NA
[2,] 0.7650347
[3,] 0.4093748
[4,] 0.4760925
[5,] 0.6239051
Indeed, I get the same values, but the first value is now masked as 'NA'.
Is there a reason for this, or is this a bug?
Granted, in this example, the use of 'make.contrasts' is superfluous,
but in the following example, it is not. I place the NA, such that 'a'
cannot be estimated, and I get an NA in the contrast 'b-c':
> dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) )
> dm <- rbind( cbind( dm, c=0 ), cbind( dm, c=1 ) )
> dm
intercept a b c
[1,] 1 0 0 0
[2,] 1 0 1 0
[3,] 1 1 0 0
[4,] 1 1 1 0
[5,] 1 0 0 1
[6,] 1 0 1 1
[7,] 1 1 0 1
[8,] 1 1 1 1
> y <- t( replicate( 5, dm %*% runif(4) ) )
> y[ 1, c(3,4,7,8) ] <- NA
> fit <- lmFit( y, dm )
Warning message:
Partial NA coefficients for 1 probe(s)
> coefficients( fit )
intercept a b c
[1,] 0.17989906 NA 0.66812435 0.54889675
[2,] 0.26932489 0.9461271 0.77956666 0.05345084
[3,] 0.38124957 0.0309537 0.58205980 0.26414381
[4,] 0.50592287 0.9680045 0.86566150 0.96073095
[5,] 0.01829934 0.1103156 0.09401078 0.02140402
> fit2 <- contrasts.fit( fit, c( 0, 0, 1, -1 ) )
> coefficients(fit2)
[,1]
[1,] NA
[2,] 0.72611582
[3,] 0.31791599
[4,] -0.09506945
[5,] 0.07260676
Thanks in advance for any help in getting around this, as I have a lot
of data with quite some missing values.
Best regards
Simon Anders
> sessionInfo()
R version 2.9.0 alpha (2009-03-24 r48212)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] limma_2.17.12
+---
| Dr. Simon Anders, Dipl. Phys.
| European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
| office phone +44-1223-492680, mobile phone +44-7505-841692
| preferred (permanent) e-mail: sanders at fs.tum.de
More information about the Bioconductor
mailing list