[BioC] Limma: NA handling in contrasts.fit

Tue Mar 31 18:10:09 CEST 2009

Dear Gordon et al.

I've just noticed an oddity in the way how limma's contrasts.fit 
function handles missing observations, and I wonder if it might be a bug.

Please have a look at the following test case:

I use the this design matrix:

    > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) )
    > dm
         intercept a b
    [1,]         1 0 0
    [2,]         1 0 1
    [3,]         1 1 0
    [4,]         1 1 1

Let's construct sample data for 5 genes and put in two NAs as follows:

    > y <- t( replicate( 5, dm %*% runif(3) ) )
    > y[ 1, 3:4 ] <- NA
    > y
                [,1]      [,2]      [,3]     [,4]
    [1,] 0.099221925 0.5628846        NA       NA
    [2,] 0.009771325 0.7748060 0.3977409 1.162776
    [3,] 0.223688182 0.6330630 0.6791238 1.088499
    [4,] 0.957762805 1.4338553 1.4259875 1.902080
    [5,] 0.766597103 1.3905022 0.9947635 1.618669

If I fit this data with lmFit, I unsurprisingly get a warning that some 
coefficients cannot be estimated:

    > library( limma )
    > fit <- lmFit( y, dm )
    Warning message:
    Partial NA coefficients for 1 probe(s)

The missing coefficient is the coefficient 'a' for the first gene. Note 
that 'b' can be estimated:

    > coefficients( fit )
        intercept         a         b
    [1,] 0.099221925        NA 0.4636627
    [2,] 0.009771325 0.3879696 0.7650347
    [3,] 0.223688182 0.4554356 0.4093748
    [4,] 0.957762805 0.4682247 0.4760925
    [5,] 0.766597103 0.2281664 0.6239051

I now ask 'contrast.fit' to boil the fit object down to only contain the 
"b" coefficients. I should get the same coefficients, but only the "b" 
column.

    > fit2 <- contrasts.fit( fit, coefficients="b" )
    > coefficients(fit2)
                 b
    [1,]        NA
    [2,] 0.7650347
    [3,] 0.4093748
    [4,] 0.4760925
    [5,] 0.6239051

Indeed, I get the same values, but the first value is now masked as 'NA'.

Is there a reason for this, or is this a bug?

Granted, in this example, the use of 'make.contrasts' is superfluous, 
but in the following example, it is not. I place the NA, such that 'a' 
cannot be estimated, and I get an NA in the contrast 'b-c':

    > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) )
    > dm <- rbind( cbind( dm, c=0 ), cbind( dm, c=1 ) )
    > dm
         intercept a b c
    [1,]         1 0 0 0
    [2,]         1 0 1 0
    [3,]         1 1 0 0
    [4,]         1 1 1 0
    [5,]         1 0 0 1
    [6,]         1 0 1 1
    [7,]         1 1 0 1
    [8,]         1 1 1 1
    > y <- t( replicate( 5, dm %*% runif(4) ) )
    > y[ 1, c(3,4,7,8) ] <- NA
    > fit <- lmFit( y, dm )
    Warning message:
    Partial NA coefficients for 1 probe(s)
    > coefficients( fit )
          intercept         a          b          c
    [1,] 0.17989906        NA 0.66812435 0.54889675
    [2,] 0.26932489 0.9461271 0.77956666 0.05345084
    [3,] 0.38124957 0.0309537 0.58205980 0.26414381
    [4,] 0.50592287 0.9680045 0.86566150 0.96073095
    [5,] 0.01829934 0.1103156 0.09401078 0.02140402
    > fit2 <- contrasts.fit( fit, c( 0, 0, 1, -1 ) )
    > coefficients(fit2)
                [,1]
    [1,]          NA
    [2,]  0.72611582
    [3,]  0.31791599
    [4,] -0.09506945
    [5,]  0.07260676

Thanks in advance for any help in getting around this, as I have a lot 
of data with quite some missing values.

Best regards
   Simon Anders

    > sessionInfo()
    R version 2.9.0 alpha (2009-03-24 r48212)
    x86_64-unknown-linux-gnu

    locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C

    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base 

    other attached packages:
    [1] limma_2.17.12

+---
| Dr. Simon Anders, Dipl. Phys.
| European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
| office phone +44-1223-492680, mobile phone +44-7505-841692
| preferred (permanent) e-mail: sanders at fs.tum.de