[BioC] Limma: NA handling in contrasts.fit

Wed Apr 1 01:44:05 CEST 2009

Hi Simon,

I understand your point, that this behaviour is inconsistent in limma.  It 
is however a deliberate decision rather than a bug.

The reasoning is that a contrast is defined by c'b where c is the contrast 
vector and b is the coef vector.  In matrix arithmetic in R, the answer is 
NA if any of the elements of c or b are NA, even if the NA is multiplied 
by a zero in the contrast.

In R arithmetic, Inf is a possible value, so that NA*0 must indeed be NA, 
because Inf*0 is possible and that is NA.  You could argue that limma does 
not allow fit coefficients to be Inf, and hence NA*0 should be zero in 
contrasts.fit.  I decided a long time ago that this might be nice in 
contrasts.fit, but would involve too much special-case programming which 
would be confusing and would slow down the function unsatifactorily.

You can work around it for your problem like this.  If you want to compute 
a contrast which involves only a subset of coefficients, simply subset to 
the coefficients you want before forming the contrast.  In your second 
example,

   > fit2 <- fit[,c(3,4)]
   > fit2 <- contrasts.fit( fit2, c(1,-1) )

etc does what you want.

I suspect it is impossible to program an entirely consistent behaviour in 
general when some coefs become NA.  For one thing, the meaning of 
remaining coefs is usually changed.  So I prefer to allow knowledgeable 
users to handle it as above.  I am open to other suggestions.  In my own 
microarray analysis, I always make sure this situation doesn't arise.

Best wishes
Gordon

On Tue, 31 Mar 2009, Simon Anders wrote:

> Dear Gordon et al.
>
> I've just noticed an oddity in the way how limma's contrasts.fit function 
> handles missing observations, and I wonder if it might be a bug.
>
> Please have a look at the following test case:
>
> I use the this design matrix:
>
>   > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) )
>   > dm
>        intercept a b
>   [1,]         1 0 0
>   [2,]         1 0 1
>   [3,]         1 1 0
>   [4,]         1 1 1
>
> Let's construct sample data for 5 genes and put in two NAs as follows:
>
>   > y <- t( replicate( 5, dm %*% runif(3) ) )
>   > y[ 1, 3:4 ] <- NA
>   > y
>               [,1]      [,2]      [,3]     [,4]
>   [1,] 0.099221925 0.5628846        NA       NA
>   [2,] 0.009771325 0.7748060 0.3977409 1.162776
>   [3,] 0.223688182 0.6330630 0.6791238 1.088499
>   [4,] 0.957762805 1.4338553 1.4259875 1.902080
>   [5,] 0.766597103 1.3905022 0.9947635 1.618669
>
> If I fit this data with lmFit, I unsurprisingly get a warning that some 
> coefficients cannot be estimated:
>
>   > library( limma )
>   > fit <- lmFit( y, dm )
>   Warning message:
>   Partial NA coefficients for 1 probe(s)
>
> The missing coefficient is the coefficient 'a' for the first gene. Note that 
> 'b' can be estimated:
>
>   > coefficients( fit )
>       intercept         a         b
>   [1,] 0.099221925        NA 0.4636627
>   [2,] 0.009771325 0.3879696 0.7650347
>   [3,] 0.223688182 0.4554356 0.4093748
>   [4,] 0.957762805 0.4682247 0.4760925
>   [5,] 0.766597103 0.2281664 0.6239051
>
> I now ask 'contrast.fit' to boil the fit object down to only contain the "b" 
> coefficients. I should get the same coefficients, but only the "b" column.
>
>   > fit2 <- contrasts.fit( fit, coefficients="b" )
>   > coefficients(fit2)
>                b
>   [1,]        NA
>   [2,] 0.7650347
>   [3,] 0.4093748
>   [4,] 0.4760925
>   [5,] 0.6239051
>
> Indeed, I get the same values, but the first value is now masked as 'NA'.
>
> Is there a reason for this, or is this a bug?
>
> Granted, in this example, the use of 'make.contrasts' is superfluous, but in 
> the following example, it is not. I place the NA, such that 'a' cannot be 
> estimated, and I get an NA in the contrast 'b-c':
>
>   > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) )
>   > dm <- rbind( cbind( dm, c=0 ), cbind( dm, c=1 ) )
>   > dm
>        intercept a b c
>   [1,]         1 0 0 0
>   [2,]         1 0 1 0
>   [3,]         1 1 0 0
>   [4,]         1 1 1 0
>   [5,]         1 0 0 1
>   [6,]         1 0 1 1
>   [7,]         1 1 0 1
>   [8,]         1 1 1 1
>   > y <- t( replicate( 5, dm %*% runif(4) ) )
>   > y[ 1, c(3,4,7,8) ] <- NA
>   > fit <- lmFit( y, dm )
>   Warning message:
>   Partial NA coefficients for 1 probe(s)
>   > coefficients( fit )
>         intercept         a          b          c
>   [1,] 0.17989906        NA 0.66812435 0.54889675
>   [2,] 0.26932489 0.9461271 0.77956666 0.05345084
>   [3,] 0.38124957 0.0309537 0.58205980 0.26414381
>   [4,] 0.50592287 0.9680045 0.86566150 0.96073095
>   [5,] 0.01829934 0.1103156 0.09401078 0.02140402
>   > fit2 <- contrasts.fit( fit, c( 0, 0, 1, -1 ) )
>   > coefficients(fit2)
>               [,1]
>   [1,]          NA
>   [2,]  0.72611582
>   [3,]  0.31791599
>   [4,] -0.09506945
>   [5,]  0.07260676
>
>
> Thanks in advance for any help in getting around this, as I have a lot of 
> data with quite some missing values.
>
> Best regards
>  Simon Anders
>
>   > sessionInfo()
>   R version 2.9.0 alpha (2009-03-24 r48212)
>   x86_64-unknown-linux-gnu
>
>   locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>
>   attached base packages:
>   [1] stats     graphics  grDevices utils     datasets  methods   base 
>
>   other attached packages:
>   [1] limma_2.17.12
>
>
> +---
> | Dr. Simon Anders, Dipl. Phys.
> | European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
> | office phone +44-1223-492680, mobile phone +44-7505-841692
> | preferred (permanent) e-mail: sanders at fs.tum.de
>
>
>
>