[BioC] limma: strange subsetting fit[, i] of MArrayLM object design

Gordon K Smyth smyth at wehi.EDU.AU
Tue May 20 07:00:41 CEST 2014


On Wed, 14 May 2014, Maciek Sykulski wrote:

> Thank you Gordon, that makes sense.
>
> Also, I assume that
>    > topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N)
> creates a topTable ordered by a F-statistics for coefficients 2,3 only.

Yes, as documented.  sort.by="F" is the default in this case, so could be 
omitted.

> I just couldn't pinpoint the source code where F-statistics for all 
> coefficients (computed by eBayes(fit,robust=TRUE) is "trimmed" to 
> F-statistics for only selected ones.

This is done by topTableF().  Not sure why you need to examine the source 
code in this way.  Why not just read the documentation, which says:

"topTableF ranks genes on the basis of moderated F-statistics for one or 
more coefficients. If topTable is called and coef has two or more 
elements, then the specified columns will be extracted from fit and 
topTableF called on the result."

Best wishes
Gordon

> Best regards,
> Maciek Sykulski
>
>
> On Wed, May 14, 2014 at 9:36 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Let me clarify that further.
>>
>> fit[,j] should not subset the design matrix at all.  The effect of fit[,j]
>> is actually the same as:
>>
>>   contrasts.fit(fit, contrast=u)
>>
>> where u[j]=1 and u is otherwise zero.  The design matrix is kept intact,
>> but the subsetted object keeps track of which contrasts are represented in
>> the coefficients matrix by means of the contrasts matrix.
>>
>> For example, the following code will reproduce in limma 3.19.10 and
>> earlier or limma 3.21.2 and later:
>>
>>  > example(lmFit)
>>  > fit[,2]$design
>>      Grp1 Grp2vs1
>>   [1,]    1       0
>>   [2,]    1       0
>>   [3,]    1       0
>>   [4,]    1       1
>>   [5,]    1       1
>>   [6,]    1       1
>>  > fit[,2]$contrasts
>>              Contrast
>>   Coefficient Grp2vs1
>>       Grp1          0
>>       Grp2vs1       1
>>
>> Best wishes
>> Gordon
>>
>> On Wed, 14 May 2014, Gordon K Smyth wrote:
>>
>>  Dear Maciek,
>>>
>>> The second index of an MArrayLM object subsets by coefficients.  You will
>>> easily see by typing dim(fit) or colnames(fit) and so on that the columns
>>> refer to coefficients.
>>>
>>> You are right that the current subsetting of the design matrix is
>>> incorrect. I've fixed this now in bioc-devel.
>>>
>>> All the limma subsetting code has been rewritten in past 12 months and
>>> the treatment of the design matrix was overlooked.  In any case, subsetting
>>> a design matrix is not always a sensible thing to do, and the subsetted
>>> matrix is not normally used downstream.
>>>
>>> Please upgrade to the latest version of Bioconductor.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>
>>>
>>> On Mon, 12 May 2014, Maciek Sykulski wrote:
>>>
>>>  Dear Gordon, dear maintainers of limma package,
>>>>
>>>> While writing some code based on the R package limma, Bioconductor
>>>> version:
>>>> Release 2.13,
>>>> I’ve noticed some strange code in topTable, and in subsetting fit[,i] of
>>>> "MArrayLM" object (I’m not sure if this results in any bug).
>>>>
>>>> Assume a following execution:
>>>>
>>>>  fit <- lmFit(data,design,...)
>>>>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F")
>>>>>
>>>>
>>>>
>>>> specifically, in the function topTable, the line fit <- eBayes(fit[,
>>>> coef])
>>>> raises my concerns (see comment below in the code).
>>>>
>>>> topTable <-function(...) {
>>>>
>>>> [...]
>>>>
>>>> if (length(coef) > 1) {
>>>>
>>>>    coef <- unique(coef)
>>>>
>>>>    if (length(fit$coef[1, coef]) < ncol(fit))
>>>>
>>>>        fit <- eBayes(fit[, coef])
>>>>
>>>>        # ^ Above reevaluates eBayes without robust=TRUE
>>>>
>>>>        # even if we want a robust estimation
>>>>
>>>> [...]
>>>>
>>>> }
>>>>
>>>>
>>>> Second confusion is about subsetting of fit object fit[, coef].
>>>>
>>>> fit[i,] subsets fit on genes (rows). Then, what kind of subsetting does
>>>> fit[,coef] perform? Is it by samples, or by coefficients?
>>>>
>>>> After inspection I see that fit[,coef] subsets the fit$design matrix on
>>>> samples, not on coefficients (while it does subset on columns from
>>>> fit$coefficients).
>>>> Inspection of the code responsible for subsetting "MArrayLM" reveals:
>>>>
>>>> subsetListOfArrays <- function(object,i,j,IJ,IX,I,JX)
>>>> {
>>>>    [...]
>>>>        for(a in IJ) object[[a]] <- object[[a]][i,j,drop=FALSE]
>>>>        for(a in IX) object[[a]] <- object[[a]][i, ,drop=FALSE]
>>>>        for(a in I ) object[[a]] <- object[[a]][i]
>>>>        for(a in JX) object[[a]] <- object[[a]][j, ,drop=FALSE]
>>>>        #Shouldn`t this line above look more like this?:
>>>>        #for(a in JX) object[[a]] <- object[[a]][, j,drop=FALSE]
>>>>    [...]
>>>> }
>>>>
>>>>
>>>> Here is an example of a design matrix (rows correspond to samples,
>>>> columns
>>>> to fitted coefficients):
>>>>
>>>>  class(fit)
>>>>>
>>>> [1] "MArrayLM"
>>>> attr(,"package")
>>>> [1] "limma"
>>>>
>>>>> fit$design
>>>>>
>>>>    cluster.1 cluster.2 cluster.3 cluster.4
>>>> 1           1         1         0         0
>>>> 2           1         1         0         1
>>>> 3           1         1         0         1
>>>> [...]
>>>> 42          1         1         0         0
>>>> 43          1         1         0         0
>>>> 44          1         0         0         0
>>>> [...]
>>>>
>>>>
>>>> Now, what I’d expect from an operation which subsets coefficients from a
>>>> MArrayLM model
>>>> (as in topTable line fit <- eBayes(fit[, coef])) is this
>>>>
>>>>    cluster.2 cluster.3
>>>> 1           1         0
>>>> 2           1         0
>>>> 3           1         0
>>>> [...]
>>>>
>>>>
>>>> moreover fit[,2:3]$coefficients is modified exactly in this way.
>>>>
>>>> However, the resulting fit[,2:3]$design is a subset on samples:
>>>>
>>>>  fit[,2:3]$design
>>>>>
>>>>    cluster.1 cluster.2 cluster.3 cluster.4
>>>> 2           1         1         0         1
>>>> 3           1         1         0         1
>>>>
>>>>
>>>> Is this really what fit <- eBayes(fit[, coef]) inside topTable intends to
>>>> achieve? This may be relevant, since subsetting fit this way propagates
>>>> further inside eBayes function, where it causes classifyTestsF not to run
>>>> (in case of my data), because of fit[,coef]$design being singular.
>>>>
>>>> eBayes <- function (fit, proportion = 0.01, stdev.coef.lim = c(0.1, 4),
>>>>    trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1))
>>>> {
>>>>    [...]
>>>>    #!!! Below condition always evaluates to false
>>>>    # because fit$design was malformed before and
>>>>    # is.fullrank always returns false
>>>>    if (!is.null(fit$design) && is.fullrank(fit$design)) {
>>>>        F.stat <- classifyTestsF(fit, fstat.only = TRUE)
>>>>        fit$F <- as.vector(F.stat)
>>>>   [...]
>>>> }
>>>>
>>>>
>>>> I’m not pointing to any specific misbehavior/bug, because it seems that
>>>>
>>>> topTable(eBayes(fit,robust=TRUE), coef=2:3,sort.by="F",number=N)
>>>>
>>>> still somehow produces the same results as my modified:
>>>>
>>>> topTable.fixed(eBayes.fixed(fit,robust=TRUE),coef=2:3,sort.by
>>>> ="F",robust=TRUE)
>>>>
>>>>
>>>> and different from non-robust:
>>>>
>>>> topTable(eBayes(fit,robust=FALSE), coef=2:3,sort.by="F")
>>>>
>>>>
>>>> So the final question is:
>>>> Does fit[,coef] modifies the fit$design matrix properly, in alignment
>>>> with
>>>> later check is.fullrank(fit$design)?
>>>>
>>>>
>>>> Attached is a simple example R session with these issues.
>>>>
>>>> Thanks,
>>>> Maciek Sykulski
>>>>
>>>
>>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list