[BioC] limma: coefficient not estimable when dataset reduced by a few samples
James W. MacDonald
jmacdon at uw.edu
Fri Aug 3 16:12:55 CEST 2012
Hi Natasha,
On 8/3/2012 8:11 AM, Natasha Sahgal wrote:
> Dear List,
>
> I have array data for 42 tumorBT - tumorAT patients. Initial analysis
> (adding patient and batch covariates to the model) I managed to obtain a
> list of DE genes.
>
> However now, I need to reduce this set of 42 to 33 patients pairs and
> reanalyse the data. This time though it seems that it can't estimate the
> coefficient of one patient, would this not affect my results as NA
> coefficients are produced?
>
> What I do not understand is what has changed for this to happen now? How
> do I rectify it?
You don't give any information about the design matrix with all 42
patients, but I wonder how you were able to fit a paired analysis with a
batch effect in that case either. When you fit a paired analysis by
estimating a per-patient block effect, you by definition are already
estimating the maximum number of coefficients that you can with the
number of samples in hand. As an example:
> treat <- factor(rep(1:2, 42))
> batch <- factor(rep(1:2, each = 42))
> patient <- factor(rep(1:42, each= 2))
> is.fullrank(model.matrix(~treat+patient))
[1] TRUE
> is.fullrank(model.matrix(~treat+batch+patient))
[1] FALSE
So I don't see how you will be able to fit a paired analysis by blocking
on patient if you have a batch effect as well, regardless of the number
of patients.
However, you can always do a conventional paired analysis, by first
calculating the paired differences for each patient and then fitting the
model. Note that in this case the coefficient you are estimating is the
intra-pair difference, and you are testing that this difference is
different from zero (or in other words, the contrast is implicit in the
coefficient, so you don't need contrasts.fit()).
Best,
Jim
>
> Code:
> dim(tum.dat.red)
> 27555 66
>
> ## Overall Sample Descriptions
>> batch = as.factor(t.info.red$Batch)
> batch
> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 1 2 2
> [39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> Levels: 1 2
>
>
>> treat = droplevels(t.info.red$Status_Treatment)
>> treat = as.factor(treat, levels=c("Tumour_BT", "Tumour_AT"))
> treat
> [1] Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT
> [8] Tumour_AT Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT
> [15] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT
> [22] Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_BT Tumour_AT
> [29] Tumour_BT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_AT Tumour_AT
> [36] Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT Tumour_BT
> [43] Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT Tumour_AT
> [50] Tumour_AT Tumour_AT Tumour_AT Tumour_BT Tumour_BT Tumour_BT Tumour_BT
> [57] Tumour_BT Tumour_BT Tumour_BT Tumour_AT Tumour_AT Tumour_AT Tumour_AT
> [64] Tumour_AT Tumour_AT Tumour_AT
> Levels: Tumour_BT Tumour_AT
>
>
>> patient = as.factor(t.info.red$Trial.number)
> patient
> [1] 13 25 13 25 18 28 18 28 30 39 30 53 39 53 32 42 32 51 42 51 34 45 34
> 55 45
> [26] 55 54 54 37 57 37 57 35 52 35 52 17 7 12 10 14 11 16 15 14 16 15 17
> 7 12
> [51] 10 11 20 26 36 40 48 23 24 23 20 24 26 36 40 48
> 33 Levels: 7 10 11 12 13 14 15 16 17 18 20 23 24 25 26 28 30 32 34 35 36
> ... 57
>
>
> ## Deisgning model with BATCH effect
>> design.tum = model.matrix(~treat+batch+patient)
> ## Fit model& make contrasts
>> fit.tum<- lmFit(tum.dat.red, design.tum)
> Coefficients not estimable: patient57
> Warning message:
> Partial NA coefficients for 27555 probe(s)
>
>
> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] limma_3.12.0
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.0
>
>
>
> Many Thanks in advance,
> Natasha
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list