[BioC] EdgeR Design matrix not of full rank. The following coefficients not estimable errorR
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Jan 2 04:46:11 CET 2014
Dear Eugene,
On Sat, 21 Dec 2013, Eugene Bolotin wrote:
> Dear Gordon,
> I apologize if I was a bit unclear, I actually simplified my problem a
> little bit for the post so it would fit into this bio-conductor post. I
> actually have 10+ samples in batch 1157, but that batch does not contain
> any "tumor" samples. I have additional similar batches some with tumor
> "some" without "tumor" samples. I want to remove batch specific differences
> between all samples. edgeR however gives me the same error, no matter how
> many samples I have in the batch, but does not give me this error if I
> remove all batches which do not contain any "tumor" samples.
If the problem that you posted was not your real problem, then please post
your real problem.
But let me say that it will never be possible to estimate a batch effect
for a group of samples every one of which it also has its own unique
treatment condition, regardless of how many of these there are. To do so,
would be to estimate n+1 parameters from n observations. It is a
universal rule of statistics that you cannot estimate more unknown
parameters than you have observations.
> Can I just take residuals of logged count data after performing the
> linear regression on the batch factor? Can I then then feed the
> residuals into edgeR linear modeling? I want to compare how much each
> sample/patient/vector differs from average "tumor" sample. The batches
> are quite large with >10 samples each, and I have ~300 total samples.
No you can't. edgeR only complains that the problem is non-estimable when
it is truly impossible to estimate all the parameters. Impossible means
impossible. If it was possible to work around by an ad hoc method such as
you describe, then edgeR would have already done that.
Best wishes
Gordon
> Thanks a ton,
> Eugene
>
>
>
> On Sat, Dec 21, 2013 at 3:31 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Eugene,
>>
>> According to your design, Sample 31 is a unique treatment unto itself, and
>> also a unique batch unto itself. Obviously it is impossible to estimate
>> both the batch effect and the treatment effect from one sample. Hence the
>> error message.
>>
>> Best wishes
>> Gordon
>>
>> Date: Fri, 20 Dec 2013 16:49:43 -0800 (PST)
>>> From: "Eugene Bolotin [guest]" <guest at bioconductor.org>
>>> To: bioconductor at r-project.org, elbolotin at gmail.com
>>> Subject: [BioC] EdgeR Design matrix not of full rank. The following
>>> coefficients not estimable erroR
>>>
>>>
>>> Hi I have the following samples:
>>> batch
>>> [1] 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802 2055 1802 1802
>>> 2055
>>> [16] 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055
>>> 1802 1802
>>> [31] 1157 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802 1802
>>> 2055 2055
>>> [46] 2055 2055 2055 2055 2055 2055 2055 2055 2055 2055
>>> Levels: 1157 1802 2055
>>> treatment
>>> [1] TCGA-BR-6452 TCGA-BR-6453 tumor TCGA-BR-6454 tumor
>>> [6] TCGA-BR-6455 TCGA-BR-6456 TCGA-BR-6457 tumor TCGA-BR-6458
>>> [11] tumor TCGA-BR-6563 TCGA-BR-6565 TCGA-BR-6566 TCGA-BR-7196
>>> [16] TCGA-BR-7703 tumor TCGA-BR-7704 tumor TCGA-BR-7707
>>> [21] TCGA-BR-7715 tumor TCGA-BR-7716 tumor TCGA-BR-7717
>>> [26] tumor TCGA-BR-7723 TCGA-CD-5804 TCGA-CG-4437 TCGA-CG-4441
>>> [31] TCGA-CG-4476 TCGA-CG-5716 TCGA-D7-6518 TCGA-D7-6519 TCGA-D7-6520
>>> [36] TCGA-D7-6521 TCGA-D7-6522 TCGA-D7-6524 TCGA-D7-6525 TCGA-D7-6526
>>> [41] TCGA-D7-6527 TCGA-D7-6528 TCGA-F1-6177 TCGA-F1-6875 TCGA-FP-7735
>>> [46] tumor TCGA-FP-7829 tumor TCGA-HF-7131 TCGA-HF-7132
>>> [51] TCGA-HF-7133 TCGA-HF-7134 TCGA-HF-7136 TCGA-IN-7806 tumor
>>> 44 Levels: TCGA-BR-6452 TCGA-BR-6453 TCGA-BR-6454 TCGA-BR-6455 ... tumor
>>>
>>>
>>>
>>>
>>>
>>> I want to compare each sample from TCGA_X, to average mutant background,
>>> I know it is possible, because I was able to do it using standard commands.
>>> However, when I try to adjust for batch effects as follows:
>>> design=model.matrix(~batch+treatment)
>>> names(data.frame(design))
>>> group=treatment
>>> y=readDGE(files, path=wd, columns=c(1,2), group=group)
>>> #names(data.frame(design))
>>> design=model.matrix(~0+batch+treatment)
>>>
>>> names(data.frame(design))
>>> #rownames(design)=colnames(y)
>>> design
>>>
>>> y = estimateGLMCommonDisp(y, design, verbose=TRUE)
>>>>
>>> Error in glmFit.default(y, design = design, dispersion = dispersion,
>>> offset = offset, :
>>> Design matrix not of full rank. The following coefficients not
>>> estimable:
>>> treatmentTCGA-CG-4476
>>> as far as i can tell it is because the batch 1157 contains a normal
>>> sample but does not contain any tumor samples.
>>> Is there a way around that?
>>> Thanks,
>>> Eugene
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>> sessionInfo()
>>>>
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] edgeR_3.4.2 limma_3.18.6
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_3.0.2
>>>
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list