[BioC] DEXSeq on two-exon genes: how to specify a formula without redundant terms
Alejandro Reyes
alejandro.reyes at embl.de
Fri May 24 14:19:07 CEST 2013
Dear Manikandan Narayanan,
Sorry for the delayed reply!
Thanks for your e-mail! It has led us to spot two minor bugs and
therefore modified the DEXSeq code in two aspects:
Firstly, redundant terms were removed in DEXSeq in a very naive way: a
lm.fit model was first fitted and the columns from the model matrix with
NA coefficients were removed from the model matrix, now a proper
function has been written for this purpose (with basically the code to
what you also proposed). Secondly, our "if" statement that you mentioned
was wrongly located within the function, which was causing your error
message. It has been relocated and now it seems to work for its purpose
( I tested with your model frame and formula).
You will find the changes in the last version of the svn (1.7.2). Let me
know if it works for you or if you have additional questions.
Best regards,
Alejandro Reyes
> Hi,
> Would someone be kind enough to clarify the two questions raised about DEXSeq in my earlier email below?
>
> Would attaching any further information help? I am working with DEXSeq 1.6.0, and here are the model frame and model matrices if it makes it easier. Thanks!
>
>> mf
> sample exon sizeFactor condition batch dispersion count
> 1 Untr_biorep1 E001 1.0517803 Untr biorep1 0.13523549 25
> 2 Untr_biorep1 E002 1.0517803 Untr biorep1 0.01656906 708
> 3 LPS_biorep1 E001 1.0010474 LPS biorep1 0.13523549 20
> 4 LPS_biorep1 E002 1.0010474 LPS biorep1 0.01656906 442
> 5 Untr_biorep2 E001 0.8920483 Untr biorep2 0.13523549 26
> 6 Untr_biorep2 E002 0.8920483 Untr biorep2 0.01656906 947
> 7 LPS_biorep2 E001 1.1090401 LPS biorep2 0.13523549 25
> 8 LPS_biorep2 E002 1.1090401 LPS biorep2 0.01656906 884
>
>> mm
> (Intercept) sampleLPS_biorep2 sampleUntr_biorep1 sampleUntr_biorep2
> 1 1 0 1 0
> 2 1 0 1 0
> 3 1 0 0 0
> 4 1 0 0 0
> 5 1 0 0 1
> 6 1 0 0 1
> 7 1 1 0 0
> 8 1 1 0 0
> conditionUntr batchbiorep2 exonE002 conditionUntr:exonE002
> 1 1 0 0 0
> 2 1 0 1 1
> 3 0 0 0 0
> 4 0 0 1 0
> 5 1 1 0 0
> 6 1 1 1 1
> 7 0 1 0 0
> 8 0 1 1 0
> batchbiorep2:exonE002
> 1 0
> 2 0
> 3 0
> 4 0
> 5 0
> 6 1
> 7 0
> 8 1
> attr(,"assign")
> [1] 0 1 1 1 2 3 4 5 6
> attr(,"contrasts")
> attr(,"contrasts")$sample
> [1] "contr.treatment"
>
> attr(,"contrasts")$condition
> [1] "contr.treatment"
>
> attr(,"contrasts")$batch
> [1] "contr.treatment"
>
> attr(,"contrasts")$exon
> [1] "contr.treatment"
>
>
>
>
>
>
>
> From: Narayanan, Manikandan (NIH/NIAID) [E]
> Sent: Thursday, May 16, 2013 11:14 AM
> To: bioconductor at r-project.org
> Subject: DEXSeq on two-exon genes: how to specify a formula without redundant terms
>
> Hi DEXSeq users/developers,
> I have used DEXSeq successfuly for genes with many exons and really like the diagnostic/visualization plots that come with it. Recently though, for genes with two testable exons, I am getting the "Underdetermined model; cannot estimate dispersions." error.
>
> I figure this is due to redundant terms in my formula as shown in PS below. So my questions are:
>
> 1) Is there a way to specify the formula count ~ sample + (condition + batch) * exon so that redundant terms 'condition + batch' are removed?
>
> 2) If not, is it safe to change ncol(mm) to qr(mm)$rank (i.e., rank of model matrix to remove redundant terms) in this piece of code in estimateExonDispersionsForModelFrame:
> if (nrow(mm) <= ncol(mm))
> stop("Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.")
>
> Would changing the code this way violate any assumptions of the DEXSeq model?
>
>
> Thank you,
> Mani
>
>
> PS: # condition + batch terms are redundant as sample term is already present!
>> formulaDispersion
> count ~ sample + (condition + batch) * exon
>
>> design(ecs)
> condition batch
> Untr_biorep1 Untr biorep1
> LPS_biorep1 LPS biorep1
> Untr_biorep2 Untr biorep2
> LPS_biorep2 LPS biorep2
>
>> colnames(model.matrix(formulaDispersion, mf))
> [1] "(Intercept)" "sampleLPS_biorep2" "sampleUntr_biorep1"
> [4] "sampleUntr_biorep2" "conditionUntr" "batchbiorep2"
> [7] "exonE002" "conditionUntr:exonE002" "batchbiorep2:exonE002"
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list