[BioC] DEXSeq on two-exon genes: how to specify a formula without redundant terms
Narayanan, Manikandan (NIH/NIAID) [E]
manikandan.narayanan at nih.gov
Thu May 16 17:14:26 CEST 2013
Hi DEXSeq users/developers,
I have used DEXSeq successfuly for genes with many exons and really like the diagnostic/visualization plots that come with it. Recently though, for genes with two testable exons, I am getting the "Underdetermined model; cannot estimate dispersions." error.
I figure this is due to redundant terms in my formula as shown in PS below. So my questions are:
1) Is there a way to specify the formula count ~ sample + (condition + batch) * exon so that redundant terms 'condition + batch' are removed?
2) If not, is it safe to change ncol(mm) to qr(mm)$rank (i.e., rank of model matrix to remove redundant terms) in this piece of code in estimateExonDispersionsForModelFrame:
if (nrow(mm) <= ncol(mm))
stop("Underdetermined model; cannot estimate dispersions. Maybe replicates have not been properly specified.")
Would changing the code this way violate any assumptions of the DEXSeq model?
Thank you,
Mani
PS: # condition + batch terms are redundant as sample term is already present!
> formulaDispersion
count ~ sample + (condition + batch) * exon
> design(ecs)
condition batch
Untr_biorep1 Untr biorep1
LPS_biorep1 LPS biorep1
Untr_biorep2 Untr biorep2
LPS_biorep2 LPS biorep2
> colnames(model.matrix(formulaDispersion, mf))
[1] "(Intercept)" "sampleLPS_biorep2" "sampleUntr_biorep1"
[4] "sampleUntr_biorep2" "conditionUntr" "batchbiorep2"
[7] "exonE002" "conditionUntr:exonE002" "batchbiorep2:exonE002"
More information about the Bioconductor
mailing list