[BioC] DEXSeq continous variable in model
Alejandro Reyes
alejandro.reyes at embl.de
Fri Aug 1 11:29:56 CEST 2014
Dear Alex Gutteridge,
Thanks for your detailed e-mails and sorry for the delay in the reply.
About the fact that you do get significant hits if you specify factors
instead of your continuous variable, Is it a thresholding effect or the
exons that you detect when specifying the factorial design have large
p-values with the continuous design?
About the exon fold changes error, this function does not support
continuous variables. I am unsure how would a fold change be with a
continuous variable? I would maybe partition the "Age" into more than 2
factors based on e.g. quantiles or simply make plots like the one you
showed!
Bests,
Alejandro Reyes
> On 30-07-2014 10:13, Alex Gutteridge wrote:
>> On 29-07-2014 10:00, Alex Gutteridge wrote:
>>> Hi,
>>>
>>> I have a time course RNASeq experiment and I'd like to detect DEU
>>> between early and late stages. I am trying to use 'Age' as a
>>> continuous variable in my design, but I'm getting an error which I
>>> suspect is related to this e.g:
>>>
>>>> summary(sample.data.ipsc[,1:6])
>>> Sample CellType Subtype Donor Age
>>> Length:28 iPSC:28 iPSC :28 4555 :28 Min. : 2.000
>>> Class :character HBR : 0 D4721 : 0 1st Qu.: 4.000
>>> Mode :character L2 : 0 D4749 : 0 Median : 8.000
>>> L3 : 0 D6002 : 0 Mean : 7.821
>>> L4 : 0 D6005 : 0 3rd Qu.:10.500
>>> L5 : 0 D6008 : 0 Max. :14.000
>>> (Other): 0 (Other): 0
>>> Passage
>>> 42:3
>>> 48:7
>>> 49:5
>>> 52:6
>>> 56:7
>>>
>>>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>>> sampleData=sample.data.ipsc,
>>> design= ~ sample + exon + Age:exon,
>>> flattenedfile=flattenedFile)
>>>
>>>> BPPARAM = MulticoreParam(workers=24)
>>>
>>>> dxd = estimateSizeFactors(dxd)
>>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", BPPARAM=BPPARAM)
>>> Error: 8346 errors; first error:
>>> Error in FUN(1:3[[1L]], ...): Non-factor in model frame
>>>
>>> For more information, use bplasterror(). To resume calculation, re-call
>>> the function and set the argument 'BPRESUME' to TRUE or wrap the
>>> previous call in bpresume().
>>>
>>> First traceback:
>>> 31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM =
>>> BPPARAM)
>>> 30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>>> 29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>>> 28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed,
>>> mc.silent = !BPPARAM$verbose, mc.cores = bpworkers(BPPARAM),
>>> mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal
>>> else FALSE)
>>> 27: lapply(seq_len(cores), inner.do)
>>> 26: lapply(seq_len(cores), inner.do)
>>> 25: FUN(1:24[[1L]], ...)
>>> 24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
>>> 23: parallel:::sendMaster(...)
>>> 22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
>>> 21: tryCatch(expr, error = function(e)
>>>
>>> Can DEXSeq accept a model with a continuous variable? Does it make
>>> sense to do so? (I do the same thing with DESeq2 to detected DE and it
>>> works fine). Is this error due to that? Note that all the other steps
>>> seem to run fine and I can get results (though I don't have many
>>> significant hits - not sure if this is related or not). If not what is
>>> best practice? Just split the data set into 'early' and 'late' samples
>>> and run that as a factor?
>>>
>>> Alex Gutteridge
>>
>> To sort of answer my own question: cutting the age vector into two
>> bins (early <8 weeks and late >8 weeks) and using that factor seems to
>> work and give many significant hits:
>>
>>> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2)
>>
>>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>> sampleData=sample.data.ipsc,
>> design= ~ sample + exon + AgeBin:exon,
>> flattenedfile=flattenedFile)
>>
>>> BPPARAM = MulticoreParam(workers=12)
>>
>>> dxd = estimateSizeFactors(dxd)
>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>>
>> Which makes me think that DEXSeq wasn't doing what I hoped with the
>> previous analysis (quite apart from the error). Is there any way to
>> rejig the design to cope with a continuous variable as opposed to a
>> factor? I *think* the underlying biological question makes sense: Are
>> there exons showing DEU that correlates with time?
>>
>> Alex Gutteridge
>
> I'm just going to bump this question one last time as I'd love to know
> if there's a better solution to what I have now. In short: I have a
> reasonable detailed time course RNASeq experiment (8 timepoints; 2-14
> weeks; 3-4 replicates at all time points except one) and would like to
> detect time dependent DEU using DEXSeq. Slicing the timecourse into two
> bins (at 8 weeks) and doing a straight early vs late comparison works
> and detects plenty of interesting stuff. Trying to model age as a
> continuous variable gives an error when determining fold changes and
> doesn't report any significant DEU.
>
> See first plot here for an example (exons 27/28 show strong DEU
> comparing <8 weeks to >8 week samples) shown with a DEXSeq plot:
>
> http://dx.doi.org/10.6084/m9.figshare.1124172
>
> But the 8 week split is entirely arbitrary, really time (Age) is a
> continuous variable and so I'd like to model it as such (maybe this is
> just OCD on my behalf, but I'm sure I must also be loosing some power by
> crudely splitting the timecourse like this; also DESeq2 lets me model
> the genewise DE with a continuous variable).
>
> The second plot from the figshare link shows the RPKMs of those two
> exons (27/28). On top of a background of increased gene-wise expression
> there is clearly a time dependent effect on the relative usage of these
> two exons, which it must be possible to model better than a crude split
> at 8w (black dashed line). Is there any trick to do this with DEXSeq?
>
> Alex Gutteridge
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list