[BioC] DEXSeq continous variable in model

Alejandro Reyes alejandro.reyes at embl.de
Fri Aug 1 11:29:56 CEST 2014


Dear Alex Gutteridge,

Thanks for your detailed e-mails and sorry for the delay in the reply.

About the fact that you do get significant hits if you specify factors 
instead of your continuous variable, Is it a thresholding effect or the 
exons that you detect when specifying the factorial design have large 
p-values with the continuous design?

About the exon fold changes error, this function does not support 
continuous variables.  I am unsure how would a fold change be with a 
continuous variable? I would maybe partition the "Age" into more than 2 
factors based on e.g. quantiles or simply make plots like the one you 
showed!

Bests,
Alejandro Reyes



> On 30-07-2014 10:13, Alex Gutteridge wrote:
>> On 29-07-2014 10:00, Alex Gutteridge wrote:
>>> Hi,
>>>
>>> I have a time course RNASeq experiment and I'd like to detect DEU
>>> between early and late stages. I am trying to use 'Age' as a
>>> continuous variable in my design, but I'm getting an error which I
>>> suspect is related to this e.g:
>>>
>>>> summary(sample.data.ipsc[,1:6])
>>>     Sample          CellType     Subtype       Donor         Age
>>>  Length:28          iPSC:28   iPSC   :28   4555   :28   Min.   : 2.000
>>>  Class :character             HBR    : 0   D4721  : 0   1st Qu.: 4.000
>>>  Mode  :character             L2     : 0   D4749  : 0   Median : 8.000
>>>                               L3     : 0   D6002  : 0   Mean   : 7.821
>>>                               L4     : 0   D6005  : 0   3rd Qu.:10.500
>>>                               L5     : 0   D6008  : 0   Max.   :14.000
>>>                               (Other): 0   (Other): 0
>>>  Passage
>>>  42:3
>>>  48:7
>>>  49:5
>>>  52:6
>>>  56:7
>>>
>>>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>>>                                sampleData=sample.data.ipsc,
>>>                                design= ~ sample + exon + Age:exon,
>>>                                flattenedfile=flattenedFile)
>>>
>>>> BPPARAM = MulticoreParam(workers=24)
>>>
>>>> dxd = estimateSizeFactors(dxd)
>>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>>>> dxd = estimateExonFoldChanges(dxd, fitExpToVar="Age", BPPARAM=BPPARAM)
>>> Error: 8346 errors; first error:
>>>   Error in FUN(1:3[[1L]], ...): Non-factor in model frame
>>>
>>> For more information, use bplasterror(). To resume calculation, re-call
>>>   the function and set the argument 'BPRESUME' to TRUE or wrap the
>>>   previous call in bpresume().
>>>
>>> First traceback:
>>>   31: estimateExonFoldChanges(dxd, fitExpToVar = "Age", BPPARAM =
>>> BPPARAM)
>>>   30: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>>>   29: bplapply(testablegenes, geteffects, BPPARAM = BPPARAM)
>>>   28: mclapply(X = X, FUN = FUN, ..., mc.set.seed = BPPARAM$setSeed,
>>>           mc.silent = !BPPARAM$verbose, mc.cores = bpworkers(BPPARAM),
>>>           mc.cleanup = if (BPPARAM$cleanup) BPPARAM$cleanupSignal
>>> else FALSE)
>>>   27: lapply(seq_len(cores), inner.do)
>>>   26: lapply(seq_len(cores), inner.do)
>>>   25: FUN(1:24[[1L]], ...)
>>>   24: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
>>>   23: parallel:::sendMaster(...)
>>>   22: try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
>>>   21: tryCatch(expr, error = function(e)
>>>
>>> Can DEXSeq accept a model with a continuous variable? Does it make
>>> sense to do so? (I do the same thing with DESeq2 to detected DE and it
>>> works fine). Is this error due to that? Note that all the other steps
>>> seem to run fine and I can get results (though I don't have many
>>> significant hits - not sure if this is related or not). If not what is
>>> best practice? Just split the data set into 'early' and 'late' samples
>>> and run that as a factor?
>>>
>>> Alex Gutteridge
>>
>> To sort of answer my own question: cutting the age vector into two
>> bins (early <8 weeks and late >8 weeks) and using that factor seems to
>> work and give many significant hits:
>>
>>> sample.data.ipsc$AgeBin = cut(sample.data.ipsc$Age,2)
>>
>>> dxd = DEXSeqDataSetFromHTSeq(countFiles,
>>                                sampleData=sample.data.ipsc,
>>                                design= ~ sample + exon + AgeBin:exon,
>>                                flattenedfile=flattenedFile)
>>
>>> BPPARAM = MulticoreParam(workers=12)
>>
>>> dxd = estimateSizeFactors(dxd)
>>> dxd = estimateDispersions(dxd, BPPARAM=BPPARAM)
>>> dxd = testForDEU(dxd, BPPARAM=BPPARAM)
>>
>> Which makes me think that DEXSeq wasn't doing what I hoped with the
>> previous analysis (quite apart from the error). Is there any way to
>> rejig the design to cope with a continuous variable as opposed to a
>> factor? I *think* the underlying biological question makes sense: Are
>> there exons showing DEU that correlates with time?
>>
>> Alex Gutteridge
>
> I'm just going to bump this question one last time as I'd love to know
> if there's a better solution to what I have now. In short: I have a
> reasonable detailed time course RNASeq experiment (8 timepoints; 2-14
> weeks; 3-4 replicates at all time points except one) and would like to
> detect time dependent DEU using DEXSeq. Slicing the timecourse into two
> bins (at 8 weeks) and doing a straight early vs late comparison works
> and detects plenty of interesting stuff. Trying to model age as a
> continuous variable gives an error when determining fold changes and
> doesn't report any significant DEU.
>
> See first plot here for an example (exons 27/28 show strong DEU
> comparing <8 weeks to >8 week samples) shown with a DEXSeq plot:
>
> http://dx.doi.org/10.6084/m9.figshare.1124172
>
> But the 8 week split is entirely arbitrary, really time (Age) is a
> continuous variable and so I'd like to model it as such (maybe this is
> just OCD on my behalf, but I'm sure I must also be loosing some power by
> crudely splitting the timecourse like this; also DESeq2 lets me model
> the genewise DE with a continuous variable).
>
> The second plot from the figshare link shows the RPKMs of those two
> exons (27/28). On top of a background of increased gene-wise expression
> there is clearly a time dependent effect on the relative usage of these
> two exons, which it must be possible to model better than a crude split
> at 8w (black dashed line). Is there any trick to do this with DEXSeq?
>
> Alex Gutteridge
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list