[BioC] how to determine what kind of dispersion to use

Tue Feb 18 04:55:19 CET 2014

Hello,

In general, the advice is to use tagwise dispersions (which are, by 
default, moderated by the dispersion trend according to the 
auto-determined prior df) unless that is not an option. The most common 
reason that one could be unable to use tagwise dispersions would be if 
the dataset had no biological replicates.

With regard to testing the edgeR dispersion assumptions in a RIP-seq 
context, I would think that the most important assumption to test is 
whether the RIP-seq samples have the same dispersions as the regular 
RNA-seq control samples. This is important to look at since edgeR 
assumes that the dispersion for a given gene does not vary across 
samples or conditions. (This is analagous to doing a t-test with the 
assumption of equal variance between groups.) I would recommend that 
you split your dataset into RIP-seq only and RNA-seq only and estimate 
dispersions on both. Then call plotBCV on both datasets and see if the 
common and trended dispersions look similar (you will probably want to 
use the same xlim and ylim arguments for both calls to plotBCV so the 
scales are comparable). However, even if this is not the case, Gordon 
has replied previously that if the dispersions are different in each 
group, the test will at worst be over-conservative, meaning that you 
might get some false negatives but you should not get extra false 
positives.

However, I think the most important issue to look at with RIP-seq is 
probably the normalization factors. Generally, the assumption behind 
most normalization methods is that the "average" fold change should be 
zero, i.e. that most genes are not changing, and they differ in how 
they compute this average (trimmed mean, quantile, etc.). However, you 
need to think carefully about what assumption you can make about the 
relationship between a RIP-seq sample and the matched RNA-seq sample. 
Remember that in general, high-throughput sequencing is not capable of 
absolute quantitation, since most sequencing methods produce the same 
quantity of reads regardless of the size of the input. Therefore, you 
cannot sidestep the issue of normalization, and you have to make some 
*a priori* assumption about how to normalize the samples. I'm not sure 
what that would be for RIP-seq, and it may depend on what question you 
want to ask. For example, if you are only going to be testing 
differential RIP pulldowns relative to expression level, the 
normalization between RIP and RNA-seq is not as important, because it 
will cancel out anyway.

Hopefully this clarifies some of the issues you need to contend with. 
In general, I expect that edgeR and similar methods are suitable for 
use in analyzing RIP-seq data.

-Ryan

On Mon Feb 17 17:53:40 2014, J [guest] wrote:
>
> Hello listserve,
>
> Most analysis performed in edgeR rightfully assumes that the data is not Poisson and in fact follows a NB distribution. This information is important when shrinking the dispersions, however I was wondering if there was a graph or function in edgeR that I could make/use to determine what kind of dispersion (i.e. common, moderated tagwise) I need to apply in the exactTest function?
>
> I'm not doing a typical RNA-seq experiment (i.e. RIP-seq) so I would like to test which parts of the classic workflow are appropriate for what I'm doing. For instance, can I still use the same equation to figure out the prior.df, or will that not apply to RIP-seq?
>
> After doing some comparisons between the different functions and arguments within them I'm wondering if RIP-seq may pose a problem when trying to use the moderated dispersion since the reads in the untagged IP will generally be less than the IP samples. Does that seem like a possibility?
>
> Also for the dispersion argument in the exactTest() function, are there good rules of thumb of when to use  "common", "trended", "tagwise" or "auto"?
>
> Thanks
>
>
>
>
>   -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] LSD_2.5            ellipse_0.3-8      schoolmath_0.4
> [4] colorRamps_2.3     RColorBrewer_1.0-5 gtools_3.2.1
> [7] MASS_7.3-29        edgeR_3.4.2        limma_3.18.12
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor