[BioC] how to determine what kind of dispersion to use
Ryan
rct at thompsonclan.org
Tue Feb 18 04:55:19 CET 2014
Hello,
In general, the advice is to use tagwise dispersions (which are, by
default, moderated by the dispersion trend according to the
auto-determined prior df) unless that is not an option. The most common
reason that one could be unable to use tagwise dispersions would be if
the dataset had no biological replicates.
With regard to testing the edgeR dispersion assumptions in a RIP-seq
context, I would think that the most important assumption to test is
whether the RIP-seq samples have the same dispersions as the regular
RNA-seq control samples. This is important to look at since edgeR
assumes that the dispersion for a given gene does not vary across
samples or conditions. (This is analagous to doing a t-test with the
assumption of equal variance between groups.) I would recommend that
you split your dataset into RIP-seq only and RNA-seq only and estimate
dispersions on both. Then call plotBCV on both datasets and see if the
common and trended dispersions look similar (you will probably want to
use the same xlim and ylim arguments for both calls to plotBCV so the
scales are comparable). However, even if this is not the case, Gordon
has replied previously that if the dispersions are different in each
group, the test will at worst be over-conservative, meaning that you
might get some false negatives but you should not get extra false
positives.
However, I think the most important issue to look at with RIP-seq is
probably the normalization factors. Generally, the assumption behind
most normalization methods is that the "average" fold change should be
zero, i.e. that most genes are not changing, and they differ in how
they compute this average (trimmed mean, quantile, etc.). However, you
need to think carefully about what assumption you can make about the
relationship between a RIP-seq sample and the matched RNA-seq sample.
Remember that in general, high-throughput sequencing is not capable of
absolute quantitation, since most sequencing methods produce the same
quantity of reads regardless of the size of the input. Therefore, you
cannot sidestep the issue of normalization, and you have to make some
*a priori* assumption about how to normalize the samples. I'm not sure
what that would be for RIP-seq, and it may depend on what question you
want to ask. For example, if you are only going to be testing
differential RIP pulldowns relative to expression level, the
normalization between RIP and RNA-seq is not as important, because it
will cancel out anyway.
Hopefully this clarifies some of the issues you need to contend with.
In general, I expect that edgeR and similar methods are suitable for
use in analyzing RIP-seq data.
-Ryan
On Mon Feb 17 17:53:40 2014, J [guest] wrote:
>
> Hello listserve,
>
> Most analysis performed in edgeR rightfully assumes that the data is not Poisson and in fact follows a NB distribution. This information is important when shrinking the dispersions, however I was wondering if there was a graph or function in edgeR that I could make/use to determine what kind of dispersion (i.e. common, moderated tagwise) I need to apply in the exactTest function?
>
> I'm not doing a typical RNA-seq experiment (i.e. RIP-seq) so I would like to test which parts of the classic workflow are appropriate for what I'm doing. For instance, can I still use the same equation to figure out the prior.df, or will that not apply to RIP-seq?
>
> After doing some comparisons between the different functions and arguments within them I'm wondering if RIP-seq may pose a problem when trying to use the moderated dispersion since the reads in the untagged IP will generally be less than the IP samples. Does that seem like a possibility?
>
> Also for the dispersion argument in the exactTest() function, are there good rules of thumb of when to use "common", "trended", "tagwise" or "auto"?
>
> Thanks
>
>
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] LSD_2.5 ellipse_0.3-8 schoolmath_0.4
> [4] colorRamps_2.3 RColorBrewer_1.0-5 gtools_3.2.1
> [7] MASS_7.3-29 edgeR_3.4.2 limma_3.18.12
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list