[BioC] A few Q's on using DEXSeq with mucho data

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Mar 8 19:31:36 CET 2012

Hi Simon,

Thanks for the detailed response.

I just wanted to clarify one bit regarding the call to
`fitDispersionFunction` regarding the concern I rose in my email back
to myself, which is that we are fitting the dispersion.

You say:

> If some of your conditions have much higher variability than the other
> conditions, they will in fact cause you to lose power even in comparisons
> which do not involve them. This would be an argument for subsetting before
> dispersion estimation.

I'm not so much concerned with, say, an experiment (or batch of
experiments) showing more variability between read counts (although it
is a possibility) for the same bin in the same condition.

I was trying to convey my concern (maybe wrong) that the "fitted
dispersion" we end up using is a function of the mean read count for
the bin, where the mean is calculated from the expression of the gene
across all samples/conditions.

It could be that in one particular two-experiment comparison I want to
make, the expression of the exon is quite high in both samples. In
this case, the higher "averaged normalzed count value" (x-axis of fig
2 in your pre-paper) would likely be associated w/ a lower dispersion
when doing the test therefore increasing our power.

It could be, however, that in the rest of the conditions the gene (and
therefore the exon) would be expressed at a lower level, and the
dispersion for that bin would then be estimated at a higher amount,
decreasing the power in this case.

So -- I feel like I'd like to use all the data to calculate the "mean
normalized count" to dispersion line of best fit (again, Fig 2 in the
pre-paper) ... so far, so good. But for each pairwise test between
conditions I now want to perform, I'd set the `dispFitted` column for
the bin by using the mean expression of the bin only from the two
conditions under test.

Sorry if that's not all that clear -- hopefully what I'm trying to do
makes sense -- but also (hopefully) it doesn't sound too boneheaded.

What do you think?


Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

More information about the Bioconductor mailing list