[Bioc-sig-seq] RNA-seq: test for within sample differential counts

Mon Oct 25 14:04:43 CEST 2010

Hi Sean,

yes, I agree it is not totally straight forward,  otherwise such package might already exist. Possibly, the problem
of sequencing bias and GC content could be solved by normalizing with a run of genomic DNA? Not that we have that yet,
though it  might be an option for the future. But then, let's assume that all bias has been accounted for. Would a simple Poisson model be 
suitable? If so, I could compute a probability of seeing N or less reads for a single base position while its neighborhood has 
a mean coverage estimated from the data. Or is this assumption invalid even in the case of no sequencing bias?

Best
Michael  

On Oct 25, 2010, at 12:50 PM, Sean Davis wrote:

> 
> 
> On Mon, Oct 25, 2010 at 5:58 AM, Michael Dondrup <Michael.Dondrup at uni.no> wrote:
> Hi,
> 
> I need some statistical advise for the following problem. Given an RNA-seq experiment I would like to assess
> statistical significance of differential read-counts >within< a sample. Given a sample with read counts
> for two (adjacent) regions out of all all regions of the genome I am interested in, say gene A and intron B.
> 
> 
> Hi, Michael.  
> 
> Comparing two different regions directly in the same sample is a problem that introduces a few more sources of biological and technical variation than comparing the same region between samples.  Assume that the number of molecules for each of the two regions is identical in the cell.  First, the mappability of the two regions could be quite different, leading to differences in counts between the regions.  Second, the GC content or other sequence-level characteristics of the two regions could be different, leading to different efficiencies in the sequencing procedure, again leading to differences in counts between regions.  Third, the structure of the region in combination with an individual mapping method can also contribute to differences in read counts between two regions.  There may be clever ways to control for these issues, but I am not aware of fully general ways of doing so.  
> 
> Sean
>  
> I wish to detect if region B has a significantly lower read count than A, lengths of regions A and B are known to be different,
> so I think fisher's-exact test does not apply here. Region length should be taken into account for this, as I think that
> the more positions are different between regions, the more significant the result should be. I also have biol. replicates,
> but these replicates have different numbers of reads.
> 
> Packages like DEseq and edgeR seem to be tailored to between samples comparison. So which method
> would you recommend for within sample comparison?
> 
> Thank you very much
> Michael
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>