[Bioc-sig-seq] RNA-seq: test for within sample differential counts

Mon Oct 25 17:05:59 CEST 2010

Hi Michael,

Recently I have submitted to BioC a library called rnaSeqMap, which 
includes the code we use locally
for problems like yours and similar.
It has the implementation of Aumann-Lindell algorithm for finding 
irreducible regions
that perhaps may be used here. It is not a purely statistical solution, 
it is more of data mining, but I would
look for the minimal level of coverage (mi), for which both your regions 
are irreducible - in order to compare
them on one sample.
That's the outline of the receipe:

1. Take one region, create SeqReads object on it.    rs <- 
newSeqReads(chr, st, en, strand)
2. Fill the object with data using rs <- addDataToReadset() if you have 
the reads data in your workspace,
or use addExperimentsToReadset() if you are connected to xmap database 
with the seq_reads table added.
3. Run   nd <- getCoverageFromRS(rs, 1) to get the NucleotideDistr 
object with coverage.
4. Then - in some sort of adaptive procedure do:  nd.reg <- 
findRegionsAsND(nd, mi, minsup)
to get the minimal mi, for which distr(nd.reg) is filled with mi on all 
the positions.

Then repeat the procedure for both regions and compare mi.

Alternatively one can think of some ttest comparing "discretized" with 
A-L algorithm values from distributions for both regions to make the 
comparison more "statistically correct", eg:

  nd.rbc1 <- regionBasedCoverage(nd1, steps, max_mi)
  d1 <- distribs(nd.rbc1)
  nd.rbc2 <- regionBasedCoverage(nd2, steps, max_mi)
  d2 <- distribs(nd.rbc2)
t.test(d1, d2)

As a side note: irreducible region in the sense of Aumann and Lindell 
(2003) has to start and end with coverage > mi
So if your region is not "filled on both sides" you may check a 
representative sub-region.
And all the functions except addExperimentsToReadset() do not require 
the database connection.

Do not know if this is what you precisely asked for - just my 
approximate solution :)

Cheers,
Michal

On 25/10/2010 11:58, Michael Dondrup wrote:
> Hi,
>
> I need some statistical advise for the following problem. Given an RNA-seq experiment I would like to assess
> statistical significance of differential read-counts>within<  a sample. Given a sample with read counts
> for two (adjacent) regions out of all all regions of the genome I am interested in, say gene A and intron B.
>
> I wish to detect if region B has a significantly lower read count than A, lengths of regions A and B are known to be different,
> so I think fisher's-exact test does not apply here. Region length should be taken into account for this, as I think that
> the more positions are different between regions, the more significant the result should be. I also have biol. replicates,
> but these replicates have different numbers of reads.
>
> Packages like DEseq and edgeR seem to be tailored to between samples comparison. So which method
> would you recommend for within sample comparison?
>
> Thank you very much
> Michael
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>