[Bioc-sig-seq] RNA-seq: test for within sample differential counts
michal at fgcz.ethz.ch
Mon Oct 25 17:05:59 CEST 2010
Recently I have submitted to BioC a library called rnaSeqMap, which
includes the code we use locally
for problems like yours and similar.
It has the implementation of Aumann-Lindell algorithm for finding
that perhaps may be used here. It is not a purely statistical solution,
it is more of data mining, but I would
look for the minimal level of coverage (mi), for which both your regions
are irreducible - in order to compare
them on one sample.
That's the outline of the receipe:
1. Take one region, create SeqReads object on it. rs <-
newSeqReads(chr, st, en, strand)
2. Fill the object with data using rs <- addDataToReadset() if you have
the reads data in your workspace,
or use addExperimentsToReadset() if you are connected to xmap database
with the seq_reads table added.
3. Run nd <- getCoverageFromRS(rs, 1) to get the NucleotideDistr
object with coverage.
4. Then - in some sort of adaptive procedure do: nd.reg <-
findRegionsAsND(nd, mi, minsup)
to get the minimal mi, for which distr(nd.reg) is filled with mi on all
Then repeat the procedure for both regions and compare mi.
Alternatively one can think of some ttest comparing "discretized" with
A-L algorithm values from distributions for both regions to make the
comparison more "statistically correct", eg:
nd.rbc1 <- regionBasedCoverage(nd1, steps, max_mi)
d1 <- distribs(nd.rbc1)
nd.rbc2 <- regionBasedCoverage(nd2, steps, max_mi)
d2 <- distribs(nd.rbc2)
As a side note: irreducible region in the sense of Aumann and Lindell
(2003) has to start and end with coverage > mi
So if your region is not "filled on both sides" you may check a
And all the functions except addExperimentsToReadset() do not require
the database connection.
Do not know if this is what you precisely asked for - just my
approximate solution :)
On 25/10/2010 11:58, Michael Dondrup wrote:
> I need some statistical advise for the following problem. Given an RNA-seq experiment I would like to assess
> statistical significance of differential read-counts>within< a sample. Given a sample with read counts
> for two (adjacent) regions out of all all regions of the genome I am interested in, say gene A and intron B.
> I wish to detect if region B has a significantly lower read count than A, lengths of regions A and B are known to be different,
> so I think fisher's-exact test does not apply here. Region length should be taken into account for this, as I think that
> the more positions are different between regions, the more significant the result should be. I also have biol. replicates,
> but these replicates have different numbers of reads.
> Packages like DEseq and edgeR seem to be tailored to between samples comparison. So which method
> would you recommend for within sample comparison?
> Thank you very much
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
More information about the Bioc-sig-sequencing