[Bioc-sig-seq] What to do when single reads make up a large percentage of counts?
Jenny Drnevich
drnevich at illinois.edu
Tue Aug 11 21:42:11 CEST 2009
Hi everyone,
I thought I would try this list before the general Bioconductor one
because my question pertains to NGS counts, although in reality it's
a general statistical theory question. I hope someone can help me or
point me in the right direction! Typically, you cannot compare counts
from different samples directly, but instead you have adjust by the
total number of counts obtained for each sample, correct? This
assumes that any changes in the counts of particular sequences will
not substantially affect the total count number... but what if it
might? I'm helping a colleague with some data where they sequenced
the 18-30 nt fraction of RNA to look for miRNAs; they got 1.1 to 2.1
million reads per sample, but these aligned to only 187 miRNAs! Some
of the miRNAs have up to 30% of all reads, which is a really large
percentage. Say a miRNA "X" that is 30% of the reads doubles its
count number in another sample, but the counts for all other miRNAs
are the same. The new percentage of "X" in the second sample is not
60%, but instead 46.15%, and the observed ratios of all the other
miRNAs are decreased by a factor or 0.77 (= 1/1.3). Is there any way
to correct for this? What do you do when the top 5 miRNAs make up 70%
of the counts??
Thanks,
Jenny
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu
More information about the Bioc-sig-sequencing
mailing list