[BioC] down-expression and high-expression in single cell + amplification
Simon Anders
anders at embl.de
Mon Jun 18 09:53:16 CEST 2012
Hi Pap
a few comments in addition to what has already been said:
- If you have very different library sizes, it is normal that you see
less changes in direction from the shallowly sequenced condition to the
deeply sequenced one. This is because your power depends on the abolute
read count, due to Poisson noise. Hence, if a gene has many reads in the
shallow condition and few in he deep one, you have better power to say
whether this is real than in the opposite case.
- However, in your case, the differences in size are less than 1:2,
which is usually not much a problem. Must be something else.
Maybe post an MA plot.
- I am worried that you used RSEM for quantification. RSEM infers
isoform abundances, i.e., each count value has a specific uncertainty
attached due to the ambiguity in assigning reads mapping to shared
exons, and this uncertainty can be huge and dramatically inflate false
positives if a subsequent test is not informed of them. DESeq is not
designed to work with RSEM, and the uncertainty information will get a
lost. (Actually, it isn't even calculated, if you run RSEM in EM rather
than Bayes mode, IIRC.)
- I'm not convinced that removing PCR duplicates in RNA-Seq is a good
idea. If you have 50 bp single-end reads, you constrain the value range
of your counts to 0:50, i.e., you lose all the advantages in dynamic
range that RNA-Seq has over microarrays.
Simon
More information about the Bioconductor
mailing list