[BioC] down-expression and high-expression in single cell + amplification

Mon Jun 18 09:53:16 CEST 2012

Hi Pap

a few comments in addition to what has already been said:

- If you have very different library sizes, it is normal that you see 
less changes in direction from the shallowly sequenced condition to the 
deeply sequenced one. This is because your power depends on the abolute 
read count, due to Poisson noise. Hence, if a gene has many reads in the 
shallow condition and few in he deep one, you have better power to say 
whether this is real than in the opposite case.

- However, in your case, the differences in size are less than 1:2, 
which is usually not much a problem. Must be something else.

Maybe post an MA plot.

- I am worried that you used RSEM for quantification. RSEM infers 
isoform abundances, i.e., each count value has a specific uncertainty 
attached due to the ambiguity in assigning reads mapping to shared 
exons, and this uncertainty can be huge and dramatically inflate false 
positives if a subsequent test is not informed of them. DESeq is not 
designed to work with RSEM, and the uncertainty information will get a 
lost. (Actually, it isn't even calculated, if you run RSEM in EM rather 
than Bayes mode, IIRC.)

- I'm not convinced that removing PCR duplicates in RNA-Seq is a good 
idea. If you have 50 bp single-end reads, you constrain the value range 
of your counts to 0:50, i.e., you lose all the advantages in dynamic 
range that RNA-Seq has over microarrays.

   Simon