[BioC] zero rna-seq values AFTER normalisation in edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Aug 17 02:24:54 CEST 2014
Dear Nick N,
Thanks for using edgeR. You do have misunderstandings however about how
normalization works and what is output by the cpm() function.
> Date: Fri, 15 Aug 2014 14:23:09 +0100
> From: Nick N <feralmedic at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] zero rna-seq values AFTER normalisation in edgeR
>
> I am using edgeR to analyze RNA-Seq data. This is my script:
>
>
> library("edgeR")
[snip]
> d <- calcNormFactors(d)
> all_cpm=cpm(d, normalized.lib.size=TRUE)
[snip]
> I believe that the variable "all_counts" shall contain the normalized
> counts for each sample in each condition.
The cpm() function simply computes counts-per-million, which is a
ratio rather than a count.
> My understanding is also that edgeR adds pseudocounts BEFORE performing
> the library normalisation.
No it doesn't. Why would you think that? edgeR works with your data as
it actually is rather than trying to fudge it.
> Thus it is possible that some values revert to being zero after
> normalisation. But I thought that this would happen rarely. Yet in a
> recent dataset I find an improbably large number of zero values in
> "all_counts" which made me think that my understanding of how
> pseudocounts and normalisation work in edgeR might be incorrect. Can,
> please, somebody comment on this?
cpm() simply computes counts per million by dividing the counts by the
normalized library sizes. Obviously a zero count corresponds to a zero
count-per-million. That seems pretty natural!
Are you perhaps thinking of the use of prior.counts when computing cpm or
logFC on the log-scale? The help page for the cpm() function tells you
that prior counts are not used when computing plain cpm values on the raw
scale.
I wonder what source you are relying on for information about edgeR? The
most reliable source is the documentation that comes with edgeR.
Best wishes
Gordon
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list