[BioC] zero rna-seq values AFTER normalisation in edgeR

Nick N feralmedic at gmail.com
Fri Aug 15 15:23:09 CEST 2014


I am using edgeR to analyze RNA-Seq data. This is my script:


library("edgeR")
#############################
#read in metadata & DGE
#############################
composite_samples <- read.csv(file="samples.csv",header=TRUE,sep=",")
counts <- readDGE(composite_samples$CountFiles)$counts
#############################
#Filter & Library Size Re-set
#############################
noint <- rownames(counts) %in% (c("no_feature", "ambiguous",
"too_low_aQual", "not_aligned", "alignment_not_unique"))
cpms <- cpm(counts)
keep <- rowSums(cpms>1)>=3 & !noint
counts <- counts[keep,]
colnames(counts) <- composite_samples$SampleName
d <- DGEList(counts=counts, group=composite_samples$Condition)
d$samples$lib.size <- colSums(d$counts)
#############################
#Normalisation
#############################
d <- calcNormFactors(d)
#############################
#Recording the normalized counts
#############################
all_cpm=cpm(d, normalized.lib.size=TRUE)
all_counts <- cbind(rownames(all_cpm), all_cpm)
colnames(all_counts)[1] <- "Ensembl.Gene.ID"
rownames(all_counts) <- NULL
#############################
#Estimate Dispersion
#############################
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
#############################
#Perform a test
#############################
de_ctl_mo_composite <- exactTest(d, pair=c("NY", "N"))


I believe that the variable "all_counts" shall contain the normalized
counts for each sample in each condition. My understanding is also that
edgeR adds pseudocounts BEFORE performing the library normalisation. Thus
it is possible that some values revert to being zero after normalisation.
But I thought that this would happen rarely. Yet in a recent dataset I find
an improbably large number of zero values in "all_counts" which made me
think that my understanding of how pseudocounts and normalisation work in
edgeR might be incorrect. Can, please, somebody comment on this?

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list