[BioC] Normalization of RNAseq data using ERCC?

Davis, Wade davisjwa at health.missouri.edu
Sat Feb 8 00:12:32 CET 2014


Hi Agnes,
I have used ERCC spike-ins in a large RNA-Seq study (600+ samples). I would temper expectations for any approach based on them. The dynamic range of the spike-in is large (I recall 18 orders of magnitude on base 2 scale), so unless you are sequencing quite deeply, don't get high read counts for at least the bottom 1/3 of that range. I tried a number of different strategies to use that information for the sizefactors, but was never comfortable with the results from that approach. The spike-ins themselves are subject to a great deal of sample-to-sample variability (due to pipetting variance, difference in library diversity, etc.) which makes using it as a basis for normalizing less appealing when you see the results. The result was sample differences of several fold in cases. By the way, our depth was ~ 20M reads per sample.

My experience agrees with that reported in the following paper, which uses some data from the SEQC study, and does consider the spike-ins in a complex background (i.e., spiked-in to a human sample at suggested concentrations). They also looked at large data sets.

http://life.scichina.com:8082/sciCe/EN/abstract/abstract510013.shtml

This paper (http://www.ncbi.nlm.nih.gov/pubmed/21816910) is more optimistic, and may seem somewhat contradictory to my comments and the paper above; however, a key difference is sampling depth in the latter. A glance at supplemental table S2 shows the average number of reads was 230M PER (human) SAMPLE! They also used paired-end reads. 

I did find the spike-ins useful for computing an "empirical" false discovery rate (using the ERCC Set B) between groups. With reasonable sample sizes per group (n=8), the group mean fold changes we extremely close to 1 for those probes, even though they were not used in the normalization procedure per se.

I'd be happy to discuss more off the list, and point you to publications where I used them as a measure of false discovery.

Regards,
Wade


-----Original Message-----
From: Agnes Paquet [mailto:paquet at ipmc.cnrs.fr] 
Sent: Thursday, February 06, 2014 9:05 AM
To: bioconductor at r-project.org
Subject: [BioC] Normalization of RNAseq data using ERCC?

Dear List,

We have just started using ERCC spike-in controls in our RNAseq experiments. I have looked for recommended approaches on how to use the controls for normalization, but I couldn't find much information.

 From what I read, I am planning to use the spike-ins to estimate the sizeFactors in our differential analysis pipeline. Is there a better approach that we could use to normalize our data based on the spike-ins?

Can anyone recommend any paper covering that topic?

Thank you for your help,

Agnes




More information about the Bioconductor mailing list