[BioC] Single nucleotide based RNAseq normalization with edgeR

Tue Feb 8 00:18:07 CET 2011

Hi Jens,

Il Feb/7/11 11:46 AM, Jens Georg ha scritto:
> Hi Gordon,
> thank you for your reply. The resolution of our ~100nt solexa reads is
> to small to detect individual processing sites, so we want to
> investigate every single nucleotide individually ("single nucleotide
> based normalization"). That means that we count, how often an individual
> nucleotide is covered by sequence reads. Of course, this approach will
> virtually increase the lib.size by a factor which depends on length of
> the solexa reads. As the lib.size is critical for the normalization, I
> am not sure if I should use the original read numbers for each library
> or the read numbers multiplicated with the read length to adjust for the
> single nucleotide investigation.

Do you have reasons to assume that these options are not essentially 
equivalent, ie. that the read length distributions are different in 
different lanes?

If that were the case, probably more thought is required on what 
underlying uncontrolled physical/chemical/biological effect causes this, 
and derive a suitable 'normalisation' approach from that.

		Best wishes
		Wolfgang

>
> I have two more question regarding to the normalization:
> 1. Are the norm factors calculated by the calcNormFactors( ) function
> automatically used for further steps like the estimateCommonDisp( )
> function?
> 2. Are the pseudocounts calculated by estimateCommonDisp( ) the
> normalized readcounts?
>
> Many thanks
>
> Jens
>
>> Hi Jens,
>>
>> I don't know what you mean by single nucleotide based normalization,
>> however the following comments may be helpful.
>>
>> edgeR automatically adjusts for library sizes, whether you include an
>> explicit normalization step or not. Normalization is a separate issue,
>> and is intended to deal with more subtle issues.
>>
>> Normalization, as edgeR does it, does not require replicates.
>>
>> Best wishes
>> Gordon
>>
>>> Date: Fri, 04 Feb 2011 11:28:15 +0100
>>> From: Jens Georg <jens.georg at biologie.uni-freiburg.de>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] Single nucleotide based RNAseq normalization with
>>> edgeR?
>>> Message-ID: <4D4BD4BF.4010009 at biologie.uni-freiburg.de>
>>> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>>>
>>>
>>>
>>> Dear edgeR users and developers,
>>>
>>> we used Solexa sequencing in order to detect RNase E processing sites.
>>> Therefor we splitted a RNA sample and treated one half with RNase E
>>> prior to cDNA synthesis and sequencing. The libraries differ in size
>>> (1.918.953 and 1.208.586 reads respectively) which clearly necessitates
>>> a normalization step. Furthermore we expect site specific differences
>>> rather than differences in the accumulation of the full length RNAs.
>>>
>>> So I want to ask, if it is appropiate to do a single nucleotide based
>>> normalization with edgeR and do you think a reliable basic normalization
>>> is possible without replicates?
>>>
>>> Thank you for your comments.
>>>
>>> Best regards
>>>
>>> Jens
>>
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:6}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber