[BioC] EdgeR norm.factors input

Tue Feb 11 15:38:03 CET 2014

Dear Gordon,

Thank you so much for your comments. This is exactly what I did for total read count normalization, I used norm.factors = 1
for total count (TC) normalization. 

Then here comes the question. As I mentioned in my previous post, I would like to compare the performance of different normalization methods. Besides that, I also would like to compare the results of normalized data with the results of raw count (RC) data (without taking care of any normalization). According to our previous discussion, I skiped the normalization step for RC, but the results were the same for TC and RC. Should I use

norm.factors = 1/lib.size

for RC?

One more question, I have also considered the normalization method provided in DESeq package. For this normalization method, what should be my input of correct factor (norm.factors)? I have figured out the relation between the scaling factor (sizeFactors
) of DESeq package and the correct factor (norm.factors) of edgeR which is given as below:

lib.size*norm.factors/mean(lib.size*norm.factors)=sizeFactors

Now I know the lib.size and sizeFactors, I try to figure out what the norm.factors is for DESeq normalization method. This equation system involves n unknown variables with n-1 independent equations. Let X=norm.factors=(X1,X2,...,Xn)^T, lib.size=N=(N1,N2,...,Nn) and sizeFactors
= S=(S1,S2,...,Sn), then

X2=X1*(S2/S1)*(N1/N2)
.
.
.
Xn=X1*(Sn/S1)*(N1/Nn)

Here * means the regular product. I need one more condition to find these unknown variables (X1,X2,...,Xn). Do you happenly know whether there is extra requirement that norm.factors needs to satisfy? 

Thank you!

Yanzhu

----------------------------------------------------------

edgeR always takes the total read count into account, so

   norm.factors = 1

is equivalent to total read count normalization.

Please read the section on normalization in the edgeR User's Guide.

Best wishes
Gordon

> Date: Mon, 10 Feb 2014 11:06:31 -0800 (PST)
> From: "Yanzhu [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, mlinyzh at gmail.com
> Subject: [BioC] EdgeR norm.factor input
>
>
> Dear Gordon,
>
> Thank you so much for your comments.
>
> One more question about the first question asked in my previous post 
> where I asked about how to supply the correct factor in the 
> normalization step.
>
> I would like use the total read count normalization method to normalize 
> the data then use the edgeR to test my multi-factor models as in my 
> previous post. The total read count normalization is given as
>
> X_ij/(N_j/mean(N))=X_ij*mean(N)/N_j,
>
> where X_ij is the read count of gene i sample j, N_j is the library size 
> of sample j, and mean(N) is the mean of library sizes over all samples. 
> My question is what is the input for y$samples$norm.factors? Can I do as 
> the following: y$samples$norm.factors = N/mean(N)? Where N is the vector 
> of library size of all samples, and mean(N) is the mean of library sizes 
> over all sample. Or could you please give me some suggestion? Thank you!
>
>
>
> Yanzhu
>
> ---------------------------------------------------
>
> Date: Fri,  7 Feb 2014 07:25:17 -0800 (PST)
>> From: "Yanzhu [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, mlinyzh at gmail.com
>> Subject: [BioC] EdgeR multi-factor testing questions
>>
>>
>> Dear Gordon,
>>
>> Thank you so much for your comments. I have updated my code and get the
>> different results for TMM and Upper quartile normalization methods.
>>
>> I have two more question regarding the normalization issue. I have tried
>> different normalization methods and would like to compare their
>> performance. My questions are:
>>
>> 1. In the users' guide 2.5.6, it mentions that normalization takes the
>> form of correction factors that enter into the statistical model. Such
>> correction factors are usually computed internally by edgeR functions,
>> but it is also possible for a user to supply them.I would like to supply
>> the correct factor to edgeR, how could I do this?
>
> Just enter in your own values:
>
>  y$samples$norm.factors <- yourvalues
>
>> 2. I also would like to compare the testing results of normalized data
>> with the results of raw data (without normalizing the data)? Could I
>> just skip the the normalization step as below?
>
> Yes.
>
> Gordon
>
>> group<-paste(L,S,R,sep=".")
>> design<-model.matrix(~L+R+S+L:R+L:S+R:S+L:R:S)
>> y<-DGEList(counts=counts,group=group)
>> #y<-calcNormFactors(y,method="upperquartile",p=0.75) ##skip this step
>>
>> y<-estimateGLMCommonDisp(y,design)
>> y<-estimateGLMTagwiseDisp(y,design)
>>
>> fiteUQ_LRS<-glmFit(y,design,offset=offset  )
>>
>> Thanks.
>>
>>
>> Yanzhu
>>
>>

 -- output of sessionInfo(): 

> sessionInfo() 
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
> 

--
Sent via the guest posting facility at bioconductor.org.