[BioC] edgeR calcNormFactors for paired counts

Sun May 25 00:41:19 CEST 2014

Hi Chris,

I think what you want to do here is normalize at the level of 
individuals. To that end, I would generate the full count matrix for 
each individual at the gene level (including all reads for each 
individual, not just ones that cover heterozygous loci) and use that to 
compute library sizes and normalization factors. Then I would propagate 
those library sizes and normalization factors to your allele count 
matrix. This will ensure that both alleles of each individual have the 
same normalization, and it will also ensure that all loci are normalized 
relative to the total RNA, which is not biased by where heterozygous 
alleles happen to occur.

-Ryan

On Sat May 24 14:10:25 2014, Christopher T Gregg wrote:
>
> Hi,
>
> We are examining the use of edgeR to analyze allele-specific count 
> data from RNASeq experiments. In these studies, each biological 
> replicate (n=18) has two columns: one with counts from the maternal 
> allele and the other with counts from the paternal allele for each 
> gene. Thus, the data is paired since these counts are parsed from the 
> data for each each replicate. We wish to fit a glm to the data that 
> tests for a main effect of the allele (counts ~ replicate + allele) to 
> find genes that exhibit a significant allele expression bias.
>
> My question relates to how to best handle the normalization of the 
> counts in this case. EdgeR applies calcNormFactors to the columns, 
> which disrupts the maternal:paternal count ratio for each gene in each 
> sample. We are grateful for advice on how to best manage the analysis 
> of this type of data.
>
> best wishes,
> Chris
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor