[BioC] edgeR calcNormFactors for paired counts
Ryan
rct at thompsonclan.org
Sun May 25 00:41:19 CEST 2014
Hi Chris,
I think what you want to do here is normalize at the level of
individuals. To that end, I would generate the full count matrix for
each individual at the gene level (including all reads for each
individual, not just ones that cover heterozygous loci) and use that to
compute library sizes and normalization factors. Then I would propagate
those library sizes and normalization factors to your allele count
matrix. This will ensure that both alleles of each individual have the
same normalization, and it will also ensure that all loci are normalized
relative to the total RNA, which is not biased by where heterozygous
alleles happen to occur.
-Ryan
On Sat May 24 14:10:25 2014, Christopher T Gregg wrote:
>
> Hi,
>
> We are examining the use of edgeR to analyze allele-specific count
> data from RNASeq experiments. In these studies, each biological
> replicate (n=18) has two columns: one with counts from the maternal
> allele and the other with counts from the paternal allele for each
> gene. Thus, the data is paired since these counts are parsed from the
> data for each each replicate. We wish to fit a glm to the data that
> tests for a main effect of the allele (counts ~ replicate + allele) to
> find genes that exhibit a significant allele expression bias.
>
> My question relates to how to best handle the normalization of the
> counts in this case. EdgeR applies calcNormFactors to the columns,
> which disrupts the maternal:paternal count ratio for each gene in each
> sample. We are grateful for advice on how to best manage the analysis
> of this type of data.
>
> best wishes,
> Chris
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list