[BioC] edgeR prior.count

Gordon K Smyth smyth at wehi.EDU.AU
Wed Dec 4 05:14:06 CET 2013


Dear Karen,

> Date: Mon,  2 Dec 2013 10:55:38 -0800 (PST)
> From: "Karen [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, karenmenuz at hotmail.com
> Subject: [BioC] edgeR prior.count
>
>
> I recently used the EdgeR package to analyze a RNA-Seq dataset, with 2 
> genotypes and 3 biological replicates each.

Please update to the current Bioconductor release (edgeR 3.4.1).

> After running the exacttest, the logFC and logCPM are provided for each 
> gene. I am a bit confused about how exactly these values are calculated.

It may be that you are expecting things to be somewhat simpler than they 
actually are.  edgeR uses generalized linear models to compute 
statistically efficient estimates of logCPM and logFC values.  These 
involve an interative computation for each gene that takes into account 
the dispersion value, library sizes and so on.  It's not just a matter of 
computing moderated counts and then taking averages or differences.

> 1) For logCPM, I assume that this is the average expression over all 
> samples. It is clearly not simply the averaged [counts/effective library 
> size for each sample].
>
> I understand that generally speaking the original counts (or the CPM? 
> instead) are moderated to avoid infinite values when taking logs of 
> samples/genes with zero counts/CPM, but I'm not quite sure that I can 
> figure out exactly how this is produced.

See ?aveLogCPM

> a) Is the same small value added to each gene for each sample or is the 
> added value different for different genes? How is prior.count 
> determined?

See ?predFC

As for determining the prior.count, you input the prior count yourself 
when you run exactTest, or else the default value is used.  The 
prior.count has no effect on the p-values.  It only affects the amount of 
moderation applied to the reported fold changes.

> b) Are only genes that have a "0" in one sample moderated or all all 
> genes moderated by prior.count?

See ?predFC

> c) Is there a way to see the moderated CPM for each gene and sample and 
> not just the log (moderated CPM)?

See ?cpm

> 2) How is the logFC calculated? Is it based on moderated CPMs for each 
> lane? Does it take the ratio of the average moderated CPM for each 
> group?

Generalized linear model.  See ?glmFit.  Note that a generalized linear 
model is used for the fold changes, even when using the exactTest.

Best wishes
Gordon

> Thank you!
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_3.2.4  limma_3.16.7
>
> --
> Sent via the guest posting facility at bioconductor.org.

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list