[BioC] edgeR prior.count
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Dec 4 05:14:06 CET 2013
Dear Karen,
> Date: Mon, 2 Dec 2013 10:55:38 -0800 (PST)
> From: "Karen [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, karenmenuz at hotmail.com
> Subject: [BioC] edgeR prior.count
>
>
> I recently used the EdgeR package to analyze a RNA-Seq dataset, with 2
> genotypes and 3 biological replicates each.
Please update to the current Bioconductor release (edgeR 3.4.1).
> After running the exacttest, the logFC and logCPM are provided for each
> gene. I am a bit confused about how exactly these values are calculated.
It may be that you are expecting things to be somewhat simpler than they
actually are. edgeR uses generalized linear models to compute
statistically efficient estimates of logCPM and logFC values. These
involve an interative computation for each gene that takes into account
the dispersion value, library sizes and so on. It's not just a matter of
computing moderated counts and then taking averages or differences.
> 1) For logCPM, I assume that this is the average expression over all
> samples. It is clearly not simply the averaged [counts/effective library
> size for each sample].
>
> I understand that generally speaking the original counts (or the CPM?
> instead) are moderated to avoid infinite values when taking logs of
> samples/genes with zero counts/CPM, but I'm not quite sure that I can
> figure out exactly how this is produced.
See ?aveLogCPM
> a) Is the same small value added to each gene for each sample or is the
> added value different for different genes? How is prior.count
> determined?
See ?predFC
As for determining the prior.count, you input the prior count yourself
when you run exactTest, or else the default value is used. The
prior.count has no effect on the p-values. It only affects the amount of
moderation applied to the reported fold changes.
> b) Are only genes that have a "0" in one sample moderated or all all
> genes moderated by prior.count?
See ?predFC
> c) Is there a way to see the moderated CPM for each gene and sample and
> not just the log (moderated CPM)?
See ?cpm
> 2) How is the logFC calculated? Is it based on moderated CPMs for each
> lane? Does it take the ratio of the average moderated CPM for each
> group?
Generalized linear model. See ?glmFit. Note that a generalized linear
model is used for the fold changes, even when using the exactTest.
Best wishes
Gordon
> Thank you!
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_3.2.4 limma_3.16.7
>
> --
> Sent via the guest posting facility at bioconductor.org.
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list