[BioC] edgeR -- gene expression variability
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Jan 4 01:04:10 CET 2012
Dear Miguel,
What you are doing seems correct. Although of course expecting to get
good estimates of genewise dispersions from just two libraries (one degree
of freedom) is a bit optimistic. edgeR tries to do the best that can be
done.
The edgeR manual tells you that the sqrt(dispersion) is the biological
coefficient of variation. Coefficient of variation means sd/mean rather
than variance. It is a more appropriate measure of variability than the
standard deviation for quantities that are strictly positive.
The reason why estimateTagwiseDisp() returns a limited number of distinct
dispersions is that it maximizes the tagwise dispersions on a grid of 200
possible dispersion values. estimateGLMTagwiseDisp() does something
similar, but adds an extra refinement step in which it interpolates a
cubic spline through the grid values and maximizes the spline. Hence the
dispersion values from estimateTagwiseDisp() are taken from a (largish)
set of preset values whereas those from estimateGLMTagwiseDisp() are
always different.
This has no major impact I think on a practical analysis. Nevertheless we
have modified estimateTagwiseDisp() on Bioc devel to work like
estimateGLMTagwiseDisp(), so in future they with behave in a directly
comparable way.
Please give sessionInfo() output so that we can see what versions of the
package you are using.
Best wishes
Gordon
> Date: Mon, 2 Jan 2012 13:40:59 +0100
> From: Miguel Gallach <miguel.gallach at vetmeduni.ac.at>
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR -- gene expression variability
>
> Hi List,
>
> I am analyzing my RNA-Seq data with edgeR. The next is my experimental
> design:
>
>
> d.GLM
> An object of class "DGEList"
> $samples
> group lib.size norm.factors
> R4.Hot HotAdaptedHot 17409289 0.9881635
> R5.Hot HotAdaptedHot 17642552 1.0818144
> R9.Hot ColdAdaptedHot 20010974 0.8621807
> R10.Hot ColdAdaptedHot 14064143 0.8932791
> R4.Cold HotAdaptedCold 11968317 1.0061084
> R5.Cold HotAdaptedCold 11072832 1.0523857
> R9.Cold ColdAdaptedCold 22386103 1.0520949
> R10.Cold ColdAdaptedCold 17408532 1.0903311
>
>
> As you can see, R4 and R5 are replicates of the same biological group (Hot
> adapted), and the same is true for R9 and R10 (Cold adapted).
>
> I am interested in measuring for each gene its expression variability
> within a biological group (at each temperature) to discern genes that might
> be tightly regulated (or under stabilizing selection). The question in
> particular is: How can I get tagwise dispersion values for the pairs
> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold +
> R10.Cold). I assume that the square root of each tagwise dispersion value
> can be interpreted as the expression variance of the corresponding gene
> (i.e., biological variation), as I understood from the edgeR manual. Am I
> correct?
>
> I tried to calculate it like this:
>
> R4.R5.HC = edgeR_expressed_genes[,1:2]
> #I tell edgeR there is only one factor, two replicates
> group = factor(c("HC", "HC"))
> Hot.Hot = DGEList(counts = R4.R5.HC, group = group)
> Hot.Hot = calcNormFactors(Hot.Hot)
> Hot.Hot = estimateCommonDisp(Hot.Hot)
> Hot.Hot = estimateTagwiseDisp(Hot.Hot)
>
> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold +
> R10.Cold)).
>
> What I don't understand is why I just got 20 different dispersion values
> for all genes:
>
> dim(table(Hot.Hot$tagwise.dispersion))
> [1] 20
>
> However, when I use the d.GLM dataset (i.e., the 8 samples for the 2x2
> factor design) I get one different dispersion value for each gene:
>
>> dim(table(d.GLM1$tagwise.dispersion))
> [1] 9418
>
>
> Why is this?
>
> Can I get gene expression variability in a better way to fulfill my aim?
>
>
> Thank you very much,
> Miguel Gallach
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list