[BioC] edgeR -- gene expression variability

Wed Jan 4 01:04:10 CET 2012

Dear Miguel,

What you are doing seems correct.  Although of course expecting to get 
good estimates of genewise dispersions from just two libraries (one degree 
of freedom) is a bit optimistic.  edgeR tries to do the best that can be 
done.

The edgeR manual tells you that the sqrt(dispersion) is the biological 
coefficient of variation.  Coefficient of variation means sd/mean rather 
than variance.  It is a more appropriate measure of variability than the 
standard deviation for quantities that are strictly positive.

The reason why estimateTagwiseDisp() returns a limited number of distinct 
dispersions is that it maximizes the tagwise dispersions on a grid of 200 
possible dispersion values.  estimateGLMTagwiseDisp() does something 
similar, but adds an extra refinement step in which it interpolates a 
cubic spline through the grid values and maximizes the spline.  Hence the 
dispersion values from estimateTagwiseDisp() are taken from a (largish) 
set of preset values whereas those from estimateGLMTagwiseDisp() are 
always different.

This has no major impact I think on a practical analysis.  Nevertheless we 
have modified estimateTagwiseDisp() on Bioc devel to work like 
estimateGLMTagwiseDisp(), so in future they with behave in a directly 
comparable way.

Please give sessionInfo() output so that we can see what versions of the 
package you are using.

Best wishes
Gordon

> Date: Mon, 2 Jan 2012 13:40:59 +0100
> From: Miguel Gallach <miguel.gallach at vetmeduni.ac.at>
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR -- gene expression variability
>
> Hi List,
>
> I am analyzing my RNA-Seq data with edgeR. The next is my experimental
> design:
>
>
> d.GLM
> An object of class "DGEList"
> $samples
>                   group lib.size norm.factors
> R4.Hot     HotAdaptedHot 17409289    0.9881635
> R5.Hot     HotAdaptedHot 17642552    1.0818144
> R9.Hot    ColdAdaptedHot 20010974    0.8621807
> R10.Hot   ColdAdaptedHot 14064143    0.8932791
> R4.Cold   HotAdaptedCold 11968317    1.0061084
> R5.Cold   HotAdaptedCold 11072832    1.0523857
> R9.Cold  ColdAdaptedCold 22386103    1.0520949
> R10.Cold ColdAdaptedCold 17408532    1.0903311
>
>
> As you can see, R4 and R5 are replicates of the same biological group (Hot
> adapted), and the same is true for R9 and R10 (Cold adapted).
>
> I am interested in measuring for each gene its expression variability
> within a biological group (at each temperature) to discern genes that might
> be tightly regulated (or under stabilizing selection). The question in
> particular is: How can I get tagwise dispersion values for the pairs
> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold +
> R10.Cold). I assume that the square root of each tagwise dispersion value
> can be interpreted as the expression variance of the corresponding gene
> (i.e., biological variation), as I understood from the edgeR manual. Am I
> correct?
>
> I tried to calculate it like this:
>
> R4.R5.HC = edgeR_expressed_genes[,1:2]
> #I tell edgeR there is only one factor, two replicates
> group = factor(c("HC", "HC"))
> Hot.Hot = DGEList(counts = R4.R5.HC, group = group)
> Hot.Hot = calcNormFactors(Hot.Hot)
> Hot.Hot = estimateCommonDisp(Hot.Hot)
> Hot.Hot = estimateTagwiseDisp(Hot.Hot)
>
> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold +
> R10.Cold)).
>
> What I don't understand is why I just got 20 different dispersion values
> for all genes:
>
> dim(table(Hot.Hot$tagwise.dispersion))
> [1] 20
>
> However, when I use the d.GLM dataset (i.e., the 8 samples for the 2x2
> factor design) I get one different dispersion value for each gene:
>
>> dim(table(d.GLM1$tagwise.dispersion))
> [1] 9418
>
>
> Why is this?
>
> Can I get gene expression variability in a better way to fulfill my aim?
>
>
> Thank you very much,
> Miguel Gallach

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}