[BioC] ask for your help about some questions of using edgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Sep 20 02:02:50 CEST 2011
Dear Zhenling,
I'll try to answer your questions as best I can. But the bottom line is
that there may simply be only a few differentially expressed miRs between
your two groups.
> Date: Sun, 18 Sep 2011 14:32:12 -0600
> From: Zhenling Peng <zhenling at ualberta.ca>
> To: bioconductor at r-project.org
> Subject: [BioC] ask for your help about some questions of using edgeR
>
> Dear Sir/Madam,
>
> Would you mind me disturbing you for a few minutes?
>
> I know edgeR can do comparison among multiple libraries. I did have 4
> libraries of miRNA, where two of them are normal (library1 and
> library2), the other two are abnormal (library3 and library4). And I'm
> trying to get the miRNA which expressed differently between the two
> groups: normal and abnormal. I utilized common dispersion, and the
> results are displayed at the end. When using edgeR, I have some
> questions as follows,
>
> 1) according to the results got from *summary(decideTestsDGE(de.com,
> p.value = 0.05))*, it seems that there is only 1 miRNA: RNA_de is
> expressed differently if I take 0.05 as the cut-off of the
> *adjust.p.val* (please see *line 20* and *line 12*).
This of course may simply be correct. There may be only a few
differentially expressed miRs in your data. The common dispersion
estimate appears to be reasonably small.
There is no reason why you need to restrict to a FDR (adjusted p-value) of
less than 0.05. You could easily examine the top three miRs.
There is something wrong with your topTags() output. This function does
not print column headings in the format that you have presented. You must
be using other R code that you haven't shown.
> While *line 16* indicates that there is a constriction on the counts in
> libraries. In fact, the counts in library 1 and 2 are not consistently
> greater than those in library 3 and 4. So I wondered how the package
> edgeR deals with this kind of inconsistent data. Is it still reliable to
> determine that the RNA_de is differently expressed between two groups?
> When the edgeR makes a comparison among multiple libraries, does the
> edgeR merge the libraries within the same group together before the
> comparison or not? if it does, how does it merge these libraries
> together?
>
> I tried to find the answer in the manual and the two papers to be
> implemented into edgeR, but I failed finally as my limited knowledge on
> statistics.
No, it doesn't simply merge libraries together. The whole point of having
biological replicate libraries is to measure the variation between them.
This gives an estimate of the underlying biological variation, against
which significance is assessed.
Think of it like a two-sample t-test, where you assess the separation of
two groups relative to variation within the two groups. You do not
require observations to all be the same, and you don't merge them
together. With common dispersion, edgeR is analogous to two-sample
t-tests where the same standard deviation estimate is used for every miR.
With tagwise dispersion, edgeR is analogous to two-sample t-tests with
different standard deviations estimated for each miR.
> So could you please give me a favor to explain those questions for me?2)
> according to the norm.factor for each library, and the value of the
> common.dispersion (*line 1-4*,* line10* and *11*), I think it's OK to
> use common dispersion for our data. What do you think about it?
> Actually, I'm not sure *when we can use * *common.dispersion*.
I generally recommend tagwise, if you have any reasonable number of
replicates. The choice has nothing to do with the norm.factors.
> And I tried tagwise dispersion with prion.n = 10 (20/(4-2)) also, but
> the results are very similar to the results got from common.dispersion.
> In addition, I made the pair-wise comparison between two libraries (one
> is normal, the other is abnormal), but the the value of the
> *common.dispersion*approximates to 0 (1.0E-16). So I doubt the
> reliability of the results when using common dispersion to make
> pair-wise comparison between libraries.
Well, if you try to do a comparison without replicates, it is impossible
to estimated biological variability, so edgeR puts it to zero. You are
correct that this is not reliable, but the problem is with the lack of
replicates, not with the use of common dispersion.
> In general, if we have multiple libraries and we only want to find the
> differently expressed miRNA/genes among groups, is their any other
> comparison except pair-wise comparison between groups? Would you mind
> giving me some suggestions?
I can't see what is wrong with the pair-wise comparison. You only have
two groups, so there is nothing else sensible that could be done.
> 3) From *line 19*, we can see that the miRNA---RNA_ad shows consistent
> and significant difference between groups, this is consistent with the
> P-value for this miRNA in *line 15*. While the ajust.p.val is equal to
> 1 (*line 15*, this ajust.p.val is based on the method "BH"). So I
> wondered whether we can use the P-value to analyze differential
> expression or not. If not, do we have to use ajust.p.val to make
> analysis? Is it always more reliable to use ajust.p.val instead of
> P-value to make analysis? Could you please give me some advice?
Well, if you simply use unadusted p-values, you would find about 5% of the
miRs to be signficantly different even from random data. You will be
guaranteed to find statistical significant differences even when there are
none. This would not be considered defensible in a scientific journal.
Best wishes
Gordon
> I'm very sorry to disturb you with so many naive questions. But I really
> confused about the unsatisfied results after using edgeR, and I'm not sure
> if I made any mistakes when using edgeR. Could you please give me a favor?
> Would you mind giving me any advice about my questions? Thank you very much!
>
> Best wishes,
> Zhenling Peng
> University of Alberta
>
>> dim(d)
> [1] 658 4
>> d
> An object of class "DGEList"
> $samples
> files group lib.size norm.factors
> library1 library1 normal 3184257 1.0025762
> .............................................line 1
> library2 library2 normal 2825126 0.9873665
> .............................................line 2
> library3 library3 abnormal 3955120 1.0476795
> .............................................line 3
> library4 library4 abnormal 4692333 0.9642192
> .............................................line 4
> $counts
> library1 library2 library3 library4
> RNA_aa 302440 274619 479102 403961
> ..............................................line 5
> RNA_ab 298597 283090 206178 331258
> ..............................................line 6
> RNA_ac 293963 218700 508874 573151
> ..............................................line 7
> RNA_bd 199355 168652 180300 262947
> ..............................................line 8
> RNA_ae 149868 131546 177037 201992
> ..............................................line 9
> 653 more rows ...
>
>> d <- estimateCommonDisp(d)
>> d$common.dispersion
> [1] 0.03584351
> ...............................................line 10
>> sqrt(d$common.dispersion)
> [1] 0.1893238
> ...............................................line 11
>
>> de.com <- exactTest(d)
> Comparison of groups: normal - abnormal
>> topTags(de.com, n=5)
> Comparison of groups: normal-abnormal
> logConc logFC P.Value
> adjust.p.val
>
> RNA_de -15.851627 1.7031386 5.969173e-07 0.0003927716
> .............line 12
> RNA_ea -16.861371 1.4352602 2.465201e-04 0.0554942582
> .............line 13
> RNA_cf -34.679446 30.6732173 2.833211e-03 0.3225116431
> .............line 14
> RNA_ad -3.690956 0.6652757 1.554169e-02 1.0000000000
> ..............line 15
This is not correct output from topTags().
>> detags.com <- rownames(topTags(de.com, n=5)$table)
>> d$counts[detags.com, ]
> library1 library2 library3 library4
> RNA_de 153 * 35 * *62 * 18
> ...............................................line 16
> RNA_ea 49 34 28 16
> ...............................................line 17
> RNA_cf 5 4 0 0
> ...............................................line 18
> RNA_ad 293963 218700 508874 573151
> ...............................................line 19
>
>> summary(decideTestsDGE(de.com, p.value = 0.05))
> ...............................................line 20
> [,1]
> -1 0
> 0 657
> 1 1
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list