[BioC] SAM output explanation and an FDR question

Thu Jan 12 02:46:53 CET 2006

I can answer a couple of your questions.

a) Tests of differential expression for a particular gene use a ratio 
of differences in log expression to an estimate of how variable this 
difference should be in a random sample.  When the number of arrays 
is small, the estimate of variability is poor and can be improved by 
adding a constant that is computed using the data from all the 
genes.  That constant is s0.

b) There is no consensus about appropriate values for FDR, because 
what is appropriate depends both on the goal of the study (find a few 
interesting genes, or find all the important genes) and on pi0, the 
percentage of genes that do not differentially express.  We should 
also worry about FNR.  When pi0 is large (say over 90%) then FNR is 
negligible for FDR values typically chosen.  When pi0 is small (say 
less than 20%) then FDR is negligible, but FNR may be high.

--Naomi

At 05:32 PM 1/10/2006, Ettinger, Nicholas wrote:
>Hello all!
>
>(A) For a not so statistically gifted grad student, can someone either
>tell me or point me to a place where I can understand what exactly all
>the columns in my "summary(sam.out)" data mean? I understand Delta,
>cutlow, cutup and FDR but the others (s0, False, Called, j2, j1) I am
>not exactly sure.  I looked through the siggenes vignette but this
>question is not addressed specifically.
>
> > summary(sam.out)
>
>SAM Analysis for the One-Class Case
>
>  s0 = 0.1046  (The 30 % quantile of the s values.)
>
>  Number of permutations: 16 (complete permutation)
>
>  MEAN number of falsely called genes is computed.
>
>      Delta    p0    False Called   FDR   cutlow   cutup    j2     j1
>1    0.1     0.833 24511.25      27472   0.743 -0.130  1.459  26547 53751
>2    0.2     0.833 2863.875      5314     0.449 -1.066   Inf     5314 54676
>3    0.3     0.833  449.625      1085     0.345 -1.615    Inf    1085 54676
>4    0.4     0.833       53          140     0.315 -2.307    Inf     140 54676
>
>(B) A related question: is there any kind of consensus on "how much" FDR
>is "too much" FDR?  In other words, it is pretty well accepted that for
>small numbers of hypotheses, we want p <0.05 to guide us as to whether a
>change is significant or not.  Has any kind of consensus evolved in a
>similar manner for FDR?  Any primary literature addressing this?
>
>Thank you!!
>---Nick
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111