[BioC] siggenes fc threshold

James W. MacDonald jmacdon at med.umich.edu
Thu Dec 13 15:18:28 CET 2007


Hi John,

John Lande wrote:
> I see your points. but the strange thing is that actually
> 
>> sam.out1
>    Delta    p0 False Called     FDR
> 5  2.01 0.519     0    159       0
>> sam.out2
>     Delta p0 False Called     FDR
> 5  2.01 0       1     161         0
> 
> do you see? same parameter of delta, but an higer number of significant
> genes fcor higer FC. this make no sense to me! we are not speaking about
> FDR. but of crude results from the test!

Yes, but you have to think about what Holger has told you, and about the 
way the statistics are computed.

The denominator of the t-statistic you are computing is the sum of the 
standard error of the numerator (s_i) plus a small constant (s_0). This 
small constant s_0 is computed using all the s_i values (to find out 
more about this see either the original Tusher paper or Holger's thesis).

As Holger noted, when you add a fold change criterion, the genes are 
filtered _before_ you do any of these computations. Thus, you will have 
fewer genes when you use the larger fold change criterion. Since the s_0 
value is computed using the s_i values from the available genes, the 
denominator of your statistic is probably different in the two cases 
(because the s_0 value is likely to be different). So it is not 
surprising that the number of genes found significant will change as well.

Best,

Jim


> 
> On Dec 12, 2007 5:15 PM, Holger Schwender <holger.schw at gmx.de> wrote:
> 
>> Hi John,
>>
>> I am not sure, but this might be due to the fact that in siggenes the fold
>> change is used to filter out genes prior to the actual SAM analysis. Thus,
>> only the permuted values of the test statistics for the remaining genes,
>> i.e. genes with a fold change larger than R.fold (or smaller than
>> 1/R.fold), are used to estimate the null distribution and to compute d.bar,
>> i.e. the values of the test statistic expected under the null, instead of
>> using the permuted values of all genes. This might lead to these strange
>> results.
>>
>> Best,
>> Holger
>>
>>
>>
>>
>> -------- Original-Nachricht --------
>>> Datum: Tue, 11 Dec 2007 19:20:25 +0100
>>> Von: "John Lande" <john.lande77 at gmail.com>
>>> An: bioconductor at stat.math.ethz.ch
>>> Betreff: [BioC] siggenes fc threshold
>>> dear biocoductors,
>>>
>>> I want to use siggenes, and sam to find differentially regulated genes,
>>> but
>>> I have problems with siggenes function, and possibly didn't understand
>>> properly something.
>>> here I will report an example that emulate the problem:
>>>
>>> library(siggenes)
>>> data(golub)
>>> sam.out1 <- sam(golub, golub.cl, rand = 123, gene.names =
>>> golub.gnames[,3],
>>> med=TRUE, lambda=.5,method=d.stat, B=5, R.fold=1, delta=seq(0.01, 3, 0.5
>> ))
>>> sam.out2 <- sam(golub, golub.cl, rand = 123, gene.names =
>>> golub.gnames[,3],
>>> med=TRUE, lambda=.5,method=d.stat, B=5, R.fold=2, delta=seq(0.01, 3, 0.5
>> ))
>>> I use the parameter R.fold to set the minimum FC I want for my list of
>>> significant genes.
>>> the problem is this: when I launch
>>>
>>>> sam.out1
>>> SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances
>>>
>>>   Delta    p0 False Called     FDR
>>> 1  0.01 0.519  2950   3007 0.50933
>>> 2  0.51 0.519   478   1638 0.15151
>>> 3  1.01 0.519    38    839 0.02351
>>> 4  1.51 0.519     1    380 0.00137
>>> 5  2.01 0.519     0    159       0
>>> 6  2.51 0.519     0     74       0
>>>
>>>> sam.out2
>>> SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances
>>>
>>>   Delta p0 False Called FDR
>>> 1  0.01  0    17    166   0
>>> 2  0.51  0    17    166   0
>>> 3  1.01  0    12    164   0
>>> 4  1.51  0     3    163   0
>>> 5  2.01  0     1    161   0
>>> 6  2.51  0     0    155   0
>>>
>>> you can see that the sam with higher FC with a delta of 2.51 has an
>> higher
>>> number of significant genes than the one with 1. to me does not make
>> much
>>> sense.
>>> by the way I also tried to use sam in excel and I don't have the same
>>> problems. furthermore the dynamic range of delta is much lower. do you
>>> have
>>> any idea?
>>>
>>> what do I do wrong?
>>>
>>> best regards
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> --
>> Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
>> Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
>>
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list