[BioC] Siggenes/SAM vs Excel

Fri Jul 4 16:41:46 CEST 2008

Thanks very much for your response. It helped a lot.

One of the things it points out to me (yet again) is that the  
essential algorithm has two parts, one to
make it so that dinky variances don't create a bunch of "significant"  
genes out of small variance alone.

Second,  SAM goes to substantial lengths trying to control the false  
positive rate without
simply throwing all the significant genes out.

Regardless of tweaks in the defaults, either SAM (Excel) siggenes or  
samr from CRAN should do an excellent job
of identifying differentially expressed genes.

On my data, SAM certainly does a great job compared to a t test.  
However, like Guo of MAQC fame, I get better
concordance between replicates with a fold change and loose p value  
cutoff. This seems far too simple
to be better than SAM, but there you have it. Bioc's  RankProd also  
gives me good concordance. It does . however, order
its gene lists by straight fold change...

  Perhaps SAM lists ordered by  fold change and cut off at a certain  
FDR would create optimal concordance. We do not really expect FDR to be
more reproducible than fold change, do we?

Anyway, thanks for all your input into this list.

yours,

Tom

On Jul 3, 2008, at 11:04 PM, Holger Schwender wrote:

> Hi Tom,
>
> not sure how often I have answered this question here and in other  
> forums. But okay once again: The defaults of siggenes and Excel SAM  
> are a bit different:
>
> - In siggenes, a moderated Welch's t-statistic is computed by  
> default, whereas a moderated version of the ordinary t-statistic  
> assuming equal group variances is used in Excel SAM. Set  
> var.equal=TRUE in sam to use the ordinary t-statistic (only  
> necessary if the sizes of the groups differ).
>
> - In siggenes, the mean number of falsely called genes (warning:  
> this is *not* the expected number of false positives) is computed,  
> whereas the median number is used by Excel SAM. Set med=TRUE in sam  
> to use the median number.
>
> - In siggenes, a natural cubic spline based approach is used to  
> estimate pi0, whereas Excel SAM uses an adhoc estimate. Set  
> lambda=0.5 in sam to use this adhoc estimate.
>
> - Even though I have implemented the computation of the fudge  
> factor s0 exactly as described in the Excel SAM manual, the value  
> of s0 usually differs between siggenes and Excel. Not sure why.
>
> - In siggenes s0=0 is also a choice for the fudge factor. In the  
> old version of Excel SAM it is not. Not sure about the new version.
>
> - Have not found a description on how the q-values are estimated in  
> Excel SAM. The values of the q-values usually differ between  
> siggenes and Excel. In siggenes, the computation of the q-values is  
> implemented in virtually the same way as in John Storey's R package  
> qvalue such that q-values are typically the same. They only differ  
> when there are tied p-values, since siggenes handles ties a bit  
> different (in my opinion more correctly) than John's function qvalue.
>
> - The same seed for the random number generator will not lead to  
> the same permutations of the response.
>
> Best,
> Holger
>
>
>
> -------- Original-Nachricht --------
>> Datum: Thu, 3 Jul 2008 09:21:41 -0400
>> Von: Thomas Hampton <Thomas.H.Hampton at Dartmouth.EDU>
>> An: "Holger Schwender" <holger.schw at gmx.de>
>> CC: bioc <bioconductor at stat.math.ethz.ch>
>> Betreff: [BioC] Siggenes/SAM vs Excel
>
>> How similar is  siggenes to the Stanford SAM with the Excel front  
>> end?
>>
>> Thanks!
>>
>> Tom
>
> --
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor