[BioC] Siggenes/SAM vs Excel

Holger Schwender holger.schw at gmx.de
Mon Jul 7 05:00:27 CEST 2008


First of all, siggenes has nothing to do with Excel SAM and samr. So "siggenes (Excel) SAM" is wrong. I have implemented the original version of siggenes (a much worse version of the current siggenes) for my diploma thesis at the University of Dortmund, Germany, where both Excel SAM and samr are written by the guys from Stanford that have proposed SAM.

If you would like to order your genes by the fold changes you are free to do so. The output of sam, say sam.out, provides all the information that you need to do so. For example, sam.out at fold contains the fold changes, sam.out at d the test statistics, sam.out at qvalue the q-values,... See ?SAM for all the slots sam.out has.

Holger
 
-------- Original-Nachricht --------
> Datum: Fri, 4 Jul 2008 10:41:46 -0400
> Von: Thomas Hampton <Thomas.H.Hampton at Dartmouth.EDU>
> An: "Holger Schwender" <holger.schw at gmx.de>
> CC: bioconductor at stat.math.ethz.ch
> Betreff: Re: [BioC] Siggenes/SAM vs Excel

> Thanks very much for your response. It helped a lot.
> 
> One of the things it points out to me (yet again) is that the  
> essential algorithm has two parts, one to
> make it so that dinky variances don't create a bunch of "significant"  
> genes out of small variance alone.
> 
> Second,  SAM goes to substantial lengths trying to control the false  
> positive rate without
> simply throwing all the significant genes out.
> 
> Regardless of tweaks in the defaults, either SAM (Excel) siggenes or  
> samr from CRAN should do an excellent job
> of identifying differentially expressed genes.
> 
> On my data, SAM certainly does a great job compared to a t test.  
> However, like Guo of MAQC fame, I get better
> concordance between replicates with a fold change and loose p value  
> cutoff. This seems far too simple
> to be better than SAM, but there you have it. Bioc's  RankProd also  
> gives me good concordance. It does . however, order
> its gene lists by straight fold change...
> 
>   Perhaps SAM lists ordered by  fold change and cut off at a certain  
> FDR would create optimal concordance. We do not really expect FDR to be
> more reproducible than fold change, do we?
> 
> Anyway, thanks for all your input into this list.
> 
> yours,
> 
> Tom
> 
> 
> 
> 
> 
> On Jul 3, 2008, at 11:04 PM, Holger Schwender wrote:
> 
> > Hi Tom,
> >
> > not sure how often I have answered this question here and in other  
> > forums. But okay once again: The defaults of siggenes and Excel SAM  
> > are a bit different:
> >
> > - In siggenes, a moderated Welch's t-statistic is computed by  
> > default, whereas a moderated version of the ordinary t-statistic  
> > assuming equal group variances is used in Excel SAM. Set  
> > var.equal=TRUE in sam to use the ordinary t-statistic (only  
> > necessary if the sizes of the groups differ).
> >
> > - In siggenes, the mean number of falsely called genes (warning:  
> > this is *not* the expected number of false positives) is computed,  
> > whereas the median number is used by Excel SAM. Set med=TRUE in sam  
> > to use the median number.
> >
> > - In siggenes, a natural cubic spline based approach is used to  
> > estimate pi0, whereas Excel SAM uses an adhoc estimate. Set  
> > lambda=0.5 in sam to use this adhoc estimate.
> >
> > - Even though I have implemented the computation of the fudge  
> > factor s0 exactly as described in the Excel SAM manual, the value  
> > of s0 usually differs between siggenes and Excel. Not sure why.
> >
> > - In siggenes s0=0 is also a choice for the fudge factor. In the  
> > old version of Excel SAM it is not. Not sure about the new version.
> >
> > - Have not found a description on how the q-values are estimated in  
> > Excel SAM. The values of the q-values usually differ between  
> > siggenes and Excel. In siggenes, the computation of the q-values is  
> > implemented in virtually the same way as in John Storey's R package  
> > qvalue such that q-values are typically the same. They only differ  
> > when there are tied p-values, since siggenes handles ties a bit  
> > different (in my opinion more correctly) than John's function qvalue.
> >
> > - The same seed for the random number generator will not lead to  
> > the same permutations of the response.
> >
> > Best,
> > Holger
> >
> >
> >
> > -------- Original-Nachricht --------
> >> Datum: Thu, 3 Jul 2008 09:21:41 -0400
> >> Von: Thomas Hampton <Thomas.H.Hampton at Dartmouth.EDU>
> >> An: "Holger Schwender" <holger.schw at gmx.de>
> >> CC: bioc <bioconductor at stat.math.ethz.ch>
> >> Betreff: [BioC] Siggenes/SAM vs Excel
> >
> >> How similar is  siggenes to the Stanford SAM with the Excel front  
> >> end?
> >>
> >> Thanks!
> >>
> >> Tom
> >
> > --
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/ 
> > gmane.science.biology.informatics.conductor

-- 

Jetzt dabei sein: http://www.shortview.de/wasistshortview.php?mc=sv_ext_mf@gmx



More information about the Bioconductor mailing list