[BioC] Problem of using MultiClass SAM (siggenes package)
Holger Schwender
holger.schw at gmx.de
Sat Nov 10 13:44:47 CET 2007
Hi,
I actually do not know what the reasons for these differences are, since I do not know what exactly is implemented in the Excel SAM version. What sam in siggenes does is to compute an ordinary F-statistic as implemented in mt.teststat.num.denum of the multtest package, and then to add the fudge factor to the denominator of this test statistic. If Excel SAM also uses this statistic (and the fudge factors are equal), then at least the resulting values of the test statistics for the genes (if out is the output of sam, i.e. out <- sam(...), then out at d will give you these values) should be identical. So for a start, you might take a look at the fudge factors if they are equal in siggenes and Excel SAM (likely they are not, although I implemented them in the way it was described in the Excel SAM manual), and then compare the test scores returned by sam and Excel SAM (regarding the values of the fudge factors).
Another reason for the differences might be that Excel SAM uses the median number of falsely called genes, whereas siggenes uses by default the mean number. To change the latter, set med=FALSE. However, that actually only influences the estimated value of the FDR. So it should not be a reason for the other differences.
But the chosen permutations might play a role. Note that even though you set the random number generator to the same seed in sam and Excel SAM, this does not mean that you will get the same permutations (you can "just" reproduce the results of two applications of sam). However, sam allows to input a permutation matrix (see the argument mat.samp in d.stat). So if it is possible to obtain the matrix with the permuted class labels from Excel SAM, you can use this matrix in sam.
Best,
Holger
-------- Original-Nachricht --------
> Datum: Sat, 10 Nov 2007 17:55:05 +0800
> Von: "呂若陽" <davidlue7 at gmail.com>
> An: bioconductor at stat.math.ethz.ch
> Betreff: [BioC] Problem of using MultiClass SAM (siggenes package)
> Hello,
> I am using the package siggenes for multiclass problem.
> (R 2.6.0 ;BioConductor 2.1;siggenes 1.2.11)
> My dataset is 9 samples in 3 classes, each class contains 3 samples.
> There are totally 13915 genes.
> When I using siggenes to do Multiclass SAM,
> (code: samResults<-sam(dd,cl, B=500, rand=123) )
> the results are as follow:
>
> SAM Analysis for the Multi-Class Case with 3 Classes
>
> Delta p0 False Called FDR
> 1 0.1 0.061 11893.018 13745 0.052440
> 2 913.6 0.061 0.538 59 0.000553
> 3 1827.2 0.061 0.084 12 0.000424
>
> Also,I know I can use SAM for EXCEL to do this.
> However, the results are quite different:
>
> delta # med false pos 90th perc false pos # called median FDR 90th perc
> FDR
>
> 0.099 2119.652318 2173.516349 13863 0.152899972 0.156785425
> 1.046 110.2554078 341.349479 10251 0.010755576 0.033299139
> 7.156 0 3.475098814 1166 0 0.002980359
> 14.86 0 0.473877111 166 0 0.002854681
> 61.90 0 0 1 0 0
>
> Both of them are computed with 500 permutation, and rand seed 123.
> What's going wrong with my work?
>
> I have read manual for several times(siggenes.pdf),
> but the only information about multiclass is how to assign grouping.
> Should I assign more parameters when using SAM? How?
>
> Thank you very much for answering.
>
>
> Sincerely,
> Ruo Yang, Lu
>
>
> ===== ===== ===== ===== =====
> NTU Research Center For Medical Excellence
> Bioinformatics and Biostatistics Core
> TEL:(02)2312-3456#8685
> FAX:(02)3322-4179
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
More information about the Bioconductor
mailing list