[BioC] SAM and siggenes: 4 questions

Holger Schwender holger.schw at gmx.de
Mon Mar 27 12:43:34 CEST 2006


Hi Mohammed,


> --- Ursprüngliche Nachricht ---
> Von: Mohammad Esad-Djou <shahrgol at web.de>
> An: bioconductor at stat.math.ethz.ch
> Betreff: [BioC] SAM and siggenes: 4 questions
> Datum: Sat, 25 Mar 2006 16:02:39 +0100
> 
> Hello, 
> I gets the following table with siggenes package: 
>  
> 
>  	Delta	p0	False	Called	FDR
> 1	5	0,259	2,625	21	0,0323
> 2	12,4	0,259	2,125	17	0,0323
> 3	19,8	0,259	1,75	14	0,0323
> 4	27,2	0,259	0,875	7	0,0323
> 5	34,5	0,259	0,25	2	0,0323
> 6	41,9	0,259	0,25	2	0,0323
> 7	49,3	0,259	0,25	2	0,0323
> 8	56,6	0,259	0,25	2	0,0323
> 9	64	0,259	0,25	2	0,0323
> 10	71,4	0,259	0	0	0
> 
> I have the following questions: 
> 
> 1. can someone give me please detailed description over the parameters? 
> (Delta	p0	False	Called	FDR)
> 

Delta are the Delta values mentioned in the Tibshirani papers and in my tech
report. Delta is the distance between the two cut-off lines and the 45
degree line. 
p0 (actually pi_0) is the prior probability that a gene is not
differentially expressed.
False are the number of the so called falsely called genes if the specified
threshold Delta is used.
Called is the number of genes called differentially expressed if the
corresponding value of Delta is used as threshold.
FDR is the False Discovery Rate, i.e. the error rate, if the corresponding
Delta is used as threshold, where FDR = p0 * False / Called.

All these terms (not my sometimes cryptical abbreviations in the above
table) are explained in the Tibshirani papers or my tech report. A more
detailed description on the implementation of siggenes than the one in the
tech report is available in my diploma thesis

http://de.geocities.com/holgerschw/thesis.pdf


> 2. How is computed the parameters?
> 
Technical details on the siggenes implementation can be found in the tech
report or in the diploma thesis. Some little things have changed since then.
But they are rather unimportant for the SAM analysis itself. E.g., the way
the group labels are permuted has been changed. Not so important, but if you
are interested see ?sam.dstat for a short description of the new way of
permuting.


> 3. How can I interpret the results correct? 
> 

For example, choosing Delta as 27.2 leads in your analysis to 7
differentially expressed genes and a FDR of 0.0323 which loosely spoken
means that about 3% of the 7 genes called differentially expressed are
actually not really differentially expressed. Your task now is to find a
Delta value that provides a good balance between the number of identified
genes (Called in the above table) and the estimated FDR.  

A note:

- By taking a look at the Delta values it seems that you are using the gene
expression data on the original scale. It would be better to take the
logarithm to the basis 2 of your data and put this log_2 transformed data
into sam.


> 4. I found no suitable literature over it (except 2 articles by Robert
> Tibshirani and 1 by Holger Schwender ). How can I find more literature
over
> it? 
> 

The abovementioned diploma thesis.

HTH,
Holger

> 
> Thanks in advance,
> Mohammad Esad-Djou
> 
> 
> Interdisziplinäres Zentrum für Bioinformatik (IZBI)
> Universität Leipzig 
> Härtelstr. 16 - 18
> D-04107 Leipzig
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

--



More information about the Bioconductor mailing list