# [BioC] SAM siggenes number of permutations

Claus-Dieter Mayer claus at bioss.ac.uk
Thu Jan 31 12:52:41 CET 2008

Hi Holger,

I am not a SAM-expert, so I accept the second point you mention. My
comment rather referred to a standard permutation test. I am not sure
whether I agree with your first point though. In my understanding a
p-value is the probability of  "obtaining a result at least as extreme
as a given data point" (Wikipedia agrees with me on this), i.e it is the
probability of being ">=" the observed value. So if there are only 20
possible ways to split the data up into the two groups, one of them will
lead to the observed value, so the p-value will be 1/20 at least.
Replacing the ">=" by a ">" in the calculation of the p-value will give
the wrong result ( at least if the number of permutations is small).
In general exact zeros should not occur for p-values in real-life
situations (mathematically you can of course construct situations, where
certain values are impossible to be obtained under the null hypothesis),
the zeros you will find in output occasionally are just extremely small
numbers, where the non-zero entry comes at decimal point that cannot be
displayed.

Best Wishes

Claus

Holger Schwender wrote:
> Hi Claus,
>
> this is not totally correct. If none of the permuted test scores is larger than the actual test score, then your p-value will be 0.
>
> Moreover, SAM uses not just the B permuted test scores of a particular gene to compute its p-values, but all mB permuted test scores of all m genes such that the p-value of a gene is given by i/mB instead of i/B, where i is the number of more extreme permuted test scores and B is the number of permutations.
>
> Best,
> Holger
>
>
> -------- Original-Nachricht --------
>
>> Datum: Wed, 30 Jan 2008 15:27:25 +0000
>> Von: Claus-Dieter Mayer <claus at bioss.ac.uk>
>> An: olivier armant <olivier.armant at itg.fzk.de>
>> CC: bioconductor at stat.math.ethz.ch
>> Betreff: Re: [BioC] SAM siggenes number of permutations
>>
>
>
>> Dear Oliver,
>>
>> my guess is that you have 2 groups with 3 samples each in which case
>> there are only 20 different possible permutations and the software is
>> clever enough to realise that. In that case the calculation is exact,
>> but you will not find anything significant as the smallest possible
>> p-value is 5% (1/20) for a one-sided and 10%  (2/20) for a two-sided
>> test. The problem of how large groupsizes must be in order to apply
>> permutation tests was discussed on this list some time ago, have a look
>> at https://stat.ethz.ch/pipermail/bioconductor/2007-November/020110.html.
>>
>> Hope that helps,
>>
>> Claus
>>
>> olivier armant wrote:
>>
>>> Dear all,
>>>
>>> I try to do SAM on my data using siggenes on R 2.4.1 (I am a beginner)
>>>
>>> The function I use is (after creating the vector)
>>> sam.out<-sam(data.gcrma, sam.c1, B=100, var.equal=TRUE, Set med=TRUE)
>>>
>>> It seems to work well but I get allways the message:
>>>  number of effective permutations=20
>>>
>>> Does it means that only 20 permutations were done, werheas I ask for 100
>>>
>> permutations with the function B=100??
>>
>>> I read in the SAM excel package from standford that a precise FDR
>>>
>> requires 1000 permutations!!!What do you think??
>>
>>> Help would be welcome
>>>
>>>
>>> Olivier ARMANT PhD.
>>>
>>> Institute of Toxicology and Genetics
>>> Forschungszentrum Karlsruhe
>>> Hermann-von-Helmholtz-Platz 1
>>> D-76344 Eggenstein-Leopoldshafen
>>> Germany
>>>
>>>  tel: +49-7247-82-2560
>>>  fax: +49-7247-82-3354
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>>
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>>
>>>
>>>
>>>
>>> Click link below to report this email as spam.
>>>
>>>
>> https://www.mailcontrol.com/sr/3DY9iYaP4!7r39w1EFqnMqyXCXdO4FUjsVoyh6aS5N4FEmP!1HRAPmogM3OjcxjD93Syur5W2CZtunTQgwTuP7V!!KZuwoZSVAucmrR2rgQOGNiaVM6niaGOzmDM1kiNIGdfj1S974ZFrjONfMkOumM3VVLQBeUfyoE8wlh1VA3AcEiVY62mDkUBARsCH4ulx40V!CB9C3v7YvmL6!0DaFxrVhykbxl2
>>
>>
>>>
>>>
>> --
>> ***********************************************************************************
>>  Dr Claus-D. Mayer                    | http://www.bioss.ac.uk
>>  Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
>>  Rowett Research Institute            | Telephone: +44 (0) 1224 716652
>>  Aberdeen AB21 9SB, Scotland, UK.     | Fax: +44 (0) 1224 715349
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>

--
***********************************************************************************
Dr Claus-D. Mayer                    | http://www.bioss.ac.uk
Biomathematics & Statistics Scotland | email: claus at bioss.ac.uk
Rowett Research Institute            | Telephone: +44 (0) 1224 716652
Aberdeen AB21 9SB, Scotland, UK.     | Fax: +44 (0) 1224 715349