[BioC] siggenes permutation count problem
James W. MacDonald
jmacdon at med.umich.edu
Sat Jan 7 14:33:35 CET 2006
paul.boutros at utoronto.ca wrote:
> Hello,
>
> I'm having some troubles interpreting how/why siggenes performed a certain
> number of permutations on my dataset. This is an affy dataset that was
> normalized by:
>
> data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> eset <- expresso(data, normalize.method="constant", bgcorrect.method="none",
> pmcorrect.method="mas", summary.method="avgdiff");
>
> I realize that the normalization is a bit unusual: this study is actually
> testing a range of normalization methods. This is a two-class experiment with
> 3 arrays in each group:
>
>
>>eset;
>
> Expression Set (exprSet) with
> 22690 genes
> 6 samples
> phenoData object with 1 variables and 6 cases
> varLabels
> Group: read from file
>
>>design;
>
> [1] 1 1 0 1 0 0
>
>
> So to do a SAM-like analysis I used:
> SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000);
>
> And I expected there to be 6! = 720 total possible permutations. So I was
> surprised to get this output:
>
>>SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000);
>
>
> We're doing 20 complete permutations
>
>
> Why does siggenes think there are only 20 complete permutations to be used?
> Have I done something wrong, or is my understanding of how the permutations are
> done in error?
It's a combination of incorrect terminology and (possibly) a
misunderstanding on your part. First, there *are* 720 possible
permutations, but we don't care about the ordering within each group
since we are simply comparing group means. What we really want here are
combinations, and there are only 20 combinations when you have 6 samples
and you are choosing three for each group (see ?choose). If you did all
720 permutations it would result in only 20 unique t-statistics with a
lot of replication.
This terminology is a hold over from SAM, which AFAIK really did do the
permutations rather than combinations. However, this is very wasteful of
computing time especially when the number of replicates gets large, so
siggenes rightly does the combinations and abuses terminology by calling
them 'complete permutations'.
Best,
Jim
>
> This is R 2.2.1 and siggenes 1.4.0 on WinXP.
>
> Paul
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
--
James W. MacDonald
University of Michigan
Affymetrix and cDNA Microarray Core
1500 E Medical Center Drive
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
More information about the Bioconductor
mailing list