[BioC] siggenes permutation count problem

paul.boutros@utoronto.ca paul.boutros at utoronto.ca
Sat Jan 7 19:48:35 CET 2006


Hi Jim (and others who replied off-list),

Thank you -- when I saw the term "complete permutations", it didn't register in 
my head that it really meant combinations.  

Paul

Quoting "James W. MacDonald" <jmacdon at med.umich.edu>:

> paul.boutros at utoronto.ca wrote:
> > Hello,
> > 
> > I'm having some troubles interpreting how/why siggenes performed a certain
> 
> > number of permutations on my dataset.  This is an affy dataset that was 
> > normalized by:
> > 
> > data <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> > eset <- expresso(data, normalize.method="constant",
> bgcorrect.method="none", 
> > pmcorrect.method="mas", summary.method="avgdiff");
> > 
> > I realize that the normalization is a bit unusual: this study is actually 
> > testing a range of normalization methods.  This is a two-class experiment
> with 
> > 3 arrays in each group:
> > 
> > 
> >>eset;
> > 
> > Expression Set (exprSet) with 
> >         22690 genes
> >         6 samples
> >                  phenoData object with 1 variables and 6 cases
> >          varLabels
> >                 Group: read from file
> > 
> >>design;
> > 
> > [1] 1 1 0 1 0 0
> > 
> > 
> > So to do a SAM-like analysis I used:
> > SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000);
> > 
> > And I expected there to be 6! = 720 total possible permutations.  So I was
> 
> > surprised to get this output:
> > 
> >>SAM.data <- sam(data=eset, cl=design, var.equal=FALSE, B=1000);
> > 
> > 
> > We're doing 20 complete permutations
> > 
> > 
> > Why does siggenes think there are only 20 complete permutations to be used?
>  
> > Have I done something wrong, or is my understanding of how the permutations
> are 
> > done in error?
> 
> It's a combination of incorrect terminology and (possibly) a 
> misunderstanding on your part. First, there *are* 720 possible 
> permutations, but we don't care about the ordering within each group 
> since we are simply comparing group means. What we really want here are 
> combinations, and there are only 20 combinations when you have 6 samples 
> and you are choosing three for each group (see ?choose). If you did all 
> 720 permutations it would result in only 20 unique t-statistics with a 
> lot of replication.
> 
> This terminology is a hold over from SAM, which AFAIK really did do the 
> permutations rather than combinations. However, this is very wasteful of 
> computing time especially when the number of replicates gets large, so 
> siggenes rightly does the combinations and abuses terminology by calling 
> them 'complete permutations'.
> 
> Best,
> 
> Jim
> 
> 
> > 
> > This is R 2.2.1 and siggenes 1.4.0 on WinXP.
> > 
> > Paul
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
> 
> -- 
> James W. MacDonald
> University of Michigan
> Affymetrix and cDNA Microarray Core
> 1500 E Medical Center Drive
> Ann Arbor MI 48109
> 734-647-5623
> 
> 
> 
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be
> used for urgent or sensitive issues.
>



More information about the Bioconductor mailing list