[BioC] Generating random gene lists: does sample/resample generate random sets
Thomas Hampton
Thomas.H.Hampton at Dartmouth.EDU
Wed Sep 10 22:40:32 CEST 2008
I would not have taken the curated list out. That strikes me as
a significant bias. Am I missing something?
Tom
On Sep 10, 2008, at 4:03 PM, Ochsner, Scott A wrote:
> Dear BioC,
>
> I would like feedback as to the appropriateness of the following
> procedure to produce a set of 1000 random gene lists, each list of
> length 2000. The idea is to use the set of random gene lists to
> assess how often random gene lists of size x can reproduce or
> improve the classification performance of
> myCuratedList.
>
>
> #remove myCuratedList from the universe of possible genes. The
> "eset" object is your standard ExpressionSet object.
>> length(myCuratedList)
> [1] 2000
>> Index<-setdiff(1:length(rownames(exprs(eset))),myCuratedList)
>> length(Index)
> [1] 20277
> #generate 1000 random gene lists using the genes in Index. The
> code for resample is taken from the help pages for sample.
>
>> randomMatrix<-replicate(1000,resample(index,2000))
>> dim(randomMatrix)
> [1] 2000 1000
>
>
> I've verified that each column does not contain repeated genes as
> should be the case with resample without replacement.
>
> Is there a standard procedure for doing the above or is what I've
> done kosher?
>
>
> Scott A. Ochsner, Ph.D.
> NURSA Bioinformatics
> Molecular and Cellular Biology
> Baylor College of Medicine
> Houston, TX. 77030
> phone: 713-798-6227
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/
> gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list