[BioC] Generating random gene lists: does sample/resample generate random sets

Thu Sep 11 10:18:57 CEST 2008

On Thu, Sep 11, 2008 at 12:28 AM, Ochsner, Scott A <sochsner at bcm.tmc.edu> wrote:
> Thomas,
>
> I wanted to asses the performance of random gene lists which do not have any overlap with myCuratedList hence the step to remove them from the universe of possible genes prior to random gene selection.  If I leave the curated genes in, random lists could potentially be produced with significant
> similarity to myCuratedList.  I'm interested in the chance occurrence of unique gene lists with similar classification performance as myCuratedList.  I certainly have an open mind with this point if others can come up good reasons why this may be a bad idea.
>

I agree with Tom, here.  You must include all the genes that were
originally included when you produced "myCuratedList".  Not doing so
makes precludes drawing any conclusions from the randomization
results.

Sean

> From: Thomas Hampton [mailto:Thomas.H.Hampton at Dartmouth.EDU]
> Sent: Wed 9/10/2008 3:40 PM
> To: Ochsner, Scott A
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Generating random gene lists: does sample/resample generate random sets
>
>
>
> I would not have taken the curated list out. That strikes me as
> a significant bias. Am I missing something?
>
> Tom
>
> On Sep 10, 2008, at 4:03 PM, Ochsner, Scott A wrote:
>
>> Dear BioC,
>>
>> I would like feedback as to the appropriateness of the following
>> procedure to produce a set of 1000 random gene lists, each list of
>> length 2000.  The idea is to use the set of random gene lists to
>> assess how often random gene lists of size x can reproduce or
>> improve the classification performance of
>> myCuratedList.
>>
>>
>> #remove myCuratedList from the universe of possible genes.  The
>> "eset" object is your standard ExpressionSet object.
>>> length(myCuratedList)
>>  [1] 2000
>>> Index<-setdiff(1:length(rownames(exprs(eset))),myCuratedList)
>>> length(Index)
>>  [1] 20277
>> #generate 1000 random gene lists using the genes in Index.  The
>> code for resample is taken from the help pages for sample.
>>
>>> randomMatrix<-replicate(1000,resample(index,2000))
>>> dim(randomMatrix)
>>  [1] 2000 1000
>>
>>
>> I've verified that each column does not contain repeated genes as
>> should be the case with resample without replacement.
>>
>> Is there a standard procedure for doing the above or is what I've
>> done kosher?
>>
>>
>> Scott A. Ochsner, Ph.D.
>> NURSA Bioinformatics
>> Molecular and Cellular Biology
>> Baylor College of Medicine
>> Houston, TX. 77030
>> phone: 713-798-6227
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/
>> gmane.science.biology.informatics.conductor
>
>
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>