[BioC] Looking for strongly correlated gene expression data

Kim, K.I. K.I.Kim at tue.nl
Tue Mar 13 11:01:50 CET 2007


I'd like to explain more. Simply I am considering multiple testings
using gene expression data. 
In the usual two group multiple testing set-up, if we assume true null
p-values are distributed independently and for example, 90% of p-values
are truly null, then we can see around 90% of p-values are uniformly
distributed. (for example, "golub" dataset in R multtest package) But if
there exist strong correlations among p-values (or genes), then we can't
expect such features. I guess histograms under dependent cases are more
curved than flat line even for the large p-values.

Actually, I am looking for gene expression datasets which shows "very"
different histogram from the histograms of usual independent assumption
and I want to do multiple testing using such datasets.

I also thought downloading some gene expression files from a large
database and then doing multiple testing but then I need to do some
preprocessing jobs on the downloaded files and they will take some time.
Instead I hoped to get "easy" dataset (already preprocessed like "golub"
dataset in multtest package) in bioconductor. If there is no other
convenient way to do it, then I may need to try NCBI GEO.

Thank you for your advice.

Kyung In.

-----Original Message-----
From: Sean Davis [mailto:sdavis2 at mail.nih.gov] 
Sent: Monday, March 12, 2007 2:05 PM
To: bioconductor at stat.math.ethz.ch
Cc: Kim, K.I.
Subject: Re: [BioC] Looking for strongly correlated gene expression data

On Monday 12 March 2007 08:33, Kim, K.I. wrote:
> Hi BioConductor Users,
>
> I am looking for gene expression data sets with very strong
correlation
> features. (positive or negative) So, I hope I can't expect independent
> uniform distributions for true null p-values of those data sets.
>
> If anyone knows such data sets, please let me know?

Kyung,

Could you simply test this in a bunch of datasets?  In particular, could
you 
download many (or all) of the datasets from NCBI GEO and test your
hypothesis 
that such datasets exist and in what proportion?  I may be
misunderstanding 
what you want to do, though.

Sean



More information about the Bioconductor mailing list