[BioC] Looking for strongly correlated gene expression data
Kim, K.I.
K.I.Kim at tue.nl
Tue Mar 13 11:01:50 CET 2007
I'd like to explain more. Simply I am considering multiple testings
using gene expression data.
In the usual two group multiple testing set-up, if we assume true null
p-values are distributed independently and for example, 90% of p-values
are truly null, then we can see around 90% of p-values are uniformly
distributed. (for example, "golub" dataset in R multtest package) But if
there exist strong correlations among p-values (or genes), then we can't
expect such features. I guess histograms under dependent cases are more
curved than flat line even for the large p-values.
Actually, I am looking for gene expression datasets which shows "very"
different histogram from the histograms of usual independent assumption
and I want to do multiple testing using such datasets.
I also thought downloading some gene expression files from a large
database and then doing multiple testing but then I need to do some
preprocessing jobs on the downloaded files and they will take some time.
Instead I hoped to get "easy" dataset (already preprocessed like "golub"
dataset in multtest package) in bioconductor. If there is no other
convenient way to do it, then I may need to try NCBI GEO.
Thank you for your advice.
Kyung In.
-----Original Message-----
From: Sean Davis [mailto:sdavis2 at mail.nih.gov]
Sent: Monday, March 12, 2007 2:05 PM
To: bioconductor at stat.math.ethz.ch
Cc: Kim, K.I.
Subject: Re: [BioC] Looking for strongly correlated gene expression data
On Monday 12 March 2007 08:33, Kim, K.I. wrote:
> Hi BioConductor Users,
>
> I am looking for gene expression data sets with very strong
correlation
> features. (positive or negative) So, I hope I can't expect independent
> uniform distributions for true null p-values of those data sets.
>
> If anyone knows such data sets, please let me know?
Kyung,
Could you simply test this in a bunch of datasets? In particular, could
you
download many (or all) of the datasets from NCBI GEO and test your
hypothesis
that such datasets exist and in what proportion? I may be
misunderstanding
what you want to do, though.
Sean
More information about the Bioconductor
mailing list