[BioC] Looking for strongly correlated gene expression data
Sean Davis
sdavis2 at mail.nih.gov
Tue Mar 13 11:12:35 CET 2007
On Tuesday 13 March 2007 06:01, Kim, K.I. wrote:
> I'd like to explain more. Simply I am considering multiple testings
> using gene expression data.
> In the usual two group multiple testing set-up, if we assume true null
> p-values are distributed independently and for example, 90% of p-values
> are truly null, then we can see around 90% of p-values are uniformly
> distributed. (for example, "golub" dataset in R multtest package) But if
> there exist strong correlations among p-values (or genes), then we can't
> expect such features. I guess histograms under dependent cases are more
> curved than flat line even for the large p-values.
>
> Actually, I am looking for gene expression datasets which shows "very"
> different histogram from the histograms of usual independent assumption
> and I want to do multiple testing using such datasets.
>
> I also thought downloading some gene expression files from a large
> database and then doing multiple testing but then I need to do some
> preprocessing jobs on the downloaded files and they will take some time.
> Instead I hoped to get "easy" dataset (already preprocessed like "golub"
> dataset in multtest package) in bioconductor. If there is no other
> convenient way to do it, then I may need to try NCBI GEO.
Just sticking to the NCBI GEO idea (I have a not-so-hidden agend as the author
of GEOquery), you can simply use the GDSs from GEO. They are already
preprocessed and can be easily transformed into Bioconductor objects like
exprSets and used for t-testing. It would take only a few lines of code to
do what you are suggesting for as many GDSs as you like. So, before writing
off all the data in GEO, you might look at the GEOquery vignette to see if it
might serve your needs.
Sean
More information about the Bioconductor
mailing list