[BioC] Clustering of 30,000+ genes

Sean Davis sdavis2 at mail.nih.gov
Fri Sep 9 12:08:35 CEST 2011


Hi, January.

One common way of reducing the number of features is to choose the top
X% by variance or coefficient of variation.  A large percentage of
genes are not even expressed in a given tissue type and another large
percentage do not vary across a sample set.  You can use the
genefilter package to perform such filtering.

Sean

On Wed, Sep 7, 2011 at 5:29 PM, January Weiner <january.weiner at gmail.com> wrote:
> Hello,
>
> I'm struggling with co-expression analysis, and for that I would like
> to try to cluster all the genes I have in my microarray set, including
> those which are not differentially expressed between the study groups.
> I am using CoXpress at the moment and will try my luck with GSCA as
> well, but both packages seem to have been layed out for 3000 rather
> than 30000 genes.
>
> How do you do that in R? I get errors about R not being able to
> allocate enough memory. Clearly, the amount of memory required to
> calculate all correlations the simple way might be a bit on the large
> side, but I can think of one or two tricks to get this done; I wonder
> whether it has been implemented already.
>
> Other than that -- how should I reasonably limit the number of genes
> to study? i don't want to bias the outcome of the analysis by
> selecting only genes that are DE, actually -- I would be very
> interested in genes that  show differential co-expression, but no
> differences in expression.
>
> Kind regards,
>
> j.
>
> --
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list