[BioC] Re: [S] Error in clustering procedure

Liaw, Andy andy_liaw at merck.com
Tue Sep 14 00:03:36 CEST 2004


> From: cstrato
> 
> "Dimension reduction" brings up another important issue:
> I had discussions with quite a few scientists who believe
> that dimension reduction is not allowed, since you are
> loosing worthwile information.

Eh?  By this logic, we shouldn't believe any conclusions drawn in any paper
that does not contain the rawest of raw data?  Part of data analysis is
summmarizing data into the bare essentials (have you heard of `sufficient
statistics'? If not, might worth your while) and extracting useful
information from data that contain noise.  People who make statements like
that probably believe there's no such thing as noise in their data.  May God
have mercy on them.
 
> With respect to gene expression I believe hat it makes
> sense to filter first non-variant genes to reduce the
> number of dimensions.
> 
> But..., these people are using hierarchical clustering
> to cluster chemical compound libraries in "chemical space",
> and there are no compounds to eliminate.

Who are `these people' now?  Seems like you're changing the subject to one
that's probably off-topic for BioC.
 
> So, another question is, which method would be best to
> cluster about one million compounds in chemical space in
> order to be able reduce the number of compounds used in
> screening by selecting only representative members of a
> certain cluster.

There's quite a bit of work done on this subject in the computational
chemistry literature.  The context is really quite different from gene
expression.   Molecules are clustered based on their chemical structures
(which are known), and those data are not measured (usually), but computed,
so there's no measurement errors.  The goal is also quite different.  I have
not heard of anyone trying to find `representative genes' (but I'm not
familiar with bioinformatics--- maybe someone _would_ be interested in
that?).

Andy
 
> Best regards
> Christian
> 
> michael watson (IAH-C) wrote:
> > -----Original Message-----
> > From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
> > 
> > 
> >>But MDS-like methods (note, not algorithms) are better for 
> your stated 
> >>purpose.
> > 
> > 
> > Hi
> > 
> > Just thinking out-loud here, which can be a painful process...
> > 
> > So MDS/PCA is an exercise in dimension reduction.  Therefore, if we
> > reduce the dimensionality of the dataset to few(er) dimensions which
> > explain most of the variability, then order the data set by those
> > dimensions, then that will place together genes (in the 
> list) which are
> > behaving similarly - is that what you are suggesting?
> > 
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
>



More information about the Bioconductor mailing list