BTW, I got the paper you mentioned.
Best,
Alex
On 6/9/07, ssls sddd wrote:
>
> Dear Bill,
>
> I really appreciate your valuable suggestions.
>
> I re-visited the manual of MantelCorr Package. In the example on Page 2,
> it says that the Golub training data consists of gene-expression values
> measured for 38 samples from
> Affymetrix Hgu6800 chips on 7,129 genes, then select k = 500 clusters. I
> just realized
> that the analysis performed the clustering on 7,129 genes NOT 38 samples.
> My data
> consists of around 238,000 SNPs, 49 samples. Ideally, if I want to
> classify 49 samples,
> I need to transpose my data first?
>
> Thanks a lot!
>
> Alex
>
>
> On 6/9/07, William Shannon wrote:
> >
> > It depends on your goal for the analysis.
> >
> > If you are wanting to find snp's whose log2(ratio's) are similar across
> > the samples then you are done with the analysis after k-means (though you
> > should read the literature on k-means for various ways to select the optimal
> > k). In this case you can extract the names of the snp's in each of the K
> > clusters directly from the kmeans object.
> >
> > If however you want to go one step further and see how these clusters
> > separate the samples then you could try what we did a long time ago in the
> > paper cited below (I can email you a of on Monday if you can't access it).
> >
> > In this paper we took the k-mean cluster centers and sorted them by
> > their log2(ratio) and looked to see how well they separated 2 (or maybe it
> > was 3) classes of skin samples.
> >
> > A. M. Bowcock, W. Shannon, F. Du, J. Duncan, K. Cao, K. Aftergut, J.
> > Catier, M. A. Fernandez-Vina, and A. Menter
> > *Insights into psoriasis and other inflammatory diseases from
> > large-scale gene expression studies*
> > Hum. Mol. Genet., August 1, 2001; 10(17): 1793 - 1805.
> >
> > Bill
> > *ssls sddd * wrote:
> >
> > Dear Bill,
> >
> > Thanks a lot for the suggestions. Yes, they are Affy SNP data.
> > I used the MantelCorr Package. It worked well. Specifically, the
> > commands
> > I ran are:
> >
> > library(MantelCorr)
> > kmeans.result <- GetClusters(x, 500, 100)
> > DistMatrices.result <- DistMatrices(x, kmeans.result$clusters)
> > MantelCorrs.result <- MantelCorrs(DistMatrices.result$Dfull,
> > DistMatrices.result$Dsubsets)
> > permuted.pval <- PermutationTest( DistMatrices.result$Dfull,
> > DistMatrices.result$Dsubsets, 100, 49, 0.05)
> > ClusterLists <- ClusterList(permuted.pval, kmeans.result$cluster.sizes,
> > MantelCorrs.result)
> > ClusterGenes <- ClusterGeneList(kmeans.result$clusters,
> > ClusterLists$SignificantClusters, data)
> >
> > Can you suggest me how to view the result? Is there a way to visualize
> > the
> > clusters?
> >
> > Thanks a lot!
> >
> > Sincerely,
> >
> > Alex
> >
> > On 6/7/07, William Shannon wrote:
> > >
> > > You may want to consider a k-means cluster. The pvclust appears to be
> > a
> > > hierarchical clustering algorithm (with subsequent p value estimation)
> > which
> > > is causing the problem.
> > >
> > > Hierarchical clustering uses a pairwise distance matrix to form the
> > tree
> > > dendrogram. With N = 238804 this will require a matrix with N(N-1)/2
> > or
> > > about (238804^2)/2 elements. That's what causes the memory problem.
> > >
> > > K-means is not so intensive and will result in clustering the 238804
> > rows
> > > (I assume they are snp's) and each cluster will be represented by a
> > men
> > > vector for the 49 variables.
> > >
> > > If on the other hand you want to cluster the 49 columns you may need
> > to
> > > transpose the data matrix and then run a hierarchical clustering, but
> > I
> > > would look into kmeans first.
> > >
> > > Bill Shannon
> > > Washington Univ. School of Medicine
> > >
> > >
> > > *ssls sddd * wrote:
> > >
> > > Dear List,
> > >
> > > I have a question to bother you about how to do clustering.
> > > My data consists of 49 columns (49 variables) and 238804 rows.
> > > I would like to do hierarchical clustering (unsupervised clustering
> > > and PCA). So far I tried pvclust
> > > (www.is.titech.ac.jp/~shimo/prog/
> > > *pvclust*/)
> > > but I always had the problem like for R like "cannot allocate the
> > memory".
> > >
> > > I am curious about what else packages can perform the clustering
> > analysis
> > > while memory efficient.
> > >
> > > Meanwhile, is there any way that I can extract the features of each
> > > cluster.
> > >
> > > In other words, I would like to identify which are responsible for
> > > classifying these
> > > variables (samples).
> > >
> > > Thanks a lot!
> > >
> > > Sincerely,
> > >
> > > Alex
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> > >
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
> >
>
[[alternative HTML version deleted]]