[R] New to R, trying to use agnes, but can't load my ditance matrix

Bill.Venables at csiro.au Bill.Venables at csiro.au
Mon Jun 27 23:50:03 CEST 2011


The first problem is that you are using a character string as the first argument to agnes()

The help information for agnes says that its first argument, x, is


       x: data matrix or data frame, or dissimilarity matrix, depending
          on the value of the 'diss' argument.

Not a character string.  So first you have to read your data into R and hold it as a "data matrix or data frame".  Then you have a choice.  Either you can calculate your own distance matrix with it and then call agnes() with that as the first argument (and with diss = TRUE) or you can get agnes() to calculate the distance matrix for you, in which case you need to specify how, using the metric = argument.

With 10000 entities to cluster, your distance matrix will require

> 10000*9999/2
[1] 49995000

numbers to be stored at once.  I hope you are using a 64-bit OS!

With such large numbers of entities to cluster, the usual advice is to try something more suited to the job.  clara() is designed for this kind of problem.

It might be useful to keep in mind that R is not a package.  (Repeat: R is NOT a package - I cannot stress that strongly enough.)  It is a programming language. To use it effectively you really need to know something about how it works, first.  It might pay you to spend a little time getting used to the protocols, how to do simple things in R like reading in data and manipulating it, before you tackle such a large and potentially tricky clustering problem.

Bill Venables. 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Karen R. Khar
Sent: Monday, 27 June 2011 5:44 PM
To: r-help at r-project.org
Subject: [R] New to R, trying to use agnes, but can't load my ditance matrix

Hi,

I'm mighty new to R. I'm using it on Windows. I'm trying to cluster using a
distance matrix I created from the data on my own and called it D10.dist. I
loaded the cluster package. Then tried the following command...

> agnes("E:D10.dist", diss = TRUE, metric = "euclidean", stand = FALSE,
> method = "average", par.method, keep.diss = n < 1000, keep.data = !diss)

And it responded...

Error in agnes("E:D10.dist", diss = TRUE, metric = "euclidean", stand =
FALSE,  : 
  x is not and cannot be converted to class dissimilarity

D10.dist has the following data...

D1	0
D2	0.608392	0
D3	0.497451	0.537662	0
D4	0.634548	0.393343	0.537426	0
D5	0.558785	0.543399	0.632221	0.726633	0
D6	0.659483	0.701778	0.741425	0.668624	0.655914	0
D7	0.603012	0.659173	0.571776	0.687599	0.383712	0.683948	0
D8	0.611919	0.665357	0.526453	0.715093	0.457496	0.698213	0.317039	0
D9	0.41501	0.652117	0.552011	0.68969	0.485988	0.702738	0.42819	0.442598	0
D10	0.376512	0.600607	0.517857	0.673515	0.530421	0.667736	0.537025	0.48062
0.240559	0

I would appreciate any suggestions. Please assume I know virtually nothing
about R.

Thanks,
Karen

PS I'll eventually be using ~10,000 "species" to cluster. I'll need to have
within and between cluster distance info and I'll want a plot colored by
cluster. I agnes the right R tool to use?

--
View this message in context: http://r.789695.n4.nabble.com/New-to-R-trying-to-use-agnes-but-can-t-load-my-ditance-matrix-tp3627154p3627154.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list