[R] K-Means Clustering

Don MacQueen macq at llnl.gov
Fri May 9 16:35:04 CEST 2008


Unfortunately, your data is *not* numeric. That is what the first 
error message, " 'x' must be numeric", is telling you, and you should 
believe it. It might look numeric, but it isn't, which is why Ingmar 
mentioned you might have factors instead of numbers.

Your challenge is to discover why. The "why" will depend on how you 
brought the data into R.

Assuming 'new' is a matrix (which it appears to be), here are some 
ways to find out more about your data object:
    is.numeric(new)
    is.factor(new)
    class(new)
    mode(new)
    str(new)

I'd suggest taking another look at your input data and making very 
sure there are only numbers in it. If it was a text file you read 
into R with some function, inspect the text file carefully. Also, 
check the help pages for the method you used to load the data into R, 
and see if you can find out what kinds of things cause data to be 
interpreted as other than numeric.

-Don

At 12:12 AM -0700 5/9/08, Jordan van Rijn wrote:
>Hello,
>
>I am hoping you can help me with a question concerning kmeans clustering
>in R. I am working with the following data-set (abbreviated):
>
>
>         BMW Ford Infiniti Jeep Lexus Chrysler Mercedes Saab Porsche
>         Volvo
>   [1,]   6    8        2    8     4        5        4    4       7     7
>   [2,]   8    7        4    6     4        1        6    7       8     5
>   [3,]   8    2        4    6     3        2        7    4       4     4
>   [4,]   7    4        4    6     6        1        6    3       5     5
>   [5,]   6    2        4    5     5        1        3    3       6     3
>   [6,]   6    7        3    6     5        1        8    4       8     2
>   [7,]   1    6        6    7     5        2        6    6       5     6
>   [8,]   3    6        6    4     5        1        4    2       1     1
>   [9,]   6    7        5    8     4        1        6    6       8     5
>  [10,]   6    7        5    9     3        1        2    5       1     8
>
>When I try to scale my data and perform kmeans clustering, I get the
>following errors:
>  new <- scale(new)
>Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
>>  cl <- kmeans(new, 4)
>Error in switch(nmeth, { : NA/NaN/Inf in foreign function call (arg 1)
>In addition: Warning message:
>In switch(nmeth, { : NAs introduced by coercion
>
>This is confusing to me since all of the data is numeric and there are
>no missing values. Is there something I need to do to my data to prepare
>it for kmeans? I have tried many matrix transformations but nothing has
>worked so far.
>
>Your help is much appreciated.
>
>Thanks,
>   jordan
>
>--
>   Jordan van Rijn
>   vanrijn9 at fastmail.fm
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062



More information about the R-help mailing list