[R] Reg : null values in kmeans

Thu Dec 16 05:28:31 CET 2010

Have your tried something like the following?

> # put some data to cluster in a data.frame
> d <- data.frame(x1=log(1:50), x2=sqrt(1:50), x3=1/(1:50))
> # put NA's in rows 1 and 3
> d[1,1] <- d[3,3] <- NA
> # cluster the non-NA rows
> tmp <- kmeans(na.omit(d), 3) # 3 clusters
> # add cluster id vector to original dataset, aligned properly
> d$cluster <- rep(NA, nrow(d))
> d[names(tmp$cluster), "cluster"] <- tmp$cluster
> head(d)
         x1       x2        x3 cluster
1        NA 1.000000 1.0000000      NA
2 0.6931472 1.414214 0.5000000       3
3 1.0986123 1.732051        NA      NA
4 1.3862944 2.000000 0.2500000       3
5 1.6094379 2.236068 0.2000000       3
6 1.7917595 2.449490 0.1666667       3
> tail(d)
         x1       x2         x3 cluster
45 3.806662 6.708204 0.02222222       1
46 3.828641 6.782330 0.02173913       1
47 3.850148 6.855655 0.02127660       1
48 3.871201 6.928203 0.02083333       1
49 3.891820 7.000000 0.02040816       1
50 3.912023 7.071068 0.02000000       1

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of raji sankaran
> Sent: Wednesday, December 15, 2010 7:43 PM
> To: Jannis
> Cc: r-help at r-project.org
> Subject: Re: [R] Reg : null values in kmeans
> 
> Hi Jannis,
> 
>   Thank you for answering my question. I saw the option 
> called na.omit when
> i used nnet() and tried to classify Iris data with that. I 
> wanted to know if
> there is a similar option available in kmeans which can omit 
> or in some way
> consider the null/NA values and cluster the 
> observations.Currently, kmeans
> throws an error for the dataset with NULL/NA values.
> 
> >From your answer, i could understand that, the option of 
> handling NULL/NA is
> not available with kmeans. Please correct me if am wrong.
> 
> Thanks again :)
> 
> On Wed, Dec 15, 2010 at 6:50 PM, Jannis <bt_jannis at yahoo.de> wrote:
> 
> > I do not really understand your question. You can use use kmeans but
> > without the observations that include the NA values (e.g. 
> by deleting whole
> > rows in your observation matrix). If you want to keep the 
> information in the
> > valid observations of those rows, I fear you need to look 
> for a clustering
> > algorithm that can handle missing values. I doubt that 
> there is a kmeans
> > version that can. Think about inserting means of all other 
> observations into
> > the gaps, though this introduces bias as well.
> >
> >
> > Jannis
> >
> > Raji schrieb:
> >
> >  Hi,
> >>
> >>  I am using k means algorithm for clustering.My data contains a few
> >> null/NA
> >> values.kmeans doesnt cluster with those values.Are there 
> any option like
> >> na.omit which can avoid these null values and cluster the remaining
> >> values?
> >>
> >> Thanks,
> >> Raji
> >>
> >>
> >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>