[R] Reg : null values in kmeans
William Dunlap
wdunlap at tibco.com
Thu Dec 16 05:28:31 CET 2010
Have your tried something like the following?
> # put some data to cluster in a data.frame
> d <- data.frame(x1=log(1:50), x2=sqrt(1:50), x3=1/(1:50))
> # put NA's in rows 1 and 3
> d[1,1] <- d[3,3] <- NA
> # cluster the non-NA rows
> tmp <- kmeans(na.omit(d), 3) # 3 clusters
> # add cluster id vector to original dataset, aligned properly
> d$cluster <- rep(NA, nrow(d))
> d[names(tmp$cluster), "cluster"] <- tmp$cluster
> head(d)
x1 x2 x3 cluster
1 NA 1.000000 1.0000000 NA
2 0.6931472 1.414214 0.5000000 3
3 1.0986123 1.732051 NA NA
4 1.3862944 2.000000 0.2500000 3
5 1.6094379 2.236068 0.2000000 3
6 1.7917595 2.449490 0.1666667 3
> tail(d)
x1 x2 x3 cluster
45 3.806662 6.708204 0.02222222 1
46 3.828641 6.782330 0.02173913 1
47 3.850148 6.855655 0.02127660 1
48 3.871201 6.928203 0.02083333 1
49 3.891820 7.000000 0.02040816 1
50 3.912023 7.071068 0.02000000 1
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of raji sankaran
> Sent: Wednesday, December 15, 2010 7:43 PM
> To: Jannis
> Cc: r-help at r-project.org
> Subject: Re: [R] Reg : null values in kmeans
>
> Hi Jannis,
>
> Thank you for answering my question. I saw the option
> called na.omit when
> i used nnet() and tried to classify Iris data with that. I
> wanted to know if
> there is a similar option available in kmeans which can omit
> or in some way
> consider the null/NA values and cluster the
> observations.Currently, kmeans
> throws an error for the dataset with NULL/NA values.
>
> >From your answer, i could understand that, the option of
> handling NULL/NA is
> not available with kmeans. Please correct me if am wrong.
>
> Thanks again :)
>
> On Wed, Dec 15, 2010 at 6:50 PM, Jannis <bt_jannis at yahoo.de> wrote:
>
> > I do not really understand your question. You can use use kmeans but
> > without the observations that include the NA values (e.g.
> by deleting whole
> > rows in your observation matrix). If you want to keep the
> information in the
> > valid observations of those rows, I fear you need to look
> for a clustering
> > algorithm that can handle missing values. I doubt that
> there is a kmeans
> > version that can. Think about inserting means of all other
> observations into
> > the gaps, though this introduces bias as well.
> >
> >
> > Jannis
> >
> > Raji schrieb:
> >
> > Hi,
> >>
> >> I am using k means algorithm for clustering.My data contains a few
> >> null/NA
> >> values.kmeans doesnt cluster with those values.Are there
> any option like
> >> na.omit which can avoid these null values and cluster the remaining
> >> values?
> >>
> >> Thanks,
> >> Raji
> >>
> >>
> >
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list