[R] Reg : null values in kmeans

Thu Dec 16 13:28:53 CET 2010

Hi Raji,

I am quite sure that kmeans in general is not able to handle missing 
values so most probably there wont be an option for this in R. Either 
you omit the observations with NAs as William proposed or you search for 
some algorithm that can handle missing values (not sure whether there is 
any).  Other alternatives would be to put mean values in the NA places. 
This, however, biases the results.

HTH
Jannis

raji sankaran schrieb:
> Hi Jannis,
>
>   Thank you for answering my question. I saw the option called na.omit when
> i used nnet() and tried to classify Iris data with that. I wanted to know if
> there is a similar option available in kmeans which can omit or in some way
> consider the null/NA values and cluster the observations.Currently, kmeans
> throws an error for the dataset with NULL/NA values.
>
> >From your answer, i could understand that, the option of handling NULL/NA is
> not available with kmeans. Please correct me if am wrong.
>
> Thanks again :)
>
> On Wed, Dec 15, 2010 at 6:50 PM, Jannis <bt_jannis at yahoo.de> wrote:
>
>   
>> I do not really understand your question. You can use use kmeans but
>> without the observations that include the NA values (e.g. by deleting whole
>> rows in your observation matrix). If you want to keep the information in the
>> valid observations of those rows, I fear you need to look for a clustering
>> algorithm that can handle missing values. I doubt that there is a kmeans
>> version that can. Think about inserting means of all other observations into
>> the gaps, though this introduces bias as well.
>>
>>
>> Jannis
>>
>> Raji schrieb:
>>
>>  Hi,
>>     
>>>  I am using k means algorithm for clustering.My data contains a few
>>> null/NA
>>> values.kmeans doesnt cluster with those values.Are there any option like
>>> na.omit which can avoid these null values and cluster the remaining
>>> values?
>>>
>>> Thanks,
>>> Raji
>>>
>>>
>>>       
>>     
>
>