[R] How to deal with missing values when using Random Forrest

David Winsemius dwinsemius at comcast.net
Sun Feb 26 02:26:39 CET 2012


On Feb 25, 2012, at 6:24 PM, kevin123 wrote:

> I am using the package Random Forrest to test and train a model,
> I aim to predict (LengthOfStay.days),:
>
>> library(randomForest)
>> model <- randomForest( LengthOfStay.days~.,data = training,
> + importance=TRUE,
> + keep.forest=TRUE
> + )
>
>
> *This is a small portion of the data frame:   *
>
> *data(training)*
>
> LengthOfStay.days CharlsonIndex.numeric DSFS.months
> 1                  0                   0.0         8.5
> 6                  0                   0.0         3.5
> 7                  0                   0.0         0.5
> 8                  0                   0.0         0.5
> 9                  0                   0.0         1.5
> 11                 0                   1.5         NaN
>
> *Error message*
>
> Error in na.fail.default(list(LengthOfStay.days = c(0, 0, 0, 0, 0,  
> 0,  :
>  missing values in object,

What part of that error message is unclear? Have you looked at the  
randomForest page? It tells you what the default behavior is na.fail.

>
> I would greatly appreciate any help


I would seem that the way forward is to remove the cases with missing  
values or to impute values.

-- 
David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list