[R] Mean or mode imputation fro missing values

Petr PIKAL petr.pikal at precheza.cz
Wed Oct 12 15:48:38 CEST 2011


Hi

> 
> Yes thank you Gu…
> I am just trying to do this as a rough step and will try other
> imputation methods which are more appropriate later.
> I am just learning R, and was trying to do the for loop and
> f-statement by hand but something is going wrong…
> 
> This is what I have until now:
> 
> *****fake array:
> age<- c(5,8,10,12,NA)
> a<- factor(c("aa", "bb", NA, "cc", "cc"))
> b<- c("banana", "apple", "pear", "grape", NA)
> df_test <- data.frame(age=age, a=a, b=b)
> df_test$b<- as.character(df_test$b)
> 
> for (var in 1:ncol(df_test)) {
>    if (class(df_test$var)=="numeric") {

var goes from 1 to 3, above you actually use df_test$1 which is not what 
you intend.
you shall use [] selection operator. However your Mode function does not 
correctly assign values 

for (var in 1:ncol(df_test)) {
                 if (class(df_test[,var])=="numeric") {
                                 df_test[is.na(df_test[,var]), var] <- 
mean(df_test[,var], na.rm = TRUE)
                                 } else if 
(class(df_test[,var])=="character") {
                                 Mode(df_test[is.na(df_test[,var]),var], 
na.rm = TRUE)
                                 }
}

Warning message:
In max(xtab) : no non-missing arguments to max; returning -Inf

You shall use debug(Mode] to see what is going on. I have no time to 
inspect it and do not see any obvious flaw.

Regards
Petr



>       df_test$var[is.na(df_test$var)] <- mean(df_test$var, na.rm = TRUE)
>       } else if (class(df_test$var)=="character") {
>       Mode(df_test$var[is.na(df_test$var)], na.rm = TRUE)
>       }
> }
> 
> Where 'Mode' is the function:
> 
> function (x, na.rm)
> {
>     xtab <- table(x)
>     xmode <- names(which(xtab == max(xtab)))
>     if (length(xmode) > 1)
>         xmode <- ">1 mode"
>     return(xmode)
> }
> 
> 
> It seems as it is just ignoring the statements though, without giving
> any error…Does anybody have any idea what is going on?
> 
> Thank you very much for all the great help!
> -f
> 
> 2011/10/11 Weidong Gu <anopheles123 at gmail.com>:
> > In your case, it may not be sensible to simply fill missing values by
> > mean or mode as multiple imputation becomes the norm this day. For
> > your specific question, na.roughfix in randomForest package would do
> > the work.
> >
> > Weidong Gu
> >
> > On Tue, Oct 11, 2011 at 8:11 AM, francesca casalino
> > <francy.casalino at gmail.com> wrote:
> >> Dear R experts,
> >>
> >> I have a large database made up of mixed data types (numeric,
> >> character, factor, ordinal factor) with missing values, and I am
> >> looking for a package that would help me impute the missing values
> >> using  either the mean if numerical or the mode if character/factor.
> >>
> >> I maybe could use replace like this:
> >> df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE)
> >> And go through all the many different variables of the datasets using
> >> mean or mode for each, but I was wondering if there was a faster way,
> >> or if a package existed to automate this (by doing 'mode' if it is a
> >> factor or character or 'mean' if it is numeric)?
> >>
> >> I have tried the package "dprep" because I wanted to use the function
> >> "ce.mimp", btu unfortunately it is not available anymore.
> >>
> >> Thank you for your help,
> >> -francy
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list