[R] Mean or mode imputation fro missing values
Petr PIKAL
petr.pikal at precheza.cz
Wed Oct 12 15:48:38 CEST 2011
Hi
>
> Yes thank you Gu…
> I am just trying to do this as a rough step and will try other
> imputation methods which are more appropriate later.
> I am just learning R, and was trying to do the for loop and
> f-statement by hand but something is going wrong…
>
> This is what I have until now:
>
> *****fake array:
> age<- c(5,8,10,12,NA)
> a<- factor(c("aa", "bb", NA, "cc", "cc"))
> b<- c("banana", "apple", "pear", "grape", NA)
> df_test <- data.frame(age=age, a=a, b=b)
> df_test$b<- as.character(df_test$b)
>
> for (var in 1:ncol(df_test)) {
> if (class(df_test$var)=="numeric") {
var goes from 1 to 3, above you actually use df_test$1 which is not what
you intend.
you shall use [] selection operator. However your Mode function does not
correctly assign values
for (var in 1:ncol(df_test)) {
if (class(df_test[,var])=="numeric") {
df_test[is.na(df_test[,var]), var] <-
mean(df_test[,var], na.rm = TRUE)
} else if
(class(df_test[,var])=="character") {
Mode(df_test[is.na(df_test[,var]),var],
na.rm = TRUE)
}
}
Warning message:
In max(xtab) : no non-missing arguments to max; returning -Inf
You shall use debug(Mode] to see what is going on. I have no time to
inspect it and do not see any obvious flaw.
Regards
Petr
> df_test$var[is.na(df_test$var)] <- mean(df_test$var, na.rm = TRUE)
> } else if (class(df_test$var)=="character") {
> Mode(df_test$var[is.na(df_test$var)], na.rm = TRUE)
> }
> }
>
> Where 'Mode' is the function:
>
> function (x, na.rm)
> {
> xtab <- table(x)
> xmode <- names(which(xtab == max(xtab)))
> if (length(xmode) > 1)
> xmode <- ">1 mode"
> return(xmode)
> }
>
>
> It seems as it is just ignoring the statements though, without giving
> any error…Does anybody have any idea what is going on?
>
> Thank you very much for all the great help!
> -f
>
> 2011/10/11 Weidong Gu <anopheles123 at gmail.com>:
> > In your case, it may not be sensible to simply fill missing values by
> > mean or mode as multiple imputation becomes the norm this day. For
> > your specific question, na.roughfix in randomForest package would do
> > the work.
> >
> > Weidong Gu
> >
> > On Tue, Oct 11, 2011 at 8:11 AM, francesca casalino
> > <francy.casalino at gmail.com> wrote:
> >> Dear R experts,
> >>
> >> I have a large database made up of mixed data types (numeric,
> >> character, factor, ordinal factor) with missing values, and I am
> >> looking for a package that would help me impute the missing values
> >> using either the mean if numerical or the mode if character/factor.
> >>
> >> I maybe could use replace like this:
> >> df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE)
> >> And go through all the many different variables of the datasets using
> >> mean or mode for each, but I was wondering if there was a faster way,
> >> or if a package existed to automate this (by doing 'mode' if it is a
> >> factor or character or 'mean' if it is numeric)?
> >>
> >> I have tried the package "dprep" because I wanted to use the function
> >> "ce.mimp", btu unfortunately it is not available anymore.
> >>
> >> Thank you for your help,
> >> -francy
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list