[R] how to substitute missing values (NAs) by the group means

hadley wickham h.wickham at gmail.com
Tue Jun 9 05:10:40 CEST 2009


On Mon, Jun 8, 2009 at 8:56 PM, Mao Jianfeng<jianfeng.mao at gmail.com> wrote:
> Dear Ruser's
>
> I ask for helps on how to substitute missing values (NAs) by mean of the
> group it is belonging to.
>
> my dummy dataframe is:
>
>> df
>       group traits
> 1  BSPy01-10     NA
> 2  BSPy01-10    7.3
> 3  BSPy01-10    7.3
> 4  BSPy01-11    5.3
> 5  BSPy01-11    5.4
> 6  BSPy01-11    5.6
> 7  BSPy01-11     NA
> 8  BSPy01-11     NA
> 9  BSPy01-11    4.8
> 10 BSPy01-12    8.1
> 11 BSPy01-12    6.0
> 12 BSPy01-12    6.0
> 13 BSPy01-13    6.1
>
>
> I want to substitute each "NA" by the group mean of which the "NA" is
> belonging to. For example, substitute the first record of traits "NA" by the
> mean of "BSPy01-10".

Here's yet another way, using the plyr package, http://had.co.nz/

library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
ddply(df, ~ group, transform, traits = impute.mean(traits))

Or if you wanted to make it a little more generic

impute <- function(x, fun) {
  missing <- is.na(x)
  replace(x, missing, fun(x[!missing]))
}
ddply(df, ~ group, transform, traits = impute(traits, mean))
ddply(df, ~ group, transform, traits = impute(traits, median))
ddply(df, ~ group, transform, traits = impute(traits, min))

Hadley


-- 
http://had.co.nz/




More information about the R-help mailing list