[R] how to substitute missing values (NAs) by the group means
hadley wickham
h.wickham at gmail.com
Tue Jun 9 05:10:40 CEST 2009
On Mon, Jun 8, 2009 at 8:56 PM, Mao Jianfeng<jianfeng.mao at gmail.com> wrote:
> Dear Ruser's
>
> I ask for helps on how to substitute missing values (NAs) by mean of the
> group it is belonging to.
>
> my dummy dataframe is:
>
>> df
> group traits
> 1 BSPy01-10 NA
> 2 BSPy01-10 7.3
> 3 BSPy01-10 7.3
> 4 BSPy01-11 5.3
> 5 BSPy01-11 5.4
> 6 BSPy01-11 5.6
> 7 BSPy01-11 NA
> 8 BSPy01-11 NA
> 9 BSPy01-11 4.8
> 10 BSPy01-12 8.1
> 11 BSPy01-12 6.0
> 12 BSPy01-12 6.0
> 13 BSPy01-13 6.1
>
>
> I want to substitute each "NA" by the group mean of which the "NA" is
> belonging to. For example, substitute the first record of traits "NA" by the
> mean of "BSPy01-10".
Here's yet another way, using the plyr package, http://had.co.nz/
library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
ddply(df, ~ group, transform, traits = impute.mean(traits))
Or if you wanted to make it a little more generic
impute <- function(x, fun) {
missing <- is.na(x)
replace(x, missing, fun(x[!missing]))
}
ddply(df, ~ group, transform, traits = impute(traits, mean))
ddply(df, ~ group, transform, traits = impute(traits, median))
ddply(df, ~ group, transform, traits = impute(traits, min))
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list