[R] how to substitute missing values (NAs) by the group means
David Winsemius
dwinsemius at comcast.net
Tue Jun 9 04:39:44 CEST 2009
On Jun 8, 2009, at 9:56 PM, Mao Jianfeng wrote:
> Dear Ruser's
>
> I ask for helps on how to substitute missing values (NAs) by mean of
> the
> group it is belonging to.
>
> my dummy dataframe is:
>
>> df
> group traits
> 1 BSPy01-10 NA
> 2 BSPy01-10 7.3
> 3 BSPy01-10 7.3
> 4 BSPy01-11 5.3
> 5 BSPy01-11 5.4
> 6 BSPy01-11 5.6
> 7 BSPy01-11 NA
> 8 BSPy01-11 NA
> 9 BSPy01-11 4.8
> 10 BSPy01-12 8.1
> 11 BSPy01-12 6.0
> 12 BSPy01-12 6.0
> 13 BSPy01-13 6.1
>
>
> I want to substitute each "NA" by the group mean of which the "NA" is
> belonging to. For example, substitute the first record of traits
> "NA" by the
> mean of "BSPy01-10".
>
> I have ever tried to solve this problem by using doBy package. But, I
> failed. I ask for the right solutions by using doBy package or not.
This should replace any NA by the mean with the group, or the non-NA
value:
as.numeric(apply(df, 1, function (x) ifelse( is.na(x[2]),
tapply(df$traits, df
$group, mean, na.rm=TRUE)[x[1]] ,
x[2] )
) )
[1] 7.300 7.300 7.300 5.300 5.400 5.600 5.275 5.275 4.800 8.100
6.000 6.000 6.100
Whether that is the "right solution" depends on your artistic standards.
If you accept that solution, you would execute:
df$traits <- <the above expression>
Another approach only replacing the NA's, rather than the whole column:
df[is.na(df$traits), "traits"] <- tapply(df$traits, df$group, mean,
na.rm=TRUE)[ df[is.na(df$traits),"group"] ]
>
>
> The commands used and the output I got are as followed:
>
> library(doBy)
> df<-orderBy(~group,data=df) # succeeded
> f1<-function(x){m<-mean(x, na.ram=TRUE); x[is.na(x)]<-m; x} #
> succeeded
> datatraits<-lapplyBy(traits~group,data=df, FUN=f1(traits)) # failed
> errors: mean(x, na.ram = TRUE), can not find 'traits'.
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list