[R] Subset according to groups NA proportion within specific variables
Karl Ove Hufthammer
karl at huftis.org
Mon Feb 21 13:05:21 CET 2011
D. Alain wrote:
> Now I want to make a new dataframe df.sub comprising only cases pertaining
> to groups, where the overall proportion of NAs in either of the response
> variables y,z,w does not exceed 50%.
One simple example:
library(plyr)
na.prop = function(x) data.frame(x, missing=nrow(na.omit(x))/nrow(x) )
newdf = ddply(df, .(x), na.prop)
Now you can use ‘subset’ on ‘newdf’ to obtain the required rows.
(For very large data sets it may be better to not create an entire data
frame in ‘na.prop’, duplicating the data in ’df’, but instead just return
the proportion.)
--
Karl Ove Hufthammer
More information about the R-help
mailing list