[R] Subset according to groups NA proportion within specific variables

Karl Ove Hufthammer karl at huftis.org
Mon Feb 21 13:05:21 CET 2011


D. Alain wrote:

> Now I want to make a new dataframe df.sub comprising only cases pertaining
> to groups, where the overall proportion of NAs in either of the response
> variables y,z,w does not exceed 50%.

One simple example:

library(plyr)
na.prop = function(x) data.frame(x, missing=nrow(na.omit(x))/nrow(x) )
newdf = ddply(df, .(x), na.prop)

Now you can use ‘subset’ on ‘newdf’ to obtain the required rows.

(For very large data sets it may be better to not create an entire data 
frame in ‘na.prop’, duplicating the data in ’df’, but instead just return 
the proportion.)
 
-- 
Karl Ove Hufthammer



More information about the R-help mailing list