[R] using mean substitution

Rolf Turner rolf.turner at xtra.co.nz
Tue Oct 18 00:24:51 CEST 2011


On 18/10/11 10:35, Michael Parent wrote:
> Hi, all,
>
> I'm running a monte carlo simulation with missing data. The data are arranged such that there are k columns and n rows over a set number of simulations (set to 10 right now so it runs fast while I set everything up). The data are integers, numbers 1-7 only (normal distribution).

     For CRYING OUT LOUD.  This sort of blithering nonsense makes me
     want to SCREAM!!!  The normal distribution is a continuous 
distribution.
     It does not take on (exclusively) integer values.

> The simulations are set up and run without a hitch, including imposing NA missing values at a specified prevalence semi-randomly (there are not allowed to be any completely empty rows).
>
> I'd like to replace the missing values ("NA") with the mean for the non-missing items items *on that row*.  I want to go through all the monte carlo simulation runs that I already did (so that I'm using the same data) and replace NA with the mean (e.g., if k=5 and a row has values of 3 3 NA 5 5, I want to put a 4 in for NA). I also want the imputed mean values to be rounded to the nearest integer.
>
> Does anyone have an idea for how I'd set that up? I feel like there's a fairly easy way to set up searching out those NAs and replacing the the row mean that is not coming to me.

Let your matrix of values be "m".

rv <- round(apply(m,1,mean,na.rm=TRUE))
ij <- which(is.na(m),arr.ind=TRUE)
m[ij] <- rv[ij[,1]]

     cheers,

         Rolf Turner



More information about the R-help mailing list