[R] Missing at random

(Ted Harding) ted.harding at wlandres.net
Mon Jan 31 11:17:23 CET 2011

On 31-Jan-11 04:17:45, David Winsemius wrote:
> On Jan 30, 2011, at 10:01 PM, assaedi76 assaedi76 wrote:
>> R users:
>> Thanks in advance
>>  How to generate missing at random (MAR)?

  missidx <- sample(1:nrow(dfrm), nrow(dfrm)*frac)
  is.na(dfrm$measure) <- 1:nrow(dfrm) %in% missidx

>> assaedi76 at yahoo.com
>> Thanks

That solution is for (in "missing data language") MCAR
(Missing Completely At Random), i.e. the probability
of being missing does not depend on any of the variables
in the data.

For MAR (Missing At Random), the probability of being
missing may depend on the values of covariates but must
not depend on the value of the outcome variable.

So the way to generate MAR, for data where there are
covariates X1, X2, ... , Xk (and outcome Y) is to set
up a function P (could be anything) of some or all of
X1, X2, ... , Xk taking values in [0,1] (endpoints
included), and then set a "missing" variable Z to be
0 (not missing) or 1 (missing) with probability given
by the value of Z for that case.

So, if M is a data matrix with columns X1, ... , Xk , Y
where each row is a case, use apply() to evaluate the
function P() for each row in terms of (X1,X2,...,Xk).

You then get a vector p = c(p.1, p.2, ... , p.N) of
values of P for the N rows of M. At this point:

  Z <- 1*( runif(N) <= p )

creates a vectors of 0s and 1s which will be markers
of Missing At Random.


E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Jan-11                                       Time: 10:17:20
------------------------------ XFMail ------------------------------

More information about the R-help mailing list