[R] Not missing at random

Joshua Wiley jwiley.psych at gmail.com
Mon Jun 6 22:34:38 CEST 2011


Hi Blaz,

See below.

x <-
matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
 nrow = 7, ncol=7, byrow=TRUE) ####matrix

pMiss <- 30     ####percent of missing values

N <- dim(x)[1]   ####number of cases

candidate <- which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3 |
x[,7]<3)    #### I want to sample all cases with at least 1 value
lower than 3, so I have to find candidates

## easier to use this
## find all x < 3 and return their row and column indices
## select only row indices, and then find unique
candidate <- unique(which(x < 3, arr.ind = TRUE)[, "row"])

idMiss <- sample(candidate, N * pMiss / 100)  #### I sampled cases

## from the subset of x cases that will be missing
## find all that are < 3 and set to NA
x[idMiss, ][x[idMiss, ] < 3] <- NA

## If you are going to do this a lot, consider a function
nmar <- function(x, op = "<", value = 3, p = 30) {
  op <- get(op)
  candidate <- unique(which(op(x, value), arr.ind = TRUE)[, "row"])
  idMiss <- sample(candidate, nrow(x) * p / 100)
  x[idMiss, ][op(x[idMiss, ], value)] <- NA
  return(x)
}

nmar(x)

## has the advantage that you can easily change
## p, the cut off value, the operator (e.g., "<", ">", "<=", etc.)

Cheers,

Josh

On Sun, Jun 5, 2011 at 11:17 PM, Blaz Simcic <blazsimcic at yahoo.com> wrote:
>
>
> Hello!
>
> I would like to sample 30 % of cases (with at least 1 value lower than 3 - in
> the row) and among them I want to set all values lower than 3 (within selected
> cases) as NA (NMAR- Not missing at random). I managed to sample cases, but I
> don’t know how to set values (lower than 3) as NA.
>
> R code:
>
> x <-
> matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
>  nrow = 7, ncol=7, byrow=TRUE) ####matrix
>
> pMiss <- 30     ####percent of missing values
>
> N <- dim(x)[1]   ####number of cases
>
> candidate<-which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3 |
> x[,7]<3)    #### I want to sample all cases with at least 1 value lower than 3,
> so I have to find candidates
>
> idMiss <- sample(candidate, N * p / 100)    #### I sampled cases
>
> Now I'd like to set all values among sampled cases as NA.
>
> Any suggestion?
>
> Thanks,
> Blaž
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list