[R] Not missing at random
Joshua Wiley
jwiley.psych at gmail.com
Mon Jun 6 22:34:38 CEST 2011
Hi Blaz,
See below.
x <-
matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
nrow = 7, ncol=7, byrow=TRUE) ####matrix
pMiss <- 30 ####percent of missing values
N <- dim(x)[1] ####number of cases
candidate <- which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3 |
x[,7]<3) #### I want to sample all cases with at least 1 value
lower than 3, so I have to find candidates
## easier to use this
## find all x < 3 and return their row and column indices
## select only row indices, and then find unique
candidate <- unique(which(x < 3, arr.ind = TRUE)[, "row"])
idMiss <- sample(candidate, N * pMiss / 100) #### I sampled cases
## from the subset of x cases that will be missing
## find all that are < 3 and set to NA
x[idMiss, ][x[idMiss, ] < 3] <- NA
## If you are going to do this a lot, consider a function
nmar <- function(x, op = "<", value = 3, p = 30) {
op <- get(op)
candidate <- unique(which(op(x, value), arr.ind = TRUE)[, "row"])
idMiss <- sample(candidate, nrow(x) * p / 100)
x[idMiss, ][op(x[idMiss, ], value)] <- NA
return(x)
}
nmar(x)
## has the advantage that you can easily change
## p, the cut off value, the operator (e.g., "<", ">", "<=", etc.)
Cheers,
Josh
On Sun, Jun 5, 2011 at 11:17 PM, Blaz Simcic <blazsimcic at yahoo.com> wrote:
>
>
> Hello!
>
> I would like to sample 30 % of cases (with at least 1 value lower than 3 - in
> the row) and among them I want to set all values lower than 3 (within selected
> cases) as NA (NMAR- Not missing at random). I managed to sample cases, but I
> don’t know how to set values (lower than 3) as NA.
>
> R code:
>
> x <-
> matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
> nrow = 7, ncol=7, byrow=TRUE) ####matrix
>
> pMiss <- 30 ####percent of missing values
>
> N <- dim(x)[1] ####number of cases
>
> candidate<-which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3 |
> x[,7]<3) #### I want to sample all cases with at least 1 value lower than 3,
> so I have to find candidates
>
> idMiss <- sample(candidate, N * p / 100) #### I sampled cases
>
> Now I'd like to set all values among sampled cases as NA.
>
> Any suggestion?
>
> Thanks,
> Blaž
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
More information about the R-help
mailing list