[R] Adding NA values in random positions in a dataframe
gunter.berton at gene.com
Fri Nov 29 22:00:04 CET 2013
An essentially identical approach that may be a tad clearer -- but
requires additional space -- first creates a logical vector for the
locations of the NA's in the unlisted data.frame. Further NA positions
are randomly added and then the augmented vector is used as a logical
matrix to index where the NA's should go in the data frame:
df <- data.frame(a = c(1:3,NA,4:6),
nr <- nrow(df); nc <- ncol(df)
p <- .3 ## desired total proportion of NA's
ina <- is.na(unlist(df)) ## logical vector, TRUE corresponds to NA positions
n2 <- floor(p*nr*nc) - sum(ina) ## number of new NA's
ina[sample(which(!is.na(ina)), n2)] <- TRUE
df[matrix(ina, nr=nr,nc=nc)]<- NA ## using matrix indexing
On Fri, Nov 29, 2013 at 10:09 AM, arun <smartpink111 at yahoo.com> wrote:
> I used that because 10% of the values in the data were already NA.
> You are right. Sorry, ?match() is unnecessary. I was trying another solution with match() which didn't work out and forgot to check whether it was adequate or not.
> dat1[!is.na(dat1)][sample(seq(dat1[!is.na(dat1)]),length(dat1[!is.na(dat1)])*(0.20))] <- NA
> Thanks for the reply. I don't get the 0.20 multiplied by the length of the non NA value, where did you take it from?
> Furthermore, why do we have to use the function match? Wouldn't it be enough to use the saple function?
> On Thursday, November 28, 2013 12:57 PM, arun <smartpink111 at yahoo.com> wrote:
> One way would be:
> dat1 <- as.data.frame(matrix(sample(c(1:5,NA),50,replace=TRUE,prob=c(10,15,15,20,30,10)),ncol=5))
> dat1[!is.na(dat1)][ match( sample(seq(dat1[!is.na(dat1)]),length(dat1[!is.na(dat1)])*(0.20)),seq(dat1[!is.na(dat1)]))] <- NA
> # 0.28
> Hello, I'm quite new at R so I don't know which is the most efficient
> way to execute a function that I could write easily in other languages.
> This is my problem: I have a dataframe with a certain numbers of
> NA (approximately 10%). I want to add other NA values in random
> positions of the dataframes until reaching an overall proportions of NA
> values of 30% (clearly the positions with NA values don't have to
> change). I tried looking at iterative function in R as apply or sapply
> but I can't actually figure out how to use them in this case. Thank you.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Genentech Nonclinical Biostatistics
More information about the R-help