[R] (no subject)
Dennis Murphy
djmuser at gmail.com
Sat May 7 15:30:49 CEST 2011
Hi:
To quote one of the sages of this list: 'Loops? We don't need no
steenking loops!!'.
Here's one way to do what you were asking with a two-pass approach.
Generate some random data, use the sample() function to get 20 indices which
are then used to generate NAs in the original vector. Then replace the missing
values by the preceding values (with an ifelse() statement to handle the first
position case) and then replace the remaining NAs with the vector's mean.
# Generate 100 random Poisson(10) values
x <- rpois(100, 10)
# Get the indices to set to NA
midx <- sample(length(x), 20)
# Replace x[midx] with NA
x[midx] <- NA
# If first value of x is NA, keep NA, else replace missing value
# by previous value
x[midx] <- x[ifelse(midx == 1L, NA, midx - 1)]
# Replace remaining NAs with the vector's mean
x[is.na(x)] <- mean(x, na.rm = TRUE)
To do all of this at once, wrap it up into a function and then
use the raply() function in plyr or the replicate() function in base R to
run it and put the result into a 1000 x 100 matrix:
hdimp <- function() {
x <- rpois(100, 10)
midx <- sample(length(x), 20)
x[midx] <- NA
x[midx] <- x[ifelse(midx == 1L, NA, midx - 1)]
x[is.na(x)] <- mean(x, na.rm = TRUE)
x
}
library(plyr)
u <- raply(1000, hdimp)
An alternative is to use the replicate() function:
v <- t(replicate(1000, hdimp()))
The latter approach is about 20% faster in my tests.
HTH,
Dennis
On Fri, May 6, 2011 at 2:32 PM, Nick Manginelli <themang99 at yahoo.com> wrote:
> I'm using the survey api. I am taking 1000 samples of size of 100 and
> replacing 20 of those values with missing values. Im trying to use
> sequential hot deck imputation, and thus I am trying to figure out how
> to replace missing values with the value before it. Other things I have
> to keep in mind is if there are two missing values side by side, how do I
> replace both those values with the value before. Also if the first of
> the sample of 100 is a missing value I will replace that with the mean
> of the population. Im pretty sure I have to write a loop, but if anyone
> can help me figuring how to write this I would appreciate it greatly.
> Thank you
>
>
> Nick Manginelli
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list