[R] How to programme R to randomly replace some X values with Outliers
Joshua Wiley
jwiley.psych at gmail.com
Sun Oct 3 09:28:12 CEST 2010
Dear Hock Ann,
I am not sure of all your requirements, but this should at least get
you started. I show it by hand and also wrapped up in a function. In
the function I made two density plots that I thought might be
interesting to you, you can just delete those two lines of code if you
do not want them. The code follows below.
Cheers,
Josh
## X is pseudo-random numbers from the uniform distribution
## ~U(0, 10)
X <- runif(n = 100, min = 0, max = 10)
## We can check that X is about what we would expect
## The mean should be
(1/2) * (10 + 0)
mean(x = X)
## The variance should be
(1/12) * ((10 - 0)^2)
var(x = X)
## I am assuming norm(0, 1) to be representing the standard normal distribution
## Create a vectory of these numbers to be used in the formula for Y
## this is important since you will be create Y twice
## (with X and then X replaced with some outliers)
## You do not want R to regenerate the random normal values and change
things that way
Z <- rnorm(n = 100, mean = 0, sd = 1)
## Create Y from your formula, where Z = norm(0, 1)
## Y = 6 + 2X + norm(0, 1)
Y <- 6 + 2 * X + Z
## Now I am using sample() to randomly select some values
## between 1 and the length of X, these will be the positions
## of the elements of X to be replaced
toreplace <- sample(x = seq_along(X), size = 5, replace = FALSE)
## Now replace the X values
X[toreplace] <- runif(n = 5, min = 15, max = 20)
## Create Ynew based off updated X
Ynew <- 6 + 2 * X + Z
## Calculate the standard deviations of Y and Ynew
## and store in a named vector called "results"
results <- c("SD_Y" = sd(Y), "SD_Ynew" = sd(Ynew))
## print the results vector to screen to look at it
results
## Now if you wanted to do this many times
## and potentially change a few values easily
## we can put it in a function
## n is the number in each sample
## a and b are the min and max of the uniform distribution for X
## a.outlier and b.outlier are the same but for the outliers
## nreplace is how many values of X you want to replace
## reps is how many times you want to run it
## I have written the values to default to what you said in your emamil
## but obviously it would be easy to change any one of them
mysampler <- function(n = 100, a = 0, b = 10,
a.outlier = 15, b.outlier = 20,
nreplace = 5, reps = 1000) {
if(any(c(n, nreplace, reps) < 1)) {
stop("n, nreplace, and reps must all be at least 1")
}
results <- matrix(0, nrow = reps, ncol = 2,
dimnames = list(NULL, c("SD_Y", "SD_Ynew")))
for(i in 1:reps) {
X <- runif(n = n, min = a, max = b)
Z <- rnorm(n = n, mean = 0, sd = 1)
Y <- 6 + 2 * X + Z
toreplace <- sample(x = seq_along(X), size = nreplace, replace = FALSE)
X[toreplace] <- runif(n = nreplace, min = a.outlier, max = b.outlier)
Ynew <- 6 + 2 * X + Z
results[i, ] <- c(sd(Y), sd(Ynew))
}
dev.new()
par(mfrow = c(2, 1))
plot(density(results[,"SD_Y"]), xlim = range(results))
plot(density(results[,"SD_Ynew"]), xlim = range(results))
return(results)
}
## You might find the following documentation helpful
?runif # generate random values from uniform
?rnorm # from normal
?for # to do your simulation
On Sat, Oct 2, 2010 at 11:12 PM, Hock Ann Lim <lim_ha at yahoo.com> wrote:
> Dear experts,
> I am a beginner of R.
> I'm looking for experts to guide me how to do programming in R in order to
> randomly replace 5 observations in X explanatory variable with outliers drawn
> from U(15,20) in sample size n=100. The replacement subject to y < 15.
>
> The ultimate goal of my study is to compare the std of y with and without the
> presence of outliers based on average of 1000 simulation.
>
> Info :
> X~U(0,10)
> Y=6+2X+norm(0,1)
>
> Thank you.
>
> Hock Ann
>
>
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
More information about the R-help
mailing list