[R] How to programme R to randomly replace some X values with Outliers

Joshua Wiley jwiley.psych at gmail.com
Sun Oct 3 09:28:12 CEST 2010


Dear Hock Ann,

I am not sure of all your requirements, but this should at least get
you started.  I show it by hand and also wrapped up in a function.  In
the function I made two density plots that I thought might be
interesting to you, you can just delete those two lines of code if you
do not want them.  The code follows below.

Cheers,

Josh

## X is pseudo-random numbers from the uniform distribution
## ~U(0, 10)
X <- runif(n = 100, min = 0, max = 10)
## We can check that X is about what we would expect
## The mean should be
(1/2) * (10 + 0)
mean(x = X)
## The variance should be
(1/12) * ((10 - 0)^2)
var(x = X)

## I am assuming norm(0, 1) to be representing the standard normal distribution
## Create a vectory of these numbers to be used in the formula for Y
## this is important since you will be create Y twice
## (with X and then X replaced with some outliers)
## You do not want R to regenerate the random normal values and change
things that way
Z <- rnorm(n = 100, mean = 0, sd = 1)

## Create Y from your formula, where Z = norm(0, 1)
## Y = 6 + 2X + norm(0, 1)
Y <- 6 + 2 * X + Z

## Now I am using sample() to randomly select some values
## between 1 and the length of X, these will be the positions
## of the elements of X to be replaced
toreplace <- sample(x = seq_along(X), size = 5, replace = FALSE)

## Now replace the X values
X[toreplace] <- runif(n = 5, min = 15, max = 20)

## Create Ynew based off updated X
Ynew <- 6 + 2 * X + Z

## Calculate the standard deviations of Y and Ynew
## and store in a named vector called "results"
results <- c("SD_Y" = sd(Y), "SD_Ynew" = sd(Ynew))

## print the results vector to screen to look at it
results

## Now if you wanted to do this many times
## and potentially change a few values easily
## we can put it in a function
## n is the number in each sample
## a and b are the min and max of the uniform distribution for X
## a.outlier and b.outlier are the same but for the outliers
## nreplace is how many values of X you want to replace
## reps is how many times you want to run it
## I have written the values to default to what you said in your emamil
## but obviously it would be easy to change any one of them

mysampler <- function(n = 100, a = 0, b = 10,
                      a.outlier = 15, b.outlier = 20,
                      nreplace = 5, reps = 1000) {
  if(any(c(n, nreplace, reps) < 1)) {
    stop("n, nreplace, and reps must all be at least 1")
  }
  results <- matrix(0, nrow = reps, ncol = 2,
                    dimnames = list(NULL, c("SD_Y", "SD_Ynew")))
  for(i in 1:reps) {
    X <- runif(n = n, min = a, max = b)
    Z <- rnorm(n = n, mean = 0, sd = 1)
    Y <- 6 + 2 * X + Z
    toreplace <- sample(x = seq_along(X), size = nreplace, replace = FALSE)
    X[toreplace] <- runif(n = nreplace, min = a.outlier, max = b.outlier)
    Ynew <- 6 + 2 * X + Z
    results[i, ] <- c(sd(Y), sd(Ynew))
  }
  dev.new()
  par(mfrow = c(2, 1))
  plot(density(results[,"SD_Y"]), xlim = range(results))
  plot(density(results[,"SD_Ynew"]), xlim = range(results))
  return(results)
}

## You might find the following documentation helpful
?runif # generate random values from uniform
?rnorm # from normal
?for # to do your simulation


On Sat, Oct 2, 2010 at 11:12 PM, Hock Ann Lim <lim_ha at yahoo.com> wrote:
> Dear experts,
> I am a beginner of R.
> I'm looking for experts to guide me how to do programming in R in order to
> randomly replace 5 observations in X explanatory variable with outliers drawn
> from U(15,20) in sample size n=100. The replacement subject to y < 15.
>
> The ultimate goal of my study is to compare the std of y with and without the
> presence of outliers based on average of 1000 simulation.
>
> Info :
> X~U(0,10)
> Y=6+2X+norm(0,1)
>
> Thank you.
>
> Hock Ann
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list