[R] Simulate Correlated data from complex sample

Spencer Graves spencer.graves at pdf.com
Sun Dec 4 22:10:26 CET 2005


	  It may help to recall the following:

	  var(X) = var{E(X|School)} + E{var(X|School)}.

	  The first term here is the between-group covariance matrix, while the 
second is the within group covariance matrix.  Decide how you want to 
decompose var(X), and use mvrnorm{MASS} or rmvnorm{mvtnorm} for each, 
combining them as you've outlined below.

	  Does this make sense?
	  spencer graves

Doran, Harold wrote:

> Dear List:
> 
> I have created some code to simulate data from a complex sample where
> 5000 students are nested in 50 schools. My code returns a dataframe with
> a variable representing student achievement at a single time point. My
> actual code for creating this is below.
> 
> What I would like to do is generate a second column of data that is
> correlated with the first at .8 and has the same means within each
> school. So I do not think I can use mvrnorm or simulate() in the Matrix
> package, at least not in a way I can currently see.
> 
> A very basic example would be something like first create a vector (s1)
> and then generate a second one that is correlated with the first by some
> user-defined measure.
> 
> 
>>s1 <- rnorm(500, 2, 4)
> 
> 
> In my example below the variable I want to replicate is data$theta.
> 
> I think I could go through the exercise to write code that would so
> this, but I think there might be a smarter and easier function for doing
> so. I've used RSiteSearch() a bit, but the keywords I'm using aren't
> turning up results that I can use. I may be missing something very
> simple and transparent.
> 
> Any thoughts are much appreciated,
> Harold
> Ver 2.2
> Windows XP
> 
> 
> N   <- 5000 # Number of students
> J   <- 50   # Number of schools
> N_j <- N/J  # Number of students in each school
> a_g <- c(0,.5,1) # This is the growth vector
> 
> # Step 1 -- create psi for base grade
> rps  <- rep(N_j, J)
> v_gk <- rep(rnorm(J, 0, sqrt(.01) ), rps)
> v_gik <- rnorm(N, 0, sqrt(.99))
> 
> # Organize into a dataframe
> data  <- data.frame(schid = rep(1:J, rps), stuid = 1:N, cbind(v_gk,
> v_gik), psi = v_gk + v_gik  + a_g[1])
> 
> # Now create theta
> B_g   <- .95 # This is correlation between within-grade trait and
> vertical trait
> w_gk  <- 0   # fixed at zero for now
> data$w_gik <-rnorm(N, 0, sqrt(.0975))
> data$theta <- (B_g * data$psi) + w_gk + data$w_gik
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915




More information about the R-help mailing list