[R] simulate correlated binary, categorical and continuous variable

Burak Aydin burak235813 at hotmail.com
Mon Apr 2 03:06:40 CEST 2012


Hello David Duffy-2,
I see that you just proved using rmvnorm and then dichotomize/categorize
them should work. Thanks but please take  a look at this link;
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous
and this article;
Analysis by Categorizing or Dichotomizing Continuous Variables Is
Inadvisable: An Example from the Natural History of Unruptured Aneurysms
by O. Naggaraa,b, J. Raymonda, F. Guilberta, D. Roya, A. Weilla and D.G.
Altmanc 2011.

Plus; here is my explanatory code.
require(mvtnorm)
sigm=matrix(c(0.12, 0.05, 0.02, 0.00, 
              0.05, 1.24, 0.38,0.00, 
              0.02, 0.38, 2.38, 0.03, 
              0.00, 0.00,0.03, 0.16),
               ncol=4, byrow=T)


mu=rep(0,4)

#simulated data
dat1 = rmvnorm(1000,mean=mu,sigma=sigm)

#difference between sigmas before dichotimize/categorize
sigm-cov(dat1)
#difference between means before dichotimize/categorize
means1=apply(dat1,2,mean)
mu-means1

#dichotimization and categorization
#lets dichotimize the third variable
#I wantto keep mean the same (0.50)
dat2=dat1
dat2[,3]=ifelse(dat1[,3]>0.0,0,1)

means2=apply(dat2,2,mean)
mu-means2
# I kept the mean same, but look at the difference in cov matricies
sigm-cov(dat2) 

--
View this message in context: http://r.789695.n4.nabble.com/simulate-correlated-binary-categorical-and-continuous-variable-tp4516433p4524882.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list