[R] Data Simulation in R
Uwe Ligges
ligges at statistik.uni-dortmund.de
Wed Jan 19 13:49:26 CET 2005
Doran, Harold wrote:
> Thanks. But, I think I am doing that. I use rm() and gc() as the code
> moves along. The datasets are stored as a list. Is there a way that I
> can save the list object and call each dataset within a list one at a
> time, or must the entire list be in memory at once?
The list is in memory - and must be to access its elements.
Either save the list elements to separate files, or even better make use
of a database.
Uwe Ligges
> Harold
>
> -----Original Message-----
> From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de]
> Sent: Wednesday, January 19, 2005 5:51 AM
> To: Doran, Harold
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Data Simulation in R
>
> Doran, Harold wrote:
>
>
>>Dear List:
>>
>>A few weeks ago I posted some questions regarding data simulation and
>>received some very helpful comments, thank you. I have modified my
>>code accordingly and have made some progress.
>>
>>However, I now am facing a new challenge along similar lines. I am
>>attempting to simulate 250 datasets and then run the data through a
>>linear model. I use rm() and gc() as I move along to clean up the
>>workspace and preserve memory. However, my aim is to use sample sizes
>>of 5,000 and 10,000. By any measure this is a huge task.
>>
>>My machine has 2GB RAM and a Pentium 4 2.8 GHz machine with Windows
>
> XP.
>
>>I have the following in the "target" section of the Windows shortcut
>>--max-mem-size=1812M
>>
>>With such large samples, R is unable to perform the analysis, at least
>
>
>>with the code I have developed. It halts when it runs out of memory. A
>
>
>>collegue subsequently constructed the simulation using another
>>software program with a similar computer and, while it took over night
>
>
>>(and then some), the program produced the results desired.
>>
>>I am curious if it is the case that such large simulations are out of
>>the grasp of R or if my code is not adequately organized to perform
>>the simulation.
>>
>>I would appreciate any thoughts or advice.
>
>
>
> Don't hold all datasets (and results, if they are big) in the memory at
> the same time!!!
>
> Either generate them when you use them and delete them afterwards, or
> save them to disc an only load one by one for further analyses.
> Also, you might want to call gc() after you removed large objects...
>
> Uwe Ligges
>
>
>
>
>>Harold
>>
>>
>>
>>library(MASS)
>>library(nlme)
>>mu<-c(100,150,200,250)
>>Sigma<-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4
>>,4
>>)
>>mu2<-c(0,0,0)
>>Sigma2<-diag(64,3)
>>sample.size<-5000
>>N<-250 #Number of datasets
>>#Take a single draw from VL distribution vl.error<-mvrnorm(n=N, mu2,
>>Sigma2)
>>
>>#Step 1 Create Data
>>Data <- lapply(seq(N), function(x)
>>as.data.frame(cbind(1:10,mvrnorm(n=sample.size, mu, Sigma))))
>>
>>#Step 2 Add Vertical Linking Error
>>for(i in seq(along=Data)){
>>Data[[i]]$V6 <- Data[[i]]$V2
>>Data[[i]]$V7 <- Data[[i]]$V3 + vl.error[i,1]
>>Data[[i]]$V8 <- Data[[i]]$V4 + vl.error[i,2]
>>Data[[i]]$V9 <- Data[[i]]$V5 + vl.error[i,3] }
>>
>>#Step 3 Restructure for Longitudinal Analysis long <- lapply(Data,
>>function(x) reshape(x, idvar="Data[[i]]$V1",
>>varying=list(c(names(Data[[i]])[2:5]),c(names(Data[[i]])[6:9])),
>>v.names=c("score.1","score.2"), direction="long"))
>>
>>#####################
>>#Clean up Workspace
>>rm(Data,vl.error)
>>gc()
>>#####################
>>
>># Step 4 Run GLS
>>
>>glsrun1 <- lapply(long, function(x) gls(score.1~I(time-1), data=x,
>>correlation=corAR1(form=~1|V1), method='ML'))
>>
>># Extract intercepts and slopes
>>int1 <- sapply(glsrun1, function(x) x$coefficient[1])
>>slo1 <- sapply(glsrun1, function(x) x$coefficient[2])
>>
>>################
>>#Clean up workspace
>>rm(glsrun1)
>>gc()
>>
>>glsrun2 <- lapply(long, function(x) gls(score.2~I(time-1), data=x,
>>correlation=corAR1(form=~1|V1), method='ML'))
>>
>># Extract intercepts and slopes
>>int2 <- sapply(glsrun2, function(x) x$coefficient[1])
>>slo2 <- sapply(glsrun2, function(x) x$coefficient[2])
>>
>>
>>#Clean up workspace
>>rm(glsrun2)
>>gc()
>>
>>
>>
>># Print Results
>>
>>cat("Original Standard Errors","\n", "Intercept","\t",
>>sd(int1),"\n","Slope","\t","\t", sd(slo1),"\n")
>>
>>cat("Modified Standard Errors","\n", "Intercept","\t",
>>sd(int2),"\n","Slope","\t","\t", sd(slo2),"\n")
>>
>>rm(list=ls())
>>gc()
>>
>> [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
>>http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list