[R] Reading mcmc/coda into a big.matrix efficiently

Guy W Cole gwc2124 at columbia.edu
Mon Jan 2 02:37:57 CET 2012


I'm trying to read CODA/mcmc files (see the coda package), as  
generated by jags/WinBUGS/OpenBUGS, into a big.matrix.    I can't load  
the whole mcmc object produced by read.coda() into memory since I'm  
using a laptop for this analysis (currently I'm unfunded).

Right now I'm doing it by creating the filebacked.big.matrix, reading  
a chunk of data at a time from the chain file using read.table() with  
"skip" and "nrows" set, and storing it into the big.matrix.  While  
this is memory efficient, the processing overhead seems be related to  
the size of the skip value, so that the time required is proportionate  
to the number of variables.

Any tips on how to do this faster / more efficiently?  I'm using a  
unix system, so a solution that uses grep/sed

Here's some sample code of how I do it now:
	index	= read.table("Big.CODAindex.txt", col.names =  
c("var","start","end"))
	n	= index[1,3] - index[1,2] + 1
	k	= dim(index)[1]
	X	= filebacked.big.matrix( nrow = n, ncol = k, backingfile =  
"Big.CODA.backing")
	for(i in 1:k) {	X[,i] = read.table("Big.CODAchain1.txt", skip =  
(i-1)*n, nrows = n)[,2]
					print(i)
					print(Sys.time())
				}

Also, here are the first few rows of the index and chain files, so you  
can see the formatting.  The index file tells you each variable's name  
and the range or rows in the chain file containing the variable's  
values.  The chain file contains the iteration number the value was  
taken from, and

CODAindex.txt
	egu[1] 1 10000
	egu[2] 10001 20000
	egt[1] 20001 30000
	egt[2] 30001 40000
	ept[1] 40001 50000
	ept[2] 50001 60000
	...

CODAchain1.txt
	10001  -0.289963
	10011  -0.310657
	10021  -0.290596
	10031  -0.286273
	10041  -0.319877
	10051  -0.299019
	....

Thanks in advance for any tips!

--Guy W. Cole
R version 2.14.0 (2011-10-31) x86_64-apple-darwin9.8.0



More information about the R-help mailing list