[R] Data too big for a specific library package to handle

Kevin Parent ksparent at yahoo.com
Wed Aug 23 07:59:59 CEST 2017


I know there are ways around the 'can't allocate a vector of size x GB' errors, but I'm stumped. 
So my raw data has >7 million rows and eight columns. That's not a problem itself.
Using the confreq package (for configural frequency analysis), I take my data and run it through the package's dat2fre function. This converts to a class called 'Pfreq.'  (Looks like a data frame to me, but R recognizes it as different.) It is now smaller, a little less than a million rows, and one column added. It's one row for every possible permutation, with the new column a frequency count, though in my case, 99% are 0s.
However, this data is meaningless by itself and I need to run it through the packages' CFA command for the main analysis, but when I do, I invariably get the 'can't allocate' error. The CFA command only works with the Pfreq class as input.
I usually run 64-bit R under Linux but get the error. So I used a Windows machine at work (forget which version of Windows, but it runs 64-bit R), but I still get the error.
The problem with most memory allocation workarounds is that what I'm doing creates a non-standard, library-specific data structure. Most workarounds are designed for very large vectors, data frames, lists, matrices, etc., not for very large 'Pfreqs'.
Any help?
The script below will simulate my data set with random data, but it takes several minutes to run and may eat up your resources until it's finished.
rm(list=ls(all=T))require(confreq)set.seed(1066)
observations<- as.factor(rep(replicate(60000,paste(sample(c(LETTERS,letters),sample(15)),collapse=''),simplify=vector),times=100))source<-as.factor(c(rep('A',times=3000000),rep('B',times=3000000))) #(observations come from one of two sources)
factor.1<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.2<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.3<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.4<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.5<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.6<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.7<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.8<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))
x<-data.frame(observations,source,factor.1, factor.2, factor.3, factor.4, factor.5, factor.6, factor.7, factor.8)
x<-dat2fre(x)
analysis<-CFA(x) #error: cannot allocate vector of size 2.1 Gb (the error message for the real data indicates 56 Gb)
 _____ Kevin Parent, Ph.D Korea Maritime University Vice Chairman of Education and Training, Korea Toastmasters http://grou.ps/koreatoastmasters Schoolmasters, http://grou.ps/schoolmasters/home
	[[alternative HTML version deleted]]



More information about the R-help mailing list