[R] lm Regression takes 24+ GB RAM - Error message

Milan Bouchet-Valat nalimilan at club.fr
Wed Mar 6 17:48:31 CET 2013


Le mercredi 06 mars 2013 à 08:31 -0800, Jonas125 a écrit :
> The datatable (and the split obviously) only contain characters and numeric
> data.
> 
> I found that 4 regression in a row work if I don't use the calculated
> columns as variables but 2 of the original columns. 
> RAM usage stays below 3GB!
> --> Why does R has such problems with the calculated columns? Their
> calculation is already done before the regression starts. 
> 
> It's like this:
> Create the calculated columns:
> Dataset$ExtraColumn1 <- Dataset$ColumnA / Dataset$ColumnB
> Dataset$ExtraColumn2 <- Dataset$ColumnC / Dataset$ColumnD
> 
> Perform the split of the dataset inc. calculated columns (the criteria for
> the split have a hierarchy):
> Datasplit <- split(Dataset, paste(Dataset$ColumnE, Dataset$ColumnE))
> 
> Perform the regression on the splitted data:
> Regression1 <- lapply(Datasplit, function(d) lm(ExtraColumn1 ~ ExtraColumn2,
> d, na.action = na.omit, singular.ok = TRUE))
> 
> BTW: There are no NA values in the data source.
> 
> What is my mistake?
What's the value of length(Datasplit)? Have you tried running
regressions manually on Datasplit[[1]] and calling object.size() on the
result to see how large it is?


Regards

> When I calculate the columns I might divide by zero (=inf). Could that
> create the problem in the regression?
> 
> Thanks,
> Jonas
> 
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/lm-Regression-takes-24-GB-RAM-Error-message-tp4660434p4660496.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list