[R-SIG-Mac] Multi-thread R processes performance on 8-core Mac-Pro

Douglas Bates bates at stat.wisc.edu
Fri Apr 20 14:59:32 CEST 2007

On 4/19/07, Marra, David <David.Marra at atkearney.com> wrote:
> Thank you to everyone who contributed to understanding the multi-core
> problem better. I took Elijah's advice and purchased a Leopard
> pre-release DVD and will post performance results here when it arrives
> and if the results are interesting (should be!). I'll also post the
> result from Simon's BLAS test in a few hours.
> In the meantime, there is a speed problem to solve. Appreciate advice
> anyone may have on potential approaches for speeding up the following
> function. Based on previous comments, fewer calls to memory may be
> important...
> results <- function(x){
> fit <- lm(Y ~
> get(cmb[x,1])+get(cmb[x,2])+get(cmb[x,3])+get(cmb[x,4])+get(cmb[x,5]),
> data=data1)
> list(R2=summary(fit)$adj.r.squared) }
> This call to lm function is nested in a parSapply function that iterates
> down the rows of the "cmb" matrix. Each row of cmb has, in the example
> above, 5 character values (such as "var1", "var2",..."var5")
> corresponding to variable names in the "data1" dataframe. The function
> iterates down the rows, generating regressions, each with a different
> combination of variables. (x just goes from 1 to whatever number of rows
> are in cmb.) Finally the function delivers the R2 for each combination.

Can you be more specific about what x is here?  What you write makes
it sound as if x is a single row but you wouldn't be able to do a
linear model fit on a single row.  It must be more than one row.

The immediate way to speed things up is to use lm.fit directly instead
of going through lm.  The lm function is a convenience function to
take a formula/data representation of a linear model along with
several optional arguments and create the model matrices.  In this
case you can create the model matrix for all the rows in a single
call, provided that it fits into memory, then farm out the individual
fits.  Also, the call to summary does a lot more that calculate an
adjusted R-squared.  You can calculate this single statistic directly
from the dimensions of the problem and the "effects" component of the
lm fit.

> Any speed-up ideas?
> David
> _______________________________________________
> R-SIG-Mac mailing list
> R-SIG-Mac at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-mac

More information about the R-SIG-Mac mailing list