[R-SIG-Mac] Multi-thread R processes performance on 8-core Mac-Pro

Fri Apr 20 21:39:44 CEST 2007

>On 4/19/07, Marra, David <David.Marra at atkearney.com> wrote:
>> Thank you to everyone who contributed to understanding the multi-core
>> problem better. I took Elijah's advice and purchased a Leopard
>> pre-release DVD and will post performance results here when it arrives
>> and if the results are interesting (should be!). I'll also post the
>> result from Simon's BLAS test in a few hours.
>>
>> In the meantime, there is a speed problem to solve. Appreciate advice
>> anyone may have on potential approaches for speeding up the following
>> function. Based on previous comments, fewer calls to memory may be
>> important...
>>
>> results <- function(x){
>> fit <- lm(Y ~
>> get(cmb[x,1])+get(cmb[x,2])+get(cmb[x,3])+get(cmb[x,4])+get(cmb[x,5]),
>> data=data1)
>> list(R2=summary(fit)$adj.r.squared) }
>>
>> This call to lm function is nested in a parSapply function that iterates
>> down the rows of the "cmb" matrix. Each row of cmb has, in the example
>> above, 5 character values (such as "var1", "var2",..."var5")
>> corresponding to variable names in the "data1" dataframe. The function
>> iterates down the rows, generating regressions, each with a different
>> combination of variables. (x just goes from 1 to whatever number of rows
>> are in cmb.) Finally the function delivers the R2 for each combination.

>Can you be more specific about what x is here?  What you write makes
>it sound as if x is a single row but you wouldn't be able to do a
>linear model fit on a single row.  It must be more than one row.

>The immediate way to speed things up is to use lm.fit directly instead
>of going through lm.  The lm function is a convenience function to
>take a formula/data representation of a linear model along with
>several optional arguments and create the model matrices.  In this
>case you can create the model matrix for all the rows in a single
>call, provided that it fits into memory, then farm out the individual
>fits.  Also, the call to summary does a lot more that calculate an
>adjusted R-squared.  You can calculate this single statistic directly
>from the dimensions of the problem and the "effects" component of the
>lm fit.

I will try to clarify. 
The purpose of the function is to create x different lm models and extract their R2s. If x is 1:500 that means 500 unique models, each with a different combination of arguments. The 500 unique combinations of argument names are stored in cmb. One combination in each row. If there are 500 combinations of 4 arguments each, the cmb matrix has 500 rows and 4 columns. For example row 29 might contain the following 4 character values: "Var2", "Var7", "Var18", "Var30". Literally, just characters. A text file, if you will. The characters "Var2" would be in the first column, "Var7, in the second.."Var18" in the fourth. The function I would like to speed up, if it is possible, then gets variable names from cmb and the data from data1. data1 is a large dataframe with all the variables, Var1 to Var30, and their data. 

>>
>> Any speed-up ideas?
>>
>> David
>>
>> _______________________________________________
>> R-SIG-Mac mailing list
>> R-SIG-Mac at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>