[R] R versus SAS: lm performance

Arne.Muller@aventis.com Arne.Muller at aventis.com
Mon May 10 16:36:58 CEST 2004


Hello,

A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. He got the same results, but SAS took a bit more than a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of memory, and I assume that all machines have similar hardware, but the S+ and SAS machines are on windows whereas the R machine is Redhat Linux 7.2.

My question is if I'm doing something wrong (technically) calling the lm routine, or (if not), how I can optimize the call to lm or even using an alternative to lm. I'd like to run about 12,000 of these models in R (for a gene expression experiment - one model per gene, which would take far too long).

I've run the follwong code in R (and S+):

> options(contrasts=c('contr.helmert', 'contr.poly'))

The 1st colum is the value to be modeled, and the others are factors.

> names(df.gene1data) <- c("Va", "Ba", "Ti", "Do", "Ar", "Pr")
> df[c(1:2,1343:1344),]
           Va    Do  Ti  Ba Ar    Pr
1    2.317804 000mM 24h NEW  1     1
2    2.495390 000mM 24h NEW  2     1
8315 2.979641 025mM 04h PRG 83    16
8415 4.505787 000mM 04h PRG 84    16

this is a dataframe with 1344 rows.

x <- Sys.time();
wlm <- lm(Va ~
Ba+Ti+Do+Pr+Ba:Ti+Ba:Do+Ba:Pr+Ti:Do+Ti:Pr+Do:Pr+Ba:Ti:Do+Ba:Ti:Pr+Ba:Do:Pr+Ti:Do:Pr+Ba:Ti:Do:Pr+(Ba:Ti:Do)/Ar, data=df, singular=T);
difftime(Sys.time(), x)

Time difference of 15.33333 mins

> anova(wlm)
Analysis of Variance Table

Response: Va
             Df Sum Sq Mean Sq   F value    Pr(>F)    
Ba            2    0.1     0.1    0.4262  0.653133    
Ti            1    2.6     2.6   16.5055 5.306e-05 ***
Do            4    6.8     1.7   10.5468 2.431e-08 ***
Pr           15 5007.4   333.8 2081.8439 < 2.2e-16 ***
Ba:Ti         2    3.2     1.6    9.8510 5.904e-05 ***
Ba:Do         7    2.8     0.4    2.5054  0.014943 *  
Ba:Pr        30   80.6     2.7   16.7585 < 2.2e-16 ***
Ti:Do         4    8.7     2.2   13.5982 9.537e-11 ***
Ti:Pr        15    2.4     0.2    1.0017  0.450876    
Do:Pr        60   10.2     0.2    1.0594  0.358551    
Ba:Ti:Do      7    1.4     0.2    1.2064  0.296415    
Ba:Ti:Pr     30    5.6     0.2    1.1563  0.259184    
Ba:Do:Pr    105   14.2     0.1    0.8445  0.862262    
Ti:Do:Pr     60   14.8     0.2    1.5367  0.006713 ** 
Ba:Ti:Do:Pr 105   15.8     0.2    0.9382  0.653134    
Ba:Ti:Do:Ar  56   26.4     0.5    2.9434 2.904e-11 ***
Residuals   840  134.7     0.2                        

The corresponding SAS program from my collegue is:

proc glm data = "the name of the data set";

class B T D A P;

model V = B T D P B*T B*D B*P T*D T*P D*P B*T*D B*T*P B*D*P T*D*P B*T*D*P A(B*T*D);

run;

Note, V = Va, B = Ba, T = Ti, D = Do, P = Pr, A = Ar of the R-example

	kind regards + thanks a lot for your help,

	Arne




More information about the R-help mailing list