[R] MCO: Timing using model.matrix method

Fri May 1 18:52:16 CEST 2009

Hi fellow users of R,

My research requires the simultaneous optimization of several response
functions.  Therefore, I am using the nsga2 method in the mco package,
which works beautifully.

However, I am running into a significant timing difference that is
causing me grief.  The details:

Say I have the following function to optimize, using the results of
the method rsm.

> library(rsm)
> #run the rsm analysis
> rsm.ave<-rsm(ave ~ SO(x1, x2, x3, x4), data=heli)
> rsm.sd<-rsm(log100s ~ SO(x1, x2, x3, x4), data=heli)
>
> #extract coefficients
> coef.ave<-rsm.ave$coefficients
> coef.sd<-rsm.sd$coefficients
>
> #function to optimize (maximize y1, minimize y2)
> opt.func<-function(x){
+ y<-numeric(2)
+
+ y[1]<--1*(coef.ave[1]+coef.ave[2]*x[1]+coef.ave[3]*x[2]+coef.ave[4]*x[3]+coef.ave[5]*x[4]+
+ coef.ave[6]*x[1]*x[2]+coef.ave[7]*x[1]*x[3]+coef.ave[8]*x[1]*x[4]+coef.ave[9]*x[2]*x[3]+
+ coef.ave[10]*x[2]*x[4]+coef.ave[11]*x[3]*x[4]+
+ coef.ave[12]*x[1]^2+coef.ave[13]*x[2]^2+coef.ave[14]*x[3]^2+coef.ave[15]*x[4]^2)
+
+ y[2]<-coef.sd[1]+coef.sd[2]*x[1]+coef.sd[3]*x[2]+coef.sd[4]*x[3]+coef.sd[5]*x[4]+
+ coef.sd[6]*x[1]*x[2]+coef.sd[7]*x[1]*x[3]+coef.sd[8]*x[1]*x[4]+coef.sd[9]*x[2]*x[3]+
+ coef.sd[10]*x[2]*x[4]+coef.sd[11]*x[3]*x[4]+
+ coef.sd[12]*x[1]^2+coef.sd[13]*x[2]^2+coef.sd[14]*x[3]^2+coef.sd[15]*x[4]^2
+ return(y)
+ }
>
> library(mco)
> print(system.time(nsga.res<-nsga2(opt.func, 4, 2, generations=150, popsize=100, cprob=0.20,
+ cdist=100, mprob=0.20, mdist=100, lower.bounds=rep(-2,
4),upper.bounds=rep(2, 4))))
  user  system elapsed
  2.42    0.00    2.43

That is impressive, and is exactly what I am looking for in my code.
However, it has the drawback that the structure of the function to be
optimized has to be built manually, and cannot be automatically built
ffrom the rsm object.  Also, it is hard on the eyes.

Another way of achieving this end is to use the model.matrix method,
which is advantageous in that it is completely general, and can easily
be automated.

> terms<-delete.response(terms(rsm.ave))
> opt.func2<-function(x, coef.ave, coef.sd, terms){
+ y<-numeric(2)
+ x.df<-data.frame(t(x))
+ names(x.df)=all.vars(terms)
+ X<-model.matrix(terms, data=x.df)
+ y[1]<-1-crossprod(t(X), coef.ave)
+ y[2]<-crossprod(t(X),coef.sd)
+ return(y)
+ }
>
> print(system.time(nsga.res2<-nsga2(opt.func2, 4, 2, coef.ave=coef.ave, coef.sd=coef.sd, terms=terms,
+ generations=150, popsize=100, cprob=0.20,
+ cdist=100, mprob=0.20, mdist=100, lower.bounds=rep(-2,
4),upper.bounds=rep(2, 4))))
  user  system elapsed
 59.42    0.00   60.48

My issue is self-evident:  using this method resulted in a 30 fold
increase in time.  My question is why?  If I time the individual
components separately, nothing is unusual.  My hunch is the
"interaction" between the model.matrix and nsga2 methods.

Any ideas on how to speed this process up, or circumvent the issue altogether?

Thanks,

Corey