[R] model seleciton by leave-one-out cross-validation

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri May 11 17:35:34 CEST 2007


What is 'cv.glm'?  That's almost certainly where the time is going, and 
there are fast methods of LOO cross-validation for linear models (as 
distinct from glms).  If this is the function of that name from package 
boot, that comment certainly applies.

Do you know about the relationship between LOO CV and AIC in model 
selection?

On Fri, 11 May 2007, Àî¿¡½Ü wrote:

> Hi, all
>
> When I am using mle.cv(wle), I find a interesting problem: I can't do
> leave-one-out cross-validation with mle.cv(wle). I will illustrate the
> problem as following:
>
>> xx=matrix(rnorm(20*3),ncol=3)
>> bb=c(1,2,0)
>> yy=xx%*%bb+rnorm(20,0,0.001)+0
>> summary(mle.cv(yy~xx,split=nrow(xx)-1,monte.carlo=2*nrow(xx),verbose=T),
> num.max=1)[[1]]
> mle.cv: dimension of the split subsample set to default value =  9
> (Intercept)          xx1          xx2          xx3           cv
> 0.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 1.292513e-06
>
>
> So does anybody know how to do linear model selection by leave-one-out
> cross-validation? I've written one function, but it runs toooooo slow~~~
>
> Thanks firstly
>
> This is my super slow function:
>
>
> ####################
> ## function: rec.comb
> ## input: vec -- vector
> ## output: all possible combination from the elements of vec
> ####################
> rec.comb=function(vec)
> {
>  if(length(vec)==0){list(NULL)
>  }else { tmp=rec.comb(vec[-1])
>    tmp2=sapply(tmp,function(x)c(vec[1],x))
>    c(tmp,tmp2)
>  }
> }
>
> ####################
> ## function: CV1glm--CV1 using K fold CV
> ## input: y -- response vector; x -- predictors without intercept
> ## output: the vector whether each predictor should be selected. E.g.
> (0,1,0,1) means no intecept while var1 and var3 should be selected
> ####################
> CV1glm=function(y,x){
> n.var=ncol(x)
> n=nrow(x)
> comb=rec.comb(1: (n.var))
> n.comb=length(comb)
> pe=c()
>
> data=data.frame(y=y)
> glm=glm(y~1,data=data)
> pe[1]=cv.glm(data,glm)$delta[1]
> for(i in 2:n.comb){
>  data=data.frame(y=y,x=x[,comb[[i]]])
>  glm=glm(y~.,data=data)
>  pe[i]=cv.glm(data,glm)$delta[1]
> }
> pe1=c() ####################################without intercept
> pe1[1]=Inf
> for(i in 2:n.comb){
>  data=data.frame(y=y,x=x[,comb[[i]]])
>  glm=glm(y~.-1,data=data)
>  pe1[i]=cv.glm(data,glm)$delta[1]
> }
>
> var=rep(0,n.var)
> if(min(pe)<min(pe1)){
>  int=1
>  var[comb[[which(pe==min(pe))]]]=1
> }else{
>  int=0
>  var[comb[[which(pe1==min(pe1))]]]=1
> }
> c(int,var)
> }
>
>
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list