[R] model seleciton by leave-one-out cross-validation
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri May 11 17:35:34 CEST 2007
What is 'cv.glm'? That's almost certainly where the time is going, and
there are fast methods of LOO cross-validation for linear models (as
distinct from glms). If this is the function of that name from package
boot, that comment certainly applies.
Do you know about the relationship between LOO CV and AIC in model
selection?
On Fri, 11 May 2007, Àî¿¡½Ü wrote:
> Hi, all
>
> When I am using mle.cv(wle), I find a interesting problem: I can't do
> leave-one-out cross-validation with mle.cv(wle). I will illustrate the
> problem as following:
>
>> xx=matrix(rnorm(20*3),ncol=3)
>> bb=c(1,2,0)
>> yy=xx%*%bb+rnorm(20,0,0.001)+0
>> summary(mle.cv(yy~xx,split=nrow(xx)-1,monte.carlo=2*nrow(xx),verbose=T),
> num.max=1)[[1]]
> mle.cv: dimension of the split subsample set to default value = 9
> (Intercept) xx1 xx2 xx3 cv
> 0.000000e+00 1.000000e+00 1.000000e+00 0.000000e+00 1.292513e-06
>
>
> So does anybody know how to do linear model selection by leave-one-out
> cross-validation? I've written one function, but it runs toooooo slow~~~
>
> Thanks firstly
>
> This is my super slow function:
>
>
> ####################
> ## function: rec.comb
> ## input: vec -- vector
> ## output: all possible combination from the elements of vec
> ####################
> rec.comb=function(vec)
> {
> if(length(vec)==0){list(NULL)
> }else { tmp=rec.comb(vec[-1])
> tmp2=sapply(tmp,function(x)c(vec[1],x))
> c(tmp,tmp2)
> }
> }
>
> ####################
> ## function: CV1glm--CV1 using K fold CV
> ## input: y -- response vector; x -- predictors without intercept
> ## output: the vector whether each predictor should be selected. E.g.
> (0,1,0,1) means no intecept while var1 and var3 should be selected
> ####################
> CV1glm=function(y,x){
> n.var=ncol(x)
> n=nrow(x)
> comb=rec.comb(1: (n.var))
> n.comb=length(comb)
> pe=c()
>
> data=data.frame(y=y)
> glm=glm(y~1,data=data)
> pe[1]=cv.glm(data,glm)$delta[1]
> for(i in 2:n.comb){
> data=data.frame(y=y,x=x[,comb[[i]]])
> glm=glm(y~.,data=data)
> pe[i]=cv.glm(data,glm)$delta[1]
> }
> pe1=c() ####################################without intercept
> pe1[1]=Inf
> for(i in 2:n.comb){
> data=data.frame(y=y,x=x[,comb[[i]]])
> glm=glm(y~.-1,data=data)
> pe1[i]=cv.glm(data,glm)$delta[1]
> }
>
> var=rep(0,n.var)
> if(min(pe)<min(pe1)){
> int=1
> var[comb[[which(pe==min(pe))]]]=1
> }else{
> int=0
> var[comb[[which(pe1==min(pe1))]]]=1
> }
> c(int,var)
> }
>
>
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list