[R] running crossvalidation many times MSE for Lasso regression

Tue Oct 24 21:02:52 CEST 2023

Dear Rui,

I really thank you a lot for your response and your R code.

Best,

Sacha

Le mardi 24 octobre 2023 à 16:37:56 UTC+2, Rui Barradas <ruipbarradas using sapo.pt> a écrit : 

Às 20:12 de 23/10/2023, varin sacha via R-help escreveu:
> Dear R-experts,
> 
> I really thank you all a lot for your responses. So, here is the error (and warning) messages at the end of my R code.
> 
> Many thanks for your help.
> 
> 
> Error in UseMethod("predict") :
>    no applicable method for 'predict' applied to an object of class "c('matrix', 'array', 'double', 'numeric')"
>> mean(unlist(lst))
> [1] NA
> Warning message:
> In mean.default(unlist(lst)) :
>    argument is not numeric or logical: returning NA
> 
> 
> 
> 
> 
> 
> 
> 
> Le lundi 23 octobre 2023 à 19:59:15 UTC+2, Ben Bolker <bbolker using gmail.com> a écrit :
> 
> 
> 
> 
> 
>    For what it's worth it looks like spm2 is specifically for *spatial*
> predictive modeling; presumably its version of CV is doing something
> spatially aware.
> 
>    I agree that glmnet is old and reliable.  One might want to use a
> tidymodels wrapper to create pipelines where you can more easily switch
> among predictive algorithms (see the `parsnip` package), but otherwise
> sticking to glmnet seems wise.
> 
> On 2023-10-23 4:38 a.m., Martin Maechler wrote:
>>>>>>> Jin Li
>>>>>>>        on Mon, 23 Oct 2023 15:42:14 +1100 writes:
>>
>>        > If you are interested in other validation methods (e.g., LOO or n-fold)
>>        > with more predictive accuracy measures, the function, glmnetcv, in the spm2
>>        > package can be directly used, and some reproducible examples are
>>        > also available in ?glmnetcv.
>>
>> ... and once you open that can of w..:  the  glmnet package itself
>> contains a function  cv.glmnet()  which we (our students) use when teaching.
>>
>> What's the advantage of the spm2 package ?
>> At least, the glmnet package is authored by the same who originated and
>> first published (as in "peer reviewed" ..) these algorithms.
>>
>>
>>
>>        > On Mon, Oct 23, 2023 at 10:59 AM Duncan Murdoch <murdoch.duncan using gmail.com>
>>        > wrote:
>>
>>        >> On 22/10/2023 7:01 p.m., Bert Gunter wrote:
>>        >> > No error message shown Please include the error message so that it is
>>        >> > not necessary to rerun your code. This might enable someone to see the
>>        >> > problem without running the code (e.g. downloading packages, etc.)
>>        >>
>>        >> And it's not necessarily true that someone else would see the same error
>>        >> message.
>>        >>
>>        >> Duncan Murdoch
>>        >>
>>        >> >
>>        >> > -- Bert
>>        >> >
>>        >> > On Sun, Oct 22, 2023 at 1:36 PM varin sacha via R-help
>>        >> > <r-help using r-project.org> wrote:
>>        >> >>
>>        >> >> Dear R-experts,
>>        >> >>
>>        >> >> Here below my R code with an error message. Can somebody help me to fix
>>        >> this error?
>>        >> >> Really appreciate your help.
>>        >> >>
>>        >> >> Best,
>>        >> >>
>>        >> >> ############################################################
>>        >> >> # MSE CROSSVALIDATION Lasso regression
>>        >> >>
>>        >> >> library(glmnet)
>>        >> >>
>>        >> >>
>>        >> >>
>>        >> x1=c(34,35,12,13,15,37,65,45,47,67,87,45,46,39,87,98,67,51,10,30,65,34,57,68,98,86,45,65,34,78,98,123,202,231,154,21,34,26,56,78,99,83,46,58,91)
>>        >> >>
>>        >> x2=c(1,3,2,4,5,6,7,3,8,9,10,11,12,1,3,4,2,3,4,5,4,6,8,7,9,4,3,6,7,9,8,4,7,6,1,3,2,5,6,8,7,1,1,2,9)
>>        >> >>
>>        >> y=c(2,6,5,4,6,7,8,10,11,2,3,1,3,5,4,6,5,3.4,5.6,-2.4,-5.4,5,3,6,5,-3,-5,3,2,-1,-8,5,8,6,9,4,5,-3,-7,-9,-9,8,7,1,2)
>>        >> >> T=data.frame(y,x1,x2)
>>        >> >>
>>        >> >> z=matrix(c(x1,x2), ncol=2)
>>        >> >> cv_model=glmnet(z,y,alpha=1)
>>        >> >> best_lambda=cv_model$lambda.min
>>        >> >> best_lambda
>>        >> >>
>>        >> >>
>>        >> >> # Create a list to store the results
>>        >> >> lst<-list()
>>        >> >>
>>        >> >> # This statement does the repetitions (looping)
>>        >> >> for(i in 1 :1000) {
>>        >> >>
>>        >> >> n=45
>>        >> >>
>>        >> >> p=0.667
>>        >> >>
>>        >> >> sam=sample(1 :n,floor(p*n),replace=FALSE)
>>        >> >>
>>        >> >> Training =T [sam,]
>>        >> >> Testing = T [-sam,]
>>        >> >>
>>        >> >> test1=matrix(c(Testing$x1,Testing$x2),ncol=2)
>>        >> >>
>>        >> >> predictLasso=predict(cv_model, newx=test1)
>>        >> >>
>>        >> >>
>>        >> >> ypred=predict(predictLasso,newdata=test1)
>>        >> >> y=T[-sam,]$y
>>        >> >>
>>        >> >> MSE = mean((y-ypred)^2)
>>        >> >> MSE
>>        >> >> lst[i]<-MSE
>>        >> >> }
>>        >> >> mean(unlist(lst))
>>        >> >> ##################################################################
>>        >> >>
>>        >> >>
>>        >> >>
>>        >> >>
>>        >> >> ______________________________________________
>>        >> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>        >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>        >> >> PLEASE do read the posting guide
>>        >> http://www.R-project.org/posting-guide.html
>>        >> >> and provide commented, minimal, self-contained, reproducible code.
>>        >> >
>>        >> > ______________________________________________
>>        >> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>        >> > https://stat.ethz.ch/mailman/listinfo/r-help
>>        >> > PLEASE do read the posting guide
>>        >> http://www.R-project.org/posting-guide.html
>>        >> > and provide commented, minimal, self-contained, reproducible code.
>>        >>
>>        >> ______________________________________________
>>        >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>        >> https://stat.ethz.ch/mailman/listinfo/r-help
>>        >> PLEASE do read the posting guide
>>        >> http://www.R-project.org/posting-guide.html
>>        >> and provide commented, minimal, self-contained, reproducible code.
>>        >>
>>
>>
>>        > --
>>        > Jin
>>        > ------------------------------------------
>>        > Jin Li, PhD
>>        > Founder, Data2action, Australia
>>        > https://www.researchgate.net/profile/Jin_Li32
>>        > https://scholar.google.com/citations?user=Jeot53EAAAAJ&hl=en
>>
>>        > [[alternative HTML version deleted]]
> 
>>
>>        > ______________________________________________
>>        > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>        > https://stat.ethz.ch/mailman/listinfo/r-help
>>        > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>        > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

In your OP, the following two code lines are where that error comes from.

predictLasso=predict(cv_model, newx=test1)

ypred=predict(predictLasso,newdata=test1)

predictLasso already are predictions, it's the output of predict. So 
when you run the 2nd line above you are passing it a matrix, not a 
fitted model, and the error is thrown.

After the several suggestion in this thread, don't you want something 
like this instead of your for loop?

# make the results reproducible
set.seed(2023)
# this is better than what you had
z <- TT[c("x1", "x2")] |> as.matrix()
y <- TT[["y"]]
cv_model <- cv.glmnet(z, y, alpha = 1, type.measure = "mse")

best_lambda <- cv_model$lambda.min
best_lambda

# these two values should be the same, and they are
# index to minimum mse
(i <- cv_model$index[1])
which(cv_model$lambda == cv_model$lambda.min)

# these two values should be the same, and they are
# value of minimum mse
cv_model$cvm[i]
min(cv_model$cvm)

plot(cv_model)

Hope this helps,

Rui Barradas

-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com