[R] MSE Cross-validation with factor interactions terms MARS regression

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Tue Oct 30 00:30:06 CET 2018


The two lines did the same thing, so little wonder...

More likely, the culprit is that a is assigned in the global environment, and then used in a prediction on a subset.

Also, 

- you are defining Training, but as far as I can tell, you're not using it. Not likely to be an issue in itself, but wouldn't you want to fit on the Training set and evaluate on the Testing? 

- your model de facto contains both education as a numeric predictor and as.factor(education) as well as the interaction term age:as.factor(education). Does that make sense modelling-wise??

-pd

> On 29 Oct 2018, at 23:50 , varin sacha via R-help <r-help using r-project.org> wrote:
> 
> Hi Bert,
> 
> Many thanks, I have fixed it but it still don't work... . 
> Best,
> 
> 
> 
> 
> 
> 
> Le lundi 29 octobre 2018 à 22:07:26 UTC+1, Bert Gunter <bgunter.4567 using gmail.com> a écrit : 
> 
> 
> 
> 
> 
> I did no analysis of your code or thought process, but noticed that you had the following two successive lines in your code:
> 
> 
> y=Testing$wage
> 
> y=Wage[-sam,]$wage
> 
> This obviously makes no sense, so maybe you should fix this first and then proceed.
> 
> -- Bert
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Mon, Oct 29, 2018 at 1:46 PM varin sacha via R-help <r-help using r-project.org> wrote:
>> 
>> Dear R-experts,
>> I am having trouble while doing crossvalidation with a MARS regression including an interaction term between a factor variable (education) and 1 continuous variable (age). How could I solve my problem ?
>> 
>> Here below my reproducible example.
>> 
>> #######
>> 
>> install.packages("ISLR")
>> 
>> library(ISLR)
>> 
>> install.packages("earth")
>> 
>> library(earth)
>> 
>> a<-as.factor(Wage$education)
>> 
>> # Create a list to store the results
>> 
>> lst<-list()
>> 
>> # This statement does the repetitions (looping)
>> 
>> for(i in 1 :200) {
>> 
>> n=dim(Wage)[1]
>> 
>> p=0.667
>> 
>> sam=sample(1 :n,floor(p*n),replace=FALSE)
>> 
>> Training =Wage [sam,]
>> 
>> Testing = Wage [-sam,]
>> 
>> mars5<-earth(wage~age+education+year+age*a, data=Wage)
>> 
>> ypred=predict(mars5,newdata=Testing)
>> 
>> y=Testing$wage
>> 
>> y=Wage[-sam,]$wage
>> 
>> MSE = mean(y-ypred)^2
>> 
>> MSE
>> 
>> lst[i]<-MSE
>> 
>> }
>> 
>> mean(unlist(lst))
>> 
>> summary(mars5)
>> 
>> #######
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com




More information about the R-help mailing list