# [R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

peter dalgaard pdalgd at gmail.com
Sat Nov 1 19:35:10 CET 2014

```You seem to be using bw.ucv to set the bandwidth for ksmooth. However, bw.ucv selects the bandwidth for estimating the _density_ of x. I see no reason to believe that the same bandwidth selection should be optimal or even consistent for a kernel smoother like ksmooth.

Check out the KernSmooth package, in particular the dpik() and dpill() function and the book that the package supports.

-pd

> On 01 Nov 2014, at 13:03 , Khulood Aljehani <aljehani-k at hotmail.com> wrote:
>
>
> Hello
> I hope that you will help me in my problem with the Nadaraya-Watson kernel regression estimation method (NW)  I used a simulation data and made a loop ​​to calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E where, Y: the response variable,       X: the explanatory variable from uniform (0,1)       E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE  But the MSE increases with increasing the sample size, and this is my program that i wrote it
> n1=25
> set.seed(4455)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> nrep <- 1000
>
> #----------------------------------------Fixed NW
> mse_rep1<-c()
> for(i in 1:1500){
> set.seed(i+236)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> hmax <- 2 * sqrt(var(X)) * n1^(-1/5)
> lower = 0.01 * hmax
> h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower)
> est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)\$y
> mse1<-(n1^-1)*sum((Y - est1)^2)
>
> mse_rep1 <- cbind(mse_rep1,mse1)
>
> dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i))
>
> }
> library(functional)
> MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))]
>
> MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean)     #calculate the average of the 1000 MSEBut i got NA value first, i made 1500 replication then i choose 1000 without NA value
> When i change the sample size to 50 or 100 the MSE decrease , but more than 100 the MSE increas. this is the main problem.
> I hope I was able to clarify the problem well
> Regards
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help