[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

peter dalgaard pdalgd at gmail.com
Sat Nov 1 19:35:10 CET 2014


You seem to be using bw.ucv to set the bandwidth for ksmooth. However, bw.ucv selects the bandwidth for estimating the _density_ of x. I see no reason to believe that the same bandwidth selection should be optimal or even consistent for a kernel smoother like ksmooth. 

Check out the KernSmooth package, in particular the dpik() and dpill() function and the book that the package supports.

-pd


> On 01 Nov 2014, at 13:03 , Khulood Aljehani <aljehani-k at hotmail.com> wrote:
> 
> 
> Hello
> I hope that you will help me in my problem with the Nadaraya-Watson kernel regression estimation method (NW)  I used a simulation data and made a loop ​​to calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E where, Y: the response variable,       X: the explanatory variable from uniform (0,1)       E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE  But the MSE increases with increasing the sample size, and this is my program that i wrote it
> n1=25
> set.seed(4455)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> nrep <- 1000
> 
> #----------------------------------------Fixed NW
> mse_rep1<-c()
> for(i in 1:1500){
> set.seed(i+236)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> hmax <- 2 * sqrt(var(X)) * n1^(-1/5) 
> lower = 0.01 * hmax              
> h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower)
> est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y
> mse1<-(n1^-1)*sum((Y - est1)^2)
> 
> mse_rep1 <- cbind(mse_rep1,mse1)
> 
> dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i))
> 
> }
> library(functional)
> MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))]
> 
> MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean)     #calculate the average of the 1000 MSEBut i got NA value first, i made 1500 replication then i choose 1000 without NA value
> When i change the sample size to 50 or 100 the MSE decrease , but more than 100 the MSE increas. this is the main problem.
> I hope I was able to clarify the problem well
> Regards 
>    		 	   		  
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list