[R] MSE increased by increasing the sample size for Nadaraya-Watson kernel regression

Sat Nov 1 19:28:17 CET 2014

1. I am unfamiliar with the functional package.

2. I think the proper question is: Why do you expect the mse to
decrease with decreasing sample size?
Example: the precision of an average (as an estimator of the
population mean) increases (gets smaller) as sample size increases,
but the mse is essentially constant as an estimator of the population
variance.
Note: for nonparametric smoothers, mse is related to bandwidth choice
also. This might change by default with different sample sizes.

3. In future, please post in plain text, not html, as the posting
guide requests.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll

On Sat, Nov 1, 2014 at 5:03 AM, Khulood Aljehani <aljehani-k at hotmail.com> wrote:
>
> Hello
> I hope that you will help me in my problem with the Nadaraya-Watson kernel regression estimation method (NW)  I used a simulation data and made a loop to calculate the NW estimator for the regression model Y=1-X+exp(-200*(X-0.5)^2)+E where, Y: the response variable,       X: the explanatory variable from uniform (0,1)       E: error term, i.i.d from normal(0,0.1) Then i calculate the MSE  But the MSE increases with increasing the sample size, and this is my program that i wrote it
> n1=25
> set.seed(4455)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> nrep <- 1000
>
> #----------------------------------------Fixed NW
> mse_rep1<-c()
> for(i in 1:1500){
> set.seed(i+236)
> E<-rnorm(n1,mean=0,sd=0.1)
> X<-runif(n1, min = 0, max = 1)
> mx=1-X+exp(-200*(X-0.5)^2)
> Y <- mx+E
> hmax <- 2 * sqrt(var(X)) * n1^(-1/5)
> lower = 0.01 * hmax
> h<- bw.ucv(X,nb = 1000, lower=lower, upper=hmax, tol=0.1*lower)
> est1 <- ksmooth(X, Y, kernel = "normal", bandwidth = h)$y
> mse1<-(n1^-1)*sum((Y - est1)^2)
>
> mse_rep1 <- cbind(mse_rep1,mse1)
>
> dimnames(mse_rep1)<-list(c("MSE1"),paste("rep",1:i))
>
> }
> library(functional)
> MSE_rep1<-mse_rep1[,apply(mse_rep1, 2, Compose(is.finite, any))]
>
> MSE_fixedNW<- apply(MSE_rep1[1:1000], 1, mean)     #calculate the average of the 1000 MSEBut i got NA value first, i made 1500 replication then i choose 1000 without NA value
> When i change the sample size to 50 or 100 the MSE decrease , but more than 100 the MSE increas. this is the main problem.
> I hope I was able to clarify the problem well
> Regards
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.