[R] Leave One Out Cross Validation

muddz mlaique at ryerson.ca
Sat Jun 20 19:18:00 CEST 2009


Hi David,

Thanks and I apologize for the lack of clarity.

##n is defined as the length of xdat
n<-length(xdat)

#I defined 'k' as the Gaussian kernel function
k<-function(v) {1/sqrt(2*pi)*exp(-v^2/2)} #GAUSSIAN kernal

#I believe ypred in my case, was the leave one out estimator (I think its
the variable x, in other words xdat or independent variable). I could be
wrong, but from the text of Davidson and MacKinnon(pg. 686 "Cross
Validation function"), thats what I got out of it. 

> cvhfunc<-function(y,x,h)    {
>     ypred<-0
>     for (i in 1:n){   
>        for (j in 1:n){
>        if (j!=i){
>             ypred<-ypred +    (y[i]*k( (x[j]-x[i])/h ) )/
                                       k( (x[j]-x[i])/h )

#I believe ypred in my case, was the leave one out estimator (I think its
the variable x, in other words xdat or independent variable). I could be
wrong, but from the text of Davidson and MacKinnon, thats what I got out of
it. 

#I totally hear you on the fact that ypred will just equal y[i], however it
should not be the case because for each x[i], I should be running all x[j]
except for when x[i]=x[j]. And I should be mulitiplying the numerator of
ypred (y[i]*k( (x[j]-x[i])/h ). However, its not doing that. 

#I believe CV should be the following: It is used to determine optimal
bandwidth. Steps:
(1)The process involves running x, [j] times for each x[i], except for when
x[j]=x[i]. This is similiar to drawing a histogram and finding how how
large/small the bins should be. ypred should be a vector of nx1 
(2) The second step is similiar to a variance measure, where take the
difference of y and (1), square, and than sum for all n. 

>                 }
>                   }   }
>     ypred     # not sure what that will do inside the function. If  
> it's there for debugging you may want to print(ypred)
>
>      cvh<-0
>      cvh<-cvh+((1/n)*(ydat-ypred)^2    


# ypred is a scalar, while y is a vector, so cvh will be a vector. Is  
that what you want?

>                              }
# I was not sure with this> " ypred     # not sure what that will do inside
the function. If  
it's there for debugging you may want to print(ypred)". 


#As a test run with h=.2; If test values of h= 1.4,15,30,50 show different
and good results (i.e no NaN, 
#massive small numbers, etc) than ideal to run a loop of h values to find
min CV(h)
> test2<-cvhfunc(ydat,xdat,.2);test2


Thanks.
-Mudasir

>                              }

David Winsemius wrote:
> 
> 
> On Jun 19, 2009, at 7:45 PM, muddz wrote:
> 
>>
>> Hi Uwe,
>>
>> My apologies.
>>
>> Please if I can be guided what I am doing wrong in the code. I  
>> started my
>> code as such:
>>
>> # ypred is my leave one out estimator of x
> 
> Estimator of x? Really?
> 
>> cvhfunc<-function(y,x,h)    {
>>     ypred<-0
>>     for (i in 1:n){   #how did you define "n"? It's not in your  
>> parameter list
>>        for (j in 1:n){
>>        if (j!=i){
>>             ypred<-ypred +    (y[i]*k( (x[j]-x[i])/h ) )/
>                                        k( (x[j]-x[i])/h )
> 
> At this point you are also using "k" as a function. Have you at any  
> point defined "k"?
> 
> Also multiplication of the ratio of two identical numbers would  
> generally give a result of y[i] for that second term. Unless, of  
> course, any x[j] = x[i] in which case you will throw an error and  
> every thing will grind to a halt.
> 
> It might help if you now explained what you think "CV" should be.
> 
>>                 }
>>                   }   }
>>     ypred     # not sure what that will do inside the function. If  
>> it's there for debugging you may want to print(ypred)
>>
>> #CVh is a
> 
> # Yes? CVh is a ....what ?
> 
>>      cvh<-0
>>      cvh<-cvh+((1/n)*(y-ypred)^2    # n again. R will still not know  
>> what that is.
>>      cvh
> 
> # ypred is a scalar, while y is a vector, so cvh will be a vector. Is  
> that what you want?
> 
>>                              }
> 
>> test2<-cvhfunc(ydat,xdat,.2);test2
>>
>> #I was experimenting with the following data:
>> library(datasets)
>> data(faithful)
>> ydat<-faithful$eruptions;ydat;plot(ydat);par(new=T)
>> xdat<-faithful$waiting;xdat;plot(xdat,col="blue")
>>
>> # I want to minimize the CV function with respect to h. Thanks.
>>
>>
>>
>>
>> Uwe Ligges-3 wrote:
>>>
>>> See the posting guide:
>>> If you provide commented, minimal, self-contained, reproducible code
>>> some people may be willing to help on the list.
>>>
>>> Best,
>>> Uwe Ligges
>>>
>>>
>>> muddz wrote:
>>>> Hi All,
>>>>
>>>> I have been trying to get this LOO-Cross Validation method to work  
>>>> on R
>>>> for
>>>> the past 3 weeks but have had no luck. I am hoping someone would  
>>>> kindly
>>>> help
>>>> me.
>>>>
>>>> Essentially I want to code the LOO-Cross Validation for the 'Local
>>>> Constant'
>>>> and 'Local Linear' constant estimators. I want to find optimal h,
>>>> bandwidth.
>>>>
>>>> Thank you very much!
>>>> -M
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24120380.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24127218.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list