[R] Leave One Out Cross Validation
muddz
mlaique at ryerson.ca
Sat Jun 20 19:18:00 CEST 2009
Hi David,
Thanks and I apologize for the lack of clarity.
##n is defined as the length of xdat
n<-length(xdat)
#I defined 'k' as the Gaussian kernel function
k<-function(v) {1/sqrt(2*pi)*exp(-v^2/2)} #GAUSSIAN kernal
#I believe ypred in my case, was the leave one out estimator (I think its
the variable x, in other words xdat or independent variable). I could be
wrong, but from the text of Davidson and MacKinnon(pg. 686 "Cross
Validation function"), thats what I got out of it.
> cvhfunc<-function(y,x,h) {
> ypred<-0
> for (i in 1:n){
> for (j in 1:n){
> if (j!=i){
> ypred<-ypred + (y[i]*k( (x[j]-x[i])/h ) )/
k( (x[j]-x[i])/h )
#I believe ypred in my case, was the leave one out estimator (I think its
the variable x, in other words xdat or independent variable). I could be
wrong, but from the text of Davidson and MacKinnon, thats what I got out of
it.
#I totally hear you on the fact that ypred will just equal y[i], however it
should not be the case because for each x[i], I should be running all x[j]
except for when x[i]=x[j]. And I should be mulitiplying the numerator of
ypred (y[i]*k( (x[j]-x[i])/h ). However, its not doing that.
#I believe CV should be the following: It is used to determine optimal
bandwidth. Steps:
(1)The process involves running x, [j] times for each x[i], except for when
x[j]=x[i]. This is similiar to drawing a histogram and finding how how
large/small the bins should be. ypred should be a vector of nx1
(2) The second step is similiar to a variance measure, where take the
difference of y and (1), square, and than sum for all n.
> }
> } }
> ypred # not sure what that will do inside the function. If
> it's there for debugging you may want to print(ypred)
>
> cvh<-0
> cvh<-cvh+((1/n)*(ydat-ypred)^2
# ypred is a scalar, while y is a vector, so cvh will be a vector. Is
that what you want?
> }
# I was not sure with this> " ypred # not sure what that will do inside
the function. If
it's there for debugging you may want to print(ypred)".
#As a test run with h=.2; If test values of h= 1.4,15,30,50 show different
and good results (i.e no NaN,
#massive small numbers, etc) than ideal to run a loop of h values to find
min CV(h)
> test2<-cvhfunc(ydat,xdat,.2);test2
Thanks.
-Mudasir
> }
David Winsemius wrote:
>
>
> On Jun 19, 2009, at 7:45 PM, muddz wrote:
>
>>
>> Hi Uwe,
>>
>> My apologies.
>>
>> Please if I can be guided what I am doing wrong in the code. I
>> started my
>> code as such:
>>
>> # ypred is my leave one out estimator of x
>
> Estimator of x? Really?
>
>> cvhfunc<-function(y,x,h) {
>> ypred<-0
>> for (i in 1:n){ #how did you define "n"? It's not in your
>> parameter list
>> for (j in 1:n){
>> if (j!=i){
>> ypred<-ypred + (y[i]*k( (x[j]-x[i])/h ) )/
> k( (x[j]-x[i])/h )
>
> At this point you are also using "k" as a function. Have you at any
> point defined "k"?
>
> Also multiplication of the ratio of two identical numbers would
> generally give a result of y[i] for that second term. Unless, of
> course, any x[j] = x[i] in which case you will throw an error and
> every thing will grind to a halt.
>
> It might help if you now explained what you think "CV" should be.
>
>> }
>> } }
>> ypred # not sure what that will do inside the function. If
>> it's there for debugging you may want to print(ypred)
>>
>> #CVh is a
>
> # Yes? CVh is a ....what ?
>
>> cvh<-0
>> cvh<-cvh+((1/n)*(y-ypred)^2 # n again. R will still not know
>> what that is.
>> cvh
>
> # ypred is a scalar, while y is a vector, so cvh will be a vector. Is
> that what you want?
>
>> }
>
>> test2<-cvhfunc(ydat,xdat,.2);test2
>>
>> #I was experimenting with the following data:
>> library(datasets)
>> data(faithful)
>> ydat<-faithful$eruptions;ydat;plot(ydat);par(new=T)
>> xdat<-faithful$waiting;xdat;plot(xdat,col="blue")
>>
>> # I want to minimize the CV function with respect to h. Thanks.
>>
>>
>>
>>
>> Uwe Ligges-3 wrote:
>>>
>>> See the posting guide:
>>> If you provide commented, minimal, self-contained, reproducible code
>>> some people may be willing to help on the list.
>>>
>>> Best,
>>> Uwe Ligges
>>>
>>>
>>> muddz wrote:
>>>> Hi All,
>>>>
>>>> I have been trying to get this LOO-Cross Validation method to work
>>>> on R
>>>> for
>>>> the past 3 weeks but have had no luck. I am hoping someone would
>>>> kindly
>>>> help
>>>> me.
>>>>
>>>> Essentially I want to code the LOO-Cross Validation for the 'Local
>>>> Constant'
>>>> and 'Local Linear' constant estimators. I want to find optimal h,
>>>> bandwidth.
>>>>
>>>> Thank you very much!
>>>> -M
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24120380.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24127218.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list