[R] script problem to obtain pairs of overlap values

Tue Jul 11 21:11:53 CEST 2006

Dear,

I wrote a code to estimate the overlap between two kernel distributions.
The script must estimates the overlap among each columns of data frame.
With S sampled species (columns) in my data frame, I want obtain
S(S-1)/2 pairs of overlap values between species.
However, the code is not well write at all (only an overlap value is
produced) and I can't find the solution.

To illustrate the calculations, I use the data frame "tdon" and the
value of the bandwidth "h", which was estimated in other part of script.

tdon <- data.frame (sp.1=c (5 ,9 ,NA ,5, 11) , sp.2=c (4, 2, 4, NA, 11,
),sp.3=c(5, 4, 2, 6, 13), sp.4=c(3 , 11, NA, 5, 3), sp.5=c(2 ,5 ,2, 9, 9))

> h

[1] 1.047 2.973 0.887 1.520 2.955

Here is the code:

for (i in 1:(nbcol-1)) # nbcol<-ncol(tdon)
    {tdon1<-tdon[,i]
    tdon11<- subset(tdon1,tdon1!="NA")
    fctk1<-function(x)
    {density (tdon11, bw=h[i], kernel="gaussian")$y}

for (j in (i+1):nbcol)
    {tdon2<-tdon[,j]
    tdon21<- subset(tdon2,tdon2!="NA")
    fctk2<-function(x)
    {density (tdon21, bw=h[j], kernel="gaussian")$y}

        diffctk<-function(x)
        {abs(fctk1(x)-fctk2(x))}

        intctk<- approxfun (diffctk(x), rule=2)
        int<- integrate(diffctk,-Inf,Inf)$value
             overlap<- 1 - 0.5* int
                }
                }

The use of "approxfun" to integrate the difference in the estimated
density values (my "diffctk" function) was suggested by Thomas Lumley,
but I'm not sure that I have found the solution or if this solution is correct for my problem.

I need that the "overlap" produce a vector with the length equal to 10, with
all pairs of overlap values.

Any help or advice on improvement for this code will be appreciated.

With kind regards,

    Rogério