[R] script problem to obtain pairs of overlap values

Rogério Rosa da Silva rogeriorosas at gmail.com
Tue Jul 11 21:11:53 CEST 2006


I wrote a code to estimate the overlap between two kernel distributions.
The script must estimates the overlap among each columns of data frame.
With S sampled species (columns) in my data frame, I want obtain
S(S-1)/2 pairs of overlap values between species.
However, the code is not well write at all (only an overlap value is
produced) and I can't find the solution.

To illustrate the calculations, I use the data frame "tdon" and the
value of the bandwidth "h", which was estimated in other part of script.

tdon <- data.frame (sp.1=c (5 ,9 ,NA ,5, 11) , sp.2=c (4, 2, 4, NA, 11,
),sp.3=c(5, 4, 2, 6, 13), sp.4=c(3 , 11, NA, 5, 3), sp.5=c(2 ,5 ,2, 9, 9))

> h

[1] 1.047 2.973 0.887 1.520 2.955

Here is the code:

for (i in 1:(nbcol-1)) # nbcol<-ncol(tdon)
    tdon11<- subset(tdon1,tdon1!="NA")
    {density (tdon11, bw=h[i], kernel="gaussian")$y}

for (j in (i+1):nbcol)
    tdon21<- subset(tdon2,tdon2!="NA")
    {density (tdon21, bw=h[j], kernel="gaussian")$y}


        intctk<- approxfun (diffctk(x), rule=2)
        int<- integrate(diffctk,-Inf,Inf)$value
             overlap<- 1 - 0.5* int

The use of "approxfun" to integrate the difference in the estimated
density values (my "diffctk" function) was suggested by Thomas Lumley,
but I'm not sure that I have found the solution or if this solution is correct for my problem.

I need that the "overlap" produce a vector with the length equal to 10, with
all pairs of overlap values.

Any help or advice on improvement for this code will be appreciated.

With kind regards,

More information about the R-help mailing list