[R] strange behavior of lchoose in combinatorics problem
Gonçalo Ferraz
gferraz29 at gmail.com
Mon Jun 27 06:35:35 CEST 2016
Stefan,
I am sorry that I wasn’t more careful writing my question. I misrepresented my problem when I wrote “the sum of the probabilities in lpvec should be <=1, but it is not. The sum is something on the order of 1.48e-13.”
It is absolutely right that lpvec doesn’t contain probabilities, it contains log-probabilities that are negative numbers. Also, clearly, the value of 1.48e-13 is <=1!
What I meant is that I wanted to add probabilities in the log-probability space and that I expected a negative value as a result. Instead, I am getting a positive value.
You are also right that if I wanted to test for a positive association I should add the probabilities of k>=J to obtain a p-value. This is clear to me, but I want a metric of the strength of association between animals that takes higher values when the association is stronger. A final result in the log-probability space is useful because I want to use the strength of association to build networks of individuals.
Finally, your idea of using Fisher’s exact test is very appealing for the simplicity and availability of a working function in R, but I am having trouble interpreting the result, or the result is different from that of my formula. Here’s a simple example:
Say N=2, N1=1, N2=1, and J=1. In this case, if the individuals are completely independent of each other, I expect to see them together with a probability of 0.5. That is the output from my function, in probability space. If I run Fisher’s exact test on the corresponding contingency table, however, I get a p-value of 1. This is what I did:
fisher.test(rbind(c(J,N1-J),c(N2-J,N-N1-N2+J)))
Why should this be, when the Fisher’s exact test is giving me the probability of obtaining the observed arrangement under the null hypothesis of no association between the animals. Am I interpreting the output incorrectly. Thank you for answer, it is great to find an even simpler way of addressing the problem.
Best,
Gonçalo
> On Jun 26, 2016, at 9:35 AM, Stefan Evert <stefanML at collocations.de> wrote:
>
> Why do you want to do this? Why not simply use Fisher's exact test?
>
> N <- 2178
> N1 <- 165
> N2 <- 331
> J <- 97
> ct <- rbind(c(J, N1-J), c(N2-J, N-N1-N2+J))
> fisher.test(ct)
>
> Background explanation:
>
> - Your formula computes the log hypergeometric probability for a contingency table as ct above, but with k instead of J.
>
> - It does so in an unnecessarily complicated way: three terms would be enough (cf. the equation at http://www.collocations.de/AM/section3.html; with C1=N1, C2=N-C1, R1=N2).
>
> - If you want to test for a positive association between the two animals, you should be adding up the probabilities for k >= J to obtain a p-value, rather than k <= J (what would this sum of probabilities tell you?).
>
> - lpvec doesn't contain probabilities, but log probabilities. What sense would there be in adding those up? In any case, you should obtain a negative value because all the individual logs are negative.
>
> Best,
> Stefan
>
>
>
>> On 25 Jun 2016, at 16:13, Gonçalo Ferraz <gferraz29 at gmail.com> wrote:
>>
>> I am working on interactions between animals, studying whether animal 1 is attracted to animal 2 (or vice-versa). I looked for the two animals in N=2178 sampling occasions, finding animal 1 a total of N1=165 times, and animal 2 a total of N2=331 times. In J=97 occasions, I saw both animals at the same time.
>>
>> The more frequently I see the two animals in the same sampling occasion, the more I will believe that they are attracted to each other. Therefore, I want to calculate the probability of finding J<=97 when the two animals are moving around independently of each other. The higher this probability, the stronger the attraction.
>>
>> Following Veech (Journal of Biogeography 2014, 41: 1029-1035) I compute the log probability of obtaining a number n of encounters between animals as ‘lpn’ in the function lveech:
>>
>
More information about the R-help
mailing list