[R] Joiny probability with R
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Feb 19 17:54:02 CET 2011
On Sat, Feb 19, 2011 at 11:30 AM, Ted Harding <ted.harding at wlandres.net> wrote:
> On 19-Feb-11 14:48:53, David Winsemius wrote:
>> On Feb 19, 2011, at 5:47 AM, danielepippo wrote:
>>> Hi,
>>> I have two vector with the marginal distribution like this:
>>>> a
>>> [1] -0.419 -0.364 -0.159 -0.046 -0.010 -0.002 0.000 0.000 0.000
>>>> b
>>> [1] 0.125 0.260 0.270 0.187 0.097 0.041 0.014 0.004 0.001
>>>
>>> How can I calculate the joint distribution with R?
>>
>> Marginal distributions do not uniquely determine the joint
>> distribution, Furthermore, the first one doesn't even look like a
>> distribution or even a density. Densities are positive and CDFs range
>> from 0 to 1. (The second on could be a discrete density for some set
>> of values.) So you need to explain where those numbers come from and
>> why you think we should apply sort of "distributional" assumptions
>> about them.
>> --
>> David Winsemius, MD
>> West Hartford, CT
>
> Following up on David's reply (I was in the midst of thinking about
> this when he responded):
>
> I strongly suspect that the values of 'a' have had their signs
> changed. If we simply make them all positive:
>
> a <- c(0.419,0.364,0.159,0.046,0.010,0.002,0.000,0.000,0.000)
>
> then now sum(a) = 1 and these can be probabilities. Also, with
> the values given, sum(b) = 0.999; so (in the relatively least
> disturbing way) I shall increase the largest value by 0.001:
>
> b <- c(0.125,0.260,0.271,0.187,0.097,0.041,0.014,0.004,0.001)
>
> (0.270 -> 0.271). Now also sum(b) = 1.
>
> Now, as a somewhat trivial example to illustrate the non-uniqueness
> that David points out, I shall invent a very arbitrary set of
> probabilities P = {P[i,j]} such that they have the given marginal
> distributions.
>
> Since we have the marginals given to 3 decimal places, now consider
> allocating 1000 pairs of random numbers (x,y). Then we pick out the
> smallest 419 of the x's to go into the bottom class of 'a'
> (proportion = 419/1000 = 0.419), then the 364 next smallest for the
> second class of 'a', and so on. And similarly for the numbers 'y'
> to go into the classes of 'b'.
>
> Then the inequalities which define these marginal divisions can be
> applied jointly to produce the joint divisions. In the first instance
> the results will be computed as counts
>
> Therefore:
>
> X0 <- runif(1000) ; X<- sort(X0)
> Y0 <- runif(1000) ; Y<- sort(Y0)
>
> A <- cumsum(c(419,364,159, 46, 10, 2, 0, 0, 0))
> B <- cumsum(c(125,260,271,187, 97, 41, 14, 4, 1))
>
> Xdiv <- numeric(8)
> Ydiv <- numeric(8)
>
> for(i in (1:8)){
> Xdiv[i] <- 0.5*(max(X[1:A[i]])+min(X[(A[i]+1):A[i+1]]))
> Ydiv[i] <- 0.5*(max(Y[1:B[i]])+min(Y[(B[i]+1):B[i+1]]))
> Xdiv[is.na(Xdiv)] <- 1.0
> Ydiv[is.na(Ydiv)] <- 1.0
> }
> Xdiv <-c(0,Xdiv,1)
> Ydiv <- c(0,Ydiv,1)
>
> N <- matrix(0,nrow=9,ncol=9)
>
> for(i in (1:9)){
> for(j in (1:9)){
> N[i,j] <- sum((Xdiv[i]<X0)&(X0<=Xdiv[i+1])&
> (Ydiv[j]<Y0)&(Y0<=Ydiv[j+1]))
> }
> }
>
> N
> rowSums(N)
> colSums(N)
>
> Here is one typical result (depending on the random numbers) for N:
>
> N
> # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
> # [1,] 54 113 112 76 36 20 7 1 0
> # [2,] 44 90 101 75 35 13 3 2 1
> # [3,] 20 40 42 28 21 6 2 0 0
> # [4,] 5 13 13 6 5 2 1 1 0
> # [5,] 2 3 3 1 0 0 1 0 0
> # [6,] 0 1 0 1 0 0 0 0 0
> # [7,] 0 0 0 0 0 0 0 0 0
> # [8,] 0 0 0 0 0 0 0 0 0
> # [9,] 0 0 0 0 0 0 0 0 0
> rowSums(N)
> # [1] 419 364 159 46 10 2 0 0 0 ## Agreeing with (A)
> colSums(N)
> # [1] 125 260 271 187 97 41 14 4 1 ## Agreeing with (B)
>
> And here is another:
>
> # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
> # [1,] 52 109 113 89 35 15 3 3 0
> # [2,] 45 95 108 50 40 16 9 0 1
> # [3,] 22 44 33 33 16 9 1 1 0
> # [4,] 4 9 14 13 4 1 1 0 0
> # [5,] 2 3 3 1 1 0 0 0 0
> # [6,] 0 0 0 1 1 0 0 0 0
> # [7,] 0 0 0 0 0 0 0 0 0
> # [8,] 0 0 0 0 0 0 0 0 0
> # [9,] 0 0 0 0 0 0 0 0 0
> rowSums(N)
> # [1] 419 364 159 46 10 2 0 0 0
> colSums(N)
> # [1] 125 260 271 187 97 41 14 4 1
>
> Then N/1000 gives the joint probability distribution, and
> rowSums(N) and colSums(N) give the marginal distributions.
Here is another way to demonstrate two different joint probability
distributions with the same marginals using R:
set.seed(123)
a <- c(-0.419, -0.364, -0.159, -0.046, -0.01, -0.002, 0, 0, 0)
b <- c(0.125, 0.26, 0.27, 0.187, 0.097, 0.041, 0.014, 0.004, 0.001)
# tabs will be a list of two tables with same marginals
tabs <- lapply(r2dtable(2, 1000 * a / sum(a), 1000 * b / sum(b)), "/",
1000); tabs
# rowtotals are the same for both tables
lapply(tabs, rowSums)
# column totals are the same for both tables
lapply(tabs, colSums)
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list