[R] Joiny probability with R

Sat Feb 19 17:54:02 CET 2011

On Sat, Feb 19, 2011 at 11:30 AM, Ted Harding <ted.harding at wlandres.net> wrote:
> On 19-Feb-11 14:48:53, David Winsemius wrote:
>> On Feb 19, 2011, at 5:47 AM, danielepippo wrote:
>>> Hi,
>>>    I have two vector with the marginal distribution like this:
>>>> a
>>> [1] -0.419 -0.364 -0.159 -0.046 -0.010 -0.002  0.000  0.000  0.000
>>>> b
>>> [1] 0.125 0.260 0.270 0.187 0.097 0.041 0.014 0.004 0.001
>>>
>>> How can I calculate the joint distribution with R?
>>
>> Marginal distributions do not uniquely determine the joint
>> distribution, Furthermore, the first one doesn't even look like a
>> distribution or even a density. Densities are positive and CDFs range
>> from 0 to 1. (The second on could be a discrete density for some set
>> of values.) So you need to explain where those numbers come from and
>> why you think we should apply sort of "distributional" assumptions
>> about them.
>> --
>> David Winsemius, MD
>> West Hartford, CT
>
> Following up on David's reply (I was in the midst of thinking about
> this when he responded):
>
> I strongly suspect that the values of 'a' have had their signs
> changed. If we simply make them all positive:
>
>  a <- c(0.419,0.364,0.159,0.046,0.010,0.002,0.000,0.000,0.000)
>
> then now sum(a) = 1 and these can be probabilities. Also, with
> the values given, sum(b) = 0.999; so (in the relatively least
> disturbing way) I shall increase the largest value by 0.001:
>
>  b <- c(0.125,0.260,0.271,0.187,0.097,0.041,0.014,0.004,0.001)
>
> (0.270 -> 0.271). Now also sum(b) = 1.
>
> Now, as a somewhat trivial example to illustrate the non-uniqueness
> that David points out, I shall invent a very arbitrary set of
> probabilities P = {P[i,j]} such that they have the given marginal
> distributions.
>
> Since we have the marginals given to 3 decimal places, now consider
> allocating 1000 pairs of random numbers (x,y). Then we pick out the
> smallest 419 of the x's to go into the bottom class of 'a'
> (proportion = 419/1000 = 0.419), then the 364 next smallest for the
> second class of 'a', and so on. And similarly for the numbers 'y'
> to go into the classes of 'b'.
>
> Then the inequalities which define these marginal divisions can be
> applied jointly to produce the joint divisions. In the first instance
> the results will be computed as counts
>
> Therefore:
>
>  X0 <- runif(1000) ; X<- sort(X0)
>  Y0 <- runif(1000) ; Y<- sort(Y0)
>
>  A <- cumsum(c(419,364,159, 46, 10,  2,  0,  0,  0))
>  B <- cumsum(c(125,260,271,187, 97, 41, 14,  4,  1))
>
>  Xdiv <- numeric(8)
>  Ydiv <- numeric(8)
>
>  for(i in (1:8)){
>    Xdiv[i] <- 0.5*(max(X[1:A[i]])+min(X[(A[i]+1):A[i+1]]))
>    Ydiv[i] <- 0.5*(max(Y[1:B[i]])+min(Y[(B[i]+1):B[i+1]]))
>    Xdiv[is.na(Xdiv)] <- 1.0
>    Ydiv[is.na(Ydiv)] <- 1.0
>  }
>  Xdiv <-c(0,Xdiv,1)
>  Ydiv <- c(0,Ydiv,1)
>
>  N <- matrix(0,nrow=9,ncol=9)
>
>  for(i in (1:9)){
>    for(j in (1:9)){
>      N[i,j] <- sum((Xdiv[i]<X0)&(X0<=Xdiv[i+1])&
>                    (Ydiv[j]<Y0)&(Y0<=Ydiv[j+1]))
>    }
>  }
>
> N
> rowSums(N)
> colSums(N)
>
> Here is one typical result (depending on the random numbers) for N:
>
>  N
>  #       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
>  #  [1,]   54  113  112   76   36   20    7    1    0
>  #  [2,]   44   90  101   75   35   13    3    2    1
>  #  [3,]   20   40   42   28   21    6    2    0    0
>  #  [4,]    5   13   13    6    5    2    1    1    0
>  #  [5,]    2    3    3    1    0    0    1    0    0
>  #  [6,]    0    1    0    1    0    0    0    0    0
>  #  [7,]    0    0    0    0    0    0    0    0    0
>  #  [8,]    0    0    0    0    0    0    0    0    0
>  #  [9,]    0    0    0    0    0    0    0    0    0
>  rowSums(N)
>  # [1] 419 364 159  46  10   2   0   0   0 ## Agreeing with (A)
>  colSums(N)
>  # [1] 125 260 271 187  97  41  14   4   1 ## Agreeing with (B)
>
> And here is another:
>
>  #       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
>  #  [1,]   52  109  113   89   35   15    3    3    0
>  #  [2,]   45   95  108   50   40   16    9    0    1
>  #  [3,]   22   44   33   33   16    9    1    1    0
>  #  [4,]    4    9   14   13    4    1    1    0    0
>  #  [5,]    2    3    3    1    1    0    0    0    0
>  #  [6,]    0    0    0    1    1    0    0    0    0
>  #  [7,]    0    0    0    0    0    0    0    0    0
>  #  [8,]    0    0    0    0    0    0    0    0    0
>  #  [9,]    0    0    0    0    0    0    0    0    0
>  rowSums(N)
>  # [1] 419 364 159  46  10   2   0   0   0
>  colSums(N)
>  # [1] 125 260 271 187  97  41  14   4   1
>
> Then N/1000 gives the joint probability distribution, and
> rowSums(N) and colSums(N) give the marginal distributions.

Here is another way to demonstrate two different joint probability
distributions with the same marginals using R:

set.seed(123)

a <- c(-0.419, -0.364, -0.159, -0.046, -0.01, -0.002, 0, 0, 0)
b <- c(0.125, 0.26, 0.27, 0.187, 0.097, 0.041, 0.014, 0.004, 0.001)

# tabs will be a list of two tables with same marginals
tabs <- lapply(r2dtable(2, 1000 * a / sum(a), 1000 * b / sum(b)), "/",
1000); tabs

# rowtotals are the same for both tables
lapply(tabs, rowSums)

# column totals are the same for both tables
lapply(tabs, colSums)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com