[R] Joiny probability with R

(Ted Harding) ted.harding at wlandres.net
Sat Feb 19 17:30:13 CET 2011


On 19-Feb-11 14:48:53, David Winsemius wrote:
> On Feb 19, 2011, at 5:47 AM, danielepippo wrote:
>> Hi,
>>    I have two vector with the marginal distribution like this:
>>> a
>> [1] -0.419 -0.364 -0.159 -0.046 -0.010 -0.002  0.000  0.000  0.000
>>> b
>> [1] 0.125 0.260 0.270 0.187 0.097 0.041 0.014 0.004 0.001
>>
>> How can I calculate the joint distribution with R?
> 
> Marginal distributions do not uniquely determine the joint  
> distribution, Furthermore, the first one doesn't even look like a  
> distribution or even a density. Densities are positive and CDFs range  
> from 0 to 1. (The second on could be a discrete density for some set  
> of values.) So you need to explain where those numbers come from and  
> why you think we should apply sort of "distributional" assumptions  
> about them.
> --
> David Winsemius, MD
> West Hartford, CT

Following up on David's reply (I was in the midst of thinking about
this when he responded):

I strongly suspect that the values of 'a' have had their signs
changed. If we simply make them all positive:

  a <- c(0.419,0.364,0.159,0.046,0.010,0.002,0.000,0.000,0.000)

then now sum(a) = 1 and these can be probabilities. Also, with
the values given, sum(b) = 0.999; so (in the relatively least
disturbing way) I shall increase the largest value by 0.001:

  b <- c(0.125,0.260,0.271,0.187,0.097,0.041,0.014,0.004,0.001)

(0.270 -> 0.271). Now also sum(b) = 1.

Now, as a somewhat trivial example to illustrate the non-uniqueness
that David points out, I shall invent a very arbitrary set of
probabilities P = {P[i,j]} such that they have the given marginal
distributions.

Since we have the marginals given to 3 decimal places, now consider
allocating 1000 pairs of random numbers (x,y). Then we pick out the
smallest 419 of the x's to go into the bottom class of 'a'
(proportion = 419/1000 = 0.419), then the 364 next smallest for the
second class of 'a', and so on. And similarly for the numbers 'y'
to go into the classes of 'b'.

Then the inequalities which define these marginal divisions can be
applied jointly to produce the joint divisions. In the first instance
the results will be computed as counts

Therefore:

  X0 <- runif(1000) ; X<- sort(X0)
  Y0 <- runif(1000) ; Y<- sort(Y0)

  A <- cumsum(c(419,364,159, 46, 10,  2,  0,  0,  0))
  B <- cumsum(c(125,260,271,187, 97, 41, 14,  4,  1))

  Xdiv <- numeric(8)
  Ydiv <- numeric(8)

  for(i in (1:8)){
    Xdiv[i] <- 0.5*(max(X[1:A[i]])+min(X[(A[i]+1):A[i+1]]))
    Ydiv[i] <- 0.5*(max(Y[1:B[i]])+min(Y[(B[i]+1):B[i+1]]))
    Xdiv[is.na(Xdiv)] <- 1.0
    Ydiv[is.na(Ydiv)] <- 1.0
  }
  Xdiv <-c(0,Xdiv,1)
  Ydiv <- c(0,Ydiv,1)

  N <- matrix(0,nrow=9,ncol=9)

  for(i in (1:9)){
    for(j in (1:9)){
      N[i,j] <- sum((Xdiv[i]<X0)&(X0<=Xdiv[i+1])&
                    (Ydiv[j]<Y0)&(Y0<=Ydiv[j+1]))
    }
  }

N
rowSums(N)
colSums(N)

Here is one typical result (depending on the random numbers) for N:

  N
  #       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
  #  [1,]   54  113  112   76   36   20    7    1    0
  #  [2,]   44   90  101   75   35   13    3    2    1
  #  [3,]   20   40   42   28   21    6    2    0    0
  #  [4,]    5   13   13    6    5    2    1    1    0
  #  [5,]    2    3    3    1    0    0    1    0    0
  #  [6,]    0    1    0    1    0    0    0    0    0
  #  [7,]    0    0    0    0    0    0    0    0    0
  #  [8,]    0    0    0    0    0    0    0    0    0
  #  [9,]    0    0    0    0    0    0    0    0    0
  rowSums(N)
  # [1] 419 364 159  46  10   2   0   0   0 ## Agreeing with (A)
  colSums(N)
  # [1] 125 260 271 187  97  41  14   4   1 ## Agreeing with (B)

And here is another:

  #       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
  #  [1,]   52  109  113   89   35   15    3    3    0
  #  [2,]   45   95  108   50   40   16    9    0    1
  #  [3,]   22   44   33   33   16    9    1    1    0
  #  [4,]    4    9   14   13    4    1    1    0    0
  #  [5,]    2    3    3    1    1    0    0    0    0
  #  [6,]    0    0    0    1    1    0    0    0    0
  #  [7,]    0    0    0    0    0    0    0    0    0
  #  [8,]    0    0    0    0    0    0    0    0    0
  #  [9,]    0    0    0    0    0    0    0    0    0
  rowSums(N)
  # [1] 419 364 159  46  10   2   0   0   0
  colSums(N)
  # [1] 125 260 271 187  97  41  14   4   1

Then N/1000 gives the joint probability distribution, and
rowSums(N) and colSums(N) give the marginal distributions.



--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Feb-11                                       Time: 16:30:08
------------------------------ XFMail ------------------------------



More information about the R-help mailing list