[R] Bivariate normal integral

Thu Apr 19 20:00:48 CEST 2012

On Apr 19, 2012, at 16:08 , juliane0212 wrote:

> hello,
> 
> I'm trying to improve the speed of my calculation but didn't get to a
> satisfying result.
> 
> It's about the numerical Integration of a bivariate normal distribution.
> 
> The code I'm currently using
> 
>          x <- 
> qnorm(seq(.Machine$double.xmin,c(1-2*.Machine$double.eps),by=0.01),
> mean=0,sd=1)
>          rho <- 0.5
> 
> integral <-   function(rho,x1){
>                           m1 <- length(x1)-1
>                           integral <- matrix(0,ncol=m1,nrow=m1)
>                           for (i in c(1:m1)){
>                                for (j in c(1:m1)){
>                                     integral[i,j] <-
> pmvnorm(lower=c(x1[i],x1[j]), upper=c( x1[c(i+1)], x1[c(j+1)]),
> corr=matrix(c(1,rho,rho,1),ncol=2))
>                                } }
>                            return(integral)}
> 
>     integral(rho,x)
> 
> I need all these values separated as in my calculation, but currently it's
> much too slow...
> 
> Has anyone an idea how to improve it?

Well, you could start by losing those superfluous calls to c()... Not going to help much except readability, though.

There's no easy solution, I think (unless someone happens to have done this before and implemented it in some package somewhere). pmvnorm() doesn't vectorize, so you're stuck with the loops one way or another. So you need to prepare for some serious work

- one idea is to pick apart the algorithm in pmvnorm down to the relevant .Fortran call and then implement a loop in C or Fortran that calls that code. 

- another idea is to rethink the situation. What kind of accuracy are you looking for? You are effectively differencing the 2-dimensional distribution function at a grid of points that are equidistant when transformed by pnorm(). Suppose you work out the density of (pnorm(X),pnorm(Y)), which is a distribution on the unit square. Then you could form an equidistant 100x100 grid and the values you need are the integrals over each grid cell.
However, the density at the midpoint of each grid cell should be a pretty good approximation to the cell integral if you just multiply by the cell area. If that isn't good enough, there are finite-difference formulas that could improve the approximation (usually by several orders of magnitude in dx and dy) using the nearby values on the grid. 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com