[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Sat Mar 3 21:26:43 CET 2012

Wow, David,

thank you for these sources, which I just screened. bagplot looks most
promising to me. I found it in the package ‘aplpack’ as well as in the R
Graph Gallery
http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112

Ellipses are not exactly what I am heading for. I am looking for a 2D
equivalent of plotting 1D standard deviation boundaries. In other words
how to plot SD boundaries on a 2D scatter plot.
So I started searching for contour/boundaries etc. instead of ellipse
leading me to Pascal Hänggi:
http://www.mail-archive.com/r-help@r-project.org/msg24013.html

To describe it in an image: I want to cut the density mountain above the
scatter plot (see demo(bivar) in the rgl package) in a way so that the
part of the mountain, that covers 68% of the data on the x-y-plane below
it (+-1 SD) is removed. Then I'd like to project the edge that results
from the cut to the x-y-plane below the mountain. This should be the 2d
equivalent of 1s SD boundaries.

I think this might be achieved as well by Hänggis code as by the
function of Forester. Unfortunately they result in slightly different
boundaries which shouldn't be the case. And I did not figure out which
one is correct if one is correct at all (!?).

Can anyone explain the difference?

I compared them with this code:

# parameters:
n<-100

# generate samples:
set.seed(138813)
x<-rnorm(n); y<-rnorm(n)
a<-list("x"=x,"y"=y) # input for Foresters function which is appended at
the very end

# estimate non-parameteric density surface via kernel smoothing
library(MASS)
den<-kde2d(x, y, n=n)

z <- array()
for (i in 1:n){
        z.x <- max(which(den$x < x[i]))
        z.y <- max(which(den$y < y[i]))
        z[i] <- den$z[z.x, z.y]
}

# store class/level borders of confidence interval in variables
confidence.border <- quantile(z, probs=0.05, na.rm = TRUE)# 0.05
corresponds to 0.95 in draw.contour

plot(x,y)
draw.contour(a, alpha=0.95)
par(new=TRUE)
contour(den, levels=confidence.border, col = "red", add = TRUE)

###########################################
## drawcontour.R
## Written by J.D. Forester, 17 March 2008
###########################################

## This function draws an approximate density contour based on
empirical, bivariate data.

##change testit to FALSE if sourcing the file
testit=TRUE

draw.contour<-function(a,alpha=0.95,plot.dens=FALSE, line.width=2,
line.type=1, limits=NULL, density.res=800,spline.smooth=-1,...){
  ## a is a list or matrix of x and y coordinates (e.g.,
a=list("x"=rnorm(100),"y"=rnorm(100)))
  ## if a is a list or dataframe, the components must be labeled "x" and "y"
  ## if a is a matrix, the first column is assumed to be x, the second y
  ## alpha is the contour level desired
  ## if plot.dens==TRUE, then the joint density of x and y are plotted,
  ## otherwise the contour is added to the current plot.
  ## density.res controls the resolution of the density plot

  ## A key assumption of this function is that very little probability
mass lies outside the limits of
  ## the x and y values in "a". This is likely reasonable if the number
of observations in a is large.

  require(MASS)
  require(ks)
  if(length(line.width)!=length(alpha)){
    line.width <- rep(line.width[1],length(alpha))
  }

  if(length(line.type)!=length(alpha)){
    line.type <- rep(line.type[1],length(alpha))
  }

  if(is.matrix(a)){
    a=list("x"=a[,1],"y"=a[,2])
  }
  ##generate approximate density values
  if(is.null(limits)){
    limits=c(range(a$x),range(a$y))
  }
  f1<-kde2d(a$x,a$y,n=density.res,lims=limits)

  ##plot empirical density
  if(plot.dens) image(f1,...)

  if(is.null(dev.list())){
    ##ensure that there is a window in which to draw the contour
    plot(a,type="n",xlab="X",ylab="Y")
  }

  ##estimate critical contour value
  ## assume that density outside of plot is very small

  zdens <- rev(sort(f1$z))
  Czdens <- cumsum(zdens)
  Czdens <- (Czdens/Czdens[length(zdens)])
  for(cont.level in 1:length(alpha)){
    ##This loop allows for multiple contour levels
    crit.val <- zdens[max(which(Czdens<=alpha[cont.level]))]

    ##determine coordinates of critical contour
    b.full=contourLines(f1,levels=crit.val)
    for(c in 1:length(b.full)){
      ##This loop is used in case the density is multimodal or if the
desired contour
      ##  extends outside the plotting region

b=list("x"=as.vector(unlist(b.full[[c]][2])),"y"=as.vector(unlist(b.full[[c]][3])))

      ##plot desired contour
      line.dat<-xspline(b,shape=spline.smooth,open=TRUE,draw=FALSE)
      lines(line.dat,lty=line.type[cont.level],lwd=line.width[cont.level])
    }
  }
}

##############################
##Test the function
##############################

##generate data
if(testit){
  n=10000
  a<-list("x"=rnorm(n,400,100),"y"=rweibull(n,2,100))

draw.contour(a=a,alpha=c(0.95,0.5,0.05),line.width=c(2,1,2),line.type=c(1,2,1),plot.dens=TRUE,
xlab="X", ylab="Y")
}

Am 03.03.12 19:56, schrieb David Winsemius:
> 
> On Mar 3, 2012, at 12:34 PM, drflxms wrote:
> 
>> Thanx a lot Greg for the hint and letting me not alone with this!
>>
>> Tried ellipse and it works well. But I am looking for something more
>> precise. The ellipse fits the real border to the nearest possible
>> ellipse. I want the "real" contour, if possible.
>>
>> Meanwhile I found an interesting function named draw.contour
>> http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
>> provided by "Forester", an "Assistant Professor in the Department of
>> Fisheries". - I love it how statistics brings people from completely
>> different specialities together. - I am a neurologist.
>>
>> This function results not exactly in the same curve, I got with the code
>> I posted last, but is in deed very close by (parallel). So I think both
>> solutions are probably true and differ only because of rounding and
>> method (..!?).
>>
>> Still I am not completely happy with
>>
>> 1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
>> symbolic, formula driven solution instead.
>> 2. me not comprehending the code completely.
>>
>> While 2. will be hopefully solved soon by me delving into the code, 1
>> remains...
>>
> 
> You could take a look at answers to similar questions on StackOverflow:
> 
> http://stackoverflow.com/questions/6655268/ellipse-containing-percentage-of-given-points-in-r
> 
> 
> http://stackoverflow.com/questions/7055848/plot-ellipse-bounding-a-percentage-of-points
> 
> 
> http://stackoverflow.com/questions/7718569/how-to-plot-90-confidence-bands-with-locfit-in-r
> 
> 
> http://stackoverflow.com/questions/7511889/ellipse-representing-horizontal-and-vertical-error-bars-with-r
> 
> 
> Searching on "plot confidence ellipse" in the rhelp archives should be
> similarly productive.
> 
> There's also a non-parametric 2d boundary plotting method called a
> bagplot and I think it's implemented in a package by the same name.
>