[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution

Sat Mar 3 18:34:32 CET 2012

Thanx a lot Greg for the hint and letting me not alone with this!

Tried ellipse and it works well. But I am looking for something more
precise. The ellipse fits the real border to the nearest possible
ellipse. I want the "real" contour, if possible.

Meanwhile I found an interesting function named draw.contour
http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
provided by "Forester", an "Assistant Professor in the Department of
Fisheries". - I love it how statistics brings people from completely
different specialities together. - I am a neurologist.

This function results not exactly in the same curve, I got with the code
I posted last, but is in deed very close by (parallel). So I think both
solutions are probably true and differ only because of rounding and
method (..!?).

Still I am not completely happy with

1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
symbolic, formula driven solution instead.
2. me not comprehending the code completely.

While 2. will be hopefully solved soon by me delving into the code, 1
remains...

Best, Felix

Am 03.03.12 17:28, schrieb Greg Snow:
> Look at the ellipse package (and the ellipse function in the package)
> for a simple way of showing a confidence region for bivariate data on
> a plot (a 68% confidence interval is about 1 SD if you just want to
> show 1 SD).
> 
> On Sat, Mar 3, 2012 at 7:54 AM, drflxms <drflxms at googlemail.com> wrote:
>> Dear all,
>>
>> I created a bivariate normal distribution:
>>
>> set.seed(138813)
>> n<-100
>> x<-rnorm(n); y<-rnorm(n)
>>
>> and plotted a scatterplot of it:
>>
>> plot(x,y)
>>
>> Now I'd like to add the 2D-standard deviation.
>>
>> I found a thread regarding plotting arbitrary confidence boundaries from
>> Pascal Hänggi
>> http://www.mail-archive.com/r-help@r-project.org/msg24013.html
>> which cites the even older thread
>> http://tolstoy.newcastle.edu.au/R/help/03b/5384.html
>>
>> As I am unfortunately only a very poor R programmer, the code of Pascal
>> Hänggi is a myth to me and I am not sure whether I was able to translate
>> the recommendation of Brain Ripley in the later thread (which provides
>> no code) into the the correct R code. Brain wrote:
>>
>> You need a 2D density estimate (e.g. kde2d in MASS) then compute the
>> density values at the points and draw the contour of the density which
>> includes 95% of the points (at a level computed from the sorted values
>> via quantile()). [95% confidence interval was desired in thread instead
>> of standard deviation...]
>>
>> So I tried this...
>>
>> den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating
>> the distributions x and y (see above), a z-value is assigned to every
>> combination of x and y.
>>
>> # create a sorted vector of z-values (instead of the matrix stored
>> inside the den object
>> den.z <-sort(den$z)
>>
>> # set desired confidence border to draw and store it in variable
>> confidence.border <- quantile(den.z, probs=0.6827, na.rm = TRUE)
>>
>> # draw a line representing confidence.border on the existing scatterplot
>> par(new=TRUE)
>> contour(den, levels=confidence.border, col = "red", add = TRUE)
>>
>> Unfortunately I doubt very much this is correct :( In fact I am sure
>> this is wrong, because the border for probs=0.05 is drawn outside the
>> values.... So please help and check.
>> Pascal Hänggis code seems to work, but I don't understand the magic he
>> does with
>>
>> pp <- array()
>> for (i in 1:1000){
>>        z.x <- max(which(den$x < x[i]))
>>        z.y <- max(which(den$y < y[i]))
>>        pp[i] <- den$z[z.x, z.y]
>> }
>>
>> before doing the very same as I did above:
>>
>> confidencebound <- quantile(pp, 0.05, na.rm = TRUE)
>>
>> plot(x, y)
>> contour(den, levels = confidencebound, col = "red", add = TRUE)
>>
>>
>> My problems:
>>
>> 1.) setting probs=0.6827 is somehow a dirty trick which I can only use
>> by simply knowing that this is the percentage of values inside +-1sd
>> when a distribution is normal. Is there a way doing this with "native"
>> sd function?
>> sd(den.z) is not correct, as den.z is in contrast to x and y not normal
>> any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in
>> this example instead of the desired 0.6827.
>>
>> 2.) I would like to have code that works with any desired confidence.
>> Unfortunately setting probs to the desired confidence would probably be
>> wrong (?!) as it relates to den.z instead of x and y, which are the
>> underlying distributions I am interested in. To put it short I want the
>> confidence of x/y and not of den.z.
>>
>>
>> I am really completely stuck. Please help me out of this! Felix
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
>