[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution
Greg Snow
538280 at gmail.com
Sat Mar 3 22:14:04 CET 2012
To further explain. If you want contours of a bivariate normal, then
you want ellipses. The density for a bivariate normal (with 0
correlation to keep things simple, but the theory will extend to
correlated cases) is proportional to exp( -1/2 ( x1^2/v1 + x2^2/v2 )
so a contour of the distribution will be all points such that x1^2/v1
+ x2^2/v2 = c for some constant c (each c will give a different
contour), but that is the definition of an ellipse (well divide both
sides by c so that the right side is 1 to get the canonical form).
The ellipse function in the ellipse package chooses c from the chi
squared distribution (since if x1 and x2 are normally distributed with
mean 0 (or have the mean subtracted), then x1^2/v1 +x2^2/v2 is chi
squared distributed with 2 degrees of freedom.
So if you really want to you can try to approximate the contours in
some other way, but any decent approach will just converge to the
ellipse.
On Sat, Mar 3, 2012 at 1:26 PM, drflxms <drflxms at googlemail.com> wrote:
> Wow, David,
>
> thank you for these sources, which I just screened. bagplot looks most
> promising to me. I found it in the package ‘aplpack’ as well as in the R
> Graph Gallery
> http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112
>
> Ellipses are not exactly what I am heading for. I am looking for a 2D
> equivalent of plotting 1D standard deviation boundaries. In other words
> how to plot SD boundaries on a 2D scatter plot.
> So I started searching for contour/boundaries etc. instead of ellipse
> leading me to Pascal Hänggi:
> http://www.mail-archive.com/r-help@r-project.org/msg24013.html
>
> To describe it in an image: I want to cut the density mountain above the
> scatter plot (see demo(bivar) in the rgl package) in a way so that the
> part of the mountain, that covers 68% of the data on the x-y-plane below
> it (+-1 SD) is removed. Then I'd like to project the edge that results
> from the cut to the x-y-plane below the mountain. This should be the 2d
> equivalent of 1s SD boundaries.
>
> I think this might be achieved as well by Hänggis code as by the
> function of Forester. Unfortunately they result in slightly different
> boundaries which shouldn't be the case. And I did not figure out which
> one is correct if one is correct at all (!?).
>
> Can anyone explain the difference?
>
> I compared them with this code:
>
> # parameters:
> n<-100
>
> # generate samples:
> set.seed(138813)
> x<-rnorm(n); y<-rnorm(n)
> a<-list("x"=x,"y"=y) # input for Foresters function which is appended at
> the very end
>
> # estimate non-parameteric density surface via kernel smoothing
> library(MASS)
> den<-kde2d(x, y, n=n)
>
> z <- array()
> for (i in 1:n){
> z.x <- max(which(den$x < x[i]))
> z.y <- max(which(den$y < y[i]))
> z[i] <- den$z[z.x, z.y]
> }
>
> # store class/level borders of confidence interval in variables
> confidence.border <- quantile(z, probs=0.05, na.rm = TRUE)# 0.05
> corresponds to 0.95 in draw.contour
>
> plot(x,y)
> draw.contour(a, alpha=0.95)
> par(new=TRUE)
> contour(den, levels=confidence.border, col = "red", add = TRUE)
>
>
> ###########################################
> ## drawcontour.R
> ## Written by J.D. Forester, 17 March 2008
> ###########################################
>
> ## This function draws an approximate density contour based on
> empirical, bivariate data.
>
> ##change testit to FALSE if sourcing the file
> testit=TRUE
>
> draw.contour<-function(a,alpha=0.95,plot.dens=FALSE, line.width=2,
> line.type=1, limits=NULL, density.res=800,spline.smooth=-1,...){
> ## a is a list or matrix of x and y coordinates (e.g.,
> a=list("x"=rnorm(100),"y"=rnorm(100)))
> ## if a is a list or dataframe, the components must be labeled "x" and "y"
> ## if a is a matrix, the first column is assumed to be x, the second y
> ## alpha is the contour level desired
> ## if plot.dens==TRUE, then the joint density of x and y are plotted,
> ## otherwise the contour is added to the current plot.
> ## density.res controls the resolution of the density plot
>
> ## A key assumption of this function is that very little probability
> mass lies outside the limits of
> ## the x and y values in "a". This is likely reasonable if the number
> of observations in a is large.
>
> require(MASS)
> require(ks)
> if(length(line.width)!=length(alpha)){
> line.width <- rep(line.width[1],length(alpha))
> }
>
> if(length(line.type)!=length(alpha)){
> line.type <- rep(line.type[1],length(alpha))
> }
>
> if(is.matrix(a)){
> a=list("x"=a[,1],"y"=a[,2])
> }
> ##generate approximate density values
> if(is.null(limits)){
> limits=c(range(a$x),range(a$y))
> }
> f1<-kde2d(a$x,a$y,n=density.res,lims=limits)
>
> ##plot empirical density
> if(plot.dens) image(f1,...)
>
> if(is.null(dev.list())){
> ##ensure that there is a window in which to draw the contour
> plot(a,type="n",xlab="X",ylab="Y")
> }
>
> ##estimate critical contour value
> ## assume that density outside of plot is very small
>
> zdens <- rev(sort(f1$z))
> Czdens <- cumsum(zdens)
> Czdens <- (Czdens/Czdens[length(zdens)])
> for(cont.level in 1:length(alpha)){
> ##This loop allows for multiple contour levels
> crit.val <- zdens[max(which(Czdens<=alpha[cont.level]))]
>
> ##determine coordinates of critical contour
> b.full=contourLines(f1,levels=crit.val)
> for(c in 1:length(b.full)){
> ##This loop is used in case the density is multimodal or if the
> desired contour
> ## extends outside the plotting region
>
> b=list("x"=as.vector(unlist(b.full[[c]][2])),"y"=as.vector(unlist(b.full[[c]][3])))
>
> ##plot desired contour
> line.dat<-xspline(b,shape=spline.smooth,open=TRUE,draw=FALSE)
> lines(line.dat,lty=line.type[cont.level],lwd=line.width[cont.level])
> }
> }
> }
>
> ##############################
> ##Test the function
> ##############################
>
> ##generate data
> if(testit){
> n=10000
> a<-list("x"=rnorm(n,400,100),"y"=rweibull(n,2,100))
>
> draw.contour(a=a,alpha=c(0.95,0.5,0.05),line.width=c(2,1,2),line.type=c(1,2,1),plot.dens=TRUE,
> xlab="X", ylab="Y")
> }
>
> Am 03.03.12 19:56, schrieb David Winsemius:
>>
>> On Mar 3, 2012, at 12:34 PM, drflxms wrote:
>>
>>> Thanx a lot Greg for the hint and letting me not alone with this!
>>>
>>> Tried ellipse and it works well. But I am looking for something more
>>> precise. The ellipse fits the real border to the nearest possible
>>> ellipse. I want the "real" contour, if possible.
>>>
>>> Meanwhile I found an interesting function named draw.contour
>>> http://quantitative-ecology.blogspot.com/2008/03/plotting-contours.html
>>> provided by "Forester", an "Assistant Professor in the Department of
>>> Fisheries". - I love it how statistics brings people from completely
>>> different specialities together. - I am a neurologist.
>>>
>>> This function results not exactly in the same curve, I got with the code
>>> I posted last, but is in deed very close by (parallel). So I think both
>>> solutions are probably true and differ only because of rounding and
>>> method (..!?).
>>>
>>> Still I am not completely happy with
>>>
>>> 1. the numeric solution of setting to 68% to get 1SD. I'd prefer a
>>> symbolic, formula driven solution instead.
>>> 2. me not comprehending the code completely.
>>>
>>> While 2. will be hopefully solved soon by me delving into the code, 1
>>> remains...
>>>
>>
>> You could take a look at answers to similar questions on StackOverflow:
>>
>> http://stackoverflow.com/questions/6655268/ellipse-containing-percentage-of-given-points-in-r
>>
>>
>> http://stackoverflow.com/questions/7055848/plot-ellipse-bounding-a-percentage-of-points
>>
>>
>> http://stackoverflow.com/questions/7718569/how-to-plot-90-confidence-bands-with-locfit-in-r
>>
>>
>> http://stackoverflow.com/questions/7511889/ellipse-representing-horizontal-and-vertical-error-bars-with-r
>>
>>
>> Searching on "plot confidence ellipse" in the rhelp archives should be
>> similarly productive.
>>
>> There's also a non-parametric 2d boundary plotting method called a
>> bagplot and I think it's implemented in a package by the same name.
>>
>
--
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com
More information about the R-help
mailing list