[R] contour for plotting confidence interval on scatter plot of bivariate normal distribution
drflxms
drflxms at googlemail.com
Sat Mar 3 15:54:30 CET 2012
Dear all,
I created a bivariate normal distribution:
set.seed(138813)
n<-100
x<-rnorm(n); y<-rnorm(n)
and plotted a scatterplot of it:
plot(x,y)
Now I'd like to add the 2D-standard deviation.
I found a thread regarding plotting arbitrary confidence boundaries from
Pascal Hänggi
http://www.mail-archive.com/r-help@r-project.org/msg24013.html
which cites the even older thread
http://tolstoy.newcastle.edu.au/R/help/03b/5384.html
As I am unfortunately only a very poor R programmer, the code of Pascal
Hänggi is a myth to me and I am not sure whether I was able to translate
the recommendation of Brain Ripley in the later thread (which provides
no code) into the the correct R code. Brain wrote:
You need a 2D density estimate (e.g. kde2d in MASS) then compute the
density values at the points and draw the contour of the density which
includes 95% of the points (at a level computed from the sorted values
via quantile()). [95% confidence interval was desired in thread instead
of standard deviation...]
So I tried this...
den<-kde2d(x, y, n=n) #as I chose n to be the same as during creating
the distributions x and y (see above), a z-value is assigned to every
combination of x and y.
# create a sorted vector of z-values (instead of the matrix stored
inside the den object
den.z <-sort(den$z)
# set desired confidence border to draw and store it in variable
confidence.border <- quantile(den.z, probs=0.6827, na.rm = TRUE)
# draw a line representing confidence.border on the existing scatterplot
par(new=TRUE)
contour(den, levels=confidence.border, col = "red", add = TRUE)
Unfortunately I doubt very much this is correct :( In fact I am sure
this is wrong, because the border for probs=0.05 is drawn outside the
values.... So please help and check.
Pascal Hänggis code seems to work, but I don't understand the magic he
does with
pp <- array()
for (i in 1:1000){
z.x <- max(which(den$x < x[i]))
z.y <- max(which(den$y < y[i]))
pp[i] <- den$z[z.x, z.y]
}
before doing the very same as I did above:
confidencebound <- quantile(pp, 0.05, na.rm = TRUE)
plot(x, y)
contour(den, levels = confidencebound, col = "red", add = TRUE)
My problems:
1.) setting probs=0.6827 is somehow a dirty trick which I can only use
by simply knowing that this is the percentage of values inside +-1sd
when a distribution is normal. Is there a way doing this with "native"
sd function?
sd(den.z) is not correct, as den.z is in contrast to x and y not normal
any more. So ecdf(den.z)(sd(den.z)) results in a percentile of 0.5644 in
this example instead of the desired 0.6827.
2.) I would like to have code that works with any desired confidence.
Unfortunately setting probs to the desired confidence would probably be
wrong (?!) as it relates to den.z instead of x and y, which are the
underlying distributions I am interested in. To put it short I want the
confidence of x/y and not of den.z.
I am really completely stuck. Please help me out of this! Felix
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Nachrichtenteil als Anhang
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120303/ba420fff/attachment.pl>
More information about the R-help
mailing list