[R] Histograms, density, and relative frequencies
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Wed Jul 7 20:36:05 CEST 2004
On Wed, 2004-07-07 at 18:29, Bret Collier wrote:
> R-users,
> I have been using R for about 1 year, and I have run across a
> couple of graphics problem that I am not quite sure how to address. I have
> read up on the email threads regarding the differences between density and
> relative frequencies (count/sum(count) on the R list, and I am hoping that
> someone could provide me with some advice/comments concerning my
> approach. I will admit that some of the underlying mathematics of the
> density discussion are beyond my current understanding, but I am looking
> into it.
>
> I have a data set (600,000 obs) used to parameterize a probabilistic causal
> model where each obs is a population response for one of 2 classes (either
> regs1 and regs2). I have been attempting to create 1 marginal probability
> plot with 2 lines (one for each class). Using my rather rough code, I
> created a plot that seems to adhere to the commonly used (although from
> what I can understand wrong) relative frequency histogram approach.
>
> My rough code looks like this:
>
> bk <- c(0, .05, .1, .15, .2, .25,.3, .35, 1)
> par(mfrow=c(1, 1))
> fawn1 <- hist(MFAWNRESID[regs1], plot=F, breaks=bk)
> fawn2 <- hist(MFAWNRESID[regs2], plot=F, breaks=bk)
> count1 <- fawn1$counts/sum(fawn1$counts)
> count2 <- fawn2$counts/sum(fawn2$counts)
> b <- c(0, .05, .1, .15, .2, .25, .3, .35)
> plot(count1~b,xaxt="n", xlim=c(0, .5), ylim=c(0, .40), pch=".", bty="l")
> lines(spline(count1~b), lty=c(1), lwd=c(2), col="black")
> lines(spline(count2~b), lty=c(2), lwd=c(2), col="black")
> axis(side=1, at=c(0, .05, .1, .15, .2, .25, .3, .35))
Have you considered density() and plot.density() by any change ?
> Using the above, I get frequency values for regs1 that look like this
> (which is the same as output for my probabilistic model):
> > count1
> [1] 1.213378e-01 3.454324e-01 3.365343e-01 1.580839e-01 3.342101e-02
> [6] 4.698426e-03 4.488942e-04 4.322685e-05
I would tend to use the term proportion rather than frequency.
> First, count1 is the frequency of occurrence within range 0-0.05, but when
> plotted is the value at b=0 and does not really represent the range? Are
> there any suggestions on a technique to approach this?
You can plot it in the mid-points like hist() does. fawn1$mids would
give you these values.
> Next: Using the above code, the x-axis values end at 0.35, but the axis
> continues (because bk ends at 1)? While there is the chance of occurrence
> out past .35, it is low and I want to extend the lines to about .35 and
> clip the x-axis. But, I have been unable to figure out how to clip Could
> someone point me in the correct direction?
In your plot() function, set xlim=c(0,0.35). If you mean 'clipping' as
in truncating the density, then you probably need to do re-adjust your
proportions such that they sum up to 1.
More information about the R-help
mailing list