[R] Unexpected behavior from hist()
David Carlson
dcarlson at tamu.edu
Thu Jun 13 17:56:05 CEST 2013
Density means that the AREAS of the bars add to 1, not the HEIGHTS
of the bars. You probably have intervals that are less than 1. Eg:
> set.seed(42)
> x <- rpois(1000, 5)/100
> info <- hist(x, prob=TRUE)
> info
$breaks
[1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
0.12 0.13
$counts
[1] 42 88 151 177 178 131 97 70 43 14 6 2 1
$density
[1] 4.2 8.8 15.1 17.7 17.8 13.1 9.7 7.0 4.3 1.4 0.6 0.2
0.1
$mids
[1] 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095
0.105 0.115
[13] 0.125
$xname
[1] "x"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
> diff(info$breaks)*info$density # Areas of each bar
[1] 0.042 0.088 0.151 0.177 0.178 0.131 0.097 0.070 0.043 0.014
0.006 0.002
[13] 0.001
> sum(diff(info$breaks)*info$density) # Sum of the areas
[1] 1
-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Sarah Goslee
Sent: Thursday, June 13, 2013 10:36 AM
To: Mohamed Badawy
Cc: r-help at r-project.org
Subject: Re: [R] Unexpected behavior from hist()
Hi,
On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy
<mbadawy at pm-engr.com> wrote:
> Hi... I'm still a beginner in R. While doing some curve-fitting
with a raw data set of length 22,000, here is what I had:
>
>
>
>> hist(y,col="red")
>
> gives me the frequency histogram, 13 total rectangles, highest is
near 5000.
>
You don't provide a reproducible example, so here's some fake data:
somedata <- runif(1000)
> Now
>
>> hist(y,prob=TRUE,col="red",ylim=c(0,1.5))
>
> gives me the density (probability?) histogram, same number f
rectangles, but the highest rectangle is obviously higher than 1,
how can this be?!!!
Because you misread the help. using freq=FALSE (equivalent to
prob=TRUE, which is a legacy option), you are getting:
freq: logical; if 'TRUE', the histogram graphic is a representation
of frequencies, the 'counts' component of the result; if
'FALSE', probability densities, component 'density', are
plotted (so that the histogram has a total area of one).
Defaults to 'TRUE' _if and only if_ 'breaks' are
equidistant
(and 'probability' is not specified).
It sounds like what you actually want is:
somehist <- hist(somedata, plot=FALSE)
somehist$counts <- somehist$counts/sum(somehist$counts)
plot(somehist)
> P.S. I had to post this thread via email as it got rejected as I
posted it from Nabble, reason was "Message rejected by filter rule
match"
Nabble is not the R-help mailing list. Posting via email is the
correct thing to do.
Sarah
--
Sarah Goslee
http://www.functionaldiversity.org
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list