[R] ggplot2 histograms... a subtle error found
Brian Diggs
diggsb at ohsu.edu
Mon Aug 2 22:41:39 CEST 2010
On 7/28/2010 5:04 PM, Mike Williamson wrote:
> Hello all,
>
> I have a peculiar and particular bug that I stumbled across with
> ggplot2. I cannot seem to replicate it with anything other than my specific
> data set.
>
> Here is the problem:
>
> - when I try to plot a histogram, allowing for ggplot2 to decide the
> binwidths itself, I get the following error:
> - stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to
> adjust this.
> - Error: position_stack requires constant width
>
> My code is simply:
>
> ggplot(data=myDataSet, aes(x=myVarOI)) + geom_histogram()
>
> or
>
> qplot(myDataSet$myVarOI)
>
> If I go ahead and set the binwidth to some value, then the plot can be
> made without problems.
>
> The problem is with the specific data that it is trying to plot. I
> suspect it is trying to create bins of different sizes, from the error
> code. Here are the basics of my data set:
>
> - length: 1936 entries
> - 1906 unique entries
> - stats:
> - Min. 1st Qu. Median Mean 3rd Qu. Max.
> 3.200e+09 6.312e+09 6.591e+09 6.874e+09 7.551e+09 1.083e+10
>
>
>
> I cannot imagine this can be solved without my specifically uploading
> the actual data. If I simply attach it, will it be received by r-help?
> Hadley, if you're interested, would you like me to send you the data
> directly to you?
I can reproduce it with generic data. The problem is one of underflow.
ggplot(data=mtcars, aes(x=6.8e+09 + qsec)) + geom_histogram()
#stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust
this.
#Error during wrapup: position_stack requires constant width
When ggplot2 verifies the widths before stacking (the default position
for histograms), it computes the widths from the minimum and maximum
values for each bin. However, because the width of the bins (0.28) is
much smaller than the scale of the edges (6.8e+09), there is some
underflow and the widths don't all come out equal:
# in ggplot2::collide
with(data, xmax-xmin)
# [1] 0.2799988 0.2799988 0.2800007 0.2799988 0.2799988 0.2799988
#0.2800007 0.2799988 0.2799988
#[10] 0.2799988 0.2800007 0.2799988 0.2799988 0.2799988 0.2800007
#0.2799988 0.2799988 0.2800007
#[19] 0.2799988 0.2799988 0.2799988 0.2800007 0.2799988 0.2799988
#0.2799988 0.2800007 0.2799988
#[28] 0.2799988 0.2799988 0.2800007 0.2799988 0.2799988
unique(with(data, xmax - xmin))
#[1] 0.2799988 0.2800007
So ggplot2 concludes the widths are not equal and gives the error you
see. I don't think this is a bug; you are operating at the edge of what
the floating point precision will allow, and seem to have crossed that
edge in this case. (I suppose ggplot2 could carry the information that
the bins are created with equal widths and then not have to check that
later, but that seems unnecessary overhead.)
There is a workaround, though.
ggplot(data=mtcars, aes(x=6.8e+09 + qsec)) +
geom_histogram(position="identity")
gives what you want and does not require the widths to be equal. If you
had more than one group, position="stack" and position="identity" are
quite different, but they are equivalent for one group and so you can
get away switching one for the other in this case.
> Regards,
> Mike
--
Brian Diggs
Senior Research Associate, Department of Surgery, Oregon Health &
Science University
More information about the R-help
mailing list