[R] comparative density estimates

Michael Friendly friendly at yorku.ca
Thu Mar 23 20:25:53 CET 2006

I have two series of events over time and I want to construct a graph of the
relative frequency/density of these events that allows their 
distributions to
be sensibly compared.  The events are the milestones items in my project on
milestones in the history of data visualization [1], and I want to 
compare trends
in Europe vs. North America.

I decided to use a graph of two overlaid density estimates with rug 
plots, but then
the question arises of how to choose the bandwidth (BW) for the two 
series to allow them
to be sensibly compared, because the range of time and total frequency 
for the two series.  To avoid clutter on this list, I've placed the data 
and R code

I have two versions of this graph, one selecting an optimal BW for each 
and the other using the adjust= argument of density() to approximately 
the BW to the value determined for the whole series combined.  The two 
(done with SAS) are shown at



The densities in the first are roughly equivalent to the R code
d1 <- density(sub1, from=1500, to=1990, bw="sj", adjust=1)
d2 <- density(sub2, from=1500, to=1990, bw="sj", adjust=1)

the second to
d1 <- density(sub1, from=1500, to=1990, bw="sj", adjust=2.5)
d2 <- density(sub2, from=1500, to=1990, bw="sj", adjust=0.75)

The second graph seems to me to undersmooth the more extensive data
from Europe and undersmooth the data from North America.

- any comments or suggestions?
- are there other methods I should consider?

I did find overlap.Density() in the DAAG package, but perversely, it 
uses a bw=
argument to select a B&W/grayscale plot.


[1] http://www.math.yorku.ca/SCS/Gallery/milestone/

Michael Friendly     Email: friendly at yorku.ca 
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

More information about the R-help mailing list