[R] comparative density estimates

Michael Friendly friendly at yorku.ca
Fri Mar 24 15:04:50 CET 2006

Thanks, Achim

The cdplot is quite interesting too, though it answers a slightly
different question and seems to finess the bandwidth question
(maybe not a bad thing).

Here's a similar plot, fleshed out as my others:

Where <- factor(c(rep("North America", length(sub1)),
                   rep("Europe", length(sub2))))
Year <- c(sub1, sub2)
cdplot(where ~ year, bw = "sj")

cdplot(Where ~ Year, bw = "sj", col=gray.colors(2,start=.7),
	main="Milestones: Place of development"
abline(v= ref, lty=3, col="blue")
laby<- 0.6 + 0.05 * c(0, 1, 2, 3, 5, 3, 5, 2)
text(labx, laby, labels=txt1, cex=1.2, xpd=TRUE)
rug(sub1, quiet=TRUE, col="red", side=3)
rug(sub2, quiet=TRUE)

This also solves the little problem I had with offsetting
the two rug plots (so as not to rely on color).

But I wonder why my main= title does not appear.


Achim Zeileis wrote:

> Michael,
> very nice and interesting plots!
> One alternative idea to compare the proportion of milestone items
> (that does not really answer the bandwith question) in Europe and North
> America might be a conditional density plot. After running your R
> source code, you could do:
>   where <- factor(c(rep("North America", length(sub1)),
>                     rep("Europe", length(sub2)))) 
>   year <- c(sub1, sub2)
>   cdplot(where ~ year, bw = "sj")
> showing the decrease in the European proportion.
> Internally, this first computes the unconditional density as in
>   plot(density(year, bw = "sj"))
> and then the density for Europe with the same bandwidth.
> Best wishes,
> Z
> On Thu, 23 Mar 2006 14:25:53 -0500 Michael Friendly wrote:
>>I have two series of events over time and I want to construct a graph
>>of the relative frequency/density of these events that allows their 
>>distributions to
>>be sensibly compared.  The events are the milestones items in my
>>project on milestones in the history of data visualization [1], and I
>>want to compare trends
>>in Europe vs. North America.
>>I decided to use a graph of two overlaid density estimates with rug 
>>plots, but then
>>the question arises of how to choose the bandwidth (BW) for the two 
>>series to allow them
>>to be sensibly compared, because the range of time and total
>>frequency differ
>>for the two series.  To avoid clutter on this list, I've placed the
>>data and R code
>>I have two versions of this graph, one selecting an optimal BW for
>>each separately
>>and the other using the adjust= argument of density() to
>>approximately equate
>>the BW to the value determined for the whole series combined.  The
>>two versions
>>(done with SAS) are shown at
>>The densities in the first are roughly equivalent to the R code
>>d1 <- density(sub1, from=1500, to=1990, bw="sj", adjust=1)
>>d2 <- density(sub2, from=1500, to=1990, bw="sj", adjust=1)
>>the second to
>>d1 <- density(sub1, from=1500, to=1990, bw="sj", adjust=2.5)
>>d2 <- density(sub2, from=1500, to=1990, bw="sj", adjust=0.75)
>>The second graph seems to me to undersmooth the more extensive data
>>from Europe and undersmooth the data from North America.
>>- any comments or suggestions?
>>- are there other methods I should consider?
>>I did find overlap.Density() in the DAAG package, but perversely, it 
>>uses a bw=
>>argument to select a B&W/grayscale plot.
>>[1] http://www.math.yorku.ca/SCS/Gallery/milestone/
>>Michael Friendly     Email: friendly at yorku.ca 
>>Professor, Psychology Dept.
>>York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
>>4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
>>Toronto, ONT  M3J 1P3 CANADA
>>R-help at stat.math.ethz.ch mailing list
>>PLEASE do read the posting guide!

Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA

More information about the R-help mailing list