[R] Graphics and LaTeX documents with the same font [double-Y-axis graphs]

Michael Friendly friendly at yorku.ca
Sat Sep 29 19:41:40 CEST 2007


hadley wickham wrote:
> On 9/29/07, hadley wickham <h.wickham at gmail.com> wrote:
>> On 9/29/07, Michael Friendly <friendly at yorku.ca> wrote:
>>> hadley wickham wrote:
>>>> I was interested to see that you have code for drawing scatterplots
>>>> with multiple y-axes.  As far as I know the only legitimate use for a
>>>> double-axis plot is to confuse or mislead the reader (and this is not
>>>> a very ethical use case).  Perhaps you have a counter-example?
>>>>
>>>> Hadley
>>>>
>>> While it is true that the double-Y-axis graph is generally considered
>>> sinful, it can be used effectively to show the relation of two time
>>> series in ways that other graphs can't do as well.
>>>
>>> For one striking example,
>>> a political, presentation graphic, see:
>>> http://www.math.yorku.ca/SCS/Gallery/images/commonsenserevolution6.pdf
>>> described on my Graphical Excellence page,
>>> http://www.math.yorku.ca/SCS/Gallery/excellence.html
>>> I found it easy to excuse the sin by the 'wow effect' produced by the
>>> graph.
>> While I agree that the double y-axis plot can be used to compare two
>> time series, I'm not sure whether or not it actually is effective.
>> The appearance of the display is so critically dependent on the
>> relative scales of the axes, that it is easy to draw the wrong
>> conclusion.  Why not use a scatterplot or path plot (i.e. connect
>> subsequent observations with edges) if you want to understand the
>> relationship between two variables?
> 
> To compare the scatterplot vs double axis plot, I used graphclick
> (http://www.arizona-software.ch/graphclick/) to digitise the graphic,
> to get the following dataset:
> 
> csr <- structure(list(year = c(1985, 1986, 1987, 1988, 1989, 1990, 1991,
> 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
> 2003, 2004, 2005, 2006), deaths = c(1, 1, 7, 5, 12, 3, 7, 5,
> 4, 6, 8, 19, 26, 20, 42, 41, 45, 41, 27, 52, 67, 50), income = c(NA,
> 8572, NA, NA, 9264, 10071, 10338, 10687, 10666, 10666, 9907,
> 8141, 8059, 7997, 7874, 7648, 7484, 7319, 7135, 7135, 7011, NA
> )), .Names = c("year", "deaths", "income"), row.names = c(NA,
> -22L), class = "data.frame")
> 
> and produce the attached graphic (I'm not sure if the attachment will
> make it to r-help, but the code should be reproducible on any system):
> 
> library(ggplot2)
> ggplot(csr, aes(x=deaths, y=income)) +
> geom_path(colour="grey80") + geom_point()
> 
> # or without connecting lines
> ggplot(csr, aes(x=deaths, y=income)) + geom_point()
> 
> I find this graph much easier to interpret - one can see outliers, the
> suggestion of non-linearity etc.  It would also be easy to add the
> political party with colour or shape.
> 
> I'm not sure if it's a good idea to include the line or not - the
> gestalt principle of connectedness makes it very difficult to
> interpret the points as separate objects even when the line connecting
> them is so faint.
> 
> Hadley
> 

Thanks for trying this, Hadley, because the comparison
is instructive in terms of the difference between the
communication goals of analysis and presentation graphs.

Actually, one should regard income as the independent variable,
deaths as response, so what you want is

 > ggplot(csr, aes(y=deaths, x=income)) +
+ geom_path(colour="grey80") + geom_point()
 >
but, instead of/in addition to geom_path, a bolder loess smooth
would show the trend better.

This does, indeed show the inverse, and non-linear relation
between welfare income and deaths more directly, a few outliers.
Good for an analysis graph, but it fails the Interocular Traumatic
Test for a presentation graph-- the message should hit you between
the eyes.
Even
with use of color/shape to represent the party in power,
the stark message of the original is lost: When the Mike
Harris conservatives came to power in Ontario in June 1995, they slashed
welfare payments, and the number deaths of homeless people
increased dramatically. This trend continued under the McGuinty 
liberals, elected in Oct 2003.  It's particularly poignant that
bars for deaths are made from the names of the homeless who died
(and sad to see the number of John/Jane Doe among them).

To explore this further, I added a column for party to the
csr dataframe, but the transitions between parties occurred
in different months, and one would need a separate datafram
to represent that precisely.

    year deaths income        party
1  1985      1     NA      Liberal
2  1986      1   8572      Liberal
3  1987      7     NA      Liberal
4  1988      5     NA      Liberal
5  1989     12   9264      Liberal
6  1990      3  10071          NDP
7  1991      7  10338          NDP
8  1992      5  10687          NDP
9  1993      4  10666          NDP
10 1994      6  10666          NDP
11 1995      8   9907 Conservative
12 1996     19   8141 Conservative
13 1997     26   8059 Conservative
14 1998     20   7997 Conservative
15 1999     42   7874 Conservative
16 2000     41   7648 Conservative
17 2001     45   7484 Conservative
18 2002     41   7319 Conservative
19 2003     27   7135      Liberal
20 2004     52   7135      Liberal
21 2005     67   7011      Liberal
22 2006     50     NA      Liberal
 >

-Michael

-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA



More information about the R-help mailing list