[R] Thinking about using two y-scales on your plot?

Richard Cotton Richard.Cotton at hsl.gov.uk
Mon Apr 7 12:24:23 CEST 2008

```
thegeologician wrote:
>
> A plot of the actual temperature during a year (or thousands of years,
> as people in palaeoclimate-studies are rather used to) is just so much
> more intuitive, than some correlation-coefficients or such. I know I'm
> largely speaking to statisticians in this forum, but in Earth Sciences,
> most people aren't... I see the use of correlation coefficients and
> -plots in proofing that an apparent correlation is "real", but the first
> question upon presenting any statistic analysis is always "What does the
> DATA look like?".
>

Agreed - the data itself is much easier to get to grips with than
correlation coefficients.

thegeologician wrote:
>
> Of course, these plots could be plotted separately with a common x-axis,
> it's just a matter of saving space and of being used to that kind of
> graph. I can't imagine anyone being falsely lead to a thought like "oh
> gosh, the temperature is much higher/bigger/more than the
> precipitation!" - that makes no sense. I do see the point in graphs
> where values are plotted together, whose possible interaction with each
> other might lead to wrong conclusions. Then, it might not be obvious
> that one is drawing a senseless conclusion.
>

I think in the temperature/ precipitation case, whether to draw multiple
y-axes or not is a fairly minor decision.  The reader would have to be
pretty dumb to assume that temperatures and precipitations can be compared.
The point is that it can appear that way - so the reader has to engage their
brain to tell themselves "ignore the obvious comparisons between the lines
that I perceive".  This is clearly not a desirable trait in a graph.

I've concocted an example to show that it's possible to mislead unwary
readers by changing the y-axes scale.

This uses the nottem temperature dataset built into R, and some made-up
precipitation data.

#Generate some precipitation data
precipitation =
30+runif(240,5,10)*sin(seq(pi/6,40*pi,pi/6)+pi/4)+rnorm(240,0,3)
pts <- ts(precipitation, start=1920, frequency=12)

#First plot, correlation is apparent
plot(nottem)
par(new=TRUE)
plot(pts, axes=FALSE, col="blue", ylab="")
Axis(side=4)

#Second plot, scale changing makes it appear that precipitation does not
vary with temperature.
plot(nottem)
par(new=TRUE)
plot(pts, axes=FALSE, col="blue", ylab="", ylim=c(0,10000))
Axis(side=4)

I'm willing to concede that the attempt at misleading the audience is pretty
artificial, and not very subtle.  A more dangerous case would be the
opposite situation - making a correlation become visible on a plot where
none really exists, by fiddling with axes tranformations (you could use a
log scale on the second y-axis, or any other transformation you wished).

I suspect that the popularity of multiple y-axes arose from a greater need
to save space in paper-based journals, but in the age of electronic
documents, is space saving really that important?

-----
Regards,
Richie.

Mathematical Sciences Unit
HSL
--
View this message in context: http://www.nabble.com/Thinking-about-using-two-y-scales-on-your-plot--tp16290293p16537217.html
Sent from the R help mailing list archive at Nabble.com.

```