[R] Subject: Re: how to include bar values in a barplot?

Ted.Harding at manchester.ac.uk Ted.Harding at manchester.ac.uk
Wed Aug 8 23:52:47 CEST 2007

Greg, I'm going to join issue with your here! Not that I'll go near
advocating "Excel-style" graphics (abominable, and the Patrick Burns
URL which you cite is remarkable in its restraint). Also, I'm aware
that this is potential flame-war territory --  again, I want to avoid
that too.

However, this is the second time you have intervened on this theme
(previously Mon 6 August), along with John Kane on Wed 1 August and
again today on similar lines, and I think it's time an alternative
point of view was presented, to counteract (I hope usefully) what
seems to be a draconianly prescriptive approach to the presentation
of information.

On 07-Aug-07 21:37:50, Greg Snow wrote:
> Generally adding the numbers to a graph accomplishes 2 things:
> 1) it acts as an admission that your graph is a failure

Generally, I disagree. Different elements in a display serve different
purposes, according to the psychological aspects of visual preception.

Sizes, proportions, colours etc. of shapes (bars in a histogram, the
marks representing points in a scatterplot, ... ) are interpreted, so
to speak, "intuitively" -- the resulting perception is formed by
processes which are hard to ascertain consciously, and the overall
effect can only be ascertained by looking at it, and noting what
impression one has formed. They stimulate mental responses in the
domain of perception of spatial relationships.

Numbers, and text, on the other hand, while still shapes from the
optical point of view, up to the point of their impact on the retina,
provoke different perceptions. They are interpreted "analytically"
stimulating mental responses in the domains of language and number.

There is no Law whatever which requires that the two must be separated.

It may be that adding any annotation to a graph or diagram will
interfere with the "intuitive" imterpretation that the diagram is
intended to stimulate, with no associated benefit.

It may be that presenting numerical/textual information within a
graphical/diagrammatic context will interfere with the "analytic"
interpretation wich is desired, with no associated benefit.

In such cases, it is clearly (and as a matter of fact to be decided
in each case) better to separate the two apsects.

It may, however, be that both can be combined in such a way that
each enhances the other; and also the simultaneous perception of
both aspects induces a "cartesian-product" richness of interpretation
where each element of the graphical presentation combines with
each element of the textual/numerical presentation to generate
a perception which could not possibly have been realised if they
had been presented separately. This, too, is a matter to be decided
in each case.

On that basis, if a graph without numbers fails to stimulate a
desired impression which could have been stimulated by adding the
numbers to the graph, then the graph without numbers is a failure.

> 2) it converts the graph into a poorly laid out table (with a
> colorful and distracting background)
> In general it is better to find an appropriate graph that does
> convey the information that is intended or if a table is more
> appropriate, then replace it with a well laid out table (or both).

There is an implication here that the information conveyed by a graph,
and the information conveyed by a table, are mutually exclusive.
And that it then follows: Thou Shalt Not Allow The One To Corrupt
The Other. While this has the appearance of a Law, it is (for reasons
I have sketched above) a Law which is not *generally* applicable.

> Remember that the role of tables is to look up specific values
> and the role of graphs is to give a good overview.

I would agree with this only to the following extent:

Tables allow *only* the look-up of values.
Graphs (modulo the capacity of the eye/brain to more or less precisely
judge relative magnitudes) only allow a "good overview".

I would not agree that these are their exclusive roles.

The role of Hamlet is to agonise over revenge for his father's death.
The role of Ophelia is to embody the "love interest" in the play.

This does not imply that there should be parallel performances of
"Hamlet" on two different  stages, with the audience trooping from
one to the other according to which character is currently at the
centre of the action. It actually works better when they're all up
there at once, interacting!

> The books by William Cleveland and Tufte have a lot of good advice
> on these issues.

Since you mention Tufte, I commend the admiring discussion in his
book "The Visual Display of Quantitative Information", Chapter 1
(Graphical Excellence), section "Narrative Graphics of Space and
Time" (pp. 40-41 in the edition which I have) of Minard's graphical
representation of what happened to Napoleon's army in the course
of its advance on, and retreat from, Moscow.

An impression of the original can be formed from the rather small
version displayed on Tufte's website at the top of
The version in the book is much clearer.

Here we see the two aspects of "intuitive" and spatial perception,
and textual/numeric "analytical" perception, happily combined on
the one display in such a way that the two interact richly.

Overlaid on the geographical pathway of the army is a broad band,
like a river (with branches), whose breadth at any point represents
the surviving numbers of the army. The advancing part is cross-hatched,
the retreating part is solid black. Place-names and rivers are
marked in text. Every so often, the numerical values of the surviving
numbers are written in at the positions they apply to:
422,000 -> 400,000 -> 175,000 -> 145,000 -> 121,000 -> 100,000 [MOSCOW].

Then, on the retreat:
[MOSCOW] 100,000 -> 96,000 -> 87,000 -> 55,000 -> 37,000 -> 24,000
-> 20,000 -> 50,000 [picking up 30,000 out of an original 50,000
who'd peeled off from the original advance early on and were now
in retreat] -> 28,000 -> 12,000 -> 14,000 -> 8,000 -> 4,000 -> 10,000.

(The increments in the final leg are due to gathering up other
remnants in retreat).

Along the retreating arm, selected points are linked to a graph
below the main graphic which shows -- as a graph -- the temperature
(the final ingredient in the disaster) in degrees C (decreasing fairly
steadily from 0degC to -20degC).

The graph itself is also annotated with the value of the temperature
at each relevant point, along with the date, and linked to the
"army graphic" by a line.

This is a complex (but, after a few minutes thought, clear) combination
of graphical and textual/numerical information. It succeeds brilliantly
in its intention, which would have been unachievable if any principle
that graphical and numerical information should be separated had been
adhered to.

had been adopted, then (at most) places on the graphic would be
marked with say letters "A", "B", "C", and on other pages would
be tables associating with each letter the residual size of the army,
the date, the temperature, and the placename. Nothing more distracting,
in terms of expecting the user to reconstruct the impression intended
to be conveyed, can be imagined.

One can, with an "editor's eye", criticise some details of the
implementation of Minard's design. The hatching on the "advancing"
section interferes with the legibility of the placenames on it
(but of course Minard would not have had nice easy colour backgrounds
available to him in 1861). The "typeface" is poorly legible in itself.
The orientations of many of the numerical annotations are so variable
that it requires unnecessary effort to read them. But these are details
which can be (at least now) put right, with enhanced clarity, thus
vindicating even more strongly the original concept. They are issues
of detail in style and implementation.

For a re-working which does attend to such details in a modern style,


and then see how attention to such details improves the effect.

> Before asking how to get R to produce a graph that looks like one
> from a spreadsheet, you should study:
> http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html and
> some of the links from there.  You may also want to run the following
> in
> R:
>> library(fortunes)
>> fortune(120)
> In general I like OpenOffice, my one main complaint is that when faced
> with the decision between doing something right or the same way as
> microsoft, they have not always made the right decision.

If anything, there should be a Law: Thou Shalt Not Even Think Of
Producing A Graph That Looks Like Anything From A Spreadsheet.
At any rate, not until spreadsheets give you much finer control
and choice of the details of their graphics.

> Hope this gives you something to think about,

It did indeed! I would add that graphics I produce myself (with
or without numeric/textual annotations) are hand-crafted. On this
approach, even R's good graphical output is treated as "draft".
The ultimate end result is composed directly from the numerical
data associated with the elements in the graphic, as exported
from R. It takes time, of course.

Whether to add such annotations, and, if so, how; and whether and
how to embellish the graphics with colour, etc., are decided at
the time in terms of the information which it is desired to
communicate, and evaluated by trying to look at it with an "new
eye", to judge what another viewer's impression might be.

In short, it is a matter of careful and thoughtful *design*.

Where, of course "thoughtful" means "thinking about it" -- one
thing that spreadsheets inhibit, because

a) Even if you do think about it, you're not going to find it easy
   to implement the results of your thoughts (if they're any good);
b) Spreadsheets readily induce the naive (especially beginning)
   user into the habit of trusting that the writers of spreadsheet
   software have thought through all those nasty implementation
   technicalities and have created an "expert system" which looks
   after drawing the graph according to best practice and with all
   necessary sophistication. Look! Isn't it clever!!

This habit, once (all too easily) acquired, is difficult to kick.
Patrick Burns's deliberate use of "addiction" is apt.

Best wishes,

E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Aug-07                                       Time: 22:40:19
------------------------------ XFMail ------------------------------

More information about the R-help mailing list