[R] reshaping of data for barplot2

Marc Schwartz MSchwartz at MedAnalytics.com
Wed Nov 24 21:58:00 CET 2004


On Wed, 2004-11-24 at 19:24 +0100, Jean-Louis Abitbol wrote:
> Dear All,
> 
> I have  the following data coming out from 
> 
> s <- with(final,
>            summarize(norm, llist(gtt,fdiab),
>                      function(norm) {
>                       n <- sum(!is.na(norm))
>                       s <- sum(norm, na.rm=T)
>                       binconf(s, n)
>                      }, type='matrix')
> )
> ie 
> 
>      gtt fdiab   norm.norm  norm.norm2  norm.norm3
> 18    PL    No  3.70370370  0.18997516 18.28346593
> 19    PL   Yes  3.57142857  0.18319034 17.71219774
> 13    TT1   No  9.09090909  3.59221932 21.15923917
> 14    TT1  Yes  1.81818182  0.09326054  9.60577606
> ...
> 10  HIGH    No 26.53061224 16.21128213 40.26228897
> 11  HIGH   Yes 10.00000000  4.66428345 20.14946472
> 
> I would like to reshape the data so that I can barplot2 treatments (gtt)
> with 2 beside bars for fdiab  yes/no and add CI.
> 
> Various attemps have been unsuccessful as I have not understood both the
> logic of beside and the nature of structures to be passed to barplot2.
> Not enough know-how with reshape and transpose either.
> 
> Needless to say Dotplot works great with this kind of data but some
> "Authority" requests side:side bars with CI.
> 
> Thanks for any help.

Jean-Louis,

For an easy example, see the help in barplot2, which uses the VADeaths
dataset. The dataset looks like:

> VADeaths
      Rural Male Rural Female Urban Male Urban Female
50-54       11.7          8.7       15.4          8.4
55-59       18.1         11.7       24.3         13.6
60-64       26.9         20.3       37.0         19.3
65-69       41.0         30.9       54.6         35.1
70-74       66.0         54.3       71.1         50.0

Now use:

barplot2(VADeaths)

This will yield a stacked bar plot, where there are 4 bars (one for each
column in the matrix). Each bar then consists of 5 stacked sections,
with each section representing the row values in each column.

Now try:

barplot(VADeaths, beside = TRUE)

This now yields 4 groups of bars, with one group for each column. Each
group then consists of 5 bars, one bar for each row value.

Hopefully, that gives you some insight into how the matrix structure
interacts with the 'beside' argument.

In the case of your data above, I read the few rows into a data frame
called 'df'. So 'df' looks like:

> df
   gtt fdiab norm.norm  norm.norm2 norm.norm3
1   PL    No  3.703704  0.18997516  18.283466
2   PL   Yes  3.571429  0.18319034  17.712198
3  TT1    No  9.090909  3.59221932  21.159239
4  TT1   Yes  1.818182  0.09326054   9.605776
5 HIGH    No 26.530612 16.21128213  40.262289
6 HIGH   Yes 10.000000  4.66428345  20.149465


To follow the VADeaths example above, you need to re-shape the required
columns, each as three column matrices, as follows:

height <- matrix(df$norm.norm, ncol = 3)
ci.l <- matrix(df$norm.norm2, ncol = 3)
ci.u <- matrix(df$norm.norm3, ncol = 3)
bars <- matrix(df$fdiab, ncol = 3)

Now, 'height' looks like:

> height
         [,1]     [,2]     [,3]
[1,] 3.703704 9.090909 26.53061
[2,] 3.571429 1.818182 10.00000

ci.l and ci.u and bars will of course look similar.


So, now you could use barplot2 as follows:

mp <- barplot2(height, plot.ci = TRUE, 
               ci.l = ci.l, ci.u = ci.u, beside = TRUE,
               names.arg = bars)


Note that I save the bar midpoints in 'mp'.

Now, you can go back and put in the bar group labels as follows. First
break out the unique values of 'gtt' keeping the order intact by using
matrix():

labels <- matrix(df$gtt, ncol = 2, byrow = TRUE)
mtext(side = 1, at = colMeans(mp), text = labels[, 1], line = 3)

Note that I use 'byrow = TRUE' in the call to matrix() so that the order
of the matrix is set properly. Thus, each column contains the group
labels and looks like:

> labels
     [,1]   [,2]  
[1,] "PL"   "PL"  
[2,] "TT1"  "TT1" 
[3,] "HIGH" "HIGH"

So we just use the first column above in the call to mtext().


So that should do it and can be extended to your full dataset if the
format is consistent with what you have above.


One final (and important) note.  There is another approach here that can
be used, which is to keep your data in its initial state and specify the
'space' argument explicitly in the call to barplot2. This is actually
less work than what we did above. In this case, we use the 'space'
argument to group the bars explicitly, which is in effect, what the
'beside' argument does internally.

We use each column from 'df' directly and set the 'space' argument to a
repeating sequence of c(1, 0) for each of the 3 groups. Note that here
we need to explicitly define the colors to use, since barplot2 uses
'grey' by default when 'height' is a vector (as does barplot). We also
need to convert df$diab to a vector, otherwise the numeric factor codes
will be used.

The sequence then goes like this:

mp <- barplot2(df$norm.norm, plot.ci = TRUE, 
               ci.l = df$norm.norm2, ci.u = df$norm.norm3,
               space = rep(c(1, 0), 3),
               col = rep(c("red", "yellow"), 3),
               names.arg = as.vector(df$fdiab))

Now, as before, we create the 'labels' matrix.

labels <- matrix(df$gtt, ncol = 2, byrow = TRUE)

Now, use mtext() for the group. Note that since 'height' was a vector
and not a matrix, 'mp' will be as well. Thus, we need to convert the
'mp' vector to a matrix to get the group midpoints:

mtext(side = 1, at = colMeans(matrix(mp, ncol = 3)), 
      text = labels[, 1], line = 3)


Which approach you take is up to you. As you see, either one will work.

HTH,

Marc Schwartz
<Will be away from e-mail for a while. Will check back later>




More information about the R-help mailing list