[R] FW: Bubble plots

hadley wickham h.wickham at gmail.com
Sat Aug 2 15:24:33 CEST 2008


On Sat, Aug 2, 2008 at 8:10 AM, Frank E Harrell Jr
<f.harrell at vanderbilt.edu> wrote:
> Cody Hamilton wrote:
>>
>> Is there a way to create a 'bubble plot' in R?
>>
>> For example, if we define the following data frame containing the level of
>> y observed for 5 patients at three time points:
>>
>> time<-c(rep('time 1',5),rep('time 2',5),rep('time 3',5))
>> y<-c('a','b','c','d','a','b','c','a','d','a','a','a','b','c','d')
>> D<-data.frame(cbind(y,time))
>>
>> I would like to display the percentage of subjects in each level of y at
>> each time point as a bubble whose size is proportional to the percentage of
>> subjects in the given level of y at the given time point.  Thus, in the case
>> of the data frame above the plot would have the levels of y
>> ('a','b','c','d') on the y-axis and the levels of time ('time 1','time 2',
>> time 3') on the x-axis with four bubbles above each time point (e.g. the
>> size of the bubble in the bottom left corner of the plot would be
>> proportional to the percentage of patients with y='a' at time='time 1').
>>
>> I am running R 2.7.1 under windows.
>>
>> Regards,
>>   -Cody
>>
>
> The xYplot function in the Hmisc package can do that.  It may be more
> elegant using ggplot2.

It's certainly possible to do it with ggplot2:

tab <- prop.table(table(D), margin = 2)
df <- as.data.frame(tab, responseName = "freq")

library(ggplot2)
qplot(y, time, data = df, size = freq)
qplot(y, time, data = df, size = freq) + scale_area()
qplot(y, time, data = df, size = freq) + scale_area(to=c(1,5))

But it wouldn't recommend it - you're trying to visualise an important
number (frequency) using a perceptual mapping (size) that humans
aren't very good at.  Why not do a scatterplot of frequency vs time?

qplot(time, freq, data=df, colour = y)

There are only a few different values of freq for this example, so a
little jittering helps:

qplot(time, freq, data=df, colour = y, geom="jitter")

Since you have time on the x-axis it's common to use a line plot:

df$time <- as.numeric(gsub("time ", "", df$time))
qplot(time, freq, data=df, colour = y, geom="line")

although again you have an overplotting problem, which you could solve
with jittering:

qplot(time, freq, data=df, colour = y, geom="line", position="jitter")

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list