[R] boxplots with multiple numerical variables

Marc Schwartz mschwartz at medanalytics.com
Sat Mar 15 02:29:02 CET 2003


>-----Original Message-----
>From: r-help-bounces at stat.math.ethz.ch 
>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Thomas Lumley
>Sent: Friday, March 14, 2003 4:36 PM
>To: Rishabh Gupta
>Cc: r-help at stat.math.ethz.ch
>Subject: Re: [R] boxplots with multiple numerical variables
>
>
>On Fri, 14 Mar 2003, [iso-8859-1] Rishabh Gupta wrote:
>
>> Hi all,
>>    I have a question regarding the boxplot function. The data I am 
>> working on has 1 grouping variable (G) and it has many numerical 
>> variables (V1, V2, V3, V4, Vx, etc). What I would like to do 
>is create 
>> a boxplot where the Y-axis represents the numerical values 
>of variable 
>> V1...Vx (all the variables have the same range). The X-axis needs
to 
>> represent the G-V combination. So suppose the possible values for G

>> are a, b and c, Then along the x-axis there would be a boxplot for 
>> each of the combinations:
>>
>>   V1Ga, V1Gb, V1Gc, V2Ga, V2Gb, V2Gc, V3Ga, V3Gb, V3Gc,.....VxGa, 
>> VxGb, VxGc, etc ie
>>   all values of V1 where the G values are a, all values of V1 where

>> the G values are b, etc In addition, if possible, it would 
>be nice if 
>> each G value would have a a different colour on the plot so 
>that they 
>> could be seen more clearly.
>>
>> I'm not sure whether such a function already exists within R or 
>> whether it would have to be written. Either way, I would 
>appreciate it 
>> very much if somebody could help and give me some advice as to how
I 
>> can achieve this.
>>
>
>I'm going to work with a data frame that has two variables and 
>a binary grouping factor
>
>df<-data.frame(x1=rnorm(100),x2=rnorm(100),g=rep(0:1,50))
>
>
>There's at least two ways to do this.  boxplot() will take a 
>list of vectors and do boxplots of them, so we can split() 
>each of the vectors
>   lapply(df[,1:2], function(v) split(v, df$g))
>and then combine them into a single list with do.call("c",)
>and then boxplot() them. That is:
>   boxplot(do.call("c",lapply(df[,1:2],function(v) 
>split(v,df$g)))) This labels the x-axis "x1.0" "x1.1", "x2.0", "x2.1"
>
>
>We can also do the opposite: combine the vectors into a single 
>variable, add a new factor indicating which vector each 
>observation came from, and use boxplot() with a formula.
>   ddf<-reshape(df,varying=list(x=c("x1","x2")),direction="long")
>   boxplot(x1~interaction(time,g),data=ddf)
>This labels the x-axis "1.0" "2.0" "1.1" "2.1"
>
>
>	-thomas


After seeing Thomas' reply, I realized that my prior approach is not
correct.

Building on Thomas' reply and to address color and box groupings if
you desire, the following will cycle through three colors, one for
each letter group. It will also group each variable based upon the a,
b and c groups. This incorporates my prior thought on boxplot
groupings and x axis labeling.

I adjusted Thomas' df to contain a grp column with three grps: a
(red), b (yellow), and c (green).  

So:

df <- data.frame(v1 = rnorm(100), v2 = rnorm(100), 
                 v3 = rnorm(100), grp = rep(letters[1:3],100))

# Define 'at' argument positions. There will be 9 boxes generated
# by the boxplot() call normally centered on x axis1:9. 
# Using 'at', you can reconfigure these.
# Note that the center boxes for each variable
# are at x axis 2, 5 and 8. The other two bars are offset by 0.5
# before and after the center box.

at = c(1.5, 2.0, 2.5, 4.5, 5.0, 5.5, 7.5, 8.0, 8.5)

# Generate plots as per Thomas' example, cycling colors and box
# positions. Set boxwex for thin boxes. Do not draw axes.

boxplot(do.call("c",lapply(df[,1:3],function(v) split(v,df$g))), 
        col = c("red", "yellow", "green"), boxwex = 0.2, at = at, 
        axes = FALSE)

# Draw y axis
axis(2)

# Draw x axis, creating letter labels below each box
axis(1, labels = rep(letters[1:3], 3), at = at)

# Now label each group of 3 boxes with the varname in the middle
# of each grouping.

mtext(side = 1, line = 3, c("V1", "V2", "V3"), at = c(2, 5, 8))

# Draw box around the whole plot

box()

The values for 'at', 'col' and 'boxwex' would of course need to be
adjusted for your actual number of variables, as would the range of
columns in the lapply() call that Thomas had first incorporated.

Sorry for my confusion earlier and hope that this helps.

Regards,

Marc Schwartz



More information about the R-help mailing list