[R] boxplots with multiple numerical variables

Rishabh Gupta rg117 at yahoo.co.uk
Sat Mar 15 02:56:29 CET 2003


Hi,
 I just tried both of the solutions that you provided. They are perfect, exacrly what I was
looking. Many Thanks four your help, I appreciate it.

Rishabh
 --- Marc Schwartz <mschwartz at medanalytics.com> wrote: > >-----Original Message-----
> >From: r-help-bounces at stat.math.ethz.ch 
> >[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Thomas Lumley
> >Sent: Friday, March 14, 2003 4:36 PM
> >To: Rishabh Gupta
> >Cc: r-help at stat.math.ethz.ch
> >Subject: Re: [R] boxplots with multiple numerical variables
> >
> >
> >On Fri, 14 Mar 2003, [iso-8859-1] Rishabh Gupta wrote:
> >
> >> Hi all,
> >>    I have a question regarding the boxplot function. The data I am 
> >> working on has 1 grouping variable (G) and it has many numerical 
> >> variables (V1, V2, V3, V4, Vx, etc). What I would like to do 
> >is create 
> >> a boxplot where the Y-axis represents the numerical values 
> >of variable 
> >> V1...Vx (all the variables have the same range). The X-axis needs
> to 
> >> represent the G-V combination. So suppose the possible values for G
> 
> >> are a, b and c, Then along the x-axis there would be a boxplot for 
> >> each of the combinations:
> >>
> >>   V1Ga, V1Gb, V1Gc, V2Ga, V2Gb, V2Gc, V3Ga, V3Gb, V3Gc,.....VxGa, 
> >> VxGb, VxGc, etc ie
> >>   all values of V1 where the G values are a, all values of V1 where
> 
> >> the G values are b, etc In addition, if possible, it would 
> >be nice if 
> >> each G value would have a a different colour on the plot so 
> >that they 
> >> could be seen more clearly.
> >>
> >> I'm not sure whether such a function already exists within R or 
> >> whether it would have to be written. Either way, I would 
> >appreciate it 
> >> very much if somebody could help and give me some advice as to how
> I 
> >> can achieve this.
> >>
> >
> >I'm going to work with a data frame that has two variables and 
> >a binary grouping factor
> >
> >df<-data.frame(x1=rnorm(100),x2=rnorm(100),g=rep(0:1,50))
> >
> >
> >There's at least two ways to do this.  boxplot() will take a 
> >list of vectors and do boxplots of them, so we can split() 
> >each of the vectors
> >   lapply(df[,1:2], function(v) split(v, df$g))
> >and then combine them into a single list with do.call("c",)
> >and then boxplot() them. That is:
> >   boxplot(do.call("c",lapply(df[,1:2],function(v) 
> >split(v,df$g)))) This labels the x-axis "x1.0" "x1.1", "x2.0", "x2.1"
> >
> >
> >We can also do the opposite: combine the vectors into a single 
> >variable, add a new factor indicating which vector each 
> >observation came from, and use boxplot() with a formula.
> >   ddf<-reshape(df,varying=list(x=c("x1","x2")),direction="long")
> >   boxplot(x1~interaction(time,g),data=ddf)
> >This labels the x-axis "1.0" "2.0" "1.1" "2.1"
> >
> >
> >	-thomas
> 
> 
> After seeing Thomas' reply, I realized that my prior approach is not
> correct.
> 
> Building on Thomas' reply and to address color and box groupings if
> you desire, the following will cycle through three colors, one for
> each letter group. It will also group each variable based upon the a,
> b and c groups. This incorporates my prior thought on boxplot
> groupings and x axis labeling.
> 
> I adjusted Thomas' df to contain a grp column with three grps: a
> (red), b (yellow), and c (green).  
> 
> So:
> 
> df <- data.frame(v1 = rnorm(100), v2 = rnorm(100), 
>                  v3 = rnorm(100), grp = rep(letters[1:3],100))
> 
> # Define 'at' argument positions. There will be 9 boxes generated
> # by the boxplot() call normally centered on x axis1:9. 
> # Using 'at', you can reconfigure these.
> # Note that the center boxes for each variable
> # are at x axis 2, 5 and 8. The other two bars are offset by 0.5
> # before and after the center box.
> 
> at = c(1.5, 2.0, 2.5, 4.5, 5.0, 5.5, 7.5, 8.0, 8.5)
> 
> # Generate plots as per Thomas' example, cycling colors and box
> # positions. Set boxwex for thin boxes. Do not draw axes.
> 
> boxplot(do.call("c",lapply(df[,1:3],function(v) split(v,df$g))), 
>         col = c("red", "yellow", "green"), boxwex = 0.2, at = at, 
>         axes = FALSE)
> 
> # Draw y axis
> axis(2)
> 
> # Draw x axis, creating letter labels below each box
> axis(1, labels = rep(letters[1:3], 3), at = at)
> 
> # Now label each group of 3 boxes with the varname in the middle
> # of each grouping.
> 
> mtext(side = 1, line = 3, c("V1", "V2", "V3"), at = c(2, 5, 8))
> 
> # Draw box around the whole plot
> 
> box()
> 
> The values for 'at', 'col' and 'boxwex' would of course need to be
> adjusted for your actual number of variables, as would the range of
> columns in the lapply() call that Thomas had first incorporated.
> 
> Sorry for my confusion earlier and hope that this helps.
> 
> Regards,
> 
> Marc Schwartz
> 
>  

__________________________________________________

Everything you'll ever need on one web page
from News and Sport to Email and Music Charts



More information about the R-help mailing list