[R] boxplots with multiple numerical variables
Rishabh Gupta
rg117 at yahoo.co.uk
Sat Mar 15 02:56:29 CET 2003
Hi,
I just tried both of the solutions that you provided. They are perfect, exacrly what I was
looking. Many Thanks four your help, I appreciate it.
Rishabh
--- Marc Schwartz <mschwartz at medanalytics.com> wrote: > >-----Original Message-----
> >From: r-help-bounces at stat.math.ethz.ch
> >[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Thomas Lumley
> >Sent: Friday, March 14, 2003 4:36 PM
> >To: Rishabh Gupta
> >Cc: r-help at stat.math.ethz.ch
> >Subject: Re: [R] boxplots with multiple numerical variables
> >
> >
> >On Fri, 14 Mar 2003, [iso-8859-1] Rishabh Gupta wrote:
> >
> >> Hi all,
> >> I have a question regarding the boxplot function. The data I am
> >> working on has 1 grouping variable (G) and it has many numerical
> >> variables (V1, V2, V3, V4, Vx, etc). What I would like to do
> >is create
> >> a boxplot where the Y-axis represents the numerical values
> >of variable
> >> V1...Vx (all the variables have the same range). The X-axis needs
> to
> >> represent the G-V combination. So suppose the possible values for G
>
> >> are a, b and c, Then along the x-axis there would be a boxplot for
> >> each of the combinations:
> >>
> >> V1Ga, V1Gb, V1Gc, V2Ga, V2Gb, V2Gc, V3Ga, V3Gb, V3Gc,.....VxGa,
> >> VxGb, VxGc, etc ie
> >> all values of V1 where the G values are a, all values of V1 where
>
> >> the G values are b, etc In addition, if possible, it would
> >be nice if
> >> each G value would have a a different colour on the plot so
> >that they
> >> could be seen more clearly.
> >>
> >> I'm not sure whether such a function already exists within R or
> >> whether it would have to be written. Either way, I would
> >appreciate it
> >> very much if somebody could help and give me some advice as to how
> I
> >> can achieve this.
> >>
> >
> >I'm going to work with a data frame that has two variables and
> >a binary grouping factor
> >
> >df<-data.frame(x1=rnorm(100),x2=rnorm(100),g=rep(0:1,50))
> >
> >
> >There's at least two ways to do this. boxplot() will take a
> >list of vectors and do boxplots of them, so we can split()
> >each of the vectors
> > lapply(df[,1:2], function(v) split(v, df$g))
> >and then combine them into a single list with do.call("c",)
> >and then boxplot() them. That is:
> > boxplot(do.call("c",lapply(df[,1:2],function(v)
> >split(v,df$g)))) This labels the x-axis "x1.0" "x1.1", "x2.0", "x2.1"
> >
> >
> >We can also do the opposite: combine the vectors into a single
> >variable, add a new factor indicating which vector each
> >observation came from, and use boxplot() with a formula.
> > ddf<-reshape(df,varying=list(x=c("x1","x2")),direction="long")
> > boxplot(x1~interaction(time,g),data=ddf)
> >This labels the x-axis "1.0" "2.0" "1.1" "2.1"
> >
> >
> > -thomas
>
>
> After seeing Thomas' reply, I realized that my prior approach is not
> correct.
>
> Building on Thomas' reply and to address color and box groupings if
> you desire, the following will cycle through three colors, one for
> each letter group. It will also group each variable based upon the a,
> b and c groups. This incorporates my prior thought on boxplot
> groupings and x axis labeling.
>
> I adjusted Thomas' df to contain a grp column with three grps: a
> (red), b (yellow), and c (green).
>
> So:
>
> df <- data.frame(v1 = rnorm(100), v2 = rnorm(100),
> v3 = rnorm(100), grp = rep(letters[1:3],100))
>
> # Define 'at' argument positions. There will be 9 boxes generated
> # by the boxplot() call normally centered on x axis1:9.
> # Using 'at', you can reconfigure these.
> # Note that the center boxes for each variable
> # are at x axis 2, 5 and 8. The other two bars are offset by 0.5
> # before and after the center box.
>
> at = c(1.5, 2.0, 2.5, 4.5, 5.0, 5.5, 7.5, 8.0, 8.5)
>
> # Generate plots as per Thomas' example, cycling colors and box
> # positions. Set boxwex for thin boxes. Do not draw axes.
>
> boxplot(do.call("c",lapply(df[,1:3],function(v) split(v,df$g))),
> col = c("red", "yellow", "green"), boxwex = 0.2, at = at,
> axes = FALSE)
>
> # Draw y axis
> axis(2)
>
> # Draw x axis, creating letter labels below each box
> axis(1, labels = rep(letters[1:3], 3), at = at)
>
> # Now label each group of 3 boxes with the varname in the middle
> # of each grouping.
>
> mtext(side = 1, line = 3, c("V1", "V2", "V3"), at = c(2, 5, 8))
>
> # Draw box around the whole plot
>
> box()
>
> The values for 'at', 'col' and 'boxwex' would of course need to be
> adjusted for your actual number of variables, as would the range of
> columns in the lapply() call that Thomas had first incorporated.
>
> Sorry for my confusion earlier and hope that this helps.
>
> Regards,
>
> Marc Schwartz
>
>
__________________________________________________
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
More information about the R-help
mailing list