[R] construct boxplots from data with varying column widths

Rory Campbell-Lange rory at campbell-lange.net
Sun Jul 17 06:47:24 CEST 2011


On 16/07/11, David Winsemius (dwinsemius at comcast.net) wrote:
> From: David Winsemius <dwinsemius at comcast.net>
> On Jul 16, 2011, at 12:15 PM, Rory Campbell-Lange wrote:
> >On 16/07/11, David Winsemius (dwinsemius at comcast.net) wrote:
> >>On Jul 16, 2011, at 11:19 AM, Rory Campbell-Lange wrote:
> >>
> >>>I'm an R beginner, and I would like to construct a set of boxplots
> >>>showing database function runtimes.
> >
> >>>I can easily reformat the base data to provide it to R in a format
> >>>such as:
> >>>
> >>>function1,12.5
> >>>function1,13.11
> >>>function1,35.2
> >
> I would have guessed you would get an error, but maybe if ave() is
> given no grouping factor it just returns a grand mean.

You are correct, and my apologies for cross posting this question here
but also on stackoverflow.

> Try instead one of these:
> 
> aggregate(data2, data2$function. , FUN=mean)
> 
> tapply(data2$runtime, data2$function. , FUN=mean)

The two above error because of 'by' 

    > aggregate(data2, data2$dbfunc , FUN=mean)
    Error in aggregate.data.frame(data2, data2$dbfunc, FUN = mean) : 
      'by' must be a list

I tried to construct a list of names for the 'by' clause and tried
again:
    
    > funcnames <- levels(data2$dbfunc)
    aggregate(data2, funcnames , FUN=mean)
   
but that causes the same error.

> data2$grpmean <- ave( data2$runtime, data2$function. , FUN=mean)
> 
> The last one adds a column in the dataframe and could be useful for
> identifying items that are some particular diastance away from thier
> group mean.

I failed initially to see the purpose of adding the grpmean column.
However, I think I now 'get it' -- it allows one to filter.

    a. build data frame
                      dbfunc runtime 
        1 fn_slot03_byperson  38.083 
        2 fn_slot03_byperson  32.396 
        3 fn_slot03_byperson  41.246 
        4 fn_slot03_byperson  92.904 
        5 fn_slot03_byperson 130.512 
        6 fn_slot03_byperson 113.853 

    b. add groupmean

       data2$grpmean <- ave(data2$runtime, data2$dbfunc. , FUN=mean)
                      dbfunc runtime grpmean
        1 fn_slot03_byperson  38.083 41.8108
        2 fn_slot03_byperson  32.396 41.8108
        3 fn_slot03_byperson  41.246 41.8108
        4 fn_slot03_byperson  92.904 41.8108
        5 fn_slot03_byperson 130.512 41.8108
        6 fn_slot03_byperson 113.853 41.8108

    c. filter by grpmean where grpmean over 150 ms

       data3 <- data2[data2$grpmean > 150,]

    d. attempt to plot
      
       boxplot(runtime ~ dbfunc, data3)
       this produces a set of circles for each function, rather that the box
       and whisker plot I'm expecting.

I'm not sure how to 'fold' the results to get the equivalent of an SQL
'group by' in the results.

Thanks very much for your help, and my apologies for the cross-posting
on stackoverflow
(http://stackoverflow.com/questions/6720036/r-summarise-data-frame-with-repeating-rows-into-boxplots)

Rory



More information about the R-help mailing list