[R] Fwd: construct boxplots from data with varying column widths

David Winsemius dwinsemius at comcast.net
Sat Jul 16 19:27:12 CEST 2011




From: David Winsemius <dwinsemius at comcast.net>

On Jul 16, 2011, at 12:15 PM, Rory Campbell-Lange wrote:

> On 16/07/11, David Winsemius (dwinsemius at comcast.net) wrote:
>>
>> On Jul 16, 2011, at 11:19 AM, Rory Campbell-Lange wrote:
>>
>>> I'm an R beginner, and I would like to construct a set of boxplots
>>> showing database function runtimes.
>
>>> I can easily reformat the base data to provide it to R in a format
>>> such as:
>>>
>>> function1,12.5
>>> function1,13.11
>>> function1,35.2
>>> ...
>
>> That is definitely to be preferred. Read that into R and show us the
>> results of str on your R data object.
>
> Thanks for your suggestion.
>
>> str(data2)
>   'data.frame':   1940170 obs. of  2 variables:
>    $ function.: Factor w/ 127 levels "fn_activities01_list",..: 102  
> 102 102 102 102 102 102 102 102 102 ...
>    $ runtime  : num  38.1 32.4 41.2 92.9 130.5 ..
>
>> head(data2)
>              function. runtime
>   1 fn_slot03_byperson  38.083
>   2 fn_slot03_byperson  32.396
>   3 fn_slot03_byperson  41.246
>   4 fn_slot03_byperson  92.904
>   5 fn_slot03_byperson 130.512
>   6 fn_slot03_byperson 113.853
>
>   tmp <- data2[data2$dbfunc=='fn_slot03_byperson',]
>> length(tmp$runtime)
>   [1] 24004
>> ave(tmp$runtime)[1]
>   [1] 41.8108

I would have guessed you would get an error, but maybe if ave() is  
given no grouping factor it just returns a grand mean.

Try instead one of these:

aggregate(data2, data2$function. , FUN=mean)

tapply(data2$runtime, data2$function. , FUN=mean)

data2$grpmean <- ave( data2$runtime, data2$function. , FUN=mean)

The last one adds a column in the dataframe and could be useful for  
identifying items that are some particular diastance away from thier  
group mean.


-- 


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list