[R] adding additional information to histogram
jim at bitwrit.com.au
Fri Jan 27 12:30:45 CET 2012
On 01/27/2012 10:07 PM, Raphael Bauduin wrote:
> On Fri, Jan 27, 2012 at 9:51 AM, Jim Lemon<jim at bitwrit.com.au> wrote:
>> On 01/27/2012 03:12 AM, Raphael Bauduin wrote:
>>> I am a beginner with R, and I think the answer to my question will
>>> seem obvious, but after searching and trying without success I've
>>> decided to post to the list.
>>> I am working with data loaded from a csv filewith these fields:
>>> order_id, item_value
>>> As an order can have multiple items, an order_id may be present
>>> multiple times in the CSV.
>>> I managed to compute the total value and the number of items for each
>>> oli<- read.csv("/tmp/order_line_items_data.csv", header=TRUE)
>>> orders_values<- tapply(oli[], oli[], sum)
>>> items_per_order<- tapply(oli[], oli[], length)
>>> I then can display the histogram of the order values:
>>> hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)
>>> Now on this histogram, I would like to display the average number of
>>> items of the orders in each group (defined with the breaks).
>>> So for the bar of orders with value 0 to 10, I'd like to display the
>>> average number of items of these orders.
>> Hi Raph,
>> As this looks a tiny bit like homework, I'll only provide suggestions. You
> This is absolutely not a homework :-)
> I'm learning R to try to get some info out of data of a e-commerce website.
>> have the value and number of items for each order. What you need to do is to
>> match them in groups. In order to do that, you want a factor that will show
>> the group for each value-items pair. The "cut" function will give you such a
>> factor, using the breaks above. You seem to understand the *apply functions,
>> so you can use one of these to return the mean number of items for each
>> value group. Alternatively, you could use the factor in the "by" function to
>> get the mean number of items.
>> You should now have a factor that can be sent to "table" to get the number
>> of orders in each value range, and a vector of the corresponding mean
>> numbers of items in each value grouping. Why you could even use the same
>> trick to calculate the mean price of the orders in each value grouping...
>> I would use "barplot" to display all this information, as it is a bit easier
>> to place the mean number on items on the bars (if you check the return value
>> for barplot).
> Your suggestions helped me get the info I wanted. I still need to
> finetune it as I currently generate 2 barplots.
> Here's what I've done, in case it can help someone in the future:
> #assigns to each entry of orders_values, the range to which is belongs
> according to the breaks passed in second arg.
> order_value_range<-cut(orders_values, c(10*0:20, 800))
> #count number of orders in each range:
> orders_number_per_range=tapply(orders_values, order_value_range, length)
> #equivalent to table(test_o)
> average_number_of_item_per_order_in_range<- tapply(items_per_order,
> order_value_range, mean)
> barplot(average_number_of_item_per_order_in_range, ylab="Items
> number", xlab="Order value")
> barplot(orders_number_per_range, ylab="Items number", xlab="Order value")
> The next step: combine the two barplots in one.
> Thanks already for your help!
Okay, what you want to do is to draw one barplot, then use the text
function (or boxed.labels in plotrix) to put the values of items per
order over or (better for not distorting the height relationship) on the
bars. In the barplot function, you can get the x positions of the bars
from the return value, and of course, you know the heights of the bars...
More information about the R-help