# [R] Calculated mean value based on another column bin from dataframe.

David Winsemius dwinsemius at comcast.net
Wed Apr 6 23:20:52 CEST 2011

```On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:

> Dear Henrique Dallazuanna,
>
> Thank you very much for your suggestion.
>
> It is obvious that your method is better than me.
>
> Is it possible to use cut, table,by etc? Whether there is some
> aggregate function in R can do this?
>
> Thanks.
>
> On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna
> <wwwhsd at gmail.com> wrote:
>> Try this:
>>
>> fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
>> 1)], '>=', e1 = dat[,1])
>> mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))
>>
>> On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup at gmail.com
>> > wrote:
>>> Dear list,
>>>
>>> I have a dataframe with two column as fellow.
>>>
>>>       V1      V2
>>>  0.15624 0.94567
>>>  0.26039 0.66442
>>>  0.16629 0.97822
>>>  0.23474 0.72079
>>>  0.11037 0.83760
>>>  0.14969 0.91312
>>>
>>> I want to get the column V2 mean value based on the bin of column of
>>> V1. I write the code as fellow. It works, but I think this is not
>>> the
>>> elegant way. Any suggestions?
>>>
>>> ran<-seq(0,0.5,0.05)
>>> mm<-NULL
>>> for (i in c(1:(length(ran)-1)))
>>> {
>>>    fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
>>>    m<-mean(dat[fil,2])
>>>    mm<-c(mm,m)
>>> }
>>> mm
>>>
>>> Here is the first 20 lines of my data.
>>>
>>> structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
>>> 0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
>>> 0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
>>> 0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
>>> 0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
>>> 0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
>>> )), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =
>>> "data.frame")
>>>
>>> ______________________________________________

Here is how I would have done it with findInterval and tapply which is
very similar to using a `cut` and `table` approach:

> dat\$grp <- findInterval(dat\$V1, seq(0,0.5,0.05) )
> tapply(dat\$V2, dat\$grp, mean)
1         2         3         4         5         6         8
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
#####---------------

You do not get exactly the same form of the result as with Henrique's
method. His yields:
> mm
[1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450
0.7269900       NaN
[8] 0.6978900       NaN       NaN       NaN

####----------------

labeled. (I'm wasn't completely sure the second to last word in the
prior sentence was a real word, but several dictionaries seem to think
so.):

> dat\$grp2 <- cut(dat\$V1 , breaks=ran)
> tapply(dat\$V2, dat\$grp2, mean)
(0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
0.9252300  0.8836100  0.9135429  0.9213600  0.8493450  0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
NA  0.6978900         NA         NA

>

David Winsemius, MD
West Hartford, CT

```