[R] Calculated mean value based on another column bin from dataframe.
David Winsemius
dwinsemius at comcast.net
Wed Apr 6 23:20:52 CEST 2011
On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:
> Dear Henrique Dallazuanna,
>
> Thank you very much for your suggestion.
>
> It is obvious that your method is better than me.
>
> Is it possible to use cut, table,by etc? Whether there is some
> aggregate function in R can do this?
>
> Thanks.
>
> On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna
> <wwwhsd at gmail.com> wrote:
>> Try this:
>>
>> fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
>> 1)], '>=', e1 = dat[,1])
>> mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))
>>
>> On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup at gmail.com
>> > wrote:
>>> Dear list,
>>>
>>> I have a dataframe with two column as fellow.
>>>
>>>> head(dat)
>>> V1 V2
>>> 0.15624 0.94567
>>> 0.26039 0.66442
>>> 0.16629 0.97822
>>> 0.23474 0.72079
>>> 0.11037 0.83760
>>> 0.14969 0.91312
>>>
>>> I want to get the column V2 mean value based on the bin of column of
>>> V1. I write the code as fellow. It works, but I think this is not
>>> the
>>> elegant way. Any suggestions?
>>>
>>> dat<-read.table("dat.txt",head=F)
>>> ran<-seq(0,0.5,0.05)
>>> mm<-NULL
>>> for (i in c(1:(length(ran)-1)))
>>> {
>>> fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
>>> m<-mean(dat[fil,2])
>>> mm<-c(mm,m)
>>> }
>>> mm
>>>
>>> Here is the first 20 lines of my data.
>>>
>>>> dput(head(dat,20))
>>> structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
>>> 0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
>>> 0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
>>> 0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
>>> 0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
>>> 0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
>>> )), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =
>>> "data.frame")
>>>
>>> ______________________________________________
Here is how I would have done it with findInterval and tapply which is
very similar to using a `cut` and `table` approach:
> dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) )
> tapply(dat$V2, dat$grp, mean)
1 2 3 4 5 6 8
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
#####---------------
You do not get exactly the same form of the result as with Henrique's
method. His yields:
> mm
[1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450
0.7269900 NaN
[8] 0.6978900 NaN NaN NaN
####----------------
The cut approach would yield this, which is more informatively
labeled. (I'm wasn't completely sure the second to last word in the
prior sentence was a real word, but several dictionaries seem to think
so.):
> dat$grp2 <- cut(dat$V1 , breaks=ran)
> tapply(dat$V2, dat$grp2, mean)
(0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
NA 0.6978900 NA NA
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list