[R] Calculate mean/var by ID

Adaikalavan Ramasamy a.ramasamy at imperial.ac.uk
Fri Sep 12 17:00:23 CEST 2008


AFAIK, tapply() only works for one variable (apart from the grouping 
variable). It might be perhaps better to use split() here:

    df <- data.frame(ID = c(111, 111, 111, 178, 178, 138, 138, 138, 138),
                     value = c(5, 6, 2, 7, 3, 3, 8, 7, 6),
                     Seg = c(2, 2, 2, 4, 4, 1, 1, 1, 1) )

    df.s <- split( df, df$ID )

    out <- sapply( df.s, function(m){
                     c( mu=mean(m$value), var=var(m$value),
                        min=min(m$Seg), max=max(m$Seg) ) })
    out <- t(out)
              mu      var min max
    111 4.333333 4.333333   2   2
    138 6.000000 4.666667   1   1
    178 5.000000 8.000000   4   4

You could also have used range() here instead of calculating min and max 
separately but naming the resulting columns becomes a bit tricky.

Regards, Adai

PS: If you do a dput() on a subset of the data, you can get a simple 
reproducible example that other R users can easily read in.



Julia Liu wrote:
> Adai,
> 
> Thank you so much for your help. I like your code the best. :) So simple. I have another question though, if you don't mind. I'd like to include another variable in "res". This variable defines the segmentation of each person (ranges, say, from 1 to 4). 
>  ID   value   Seg
> 111     5      2
> 111     6      2
> 111     2      2
> 178     7      4
> 178     3      4
> 138     3      1
> 138     8      1
> 138     7      1
> 138     6      1How to do this? Thank you so much for the help.
> Sincerely
> Julia
> 
> --- On Thu, 9/11/08, Adaikalavan Ramasamy <a.ramasamy at imperial.ac.uk> wrote:
> From: Adaikalavan Ramasamy <a.ramasamy at imperial.ac.uk>
> Subject: Re: [R] Calculate mean/var by ID
> To: "Jorge Ivan Velez" <jorgeivanvelez at gmail.com>
> Cc: "liujb" <liujulia7 at yahoo.com>, r-help at r-project.org
> Date: Thursday, September 11, 2008, 10:28 PM
> 
> A slight variation of what Jorge has proposed is:
> 
>     f <- function(x) c( mu=mean(x), var=var(x) )
> 
>     do.call( "rbind", tapply( df$value, df$ID, f ) )
> 
>              mu      var
>    111 4.333333 4.333333
>    138 6.000000 4.666667
>    178 5.000000 8.000000
> 
> Regards, Adai
> 
> 
> 
> Jorge Ivan Velez wrote:
>> Dear Julia,
>> Try also
>>
>> x=read.table(textConnection("ID    value
>> 111     5
>> 111     6
>> 111     2
>> 178     7
>> 178     3
>> 138     3
>> 138     8
>> 138     7
>> 138     6"),header=TRUE)
>>  closeAllConnections()
>> attach(x)
>>
>> do.call(rbind,tapply(value,ID, function(x){
>> res=c(mean(x,na.rm=TRUE),var(x,na.rm=TRUE))
>> names(res)=c('Mean','Variance')
>> res
>> }
>> )
>> )
>>
>> HTH,
>>
>> Jorge
>>
>>
>>
>>
>> On Thu, Sep 11, 2008 at 1:45 PM, liujb <liujulia7 at yahoo.com> wrote:
>>
>>> Hello,
>>>
>>> I have a data set that looks like this.
>>> ID    value
>>> 111     5
>>> 111     6
>>> 111     2
>>> 178     7
>>> 178     3
>>> 138     3
>>> 138     8
>>> 138     7
>>> 138     6
>>> .
>>> .
>>> .
>>>
>>> I'd like to calculate the mean and var for each object identified
> by the
>>> ID.
>>> I can in theory just loop through the whole thing..., but is there a
> easier
>>> way/command which let me calculate the mean/var by ID?
>>>
>>> Thanks,
>>> Julia
>>> --
>>> View this message in context:
>>>
> http://www.nabble.com/Calculate-mean-var-by-ID-tp19440461p19440461.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
>



More information about the R-help mailing list