[R] Calculate mean/var by ID
Adaikalavan Ramasamy
a.ramasamy at imperial.ac.uk
Fri Sep 12 17:00:23 CEST 2008
AFAIK, tapply() only works for one variable (apart from the grouping
variable). It might be perhaps better to use split() here:
df <- data.frame(ID = c(111, 111, 111, 178, 178, 138, 138, 138, 138),
value = c(5, 6, 2, 7, 3, 3, 8, 7, 6),
Seg = c(2, 2, 2, 4, 4, 1, 1, 1, 1) )
df.s <- split( df, df$ID )
out <- sapply( df.s, function(m){
c( mu=mean(m$value), var=var(m$value),
min=min(m$Seg), max=max(m$Seg) ) })
out <- t(out)
mu var min max
111 4.333333 4.333333 2 2
138 6.000000 4.666667 1 1
178 5.000000 8.000000 4 4
You could also have used range() here instead of calculating min and max
separately but naming the resulting columns becomes a bit tricky.
Regards, Adai
PS: If you do a dput() on a subset of the data, you can get a simple
reproducible example that other R users can easily read in.
Julia Liu wrote:
> Adai,
>
> Thank you so much for your help. I like your code the best. :) So simple. I have another question though, if you don't mind. I'd like to include another variable in "res". This variable defines the segmentation of each person (ranges, say, from 1 to 4).
> ID value Seg
> 111 5 2
> 111 6 2
> 111 2 2
> 178 7 4
> 178 3 4
> 138 3 1
> 138 8 1
> 138 7 1
> 138 6 1How to do this? Thank you so much for the help.
> Sincerely
> Julia
>
> --- On Thu, 9/11/08, Adaikalavan Ramasamy <a.ramasamy at imperial.ac.uk> wrote:
> From: Adaikalavan Ramasamy <a.ramasamy at imperial.ac.uk>
> Subject: Re: [R] Calculate mean/var by ID
> To: "Jorge Ivan Velez" <jorgeivanvelez at gmail.com>
> Cc: "liujb" <liujulia7 at yahoo.com>, r-help at r-project.org
> Date: Thursday, September 11, 2008, 10:28 PM
>
> A slight variation of what Jorge has proposed is:
>
> f <- function(x) c( mu=mean(x), var=var(x) )
>
> do.call( "rbind", tapply( df$value, df$ID, f ) )
>
> mu var
> 111 4.333333 4.333333
> 138 6.000000 4.666667
> 178 5.000000 8.000000
>
> Regards, Adai
>
>
>
> Jorge Ivan Velez wrote:
>> Dear Julia,
>> Try also
>>
>> x=read.table(textConnection("ID value
>> 111 5
>> 111 6
>> 111 2
>> 178 7
>> 178 3
>> 138 3
>> 138 8
>> 138 7
>> 138 6"),header=TRUE)
>> closeAllConnections()
>> attach(x)
>>
>> do.call(rbind,tapply(value,ID, function(x){
>> res=c(mean(x,na.rm=TRUE),var(x,na.rm=TRUE))
>> names(res)=c('Mean','Variance')
>> res
>> }
>> )
>> )
>>
>> HTH,
>>
>> Jorge
>>
>>
>>
>>
>> On Thu, Sep 11, 2008 at 1:45 PM, liujb <liujulia7 at yahoo.com> wrote:
>>
>>> Hello,
>>>
>>> I have a data set that looks like this.
>>> ID value
>>> 111 5
>>> 111 6
>>> 111 2
>>> 178 7
>>> 178 3
>>> 138 3
>>> 138 8
>>> 138 7
>>> 138 6
>>> .
>>> .
>>> .
>>>
>>> I'd like to calculate the mean and var for each object identified
> by the
>>> ID.
>>> I can in theory just loop through the whole thing..., but is there a
> easier
>>> way/command which let me calculate the mean/var by ID?
>>>
>>> Thanks,
>>> Julia
>>> --
>>> View this message in context:
>>>
> http://www.nabble.com/Calculate-mean-var-by-ID-tp19440461p19440461.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
More information about the R-help
mailing list