[R] Summing by index

David Winsemius dwinsemius at comcast.net
Fri Jul 30 20:46:04 CEST 2010


On Jul 30, 2010, at 2:41 PM, steven mosher wrote:

> # build a sample data frame illustrating the problem
> ids<-c(rep(1234,5),rep(5436,3),rep(7864,4))
> years<-c(seq(1990,1994,by=1),seq(1991,1993,by=1),seq(1990,1993,by=1))
> data<-seq(14,25,by=1)
> data[6]<-NA
> DF<-data.frame(Id=ids,Year=years,Data=data)
> DF
>     Id Year Data
> 1  1234 1990   14
> 2  1234 1991   15
> 3  1234 1992   16
> 4  1234 1993   17
> 5  1234 1994   18
> 6  5436 1991   NA
> 7  5436 1992   20
> 8  5436 1993   21
> 9  7864 1990   22
> 10 7864 1991   23
> 11 7864 1992   24
> 12 7864 1993   25
>
> # The result wanted is a sum of DF$Data, by DF$Id. collect the sum  
> of $Data
> for each $Id
> # the  result would take the form
> #  Id, sum  for each Id
> # Try using BY
> result<-by(DF$Data,INDICES=Data$Id,FUN=sum,na.rm=T)

Try instead:

result<-by(DF$Data,INDICES=DF$Id,FUN=sum,na.rm=T)

-- 
David.
> Error in names(IND) <- deparse(substitute(INDICES))[1L] :
>  'names' attribute [1] must be the same length as the vector [0]
> idx<-as.list(Data$Id)
>
>
> idx2<- 
> list(1234,1234,1234,1234,1234,5436,5436,5436,7864,7864,7864,7864)
> result<-by(DF$Data,INDICES=idx,FUN=sum,na.rm=T)
> result
> [1] 215
> result<-by(DF$Data,INDICES=idx2,FUN=sum,na.rm=T)
> Error in tapply(1L:12L, list(1234, 1234, 1234, 1234, 1234, 5436,  
> 5436,  :
>  arguments must have same length
>> idx
> list()
>> idx[1]
> [[1]]
> NULL
>
>> idx2
> [[1]]
> [1] 1234
>
> [[2]]
> [1] 1234
>
> [[3]]
> [1] 1234
>
> [[4]]
> [1] 1234
>
> [[5]]
> [1] 1234
>
> [[6]]
> [1] 5436
>
> [[7]]
> [1] 5436
>
> [[8]]
> [1] 5436
>
> [[9]]
> [1] 7864
>
> [[10]]
> [1] 7864
>
> [[11]]
> [1] 7864
>
> [[12]]
> [1] 7864
>
> aggregate(DF$Data, by=idx2,sum,na.rm=T)
> Error in aggregate.data.frame(as.data.frame(x), ...) :
>  arguments must have same length
>
> ################################
>
> The instruction that the INDICES must have the same length is  
> confusing me.
> the number of indices will always be less than the number of rows  
> because
> the indices are repeated, we want to sum over multiple instances of  
> the
> indices
> to collect the Sum by index. I'm confused.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list