[R] aggregate function with a dataframe for both "x" and "by"

David Winsemius dwinsemius at comcast.net
Thu Oct 6 05:47:39 CEST 2011


On Oct 5, 2011, at 7:45 PM, Eva Powers wrote:

> I have 2 dataframes.  "mydata" contains numerical data. "mybys"  
> contains
> information on the "group" each row of the data is in.  I wish to  
> aggregate
> each column in mydata using the corresponding column in mybys.

corresponding?

>
> Please see the example below.  What is a more elegant or "better"  
> way to
> accomplish this task?
>
> mydata = data.frame(testvar1=c(1,3,5,7,8,3,5,NA,4,5,7,9),
> testvar2=c(11,33,55,77,88,33,55,NA,44,55,77,99) )
>
> mybys=data.frame(mbn1=c('red','blue',1,2,NA,'big',1,2,'red',1,NA, 
> 12),mbn2=c('wet','dry',99,95,NA,'damp',95,99,'red',99,NA,NA) ,  
> stringsAsFactors =F)
>
> myaggs <- data.frame(matrix(data=NA, nrow=nrow(mydata),  
> ncol=ncol(mydata) ) )
>
> for(i in 1: ncol(mydata) ) {temp <- aggregate(mydata[i], by =  
> as.list(mybys[i]), FUN=sum, na.rm=T)
> rownums <- match(mybys[,i],temp[,1])
> myaggs[,i] <- temp[rownums,2] }
> myaggs
>
> Finally, how do I convert and use "mybys" to factors, so that I can  
> tell R
> that the NA values form a group?
>
> I tried substituting this line above:
>
> temp <- aggregate(mydata[,i], by = as.list(mybys[,i]), FUN=sum,  
> na.rm=T)
>
> ... but get the error message: "Error in
> aggregate.data.frame(as.data.frame(x), ...) :
>  arguments must have same length"
>


David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list