[R] basic question
Marc Schwartz
MSchwartz at MedAnalytics.com
Thu Apr 21 18:11:33 CEST 2005
On Thu, 2005-04-21 at 16:31 +0100, jose silva wrote:
> I know this question is very simple, but I am not figure it out
> I have the data frame:
> test<- data.frame(year=c(2000,2000,2001,2001),x=c(54,41,90,15), y=c(29,2,92,22), z=c(26,68,46,51))
> test
> year x y z
> 1 2000 54 29 26
> 2 2000 41 2 68
> 3 2001 90 92 46
> 4 2001 15 22 51
> I want to sum the vectors x, y and z within each year (2000 and 2001) to obtain this:
>
> year x y z
> 1 2000 95 31 94
> 2 2001 105 114 97
> I tried tapply but did not work (or probably I do it wrong)
>
> Any suggestions?
tapply() is typically used against a single vector, subsetting by one or
more factors.
In this case, since you want to get the colSums for more than one column
in the data frame, there are a few options:
1. Use by():
> by(test[, -1], test$year, colSums)
test$year: 2000
x y z
95 31 94
------------------------------------------------------
test$year: 2001
x y z
105 114 97
2. Use aggregate():
> aggregate(test[, -1], list(Year = test$year), sum)
Year x y z
1 2000 95 31 94
2 2001 105 114 97
3. Use split() and then lapply():
> test.s <- split(test, test$year)
> test.s
$"2000"
year x y z
1 2000 54 29 26
2 2000 41 2 68
$"2001"
year x y z
3 2001 90 92 46
4 2001 15 22 51
> lapply(test.s, function(x) colSums(x[, -1]))
$"2000"
x y z
95 31 94
$"2001"
x y z
105 114 97
Which you choose may depend upon how you need the output structured for
subsequent use.
See ?by, ?aggregate, ?lapply and ?split for more information.
HTH,
Marc Schwartz
More information about the R-help
mailing list