[R] categorical data
Marc Schwartz (via MN)
mschwartz at mn.rr.com
Wed Aug 9 21:04:18 CEST 2006
On Wed, 2006-08-09 at 18:07 +0200, Christian Oswald wrote:
> Dear List,
>
> I neeed a grouped list with two sort of categorical data. I have a data
> .frame like this.
> year cat. b c
> 1 2006 a1 125 212
> 2 2006 a2 256 212
> 3 2005 a1 14 12
> 4 2004 a3 565 123
> 5 2004 a2 156 789
> 6 2005 a1 1 456
> 7 2003 a2 786 123
> 8 2003 a1 421 569
> 9 2002 a2 425 245
>
> I need a list with the sum of b and c for every year and every cat (a1,
> a2 or a3) in this year. I had used the tapply function to build the sum
> for every year or every cat. How can I combine the two grouping values?
Christian,
Is that what you want (using DF as your data.frame):
> aggregate(DF[, c("b", "c")],
by = list(Year = DF$year, Cat = DF$cat.),
sum)
Year Cat b c
1 2003 a1 421 569
2 2005 a1 15 468
3 2006 a1 125 212
4 2002 a2 425 245
5 2003 a2 786 123
6 2004 a2 156 789
7 2006 a2 256 212
8 2004 a3 565 123
You can also reorder the results by Year and Cat:
> DF.result <- aggregate(DF[, c("b", "c")],
by = list(Year = DFyear, Cat = DF$cat.),
sum)
> DF.result[order(DF.result$Year, DF.result$Cat), ]
Year Cat b c
4 2002 a2 425 245
1 2003 a1 421 569
5 2003 a2 786 123
6 2004 a2 156 789
8 2004 a3 565 123
2 2005 a1 15 468
3 2006 a1 125 212
7 2006 a2 256 212
Note that tapply() can only handle one 'X' vector at a time, whereas
aggregate can handle multiple 'X' columns in one call. For example:
> tapply(DF$b, list(DF$year, DF$cat.), sum)
a1 a2 a3
2002 NA 425 NA
2003 421 786 NA
2004 NA 156 565
2005 15 NA NA
2006 125 256 NA
will give you the sum of 'b' for each combination of Year and Cat within
the 2d table, but I suspect this is not the output format you want. You
also get NA's in the cells where there was not the given combination
present in your data.
HTH,
Marc Schwartz
More information about the R-help
mailing list