[R] Summarize by two-column factor, retaining original factors
Matt Crawford
mcrawford at gmail.com
Fri Feb 24 17:18:36 CET 2006
I am having trouble doing the following. I have a data.frame like
this, where x and y are a variable that I want to do calculations on:
Name Year x y
ab 2001 15 3
ab 2001 10 2
ab 2002 12 8
ab 2003 7 10
dv 2002 10 15
dv 2002 3 2
dv 2003 1 15
Before I do all the other things I need to do with this data, I need
to summarize or collapse the data by name and year. I've found that I
can do things like
nameyear<-interaction(name,year)
dataframe$nameyear<-nameyear
tapply(dataframe$x,dataframe$nameyear,sum)
tapply(dataframe$y,dataframe$nameyear,sum)
and then bind those together.
But my problem is that I need to somehow retain the original Names in
my collapsed dataset, so that later I can do analyses with the Name
factors. All I can think of is something like
tapply(dataframe$Name,dataframe$nameyear, somefunction?)
but nothing seems to work.
I'm actually trying to convert a SAS program, and I can't get out of
that mindset. There, it's a simple Proc Means, By Name Year.
Thanks for any help or suggestions on the right way to go about this.
Matt Crawford
More information about the R-help
mailing list