[R] aggregate data.frame by one column
Guo Wei-Wei
wwguocn at gmail.com
Fri Jun 30 04:54:47 CEST 2006
Hi, everyone,
I have a data.frame named "eva" like this:
IND PARTNO VC1 EO1 EO2 EO3 EO4 EO5
114 114001 2 5 4 4 5 4
114 114001 2 4 4 4 4 4
114 114001 2 4 NA NA NA NA
112 112002 2 3 3 6 2 6
112 112002 2 1 1 3 4 4
112 112003 2 6 6 6 5 6
112 112003 2 5 7 6 6 6
112 112003 2 6 6 6 4 5
114 114004 2 2 3 3 2 4
114 114004 2 5 3 4 4 2
114 114004 2 NA NA NA NA NA
113 113005 2 5 5 6 6 5
113 113005 2 7 7 4 7 6
111 111006 2 5 7 7 7 7
112 112007 2 7 7 7 2 2
112 112007 2 6 6 6 1 2
112 112007 2 7 6 6 2 2
111 111008 2 4 1 3 1 4
111 111008 2 3 1 5 3 2
This is only a small part of the whole data. "PARTNO" is a digit variable
and I want to use it as a group variable to aggreate other variables.
What I want to get looks like this:
IND PARTNO NUM VC1 EO1 EO2 EO3 EO4 EO5
114 114001 3 2 4.3 4 4 4.5 4
112 112002 2 2 2 2 4.5 3 5
112 112003 3 2 5.7 6.3 6 5 5.7
114 114004 3 2 3.5 3 3.5 3 3
113 113005 2 2 6 6 5 6.5 5.5
111 111006 1 2 5 7 7 7 7
112 112007 3 2 6.7 6.3 6.3 1.7 2
111 111008 2 2 3.5 1 4 2 3
"NUM" is a newly added variable which indicates the case number
of each group grouped by "PARTNO".
I have two questions on this manipulation.
The first is how to get the newly added variable "NUM". I have no idea
on this question.
The second is how to average other variables by group. If there are
"NA", I want
the average operation is done on other cases. For example, the
variable "EO1" has
values of 2, 5, and "NA" on case 114004. What I have done is
> aggregate(eva[,-2], by=eva[,-2], mean)
But it seems because there are "NA"s, the "aggregate" cannot process.
Because the "NA" values are not a small part, I cannot use imputation
methods. I'm not sure whether my operation is right.
Does anyone have any suggestion on the two problems? Thanks in advance!
More information about the R-help
mailing list