[R] aggregate data.frame by one column
Andrew Robinson
A.Robinson at ms.unimelb.edu.au
Fri Jun 30 05:25:04 CEST 2006
Hi Wei-Wei,
try this:
eva.agg <- aggregate(x = list(
VC1=eva$VC1,
EO1=eva$EO1,
EO2=eva$EO2,
EO3=eva$EO3,
EO4=eva$EO4,
EO5=eva$EO5
),
by = list(PARTNO=eva$PARTNO),
FUN = mean, na.rm = TRUE)
eva.agg$NUM <- aggregate(eva$PARTNO, list(eva$PARTNO), length)
Cheers
Andrew
On Fri, Jun 30, 2006 at 10:54:47AM +0800, Guo Wei-Wei wrote:
> Hi, everyone,
>
> I have a data.frame named "eva" like this:
>
> IND PARTNO VC1 EO1 EO2 EO3 EO4 EO5
> 114 114001 2 5 4 4 5 4
> 114 114001 2 4 4 4 4 4
> 114 114001 2 4 NA NA NA NA
> 112 112002 2 3 3 6 2 6
> 112 112002 2 1 1 3 4 4
> 112 112003 2 6 6 6 5 6
> 112 112003 2 5 7 6 6 6
> 112 112003 2 6 6 6 4 5
> 114 114004 2 2 3 3 2 4
> 114 114004 2 5 3 4 4 2
> 114 114004 2 NA NA NA NA NA
> 113 113005 2 5 5 6 6 5
> 113 113005 2 7 7 4 7 6
> 111 111006 2 5 7 7 7 7
> 112 112007 2 7 7 7 2 2
> 112 112007 2 6 6 6 1 2
> 112 112007 2 7 6 6 2 2
> 111 111008 2 4 1 3 1 4
> 111 111008 2 3 1 5 3 2
>
> This is only a small part of the whole data. "PARTNO" is a digit variable
> and I want to use it as a group variable to aggreate other variables.
> What I want to get looks like this:
>
> IND PARTNO NUM VC1 EO1 EO2 EO3 EO4 EO5
> 114 114001 3 2 4.3 4 4 4.5 4
> 112 112002 2 2 2 2 4.5 3 5
> 112 112003 3 2 5.7 6.3 6 5 5.7
> 114 114004 3 2 3.5 3 3.5 3 3
> 113 113005 2 2 6 6 5 6.5 5.5
> 111 111006 1 2 5 7 7 7 7
> 112 112007 3 2 6.7 6.3 6.3 1.7 2
> 111 111008 2 2 3.5 1 4 2 3
>
> "NUM" is a newly added variable which indicates the case number
> of each group grouped by "PARTNO".
>
> I have two questions on this manipulation.
>
> The first is how to get the newly added variable "NUM". I have no idea
> on this question.
>
> The second is how to average other variables by group. If there are
> "NA", I want
> the average operation is done on other cases. For example, the
> variable "EO1" has
> values of 2, 5, and "NA" on case 114004. What I have done is
>
> > aggregate(eva[,-2], by=eva[,-2], mean)
>
> But it seems because there are "NA"s, the "aggregate" cannot process.
> Because the "NA" values are not a small part, I cannot use imputation
> methods. I'm not sure whether my operation is right.
>
> Does anyone have any suggestion on the two problems? Thanks in advance!
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
Andrew Robinson
Department of Mathematics and Statistics Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au http://www.ms.unimelb.edu.au
More information about the R-help
mailing list