[R] Create variables with common values for each group
Chuck Cleland
ccleland at optonline.net
Tue Jun 20 11:02:04 CEST 2006
Stephan Lindner wrote:
> Dear all,
>
> sorry, this is for sure really basic, but I searched a lot in the
> internet, and just couldn't find a solution.
>
> The problem is to create new variables from a data frame which
> contains both individual and group variables, such as mean age for an
> household. My data frame:
>
>
>
> df
>
> hhid h.age
> 1 10010020 23
> 2 10010020 23
> 3 10010126 42
> 4 10010126 60
> 5 10010142 20
> 6 10010142 49
> 7 10010142 52
> 8 10010150 18
> 9 10010150 51
> 10 10010150 28
>
>
> where hhid is the same number for each household, h.age the age for
> each household member.
>
> I tried tapply, by(), and aggregate. The best I could get was:
>
> by(df, df$hhid, function(subset) rep(mean(subset$h.age,na.rm=T),nrow(subset)))
>
> df$hhid: 10010020
> [1] 23 23
> ------------------------------------------------------------
> df$hhid: 10010126
> [1] 51 51
> ------------------------------------------------------------
> df$hhid: 10010142
> [1] 40.33333 40.33333 40.33333
> ------------------------------------------------------------
> df$hhid: 10010150
> [1] 32.33333 32.33333 32.33333
>
>
> Now I principally only would have to stack up the mean values, and
> this is where I'm stucked. The function aggregate works nice, and I
> could loop then, but I was wondering whether there is a better way to
> do that.
You could use aggregate() and then merge() the result with df.
Something like this:
> df.agg <- aggregate(df$h.age, list(hhid = df$hhid), mean)
>
> names(df.agg)[2] <- "mean.age"
>
> merge(df, df.agg)
hhid h.age mean.age
1 10010020 23 23.00000
2 10010020 23 23.00000
3 10010126 42 51.00000
4 10010126 60 51.00000
5 10010142 20 40.33333
6 10010142 49 40.33333
7 10010142 52 40.33333
8 10010150 18 32.33333
9 10010150 51 32.33333
10 10010150 28 32.33333
> My end result should look like this (assigning mean.age to the data frame):
>
>
>
> hhid h.age mean.age
> 1 10010020 23 23.00
> 2 10010020 23 23.00
> 3 10010126 42 51.00
> 4 10010126 60 51.00
> 5 10010142 20 40.33
> 6 10010142 49 40.33
> 7 10010142 52 40.33
> 8 10010150 18 32.33
> 9 10010150 51 32.33
> 10 10010150 28 32.33
>
>
>
> Cheers, and thanks a lot,
>
>
> Stephan Lindner
>
>
>
>
--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
More information about the R-help
mailing list