[R] Aggregation across two variables in data.table

Michael Haenlein haenlein at escpeurope.eu
Thu Dec 14 08:48:29 CET 2017


Dear all,

I have a data.frame that includes a series of demographic variables for a
set of respondents plus a dependent variable (Theta). For example:

   Age                                Education       Marital Familysize
Income                Housing    Theta
1:  50                         Associate degree      Divorced          4
 70K+    Owned with mortgage 9.147777
2:  65                          Bachelor degree       Married          1
10-15K Owned without mortgage 7.345036
3:  33                          Bachelor degree       Married          2
30-40K    Owned with mortgage 7.974937
4:  69                          Bachelor degree Never married          1
 70K+    Owned with mortgage 7.733053
5:  54 Some college, less than college graduate Never married          3
30-40K                 Rented 7.648642
6:  35                         Associate degree     Separated          2
10-15K                 Rented 7.496411

My objective is to calculate the average of Theta across all pairs of two
demographics.

For 1 demographic this is straightforward:

Demo_names <- c("Age", "Education", "Marital", "Familysize", "Income",
"Housing")
means1 <- as.list(rep(0, length(Demo_names)))
for (i in 1:length(Demo_names)) {
Demo_tmp <- Demo_names[i]
means1[[i]] <- data_tmp[,list(mean(Theta)),by=Demo_tmp]}

Is there an easy way to extent this logic to more than 1 variable? I know
how to do this manually, e.g.,
data_tmp[,list(mean(Theta)),by=list(Marital, Education)]

But I don't know how to integrate this into a loop.

Thanks,

Michael

	[[alternative HTML version deleted]]



More information about the R-help mailing list