[R-sig-ME] Repeated measures in Census Tract Data:

Tue May 2 20:25:33 CEST 2023

Hi all,

I am returning to multilevel models after a two-year program in ML techniques.

What I'm doing is nothing fancy, just accounting for the fact that different census tract variables (like median income, median house value, population density, percent those living in same house as they did a year ago, income disparity, percent white/black/Asian race, etc.) will be correlated.

There is 1 GEOID per row (sample of first four rows with reduced columns):
```
df  =
structure(list(kfr_pooled_pooled_p25 = c(0.45727569, 0.51709598,
0.45559084, 0.42194119), GEOID = c(6073000100, 6073000201, 6073000202,
6073000300), lognorm_crime = c(3.25809653802148, 3.66356164612965,
4.49980967033027, 4.00733318523247), median_hh_ = c(138879, 88125,
76658, 68679), income_gin = c(0.533, 0.5175, 0.459, 0.4416),
    per_white = c(0.907856450048497, 0.877313590692755, 0.857551739321885,
    0.852452758159954), per_black = c(0, 0.005288207297726, 0.000880669308675,
    0.061462111089903)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
```

I was surprised to get this error (number of levels of each grouping factor must be < number of observations) in specifying a null model:
```null = lmer(opportunity_score ~ 1 + (1|GEOID), df, REML = FALSE, control = lmerControl(optimizer = "bobyqa"))```

Which brought me to this post: https://stackoverflow.com/questions/19713228/lmer-error-grouping-factor-must-be-number-of-observations with Ben (Bolker). And it seems that I was just mistaken, as I thought the 1|GEOID would account for the correlation among subsequent variables within a GEOID. So is the only way to account for the dependence of these values to create a long dataset and then build out the model accordingly: let's say you wanted to build bivariate models regressing an opportunity score on each individual variable, you'd have to filter out only that individual variable (creating many different datasets)?

Also, as a follow-up question, I came across this site (https://www.rensvandeschoot.com/tutorials/lme4/) in researching my question and was surprised that pupils were never added into this model (in popularity dataset, model output provided below), but still van de Schoot writes that If we look at the summary output we see under the Random Effects that the residual variance on the class level 0.7021 and residual variance on the first level (pupil level) is 1.2218. I'm surprised that van de Schoot is essentially saying that the residual variance is solely accounted for at the pupil level. Isn't there much error that won't be accounted for that can't merely be explained by pupil differences? I ask this question because I'm also thinking about my own application with Census Tracts. It is the first model on this webpage (would've screen shot, but it made the email too big): https://www.rensvandeschoot.com/tutorials/lme4/.

Thanks!

James

	[[alternative HTML version deleted]]