Dear Matthew,

I recommend aggregating the data into one record per healthcare facility,
as you did when calculating the outcome variable. The aggregation removes
all variability at the patient level. Given the huge dataset, that would
force the error term close to zero.

Another option is to use an outcome variable at the patient level.

Best regards,

Op di 7 jul. 2020 om 00:19 schreef Matthew Boden <matthew.t.boden using gmail.com

> Good afternoon,
> I am looking for advice regarding a multi-level model I am trying to
> implement using lme4. My two-level random-effects model won’t run, perhaps
> due to one or two issues.
> Background: Level 1 is patients, which are clustered in healthcare
> facilities (‘Station’). The outcome is a continuous variable (‘PopCov’)
> that is calculated at the facility-level, and is thus a Level 2 variable
> that does not vary at the patient level.
> The aim of this analysis is to examine whether PopCov is predicted by (a)
> patient-level (e.g., race/ethnicity, age, symptom severity), and (b)
> facility-level variables (e.g., overall racial/ethnic composition, average
> age). It is important to examine factors such as race/ethnicity at both
> patient and facility-levels because patients with different racial/ethnic
> backgrounds tend to differ in terms of age, symptom severity, etc.
> Each record/row in my data is a patient, with facility-level variables
> (including PopCov) having identical values among patients within a given
> facility.
> An error is thrown when I run a basic model.
> A1 <-lmer(PopCov ~ (1 | Station), data = DISP)
> *Error in fn9nM$xeval()) : Downdated VtV is not positive definite
> I obtain the same error when I add to the model either a patient-level or
> facility level predictor.
> An internet search suggested that I have complete separation of my data
> and/or poorly scaled variables.
> I assume this issue has to do with the fact that the outcome is a level 2
> variable. Perhaps compounding the issue is the large and unbalanced nature
> of the data. I have ~6 million patients clustered in ~1000 healthcare
> facilities. Individual facilities have anywhere from 100 to 30000 patients
> clustered in them.
> I could use some advice regarding how to specify the model to predict a
> facility-level variable (level 2) from both patient (level 1) and
> facility-level (level 2) variables with these data.
> Thank you in advance.
> Matt
