[R-sig-ME] [EXTERNAL] Non-Normal and Heteroskedastic Residuals in Longitudinal Model Due to Non-Normal DV - Percentile Bootstrap Sufficient, or Wild Bootstrap Needed?

Philippi, Tom tom_philippi @ending from np@@gov
Thu Jun 14 19:54:05 CEST 2018


David--

I apologize in advance for not answering your precise question, but no one
else has responded, and this response might be more helpful than nothing.

If I understand your frequency data, nearly half of your observations are
tied at the extreme value of 91.  No transform is going to make that
distribution approximately normal.  Without rather large sample sizes, most
forms of bootstrapping will not produce confidence intervals with nominal
and symmetric coverage.  Further, modeling changes in the _mean_ of such
values can muddle or mislead on changes over time.

If you are primarily interested in the fixed effects, would quantile
regression perhaps address your questions of interest?  I don't know
"quality of life", but in my field, when I have oddly-distributed response
variables, I'm almost always interested in more than the mean, as the
temporal changes are more than a simple shift of the entire distribution.
For your example data, if 45% of the responses were 91, then longitudinal
trends in a mean are driven by a mixture of changes in that fraction plus
shifts in the length or width of the tail of lower values.  Quantile
regression on the lower quantiles (the median in the above data is 90)
might be more informative, as well as more applicable to such data.  If
subjects either converge on high scores over time, or start out with high
scores but then diverge as some fraction of subjects accumulate health
problems and have their scores decline over time, quantile regression might
better characterize such changes.

I have used lqmm with longitudinal data on limpet sizes with fixed plots as
random effects, and am exploring it for temporal trends in water quality
 The vignette for lqmm uses the Orthodont data from nlme, and includes the
equivalent of (1 + time | subject) as a random effect.  lqmm includes a
bootstrap function for objects of class lqm or lqmm.  I have yet to
simulate highly skewed or mixture model WQ data to see if (when)
bootstrapped confidence intervals have reasonable coverage, but that is in
the queue for this fall.

Also, perhaps the real experts on this list can chime in on the form of
your model.  While I understand mixed models with linear terms for time as
a fixed effect and within-subject random effect, I'm not clear on what
linear and quadratic fixed effect terms but only linear within-subject
terms means, especially if subjects differ in starting or drop-out times.

My apologies for not directly answering your question.  And certainly your
mileage will vary.

Tom

"To do science is to search for repeated patterns, not simply to accumulate
facts..."   --Robert MacArthur 1972, Geographical Ecology

"Statistical methods of analysis are intended to aid the interpretation of
data that are subject to appreciable haphazard variability"    --Cox &
Hinkley 1974; Theoretical Statistics

On Mon, Jun 11, 2018 at 6:59 AM, David Jones <david.tn.jones using gmail.com>
wrote:

> I am looking to model quality of life (QOL) as a DV over time. The DV
> shows strong negative skew. I am wondering about the best way to
> handle this (more detail below). Frequency distribution of QOL and
> example code are also at the end of this message.
>
> Many participants just say that their quality of life is great, and
> thus there is a ceiling effect with many values clustered at the
> highest value. While the distribution resembles y=e^x, I have not been
> able to fit a distribution via GLMM that results in normally
> distributed and homoskedastic residuals (including gamma and inverse
> gaussian). A number of DV transformations have not worked either
> (e.g., log, exponential, Box-Cox), in large part because of the large
> proportion of values at the maximum level of QOL, which creates a
> spike at the end of the distribution. I could try zero-inflated models
> by transforming the dv (multiply by -1 and put the starting value at
> 0), but even then there will still be a disproportionate number of
> values clustered at one end.
>
> My question: I am particularly interested in fixed effects parameters
> from a longitudinal model, and was thinking of testing these
> parameters by using percentile bootstrap CIs via confint(). However,
> the residuals from a lmer model are both non-normal and
> heteroskedastic - will percentile bootstrap of beta coefficients
> address this, or can only the wild bootstrap address these issues (as
> it is targeted to residuals)? I have a basic understanding of the
> bootstrap but am not an expert regarding its use in linear models.
>
> Many thanks!
>
>
>
>
> # Example lmer code
> model <- lmer(QOL ~ poly(time, 2) + (time | ID), data=dataset, REML =
> FALSE )
>
>
> # Frequency distribution
>
> QOL    valid_percent
> 25    0.000308261
> 30    0.000308261
> 32    0.000308261
> 34    0.000616523
> 38    0.000308261
> 41    0.000308261
> 45    0.000308261
> 46    0.000308261
> 47    0.000308261
> 48    0.000616523
> 49    0.000616523
> 50    0.000616523
> 51    0.000308261
> 52    0.000308261
> 53    0.001541307
> 54    0.000616523
> 55    0.001233046
> 56    0.000616523
> 57    0.000924784
> 58    0.000308261
> 59    0.000924784
> 60    0.000924784
> 61    0.001849568
> 62    0.001541307
> 63    0.003082614
> 64    0.001849568
> 65    0.00215783
> 66    0.002466091
> 67    0.004007398
> 68    0.002466091
> 69    0.004007398
> 70    0.002466091
> 71    0.003699137
> 72    0.006781751
> 73    0.004932183
> 74    0.006781751
> 75    0.006165228
> 76    0.007090012
> 77    0.007706535
> 78    0.008631319
> 79    0.010789149
> 80    0.015104809
> 81    0.014488286
> 82    0.01541307
> 83    0.020345253
> 84    0.025893958
> 85    0.03298397
> 86    0.036066585
> 87    0.053020962
> 88    0.064426634
> 89    0.080147966
> 90    0.088779285
> 91    0.452219482
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list