[R-sig-ME] [EXTERNAL] Non-Normal and Heteroskedastic Residuals in Longitudinal Model Due to Non-Normal DV - Percentile Bootstrap Sufficient, or Wild Bootstrap Needed?

David Jones d@vid@tn@jone@ @ending from gm@il@com
Fri Jun 15 01:47:05 CEST 2018


Hi Tom,

Thank you for your detailed follow up. You are correct that many of
the observations are at the extreme value. I am fortunate to have a
fairly large sample (~700 participants with roughly 8 timepoints
each), and I would be hopeful that bootstrapping could come to the
rescue. That being said, it's a tricky situation as you suggest.

I had not considered quantile regression, and a mixed quantile
approach might be a great way to get at this. I am very grateful for
this overall suggestion as well as the specifics to look for in the
lqmm vignette (and how it corresponds to nlme). It is a difficult
analytic situation and your input has been very helpful.

David

On Thu, Jun 14, 2018 at 12:54 PM, Philippi, Tom <tom_philippi using nps.gov> wrote:
> David--
>
> I apologize in advance for not answering your precise question, but no one
> else has responded, and this response might be more helpful than nothing.
>
> If I understand your frequency data, nearly half of your observations are
> tied at the extreme value of 91.  No transform is going to make that
> distribution approximately normal.  Without rather large sample sizes, most
> forms of bootstrapping will not produce confidence intervals with nominal
> and symmetric coverage.  Further, modeling changes in the _mean_ of such
> values can muddle or mislead on changes over time.
>
> If you are primarily interested in the fixed effects, would quantile
> regression perhaps address your questions of interest?  I don't know
> "quality of life", but in my field, when I have oddly-distributed response
> variables, I'm almost always interested in more than the mean, as the
> temporal changes are more than a simple shift of the entire distribution.
> For your example data, if 45% of the responses were 91, then longitudinal
> trends in a mean are driven by a mixture of changes in that fraction plus
> shifts in the length or width of the tail of lower values.  Quantile
> regression on the lower quantiles (the median in the above data is 90) might
> be more informative, as well as more applicable to such data.  If subjects
> either converge on high scores over time, or start out with high scores but
> then diverge as some fraction of subjects accumulate health problems and
> have their scores decline over time, quantile regression might better
> characterize such changes.
>
> I have used lqmm with longitudinal data on limpet sizes with fixed plots as
> random effects, and am exploring it for temporal trends in water quality
> The vignette for lqmm uses the Orthodont data from nlme, and includes the
> equivalent of (1 + time | subject) as a random effect.  lqmm includes a
> bootstrap function for objects of class lqm or lqmm.  I have yet to simulate
> highly skewed or mixture model WQ data to see if (when) bootstrapped
> confidence intervals have reasonable coverage, but that is in the queue for
> this fall.
>
> Also, perhaps the real experts on this list can chime in on the form of your
> model.  While I understand mixed models with linear terms for time as a
> fixed effect and within-subject random effect, I'm not clear on what linear
> and quadratic fixed effect terms but only linear within-subject terms means,
> especially if subjects differ in starting or drop-out times.
>
> My apologies for not directly answering your question.  And certainly your
> mileage will vary.
>
> Tom
>
> "To do science is to search for repeated patterns, not simply to accumulate
> facts..."   --Robert MacArthur 1972, Geographical Ecology
>
> "Statistical methods of analysis are intended to aid the interpretation of
> data that are subject to appreciable haphazard variability"    --Cox &
> Hinkley 1974; Theoretical Statistics
>
> On Mon, Jun 11, 2018 at 6:59 AM, David Jones <david.tn.jones using gmail.com>
> wrote:
>>
>> I am looking to model quality of life (QOL) as a DV over time. The DV
>> shows strong negative skew. I am wondering about the best way to
>> handle this (more detail below). Frequency distribution of QOL and
>> example code are also at the end of this message.
>>
>> Many participants just say that their quality of life is great, and
>> thus there is a ceiling effect with many values clustered at the
>> highest value. While the distribution resembles y=e^x, I have not been
>> able to fit a distribution via GLMM that results in normally
>> distributed and homoskedastic residuals (including gamma and inverse
>> gaussian). A number of DV transformations have not worked either
>> (e.g., log, exponential, Box-Cox), in large part because of the large
>> proportion of values at the maximum level of QOL, which creates a
>> spike at the end of the distribution. I could try zero-inflated models
>> by transforming the dv (multiply by -1 and put the starting value at
>> 0), but even then there will still be a disproportionate number of
>> values clustered at one end.
>>
>> My question: I am particularly interested in fixed effects parameters
>> from a longitudinal model, and was thinking of testing these
>> parameters by using percentile bootstrap CIs via confint(). However,
>> the residuals from a lmer model are both non-normal and
>> heteroskedastic - will percentile bootstrap of beta coefficients
>> address this, or can only the wild bootstrap address these issues (as
>> it is targeted to residuals)? I have a basic understanding of the
>> bootstrap but am not an expert regarding its use in linear models.
>>
>> Many thanks!
>>
>>
>>
>>
>> # Example lmer code
>> model <- lmer(QOL ~ poly(time, 2) + (time | ID), data=dataset, REML =
>> FALSE )
>>
>>
>> # Frequency distribution
>>
>> QOL    valid_percent
>> 25    0.000308261
>> 30    0.000308261
>> 32    0.000308261
>> 34    0.000616523
>> 38    0.000308261
>> 41    0.000308261
>> 45    0.000308261
>> 46    0.000308261
>> 47    0.000308261
>> 48    0.000616523
>> 49    0.000616523
>> 50    0.000616523
>> 51    0.000308261
>> 52    0.000308261
>> 53    0.001541307
>> 54    0.000616523
>> 55    0.001233046
>> 56    0.000616523
>> 57    0.000924784
>> 58    0.000308261
>> 59    0.000924784
>> 60    0.000924784
>> 61    0.001849568
>> 62    0.001541307
>> 63    0.003082614
>> 64    0.001849568
>> 65    0.00215783
>> 66    0.002466091
>> 67    0.004007398
>> 68    0.002466091
>> 69    0.004007398
>> 70    0.002466091
>> 71    0.003699137
>> 72    0.006781751
>> 73    0.004932183
>> 74    0.006781751
>> 75    0.006165228
>> 76    0.007090012
>> 77    0.007706535
>> 78    0.008631319
>> 79    0.010789149
>> 80    0.015104809
>> 81    0.014488286
>> 82    0.01541307
>> 83    0.020345253
>> 84    0.025893958
>> 85    0.03298397
>> 86    0.036066585
>> 87    0.053020962
>> 88    0.064426634
>> 89    0.080147966
>> 90    0.088779285
>> 91    0.452219482
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>



More information about the R-sig-mixed-models mailing list