[R-sig-ME] [EXTERNAL] Non-Normal and Heteroskedastic Residuals in Longitudinal Model Due to Non-Normal DV - Percentile Bootstrap Sufficient, or Wild Bootstrap Needed?
David Jones
d@vid@tn@jone@ @ending from gm@il@com
Fri Jun 15 01:47:05 CEST 2018
Hi Tom,
Thank you for your detailed follow up. You are correct that many of
the observations are at the extreme value. I am fortunate to have a
fairly large sample (~700 participants with roughly 8 timepoints
each), and I would be hopeful that bootstrapping could come to the
rescue. That being said, it's a tricky situation as you suggest.
I had not considered quantile regression, and a mixed quantile
approach might be a great way to get at this. I am very grateful for
this overall suggestion as well as the specifics to look for in the
lqmm vignette (and how it corresponds to nlme). It is a difficult
analytic situation and your input has been very helpful.
David
On Thu, Jun 14, 2018 at 12:54 PM, Philippi, Tom <tom_philippi using nps.gov> wrote:
> David--
>
> I apologize in advance for not answering your precise question, but no one
> else has responded, and this response might be more helpful than nothing.
>
> If I understand your frequency data, nearly half of your observations are
> tied at the extreme value of 91. No transform is going to make that
> distribution approximately normal. Without rather large sample sizes, most
> forms of bootstrapping will not produce confidence intervals with nominal
> and symmetric coverage. Further, modeling changes in the _mean_ of such
> values can muddle or mislead on changes over time.
>
> If you are primarily interested in the fixed effects, would quantile
> regression perhaps address your questions of interest? I don't know
> "quality of life", but in my field, when I have oddly-distributed response
> variables, I'm almost always interested in more than the mean, as the
> temporal changes are more than a simple shift of the entire distribution.
> For your example data, if 45% of the responses were 91, then longitudinal
> trends in a mean are driven by a mixture of changes in that fraction plus
> shifts in the length or width of the tail of lower values. Quantile
> regression on the lower quantiles (the median in the above data is 90) might
> be more informative, as well as more applicable to such data. If subjects
> either converge on high scores over time, or start out with high scores but
> then diverge as some fraction of subjects accumulate health problems and
> have their scores decline over time, quantile regression might better
> characterize such changes.
>
> I have used lqmm with longitudinal data on limpet sizes with fixed plots as
> random effects, and am exploring it for temporal trends in water quality
> The vignette for lqmm uses the Orthodont data from nlme, and includes the
> equivalent of (1 + time | subject) as a random effect. lqmm includes a
> bootstrap function for objects of class lqm or lqmm. I have yet to simulate
> highly skewed or mixture model WQ data to see if (when) bootstrapped
> confidence intervals have reasonable coverage, but that is in the queue for
> this fall.
>
> Also, perhaps the real experts on this list can chime in on the form of your
> model. While I understand mixed models with linear terms for time as a
> fixed effect and within-subject random effect, I'm not clear on what linear
> and quadratic fixed effect terms but only linear within-subject terms means,
> especially if subjects differ in starting or drop-out times.
>
> My apologies for not directly answering your question. And certainly your
> mileage will vary.
>
> Tom
>
> "To do science is to search for repeated patterns, not simply to accumulate
> facts..." --Robert MacArthur 1972, Geographical Ecology
>
> "Statistical methods of analysis are intended to aid the interpretation of
> data that are subject to appreciable haphazard variability" --Cox &
> Hinkley 1974; Theoretical Statistics
>
> On Mon, Jun 11, 2018 at 6:59 AM, David Jones <david.tn.jones using gmail.com>
> wrote:
>>
>> I am looking to model quality of life (QOL) as a DV over time. The DV
>> shows strong negative skew. I am wondering about the best way to
>> handle this (more detail below). Frequency distribution of QOL and
>> example code are also at the end of this message.
>>
>> Many participants just say that their quality of life is great, and
>> thus there is a ceiling effect with many values clustered at the
>> highest value. While the distribution resembles y=e^x, I have not been
>> able to fit a distribution via GLMM that results in normally
>> distributed and homoskedastic residuals (including gamma and inverse
>> gaussian). A number of DV transformations have not worked either
>> (e.g., log, exponential, Box-Cox), in large part because of the large
>> proportion of values at the maximum level of QOL, which creates a
>> spike at the end of the distribution. I could try zero-inflated models
>> by transforming the dv (multiply by -1 and put the starting value at
>> 0), but even then there will still be a disproportionate number of
>> values clustered at one end.
>>
>> My question: I am particularly interested in fixed effects parameters
>> from a longitudinal model, and was thinking of testing these
>> parameters by using percentile bootstrap CIs via confint(). However,
>> the residuals from a lmer model are both non-normal and
>> heteroskedastic - will percentile bootstrap of beta coefficients
>> address this, or can only the wild bootstrap address these issues (as
>> it is targeted to residuals)? I have a basic understanding of the
>> bootstrap but am not an expert regarding its use in linear models.
>>
>> Many thanks!
>>
>>
>>
>>
>> # Example lmer code
>> model <- lmer(QOL ~ poly(time, 2) + (time | ID), data=dataset, REML =
>> FALSE )
>>
>>
>> # Frequency distribution
>>
>> QOL valid_percent
>> 25 0.000308261
>> 30 0.000308261
>> 32 0.000308261
>> 34 0.000616523
>> 38 0.000308261
>> 41 0.000308261
>> 45 0.000308261
>> 46 0.000308261
>> 47 0.000308261
>> 48 0.000616523
>> 49 0.000616523
>> 50 0.000616523
>> 51 0.000308261
>> 52 0.000308261
>> 53 0.001541307
>> 54 0.000616523
>> 55 0.001233046
>> 56 0.000616523
>> 57 0.000924784
>> 58 0.000308261
>> 59 0.000924784
>> 60 0.000924784
>> 61 0.001849568
>> 62 0.001541307
>> 63 0.003082614
>> 64 0.001849568
>> 65 0.00215783
>> 66 0.002466091
>> 67 0.004007398
>> 68 0.002466091
>> 69 0.004007398
>> 70 0.002466091
>> 71 0.003699137
>> 72 0.006781751
>> 73 0.004932183
>> 74 0.006781751
>> 75 0.006165228
>> 76 0.007090012
>> 77 0.007706535
>> 78 0.008631319
>> 79 0.010789149
>> 80 0.015104809
>> 81 0.014488286
>> 82 0.01541307
>> 83 0.020345253
>> 84 0.025893958
>> 85 0.03298397
>> 86 0.036066585
>> 87 0.053020962
>> 88 0.064426634
>> 89 0.080147966
>> 90 0.088779285
>> 91 0.452219482
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
More information about the R-sig-mixed-models
mailing list