[R-sig-ME] Distribution family for non-negative lower and upper bound values

Sat Dec 26 00:30:31 CET 2015

1.       Yes, rescaling seems reasonable to me, though I would use the equations specified in the papers I cited rather than the one you described. 

2.       The short answer is that x1 and x2 are respectively the minimum and maximum possible values your variable can take on. Note that I am talking about possible values, not observed values. This is because the beta distribution can be applied when your variable has both theoretical lower and upper bounds, such that all observations must fall in the closed interval [x1, x2]. This is why the beta distribution works well for data values that are proportions. If your data do not fall in the open interval (0, 1), then you rescale them do so before running the analysis, and subsequently back-transform parameter estimates, predictions, and other quantities of interest to the original scale of the data. 

Steven J. Pierce, Ph.D.

Associate Director

Center for Statistical Training & Consulting (CSTAT)

Michigan State University

From: Gitu wa Mbui [mailto:gitumbui at gmail.com] 
Sent: Friday, December 25, 2015 5:54 PM
To: Steven J. Pierce <pierces1 at msu.edu>
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Distribution family for non-negative lower and upper bound values

Alain, Steven,

Thanks both for the lead and for the papers. A few clarifications:

1. SP>> Consider a mixed effects variant of the beta regression model, as discussed in the papers below.

I assume that you agree with the rescaling approach then? I should have mentioned that I will be comparing several models  - the ideal package would be gamm4, however it doesn't fit betar family. (gamm package does but comparing models is compromised)

2. AZ>>You could try a beta distribution, which can be used when your data is between x1 and x2.

Not sure I understand 'when your data is between x1 and x2'. What does x1 and x2 refer to?

In any case - as recommended in your book - beginners to GAMM, gamm4 package is ideal when comparing models (I have 500 models to compare). This doesn't fit beta family - is there a workaround? 

3. AZ>>All in all this sounds like an MCMC job. I haven't tried SabreR...maybe it can do a beta distribution.

I haven't tried these two before 

Kind regards,

Gitu

On Sat, Dec 26, 2015 at 1:57 AM, Steven J. Pierce <pierces1 at msu.edu <mailto:pierces1 at msu.edu> > wrote:

Gitu,

Consider a mixed effects variant of the beta regression model, as discussed in the papers below.

Smithson, M., & Verkuilen, J. (2006). A better lemon squeezer? Maximum likelihood regression with beta-distributed dependent variables. Psychological Methods, 11(1), 54-71. doi:10.1037/1082-989X.11.1.54

Zimprich, D. (2010). Modeling change in skewed variables using mixed beta regression models. Research in Human Development, 7(1), 9-26. doi:10.1080/15427600903578136

Steven J. Pierce, Ph.D.
Associate Director
Center for Statistical Training & Consulting (CSTAT)
Michigan State University

-----Original Message-----
From: Gitu wa Mbui [mailto:gitumbui at gmail.com <mailto:gitumbui at gmail.com> ]
Sent: Wednesday, December 23, 2015 8:54 PM
To: r-sig-mixed-models at r-project.org <mailto:r-sig-mixed-models at r-project.org> 
Subject: [R-sig-ME] Distribution family for non-negative lower and upper bound values

I am running generalized additive mixed models on two response variables
separately. Values in response 1 are non-negative and bounded between 1-2,
while response 2 is also non- negative and bounded between 1-3.

In choosing the distribution for response 1, I have subtracted 1 (to
rescale to between 0-1) and logit transformed before fitting the models
with gaussian family.

As for response 2 (non-negative values between 1-3), I have divided the
values by 3 so as to rescale to between 0-1, before logit transforming and
fitting with gaussian family.

Does this sound like a good approach? if not what are the alternatives,
considering:
- responses 1&2 are not proportions
- I am using lme4 version (gamm4) which is limited on the number of
families that can be fit
- histograms of both responses are pretty flat (non skewed and don't look
anywhere near normal distribution

~ Gitu

        [[alternative HTML version deleted]]

	[[alternative HTML version deleted]]