[R] Regression model with proportional dependent variable
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Tue Apr 12 08:45:04 CEST 2011
On Mon, 11 Apr 2011, ty ty wrote:
> Hello, dear experts. I don't have much experience in building
> regression models, so sorry if this is too simple and not very
> interesting question.
> Currently I'm working on the model that have to predict proportion of
> the debt returned by the debtor in some period of time. So the
> dependent variable can be any number between 0 and 1 with very high
> probability of 0 (if there are no payment) and if there are some
> payments it can very likely be 1 (all debt paid) although can be any
> number from 0 to 1.
> Not having much knowledge in this area I can't think about any
> appropriate model and wasn't able to find much on the Internet. Can
> anyone give me some ideas about possible models, any information
> on-line and some R functions and packages that can implement it.
> Thank you in advance for any help.
Beta regression is one possibility to model proportions in the open unit
interval (0, 1). It is available in R in the package "betareg":
http://CRAN.R-project.org/package=betareg
http://www.jstatsoft.org/v34/i02/
If 0 and 1 can occur, some authors have suggested to scale the response so
that 0 and 1 are avoided. See the paper linked above for an example. If,
however, there are many 0s and/or 1s, one might want to take a hurdle or
inflation type approach. One such approach is implemented in the "gamlss"
package:
http://CRAN.R-project.org/package=gamlss
http://www.jstatsoft.org/v23/i07/
http://www.gamlss.org/
The hurdle approach can be implemented using separate building blocks.
First a binary regression model that captures whether the dependent
variable is greater than 0 (i.e., crosses the hurdle): glm(I(y > 0) ~ ...,
family = binomial). Second a beta regression for only the observations in
(0, 1) that crossed the hurdle: betareg(y ~ ..., subset = y > 0). A recent
technical report introduces such a family of models along with many
further techniques (specialized residuals and regression diagnostics) that
are not yet available in R:
http://arxiv.org/abs/1103.2372
Best,
Z
> Ihor.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list