[R] GLM Gamma Regression error message in R
David Winsemius
dwinsemius at comcast.net
Tue Oct 12 14:43:16 CEST 2010
On Oct 12, 2010, at 3:51 AM, Stratos Laskarides wrote:
> Dear Madam/Sir
>
> This may be quite a long shot...
>
> By way of intro, I am a masters student in actuarial science at the
> University of Cape Town, and I am doing a project in R on some
> healthcare
> cost data. During my coding in R I encountered an error message,
> which I
> then googled, but I am still unable to resolve the issue.
>
> I would like to please ask if and how it is possible to resolve the
> problem
> raised by the error message "Error: NA/NaN/Inf in foreign function
> call (arg
> 1) In addition: Warning message: *step size truncated due to
> divergence" *in
> R?
>
> As for some background on my specific data and research problem at
> hand, I
> am fitting a gamma regression model to 13 000 lines of insurance
> claims
> data, which will be regressed against categorical variables such as
> Age
> Band, Gender, and Region.
>
> Perhaps my problem arises because the data set is too large and the
> iteratively reweighted least squares algorithm therefore cannot
> converge, in
> which case I perhaps need another GLM type. Or maybe the categorical
> explanatory variables can take on too many values (e.g. there are 15
> Age
> Bands, 5 Regions).
>
> Any insights you could provide would be much appreciated.
You are asking the right questions. Most probably some particular
stratum of categorical variables has a small number of informative
events or is pathologically distributed (from the perspective of your
model structure). This is especially likely when you enter interaction
terms. Tabular investigation may disclose a suspect and point to way
to "nail down" the culprit.
What are the descriptive stats on your outcome variable stratified by
age and region?
One option that immediately presents itself is modeling age as a
continuous variable with a spline representation. I have quite a bit
of experience working with actuaries and I do know the dominant
analytic strategy is cutting data into discrete categories. However,
this is a pretty small dataset and you should be prepared to argue in
favor of the more powerful strategy of keeping continuous variables
continuous.
Another issue: how you are handling the often statistically
pathological zero claims that almost always occur in healthcare claims
data? What does density(plot(claims)) look like? A gamma model is
going have real difficulty with the typical sort of health claims
distribution. Are you prepared to model using zero-inflated or zero-
adjusted models?
--
David.
>
> Thank you ever so much.
>
> Kind regards
> Stratos Laskarides
> South Africa
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list