[R] GLM Gamma Regression error message in R

Tue Oct 12 14:43:16 CEST 2010

On Oct 12, 2010, at 3:51 AM, Stratos Laskarides wrote:

> Dear Madam/Sir
>
> This may be quite a long shot...
>
> By way of intro, I am a masters student in actuarial science at the
> University of Cape Town, and I am doing a project in R on some  
> healthcare
> cost data. During my coding in R I encountered an error message,  
> which I
> then googled, but I am still unable to resolve the issue.
>
> I would like to please ask if and how it is possible to resolve the  
> problem
> raised by the error message "Error: NA/NaN/Inf in foreign function  
> call (arg
> 1) In addition: Warning message: *step size truncated due to  
> divergence" *in
> R?
>
> As for some background on my specific data and research problem at  
> hand, I
> am fitting a gamma regression model to 13 000 lines of insurance  
> claims
> data, which will be regressed against categorical variables such as  
> Age
> Band, Gender, and Region.
>
> Perhaps my problem arises because the data set is too large and the
> iteratively reweighted least squares algorithm therefore cannot  
> converge, in
> which case I perhaps need another GLM type. Or maybe the categorical
> explanatory variables can take on too many values (e.g. there are 15  
> Age
> Bands, 5 Regions).
>
> Any insights you could provide would be much appreciated.

You are asking the right questions. Most probably some particular  
stratum of categorical variables has  a small number of informative  
events or is pathologically distributed (from the perspective of your  
model structure). This is especially likely when you enter interaction  
terms. Tabular investigation may disclose a suspect and point to way  
to "nail down" the culprit.

What are the descriptive stats on your outcome variable stratified by  
age and region?

One option that immediately presents itself is modeling age as a  
continuous variable with a spline representation. I have quite a bit  
of experience working with actuaries and I do know the dominant  
analytic strategy is cutting data into discrete categories. However,  
this is a pretty small dataset and you should be prepared to argue in  
favor of the more powerful strategy of keeping continuous variables  
continuous.

Another issue: how you are handling the often statistically  
pathological zero claims that almost always occur in healthcare claims  
data? What does density(plot(claims)) look like? A gamma model is  
going have real difficulty with the typical sort of health claims  
distribution. Are you prepared to model using zero-inflated or zero- 
adjusted models?

-- 
David.

>
> Thank you ever so much.
>
> Kind regards
> Stratos Laskarides
> South Africa
-- 

David Winsemius, MD
West Hartford, CT