[R] Question regarding the discrepancy between count model parameter estimates between "pscl" and "MASS"

Fri Aug 29 20:36:07 CEST 2014

On Aug 29, 2014, at 9:46 AM, Nick Livingston wrote:

> Thank you for your responses. 
> 
> Since my previous attempt to manually truncate my DV didn't work, I'm very interested in trying again using the zerotrun() function. However, I attempted to install "countreg" but received the following notification:
> 
>         Warning in install.packages :
>               unable to access index for repository http://R-Forge.R-project.org/bin/macosx/contrib/3.0
> 
>               package ‘countreg’ is available as a source package but not as a binary
> 
>          Warning in install.packages :
>               package ‘countreg’ is not available (for R version 3.0.3)
> 
> I received the same message when attempting to install it in version 3.1.0, and the latest version, 3.1.1. Am I missing something?

Apparently understanding that R-forge is not CRAN.  If the package has any compiled code, then you need to have the proper tools installed. See the R-Mac-FAQ if those are needed. I generally download the source file and install from disk. 

install.packages("~/countreg_0.1-1.tar.gz", repo=NULL, type="source")

(That reported success on a SL Mac R 3.1.0 machine. And there did not appear to be any compiled code so the Mac "tool chain" was not needed. The other way to check for that possibility is to look at the DESCRIPTION file.)

-- 
David.

> 
> Thank you again. I appreciate your input.
> 
> -Nick
> --------------------------------------------
> On Fri, 8/29/14, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:
> 
> Subject: Re: [R] Question regarding the discrepancy between count model parameter estimates between "pscl" and "MASS"
> To: "peter dalgaard" <pdalgd at gmail.com>
> Cc: "Nick Livingston" <nlivingston at ymail.com>, r-help at r-project.org
> Date: Friday, August 29, 2014, 5:26 AM
> 
> On Fri, 29 Aug 2014,
> peter dalgaard wrote:
> 
>> 
> I'm no expert on hurdle models, but it seems that you
> are unaware that 
>> the negative binomial
> and the truncated negative binomial are quite 
>> different things.
> 
> Yes. You can replicate the truncated count part
> of the hurdle model with 
> the zerotrunc()
> function from the "countreg" package. The package
> is not 
> yet on CRAN but can be easily
> installed from R-Forge.
> 
>> -pd
>> 
>> 
>> On 29 Aug 2014, at
> 05:57 , Nick Livingston <nlivingston at ymail.com>
> wrote:
>> 
>>> I have
> sought consultation online and in person, to no avail. I
> hope someone
>>> on here might have
> some insight. Any feedback would be most welcome.
>>> 
>>> I am
> attempting to plot predicted values from a two-component
> hurdle model
>>> (logistic [suicide
> attempt yes/no] and negative binomial count [number of
>>> attempts thereafter]). To do so, I
> estimated each component separately using
>>> glm (MASS). While I am able to
> reproduce hurdle results for the logit
>>> portion in glm, estimates for the
> negative binomial count component are
>>> different.
>>> 
>>> Call:
>>> 
> hurdle(formula = Suicide. ~ Age + gender + Victimization *
> FamilySupport |
>>> Age + gender +
> Victimization * FamilySupport, dist = "negbin",
> link =
>>> "logit")
>>> 
>>> Pearson
> residuals:
>>>      Min   
>   1Q  Median      3Q     Max
>>> -0.9816 -0.5187 -0.4094  0.2974 
> 5.8820
>>> 
>>> 
> Count model coefficients (truncated negbin with log
> link):
>>>                    
>                          
>    Estimate Std. Error z value
>>> Pr(>|z|)
>>> 
> (Intercept)                          -0.29150 
>   0.33127  -0.880   0.3789
>>> Age                       
>               0.17068   
> 0.07556   2.259   0.0239
>>> *
>>> gender   
>                          
>    0.28273   
> 0.31614   0.894   0.3712
>>> Victimization               
>          1.08405   
> 0.18157   5.971 2.36e-09
>>> 
> ***
>>> FamilySupport           
>           0.33629   
> 0.29302   1.148   0.2511
>>> Victimization:FamilySupport -0.96831 
>   0.46841  -2.067   0.0387 *
>>> Log(theta)                 
>           0.12245   
> 0.54102   0.226   0.8209
>>> Zero hurdle model coefficients
> (binomial with logit link):
>>>      
>                                        
>     Estimate Std. Error z value
>>> 
> Pr(>|z|)
>>> (Intercept)       
>                
>    -0.547051   0.215981  -2.533 
> 0.01131
>>> *
>>> 
> Age                                 
>    -0.154493   0.063994  -2.414
>>> 0.01577 *
>>> 
> gender                             
>    -0.030942   0.284868  -0.109 
> 0.91350
>>> Victimization         
>                
> 1.073956   0.338015   3.177 
> 0.00149
>>> **
>>> 
> FamilySupport                   
>    -0.380360   0.247530  -1.537 
> 0.12439
>>> 
> Victimization\:FamilySupport 
> -0.813329   0.399905  -2.034  0.04197 *
>>> ---
>>> Signif.
> codes:  0 '***' 0.001 '**' 0.01 '*'
> 0.05 '.' 0.1 ' ' 1
>>> 
>>> Theta: count
> = 1.1303
>>> Number of iterations in
> BFGS optimization: 23
>>> 
> Log-likelihood: -374.3 on 25 Df
>>>> 
> summary(logistic)
>>> 
>>> 
>>> 
>>> 
>>> Call:
>>> glm(formula = SuicideBinary ~ Age +
> gender = Victimization * FamilySupport,
>>> family = "binomial")
>>> 
>>> Deviance
> Residuals:
>>>      Min   
>    1Q   Median       3Q 
>     Max
>>> -1.9948  -0.8470 
> -0.6686   1.1160   2.0805
>>> 
>>> 
> Coefficients:
>>>              
>                                    
> Estimate Std. Error z value
>>> 
> Pr(>|z|)
>>> (Intercept)       
>                   -0.547051   0.215981 
> -2.533  0.01131 *
>>> Age       
>                            
> -0.154493   0.063994  -2.414  0.01577
>>> *
>>> gender   
>                            
> -0.030942   0.284868  -0.109  0.91350
>>> Victimization               
>      
>    1.073956   0.338014   3.177 
> 0.00149
>>> **
>>> 
> FamilySupport                     
> -0.380360   0.247530  -1.537  0.12439
>>> Victimization:FamilySupport 
> -0.813329   0.399904  -2.034  0.04197 *
>>> ---
>>> Signif.
> codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>>> 
>>> (Dispersion
> parameter for binomial family taken to be 1)
>>> 
>>>  
>    Null deviance: 452.54  on 359  degrees of
> freedom
>>> Residual deviance: 408.24 
> on 348  degrees of freedom
>>>    (52 observations deleted
> due to missingness)
>>> AIC: 432.24
>>> 
>>> Number of
> Fisher Scoring iterations: 4
>>> 
>>>> summary(Count1)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Call:
>>> 
> glm(formula = NegBinSuicide ~ Age + gender + Victimization *
> FamilySupport,
>>> family =
> negative.binomial(theta = 1.1303))
>>> 
>>> Deviance
> Residuals:
>>>      Min   
>    1Q   Median       3Q 
>     Max
>>> -1.6393  -0.4504 
> -0.1679   0.2350   2.1676
>>> 
>>> 
> Coefficients:
>>>              
>                                
>    Estimate Std. Error t value
>>> Pr(>|t|)
>>> 
> (Intercept)                           
> 0.60820    0.13779   4.414 2.49e-05
>>> ***
>>> Age   
>                                   0.08836 
>   0.04189   2.109   0.0373
>>> *
>>> gender   
>                               0.10983   
> 0.17873   0.615   0.5402
>>> Victimization               
>           0.73270    0.10776   6.799
> 6.82e-10
>>> ***
>>> FamilySupport               
>         0.10213   
> 0.15979   0.639   0.5241
>>> 
> Victimization:FamilySupport   -0.60146   
> 0.24532  -2.452   0.0159 *
>>> ---
>>> Signif.
> codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>>> 
>>> (Dispersion
> parameter for Negative Binomial(1.1303) family taken to
> be
>>> 0.4549082)
>>> 
>>>  
>    Null deviance: 76.159  on 115  degrees of
> freedom
>>> Residual deviance: 35.101 
> on 104  degrees of freedom
>>>    (296 observations deleted
> due to missingness)
>>> AIC: 480.6
>>> 
>>> Number of
> Fisher Scoring iterations: 15
>>> 
>>> 
>>> 
> Alternatively, if there is a simpler way to plot hurdle
> regression output, or if anyone is award of another means of
> estimating NB models (I haven't had much luck with vglm
> from VGAM either), I would be happy to hear about that as
> well. I'm currently using the "visreg"
>>> package for plotting.
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> ______________________________________________
>>> R-help at r-project.org
> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal,
> self-contained, reproducible code.
>> 
>> -- 
>> Peter Dalgaard,
> Professor,
>> Center for Statistics,
> Copenhagen Business School
>> Solbjerg
> Plads 3, 2000 Frederiksberg, Denmark
>> 
> Phone: (+45)38153501
>> Email: pd.mes at cbs.dk  Priv:
> PDalgd at gmail.com
>> 
>> 
> ______________________________________________
>> R-help at r-project.org
> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal,
> self-contained, reproducible code.
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA