[R-sig-Geo] ME function for negative binomial

Roger Bivand Roger.Bivand at nhh.no
Wed Feb 15 12:38:53 CET 2012


On Tue, 14 Feb 2012, HJ Kil wrote:

> Sorry for not providing all the information appropriately and thank you so
> much for your comments.
> I would like to share my dataset, but, because of the confidentiality issue,
> I could not share it.

So then you will have to provide code demonstrating the problem with a 
data set that is available, say one shipped with one of the packages, 
perhaps in ade4 or similar. Have you also looked at advice given in:

http://www.bio.umontreal.ca/numecolR/

especially chapter 7?

Roger

> All my explanatory variables are continuous variables and my dependent
> variable almost shows exactly the same histogram distribution as this link:
> (http://www.ats.ucla.edu/stat/sas/dae/nbreg.htm)
> When I run the base negative binomial model without using "ME function," as
> you see, I have to expand the # of convergence (maxit=100 option in glm.nb)
> to be converged.
> Without this, I could not converge the model appropriately.
> Is it possible that, if the base model requires more # of convergence to
> find the initial theta, this feature is also reflected to the ME function?
>
> Regarding your comment, I have a couple of clarifying questions.
> "The tests" in the below comment refer both "moran.test" and "lm.morantest"
> (not ME function)?
> So, do you mean, although ME function can be used for normal, binomial,
> poisson and negative binomial, the above two tests may not be appropriate
> for binomial, poisson and negative binomial distribution?
>
> In addition, if I received these "not converged" warnings, could you teach
> me some strategies that I may use to make the model be converged?
> Again, thank you so much for your answer.
> HJ
>
>
> -----Original Message-----
> From: Roger Bivand [mailto:Roger.Bivand at nhh.no]
> Sent: Monday, February 13, 2012 2:31 AM
> To: HJ Kil
> Cc: r-sig-geo at r-project.org
> Subject: Re: [R-sig-Geo] ME function for negative binomial
>
> On Sun, 12 Feb 2012, HJ Kil wrote:
>
>> Dear all,
>>
>>
>>
>> First of all, thank you so much for reading this long mail (Please
>> read this if you are familiar with the "ME" function).
>
>
> Please rework this question with data that everyone can see. There is no
> chance at all of seeing what is wrong without your casting this in terms of
> a standard data set, or making your own data available. The tests you are
> using have not been established for non-normal variables, so apparent
> results on GLM residuals may not be correctly sized.
>
> Warnings and lack of convergence are always potentially serious and must
> always be checked out (to see what the cause is).
>
> Roger
>
>>
>> I need your help and comments to conduct an analysis with "ME"
>> function of the package spdep.
>>
>> I am trying to run a regression (a count dependent variable and 7-8
>> explanatory variables) with about 2000 observations (adjacent census
>> tracts in an urban area).
>>
>> Basically, I am trying to look at how these areas' demographics
>> aspects (7-8 explanatory variables) are related to a certain social
>> aspect (my dependent count variable).
>>
>> To do this, I used the below script (Italic: parts of results):
>>
>>
>>
>> ### Load all libraries: I skipped all the libraries that I loaded
>>
>> ### Read shape data file made/joined by ArcGis and a queen matrix
>> (gal) made by Geoda
>>
>> myshp <- readShapeSpatial("a.shp", ID="ID")
>>
>> mygal <- read.gal("a.GAL")
>>
>>
>>
>> #Test symmetry of the matrix: Result was "TRUE"
>>
>> print(is.symmetric.nb(mygal))
>>
>> [1] TRUE
>>
>>
>>
>> #Change nb to listw
>>
>> mylistw <- nb2listw(mygal)
>>
>>
>>
>> #Run a poisson model as a base: including offset since I would like to
>> know "density"
>>
>> base.poisson <- glm(dep ~ a + b + c + d + e + f + g,
>> offset(log(TOT_POP)), data=myshp, family="poisson")
>>
>>
>>
>> #Run dispersion test to check whether a negative binomial model is better:
>> RESULT showed over-dispersed.
>>
>> dispersiontest(base.poisson)
>>
>>        Overdispersion test
>>
>> data:  base.poisson
>>
>> z = 8.711, p-value < 2.2e-16
>>
>> alternative hypothesis: true dispersion is greater than 1
>>
>> sample estimates:
>>
>> dispersion
>>
>>  3.123331
>>
>>
>>
>> #Run a negative binomial regression as a base by using "glm.nb"
>>
>> base.nb <- glm.nb(dep ~ a + b + c + d + e + f + g +
>> offset(log(TOTAL_POP)), data=myshp, maxit=100)
>>
>> summary(base.nb)
>>
>>
>>
>> #Based on the result of the above, I found that "initial theta" was
>> "1.174928547"
>>
>> #I used the other way of running a negative binomial with the above value:
>>
>> base.nb2 <- glm(dep ~  a + b + c + d + e + f + g +
>> offset(log(TOTAL_POP)), data=myshp,
>> family=negative.binomial(1.174928547), maxit=100)
>>
>> summary(base.nb)
>>
>>
>>
>> # I found that both results are VERY SIMILAR (decimal differences)
>>
>>
>>
>> # Conducting "moran.test" and "lm.morantest" to supplementary check
>> the existence of the spatial autocorrelation in the residuals before
>> moving to the eigenfiltering
>>
>> moran.test(residuals.glm(base.nb), mylistw)
>>
>>        Moran's I test under randomisation
>>
>> Moran I statistic standard deviate = 10.835, p-value < 2.2e-16
>>
>> alternative hypothesis: greater
>>
>> sample estimates:
>>
>> Moran I statistic       Expectation          Variance
>>
>>     0.1372647104     -0.0004995005      0.0001616635
>>
>>
>>
>> moran.test(residuals.glm(base.nb2), mylistw)
>>
>>        Moran's I test under randomisation
>>
>> Moran I statistic standard deviate = 10.837, p-value < 2.2e-16
>>
>> alternative hypothesis: greater
>>
>> sample estimates:
>>
>> Moran I statistic       Expectation          Variance
>>
>>     0.1372900049     -0.0004995005      0.0001616636
>>
>>
>>
>> lm.morantest(base.nb, mylistw)
>>
>>   Global Moran's I for regression residuals
>>
>> Moran I statistic standard deviate = 16.465, p-value < 2.2e-16
>>
>> alternative hypothesis: greater
>>
>> sample estimates:
>>
>> Observed Moran's I        Expectation           Variance
>>
>>      0.2056132825      -0.0028246948       0.0001602612
>>
>>
>>
>> lm.morantest(base.nb2, mylistw)
>>
>>     Global Moran's I for regression residuals
>>
>> Moran I statistic standard deviate = 16.4668, p-value < 2.2e-16
>>
>> alternative hypothesis: greater
>>
>> sample estimates:
>>
>> Observed Moran's I        Expectation           Variance
>>
>>      0.2056358254      -0.0028246666       0.0001602612
>>
>>
>>
>> ## I do not know whether both tests are OK for a negative binomial
>> model, it seems that they all showed that there existed somewhat
>> significant spatial autocorrelations on the residuals.
>>
>>
>>
>> #Run "ME" to identify whether the model needs eigenvectors and, if
>> needed, how many eigenvectors are needed.
>>
>> ### FIRST with base.nb2 (glm above)
>>
>> myme.nb2 <- ME(dep ~ a + b + c + d + e + f + g, data=myshp, family =
>> negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
>> alpha=0.05, verbose=TRUE, nsim=99)
>>
>> Error in ME(a ~ b + c + d + e +  :
>>
>>   base correlation larger than alpha
>>
>> In addition: Warning message:
>>
>> glm.fit: algorithm did not converge
>>
>>
>>
>> ## SECOND with base.nb1 (glm.nb)
>>
>> ## To do this, I copied and pasted the new "ME.nb" command made by
>> Pablo (Feb. 03):
>> http://www.mail-archive.com/r-sig-geo@r-project.org/msg03967.html
>>
>> myme.nb <- ME.nb (dep ~ a + b + c + d + e + f + g +
>> offset(log(TOTAL_POP), data=myshp, listw=mylistw, alpha=0.05, nsim=99,
>> verbose =TRUE)
>>
>> Error in ME.nb(base.nb2, data = myshp, listw = mylistw, alpha = 0.05,  :
>>
>>  base correlation larger than alpha
>>
>> In addition: Warning messages:
>>
>> 1: glm.fit: algorithm did not converge
>>
>> 2: glm.fit: algorithm did not converge
>>
>> 3: In glm.nb(formula, data = data) : alternation limit reached
>>
>>
>>
>> ### I TRIED AFTER CHANGING "stdev" OPTIONS BOTH, BUT THE RESULTS WERE
>> THE SAME.
>>
>> ### WHEN I INCREASED THE ALPHA LEVEL TO 0.4 IN BOTH, THE RESULTS WERE
>> THE SAME.
>>
>> ### HOWEVER, WHEN I INCREASED THE ALPHA LEVEL TO 4.7 IN BOTH, I
>> FINALLY GOT THE BELOW RESULT.
>>
>> myme.nb2 <- ME (dep~ a + b + c + d + e + f + g, data=myshp, family =
>> negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
>> alpha=0.48, stdev=TRUE, verbose=TRUE, nsim=99)
>>
>> eV[,135], I: 3.805725e-09 ZI: -0.1491594, pr(ZI): 0.4407139
>>
>> eV[,18], I: 1.253316e-10 ZI: -0.02069758, pr(ZI): 0.4917434
>>
>> There were 50 or more warnings (use warnings() to see the first 50)
>>
>> (all warnings: glm.fit: algorithm did not converge)
>>
>>
>>
>> myme.nb1 <- ME.nb (REV_NUM ~ RACE_H_6_B + AGE_H_8_B + EDU_H_7_TO +
>> INC_H_GINI + POV_PE + AVE_HOU_IN + GOV_REV + offset(log(TOTAL_POP)),
>> data=myshp, listw=mylistw, alpha=0.48, nsim=999, verbose =TRUE)
>>
>> eV[,15], I: 2.775260e-08 ZI:
>>
>> NA, pr(ZI): 0.451
>>
>> eV[,17], I: 1.302856e-08 ZI: NA, pr(ZI): 0.486
>>
>> There were 50 or more warnings (use warnings() to see the first 50)
>>
>> (warnings:  glm.fit: algorithm did not converge
>>
>> In glm.nb(formula, data = data) : alternation limit reached
>>
>> In glm.nb(Y ~ iX) : alternation limit reached)
>>
>>
>>
>> ### Based on the result, if the warnings (glm.fit: algorithm did not
>> converge) are not serious (particularly for the lower alpha level at
>> which "base correlation larger than alphs showed"), I decided to stick
>> to a negative binomial (non-spatial) model for my statistics.
>>
>>
>>
>> Here are some questions to the groups related to this work:
>>
>> First, is there any way that I can make the algorithm be converged?
>> (like
>> maxit=100 in glm.nb).
>>
>> Particularly, if Pablo can read this, It would appreciate if you can
>> give me some suggestion for the modification of your program.
>>
>>
>>
>> Second, could you let me know if you find any mistakes on my script?
>> In addition, does my conclusion (using a non-spatial negative
>> binomial) seem to be OK?
>>
>> Or, if you think the warning is very serious, what alternative could
>> you suggest? Do you think a spatial poisson may be an alternative?
>> (considering spatial autocorrelation is more important than over
>> dispersion.)
>>
>> (When I conduct the same procedure (the ME function) with the poisson
>> model above, I did not get any warnings or errors with "ME" function.)
>>
>>
>>
>> Third, regarding the differences of Moran's I among "moran.test,"
>> "lm.morantest," and "ME,"
>>
>> If the Moran's I from the "ME" function is correct for glm, I think
>> that both "moran.test" and "lm.morantest" are somewhat risky for both
>> poisson or negative binomial, at least a model with "offset" (like mine).
>>
>> (Actually, I also tried to use the ME function with the poisson base
>> model above (though my data seems overdispersed). Then, the result
>> requires me to add about 20 eigenvectors into my model at alpha 0.2,
>> so that I "fitted" the eigenvectors from the "ME" results to my base
>> poisson model. Then, I double-checked the spatial poisson regression
>> with both "moran.test" and "lm.morantest," and they still showed
>> somewhat significant spatial autocorrelations in the model.)
>>
>> Could anyone explain the reason of these differences?
>>
>>
>>
>> Fourth, this is somewhat different topic, but I heard that Bayesian is
>> another option for these non-normal distributed model. However, I also
>> read a couple of articles (e.g., Gelman(2008)), that it may be
>> dangerous to use it since it requires "prior distribution or belief"
>> incorporated in the model. Is it also applied to spatial statistics?
>> For me, I have never use the Bayesian yet and it seems to me that I
>> can use it for now with my dataset.
>>
>>
>>
>> I would be very appreciated for any comments.
>>
>> Thanks,
>>
>> HJ
>>
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> --
> Roger Bivand
> Department of Economics, NHH Norwegian School of Economics, Helleveien 30,
> N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list