[R-sig-Geo] ME function for negative binomial
HJ Kil
hyeonjongkil at ucla.edu
Tue Feb 14 19:38:27 CET 2012
Sorry for not providing all the information appropriately and thank you so
much for your comments.
I would like to share my dataset, but, because of the confidentiality issue,
I could not share it.
All my explanatory variables are continuous variables and my dependent
variable almost shows exactly the same histogram distribution as this link:
(http://www.ats.ucla.edu/stat/sas/dae/nbreg.htm)
When I run the base negative binomial model without using "ME function," as
you see, I have to expand the # of convergence (maxit=100 option in glm.nb)
to be converged.
Without this, I could not converge the model appropriately.
Is it possible that, if the base model requires more # of convergence to
find the initial theta, this feature is also reflected to the ME function?
Regarding your comment, I have a couple of clarifying questions.
"The tests" in the below comment refer both "moran.test" and "lm.morantest"
(not ME function)?
So, do you mean, although ME function can be used for normal, binomial,
poisson and negative binomial, the above two tests may not be appropriate
for binomial, poisson and negative binomial distribution?
In addition, if I received these "not converged" warnings, could you teach
me some strategies that I may use to make the model be converged?
Again, thank you so much for your answer.
HJ
-----Original Message-----
From: Roger Bivand [mailto:Roger.Bivand at nhh.no]
Sent: Monday, February 13, 2012 2:31 AM
To: HJ Kil
Cc: r-sig-geo at r-project.org
Subject: Re: [R-sig-Geo] ME function for negative binomial
On Sun, 12 Feb 2012, HJ Kil wrote:
> Dear all,
>
>
>
> First of all, thank you so much for reading this long mail (Please
> read this if you are familiar with the "ME" function).
Please rework this question with data that everyone can see. There is no
chance at all of seeing what is wrong without your casting this in terms of
a standard data set, or making your own data available. The tests you are
using have not been established for non-normal variables, so apparent
results on GLM residuals may not be correctly sized.
Warnings and lack of convergence are always potentially serious and must
always be checked out (to see what the cause is).
Roger
>
> I need your help and comments to conduct an analysis with "ME"
> function of the package spdep.
>
> I am trying to run a regression (a count dependent variable and 7-8
> explanatory variables) with about 2000 observations (adjacent census
> tracts in an urban area).
>
> Basically, I am trying to look at how these areas' demographics
> aspects (7-8 explanatory variables) are related to a certain social
> aspect (my dependent count variable).
>
> To do this, I used the below script (Italic: parts of results):
>
>
>
> ### Load all libraries: I skipped all the libraries that I loaded
>
> ### Read shape data file made/joined by ArcGis and a queen matrix
> (gal) made by Geoda
>
> myshp <- readShapeSpatial("a.shp", ID="ID")
>
> mygal <- read.gal("a.GAL")
>
>
>
> #Test symmetry of the matrix: Result was "TRUE"
>
> print(is.symmetric.nb(mygal))
>
> [1] TRUE
>
>
>
> #Change nb to listw
>
> mylistw <- nb2listw(mygal)
>
>
>
> #Run a poisson model as a base: including offset since I would like to
> know "density"
>
> base.poisson <- glm(dep ~ a + b + c + d + e + f + g,
> offset(log(TOT_POP)), data=myshp, family="poisson")
>
>
>
> #Run dispersion test to check whether a negative binomial model is better:
> RESULT showed over-dispersed.
>
> dispersiontest(base.poisson)
>
> Overdispersion test
>
> data: base.poisson
>
> z = 8.711, p-value < 2.2e-16
>
> alternative hypothesis: true dispersion is greater than 1
>
> sample estimates:
>
> dispersion
>
> 3.123331
>
>
>
> #Run a negative binomial regression as a base by using "glm.nb"
>
> base.nb <- glm.nb(dep ~ a + b + c + d + e + f + g +
> offset(log(TOTAL_POP)), data=myshp, maxit=100)
>
> summary(base.nb)
>
>
>
> #Based on the result of the above, I found that "initial theta" was
> "1.174928547"
>
> #I used the other way of running a negative binomial with the above value:
>
> base.nb2 <- glm(dep ~ a + b + c + d + e + f + g +
> offset(log(TOTAL_POP)), data=myshp,
> family=negative.binomial(1.174928547), maxit=100)
>
> summary(base.nb)
>
>
>
> # I found that both results are VERY SIMILAR (decimal differences)
>
>
>
> # Conducting "moran.test" and "lm.morantest" to supplementary check
> the existence of the spatial autocorrelation in the residuals before
> moving to the eigenfiltering
>
> moran.test(residuals.glm(base.nb), mylistw)
>
> Moran's I test under randomisation
>
> Moran I statistic standard deviate = 10.835, p-value < 2.2e-16
>
> alternative hypothesis: greater
>
> sample estimates:
>
> Moran I statistic Expectation Variance
>
> 0.1372647104 -0.0004995005 0.0001616635
>
>
>
> moran.test(residuals.glm(base.nb2), mylistw)
>
> Moran's I test under randomisation
>
> Moran I statistic standard deviate = 10.837, p-value < 2.2e-16
>
> alternative hypothesis: greater
>
> sample estimates:
>
> Moran I statistic Expectation Variance
>
> 0.1372900049 -0.0004995005 0.0001616636
>
>
>
> lm.morantest(base.nb, mylistw)
>
> Global Moran's I for regression residuals
>
> Moran I statistic standard deviate = 16.465, p-value < 2.2e-16
>
> alternative hypothesis: greater
>
> sample estimates:
>
> Observed Moran's I Expectation Variance
>
> 0.2056132825 -0.0028246948 0.0001602612
>
>
>
> lm.morantest(base.nb2, mylistw)
>
> Global Moran's I for regression residuals
>
> Moran I statistic standard deviate = 16.4668, p-value < 2.2e-16
>
> alternative hypothesis: greater
>
> sample estimates:
>
> Observed Moran's I Expectation Variance
>
> 0.2056358254 -0.0028246666 0.0001602612
>
>
>
> ## I do not know whether both tests are OK for a negative binomial
> model, it seems that they all showed that there existed somewhat
> significant spatial autocorrelations on the residuals.
>
>
>
> #Run "ME" to identify whether the model needs eigenvectors and, if
> needed, how many eigenvectors are needed.
>
> ### FIRST with base.nb2 (glm above)
>
> myme.nb2 <- ME(dep ~ a + b + c + d + e + f + g, data=myshp, family =
> negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
> alpha=0.05, verbose=TRUE, nsim=99)
>
> Error in ME(a ~ b + c + d + e + :
>
> base correlation larger than alpha
>
> In addition: Warning message:
>
> glm.fit: algorithm did not converge
>
>
>
> ## SECOND with base.nb1 (glm.nb)
>
> ## To do this, I copied and pasted the new "ME.nb" command made by
> Pablo (Feb. 03):
> http://www.mail-archive.com/r-sig-geo@r-project.org/msg03967.html
>
> myme.nb <- ME.nb (dep ~ a + b + c + d + e + f + g +
> offset(log(TOTAL_POP), data=myshp, listw=mylistw, alpha=0.05, nsim=99,
> verbose =TRUE)
>
> Error in ME.nb(base.nb2, data = myshp, listw = mylistw, alpha = 0.05, :
>
> base correlation larger than alpha
>
> In addition: Warning messages:
>
> 1: glm.fit: algorithm did not converge
>
> 2: glm.fit: algorithm did not converge
>
> 3: In glm.nb(formula, data = data) : alternation limit reached
>
>
>
> ### I TRIED AFTER CHANGING "stdev" OPTIONS BOTH, BUT THE RESULTS WERE
> THE SAME.
>
> ### WHEN I INCREASED THE ALPHA LEVEL TO 0.4 IN BOTH, THE RESULTS WERE
> THE SAME.
>
> ### HOWEVER, WHEN I INCREASED THE ALPHA LEVEL TO 4.7 IN BOTH, I
> FINALLY GOT THE BELOW RESULT.
>
> myme.nb2 <- ME (dep~ a + b + c + d + e + f + g, data=myshp, family =
> negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
> alpha=0.48, stdev=TRUE, verbose=TRUE, nsim=99)
>
> eV[,135], I: 3.805725e-09 ZI: -0.1491594, pr(ZI): 0.4407139
>
> eV[,18], I: 1.253316e-10 ZI: -0.02069758, pr(ZI): 0.4917434
>
> There were 50 or more warnings (use warnings() to see the first 50)
>
> (all warnings: glm.fit: algorithm did not converge)
>
>
>
> myme.nb1 <- ME.nb (REV_NUM ~ RACE_H_6_B + AGE_H_8_B + EDU_H_7_TO +
> INC_H_GINI + POV_PE + AVE_HOU_IN + GOV_REV + offset(log(TOTAL_POP)),
> data=myshp, listw=mylistw, alpha=0.48, nsim=999, verbose =TRUE)
>
> eV[,15], I: 2.775260e-08 ZI:
>
> NA, pr(ZI): 0.451
>
> eV[,17], I: 1.302856e-08 ZI: NA, pr(ZI): 0.486
>
> There were 50 or more warnings (use warnings() to see the first 50)
>
> (warnings: glm.fit: algorithm did not converge
>
> In glm.nb(formula, data = data) : alternation limit reached
>
> In glm.nb(Y ~ iX) : alternation limit reached)
>
>
>
> ### Based on the result, if the warnings (glm.fit: algorithm did not
> converge) are not serious (particularly for the lower alpha level at
> which "base correlation larger than alphs showed"), I decided to stick
> to a negative binomial (non-spatial) model for my statistics.
>
>
>
> Here are some questions to the groups related to this work:
>
> First, is there any way that I can make the algorithm be converged?
> (like
> maxit=100 in glm.nb).
>
> Particularly, if Pablo can read this, It would appreciate if you can
> give me some suggestion for the modification of your program.
>
>
>
> Second, could you let me know if you find any mistakes on my script?
> In addition, does my conclusion (using a non-spatial negative
> binomial) seem to be OK?
>
> Or, if you think the warning is very serious, what alternative could
> you suggest? Do you think a spatial poisson may be an alternative?
> (considering spatial autocorrelation is more important than over
> dispersion.)
>
> (When I conduct the same procedure (the ME function) with the poisson
> model above, I did not get any warnings or errors with "ME" function.)
>
>
>
> Third, regarding the differences of Moran's I among "moran.test,"
> "lm.morantest," and "ME,"
>
> If the Moran's I from the "ME" function is correct for glm, I think
> that both "moran.test" and "lm.morantest" are somewhat risky for both
> poisson or negative binomial, at least a model with "offset" (like mine).
>
> (Actually, I also tried to use the ME function with the poisson base
> model above (though my data seems overdispersed). Then, the result
> requires me to add about 20 eigenvectors into my model at alpha 0.2,
> so that I "fitted" the eigenvectors from the "ME" results to my base
> poisson model. Then, I double-checked the spatial poisson regression
> with both "moran.test" and "lm.morantest," and they still showed
> somewhat significant spatial autocorrelations in the model.)
>
> Could anyone explain the reason of these differences?
>
>
>
> Fourth, this is somewhat different topic, but I heard that Bayesian is
> another option for these non-normal distributed model. However, I also
> read a couple of articles (e.g., Gelman(2008)), that it may be
> dangerous to use it since it requires "prior distribution or belief"
> incorporated in the model. Is it also applied to spatial statistics?
> For me, I have never use the Bayesian yet and it seems to me that I
> can use it for now with my dataset.
>
>
>
> I would be very appreciated for any comments.
>
> Thanks,
>
> HJ
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics, Helleveien 30,
N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list