[R-sig-Geo] ME function for negative binomial

Mon Feb 13 11:30:40 CET 2012

On Sun, 12 Feb 2012, HJ Kil wrote:

> Dear all,
>
>
>
> First of all, thank you so much for reading this long mail (Please read 
> this if you are familiar with the "ME" function).

Please rework this question with data that everyone can see. There is no 
chance at all of seeing what is wrong without your casting this in terms 
of a standard data set, or making your own data available. The tests you 
are using have not been established for non-normal variables, so apparent 
results on GLM residuals may not be correctly sized.

Warnings and lack of convergence are always potentially serious and must 
always be checked out (to see what the cause is).

Roger

>
> I need your help and comments to conduct an analysis with "ME" function of
> the package spdep.
>
> I am trying to run a regression (a count dependent variable and 7-8
> explanatory variables) with about 2000 observations (adjacent census tracts
> in an urban area).
>
> Basically, I am trying to look at how these areas' demographics aspects (7-8
> explanatory variables) are related to a certain social aspect (my dependent
> count variable).
>
> To do this, I used the below script (Italic: parts of results):
>
>
>
> ### Load all libraries: I skipped all the libraries that I loaded
>
> ### Read shape data file made/joined by ArcGis and a queen matrix (gal) made
> by Geoda
>
> myshp <- readShapeSpatial("a.shp", ID="ID")
>
> mygal <- read.gal("a.GAL")
>
>
>
> #Test symmetry of the matrix: Result was "TRUE"
>
> print(is.symmetric.nb(mygal))
>
> [1] TRUE
>
>
>
> #Change nb to listw
>
> mylistw <- nb2listw(mygal)
>
>
>
> #Run a poisson model as a base: including offset since I would like to know
> "density"
>
> base.poisson <- glm(dep ~ a + b + c + d + e + f + g, offset(log(TOT_POP)),
> data=myshp, family="poisson")
>
>
>
> #Run dispersion test to check whether a negative binomial model is better:
> RESULT showed over-dispersed.
>
> dispersiontest(base.poisson)
>
>        Overdispersion test
>
> data:  base.poisson
>
> z = 8.711, p-value < 2.2e-16
>
> alternative hypothesis: true dispersion is greater than 1
>
> sample estimates:
>
> dispersion
>
>  3.123331
>
>
>
> #Run a negative binomial regression as a base by using "glm.nb"
>
> base.nb <- glm.nb(dep ~ a + b + c + d + e + f + g + offset(log(TOTAL_POP)),
> data=myshp, maxit=100)
>
> summary(base.nb)
>
>
>
> #Based on the result of the above, I found that "initial theta" was
> "1.174928547"
>
> #I used the other way of running a negative binomial with the above value:
>
> base.nb2 <- glm(dep ~  a + b + c + d + e + f + g + offset(log(TOTAL_POP)),
> data=myshp, family=negative.binomial(1.174928547), maxit=100)
>
> summary(base.nb)
>
>
>
> # I found that both results are VERY SIMILAR (decimal differences)
>
>
>
> # Conducting "moran.test" and "lm.morantest" to supplementary check the
> existence of the spatial autocorrelation in the residuals before moving to
> the eigenfiltering
>
> moran.test(residuals.glm(base.nb), mylistw)
>
>        Moran's I test under randomisation
>
> Moran I statistic standard deviate = 10.835, p-value < 2.2e-16
>
> alternative hypothesis: greater
>
> sample estimates:
>
> Moran I statistic       Expectation          Variance
>
>     0.1372647104     -0.0004995005      0.0001616635
>
>
>
> moran.test(residuals.glm(base.nb2), mylistw)
>
>        Moran's I test under randomisation
>
> Moran I statistic standard deviate = 10.837, p-value < 2.2e-16
>
> alternative hypothesis: greater
>
> sample estimates:
>
> Moran I statistic       Expectation          Variance
>
>     0.1372900049     -0.0004995005      0.0001616636
>
>
>
> lm.morantest(base.nb, mylistw)
>
>   Global Moran's I for regression residuals
>
> Moran I statistic standard deviate = 16.465, p-value < 2.2e-16
>
> alternative hypothesis: greater
>
> sample estimates:
>
> Observed Moran's I        Expectation           Variance
>
>      0.2056132825      -0.0028246948       0.0001602612
>
>
>
> lm.morantest(base.nb2, mylistw)
>
>     Global Moran's I for regression residuals
>
> Moran I statistic standard deviate = 16.4668, p-value < 2.2e-16
>
> alternative hypothesis: greater
>
> sample estimates:
>
> Observed Moran's I        Expectation           Variance
>
>      0.2056358254      -0.0028246666       0.0001602612
>
>
>
> ## I do not know whether both tests are OK for a negative binomial model, it
> seems that they all showed that there existed somewhat significant spatial
> autocorrelations on the residuals.
>
>
>
> #Run "ME" to identify whether the model needs eigenvectors and, if needed,
> how many eigenvectors are needed.
>
> ### FIRST with base.nb2 (glm above)
>
> myme.nb2 <- ME(dep ~ a + b + c + d + e + f + g, data=myshp, family =
> negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
> alpha=0.05, verbose=TRUE, nsim=99)
>
> Error in ME(a ~ b + c + d + e +  :
>
>   base correlation larger than alpha
>
> In addition: Warning message:
>
> glm.fit: algorithm did not converge
>
>
>
> ## SECOND with base.nb1 (glm.nb)
>
> ## To do this, I copied and pasted the new "ME.nb" command made by Pablo
> (Feb. 03): http://www.mail-archive.com/r-sig-geo@r-project.org/msg03967.html
>
> myme.nb <- ME.nb (dep ~ a + b + c + d + e + f + g + offset(log(TOTAL_POP),
> data=myshp, listw=mylistw, alpha=0.05, nsim=99, verbose =TRUE)
>
> Error in ME.nb(base.nb2, data = myshp, listw = mylistw, alpha = 0.05,  :
>
>  base correlation larger than alpha
>
> In addition: Warning messages:
>
> 1: glm.fit: algorithm did not converge
>
> 2: glm.fit: algorithm did not converge
>
> 3: In glm.nb(formula, data = data) : alternation limit reached
>
>
>
> ### I TRIED AFTER CHANGING "stdev" OPTIONS BOTH, BUT THE RESULTS WERE THE
> SAME.
>
> ### WHEN I INCREASED THE ALPHA LEVEL TO 0.4 IN BOTH, THE RESULTS WERE THE
> SAME.
>
> ### HOWEVER, WHEN I INCREASED THE ALPHA LEVEL TO 4.7 IN BOTH, I FINALLY GOT
> THE BELOW RESULT.
>
> myme.nb2 <- ME (dep~ a + b + c + d + e + f + g, data=myshp, family =
> negative.binomial(1.174928547), offset=log(TOTAL_POP), listw=mylistw,
> alpha=0.48, stdev=TRUE, verbose=TRUE, nsim=99)
>
> eV[,135], I: 3.805725e-09 ZI: -0.1491594, pr(ZI): 0.4407139
>
> eV[,18], I: 1.253316e-10 ZI: -0.02069758, pr(ZI): 0.4917434
>
> There were 50 or more warnings (use warnings() to see the first 50)
>
> (all warnings: glm.fit: algorithm did not converge)
>
>
>
> myme.nb1 <- ME.nb (REV_NUM ~ RACE_H_6_B + AGE_H_8_B + EDU_H_7_TO +
> INC_H_GINI + POV_PE + AVE_HOU_IN + GOV_REV + offset(log(TOTAL_POP)),
> data=myshp, listw=mylistw, alpha=0.48, nsim=999, verbose =TRUE)
>
> eV[,15], I: 2.775260e-08 ZI:
>
> NA, pr(ZI): 0.451
>
> eV[,17], I: 1.302856e-08 ZI: NA, pr(ZI): 0.486
>
> There were 50 or more warnings (use warnings() to see the first 50)
>
> (warnings:  glm.fit: algorithm did not converge
>
> In glm.nb(formula, data = data) : alternation limit reached
>
> In glm.nb(Y ~ iX) : alternation limit reached)
>
>
>
> ### Based on the result, if the warnings (glm.fit: algorithm did not
> converge) are not serious (particularly for the lower alpha level at which
> "base correlation larger than alphs showed"), I decided to stick to a
> negative binomial (non-spatial) model for my statistics.
>
>
>
> Here are some questions to the groups related to this work:
>
> First, is there any way that I can make the algorithm be converged? (like
> maxit=100 in glm.nb).
>
> Particularly, if Pablo can read this, It would appreciate if you can give me
> some suggestion for the modification of your program.
>
>
>
> Second, could you let me know if you find any mistakes on my script? In
> addition, does my conclusion (using a non-spatial negative binomial) seem to
> be OK?
>
> Or, if you think the warning is very serious, what alternative could you
> suggest? Do you think a spatial poisson may be an alternative? (considering
> spatial autocorrelation is more important than over dispersion.)
>
> (When I conduct the same procedure (the ME function) with the poisson model
> above, I did not get any warnings or errors with "ME" function.)
>
>
>
> Third, regarding the differences of Moran's I among "moran.test,"
> "lm.morantest," and "ME,"
>
> If the Moran's I from the "ME" function is correct for glm, I think that
> both "moran.test" and "lm.morantest" are somewhat risky for both poisson or
> negative binomial, at least a model with "offset" (like mine).
>
> (Actually, I also tried to use the ME function with the poisson base model
> above (though my data seems overdispersed). Then, the result requires me to
> add about 20 eigenvectors into my model at alpha 0.2, so that I "fitted" the
> eigenvectors from the "ME" results to my base poisson model. Then, I
> double-checked the spatial poisson regression with both "moran.test" and
> "lm.morantest," and they still showed  somewhat significant spatial
> autocorrelations in the model.)
>
> Could anyone explain the reason of these differences?
>
>
>
> Fourth, this is somewhat different topic, but I heard that Bayesian is
> another option for these non-normal distributed model. However, I also read
> a couple of articles (e.g., Gelman(2008)), that it may be dangerous to use
> it since it requires "prior distribution or belief" incorporated in the
> model. Is it also applied to spatial statistics? For me, I have never use
> the Bayesian yet and it seems to me that I can use it for now with my
> dataset.
>
>
>
> I would be very appreciated for any comments.
>
> Thanks,
>
> HJ
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no