Hello everyone,
I have some binary, spatially autocorrelated data I would like to run autologistic regression on. I hope to incorporate both ordinary covariates (environmental predictors) and a spatial autocovariate in the model, ideally with a second-order neighbourhood structure. Since my computing skills are limited, I am wondering if anyone has composed an algorithm for this purpose, and would be willing to share their work, or at least their insights.
My data come with three complications:
1.. I do not have response values for all the sites within my study area. In order to evaluate the autocovariate term, I therefore need to be able to simulate the response value at unsurveyed sites with a Gibbs sampler or modified Gibbs sampler, as described in Augustin et al. 1996 (Journal of Applied Ecology 33: 339-347) and 1998 (Environmetrics 9: 175-196).
2.. My study area amounts to continental Africa South of 6 degree North; both response and environmental data have a spatial resolution of 0.5 degree longitude by latitude. While I can obviously place this data in a rectangular lattice system, I would like to limit predictions to terrestrial areas (i.e. exclude ocean on either side of the continent). All environmental variables are coded zero in ocean areas, and I gather this could cause problems with certain parameter estimation techniques. I am therefore seeking an algorithm that will allow me to specify a study area within the presumably rectangular lattice used to define coordinates for the Gibbs sampler.
3.. The number of environmental covariates to include in the model is potentially large. I have 61 candidate predictors. I intend to reduce the number of covariates by first running ordinary logistic regression with stepwise variable selection, but preliminary analyses suggest that I may nonetheless be left with 20-25 predictors to include in the autologistic model.
My primary concern with these models is prediction, although I would also like to get a sense of the relative importance of different environmental predictor variables. Since I will need to build models for more than 1600 species of bird with access to only limited computing power, I am willing to compromise on the accuracy of parameter estimation in exchange for computing speed. I gather, for example, that parameter estimation techniques involving Markov Chain Monte Carlo provide more accurate estimates of standard error but are much more complex and lengthy in their computation that the method of maximising the pseudo-likelihood.
I work in SPlus 6.0 on Windows 2000 and Windows XP. I have not previously worked in R, but understand the two systems are similar enough. So if you have any advice on the above or could help me put together an appropriate algorithm in either S or R, I would much appreciate hearing from you. I can be reached at jana.mcpherson@zoology.oxford.ac.uk
Best wishes,
Jana
~~~~~~~~~~~~~~~~~~~~
Jana M. McPherson (Schulz)
Department of Zoology, University of Oxford
South Parks Road, Oxford OX1 3PS, United Kingdom
Current address: 598 Huron Street, Toronto ONT, Canada
Tel: +1-416-929 2858
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[[alternative HTML version deleted]]