[R-sig-Geo] gridding data prior to spatial analyses

Jonathan Greenberg jgrn at illinois.edu
Tue Apr 1 23:51:14 CEST 2014


Daniel:

I guess I'm not entirely understanding why you WOULD grid the data
first unless you plan on using the model to make
predictions/interpolate your response variable (biomass) spatially.
If you do plan on doing this (create continuous maps of biomass based
on spatial predictors), then yes you need continuous maps of each of
the predictor variables.  You do probably want to account for
autocorrelation in your dataset, since as you mentioned your data is
clumped, but this can all be done using the data as points.

--j

On Tue, Apr 1, 2014 at 2:50 PM, Jones, Daniel O.B. <dj1 at noc.ac.uk> wrote:
> Dear all,
> I have a reasonably large database (~2500 points) of biomass values (the response variable) with associated positional information (lat / long) in the Atlantic. I want to look at potential environmental explanatory variables. I have several environmental datasets associated with each point (e.g. temperature, salinity, oxygen, organic carbon etc.). The data are spatially patchy and some locations (e.g. the North Sea) have a lot of data in a small area, while other areas have sparse data (e.g. the central Atlantic). I wanted to use spatial simultaneous autoregressive error modelling (errorSARlm in the spdep package) in R to assess how the biomass varies with each of the potential explanatory variables. In many other analyses I have seen the data are gridded prior to analysis. This leads to several questions:
> 1)      Should I grid the data? This dramatically reduces the available number of observations from around 2500 to around 150 (geometric mean biomass in 5 degree grid cells), but solves the problem of unequal data distribution. Are there any references that provide a recommendation for this?
> 2)      If I grid the data, should I grid the data at a higher resolution i.e. with lots of smaller cells (e.g. 1 degree). This will result in a sparse coverage (i.e. lots of holes) and lower number of observations per cell but will increase the accuracy and precision of the environmental data (which can vary dramatically over a 5 degree grid) and will increase the number of cells in the analysis (presumably increasing statistical power).
> 3)      If I grid the data, should I pick a minimum number of observations per cell and exclude the cells that do not meet this criteria. Other papers exclude grid cells where the number of observations is lower than a set value (determined, for example, by assessing how relative standard errors decrease with the number of observations).
> I would greatly appreciate any advice from someone familiar with these analyses, particularly if you know of any published papers that back up the approach.
> Many thanks, Daniel
>
> This message (and any attachments) is for the recipient ...{{dropped:6}}
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
259 Computing Applications Building, MC-150
605 East Springfield Avenue
Champaign, IL  61820-6371
Phone: 217-300-1924
http://www.geog.illinois.edu/~jgrn/
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007



More information about the R-sig-Geo mailing list