[R] fit simple surface to 2d data?
Roger.Bivand at nhh.no
Sat Jul 7 13:47:48 CEST 2001
On Fri, 6 Jul 2001, Prof Brian Ripley wrote:
> On Fri, 6 Jul 2001, george young wrote:
> > I have an array of floating-point measurements on a square (5 by 5) 2d grid.
> > The data are nominally constant, and somewhat noisy.
> > I need to find any significant spatial trend, e.g. bigger on the
> > left, bigger in the middle, etc. I have many thousands of these data sets
> > that need to be scanned for 'interesting' spatial variations, selecting the
> > datasets that are beyond some criterion of flatness.
> > My thought was to fit a 2'nd order polynomial with least-squares or some
> > such metric, and scan for coefficients bigger than some cutoff. I think
> > a parabolic surface is probably as complex a surface as the small amount of data merits.
> > Is there functionality in R that would be appropriate?
> Trend surfaces in package spatial do that, and I would rather do an anova,
> which Roger Bivand has kindly contributed.
> > Is there some other approach anyone would suggest for the general task?
> > I'm not very experienced in data crunching, so any suggestion would
> > be appreciated.
> That's more or less what I would do, the anova bit being the difference.
Yes, this feels like trend surface, but I'm not sure that it isn't a
classification problem? Given that there are thousands of replications of
the 25 grid values, maybe clara() in the cluster package or one of the
many other classifiers could pull out a much smaller number of classes for
which the surfaces could be calculated?
Clara wouldn't be using the distance information at all, unfortunately.
Another cut might be to compute a localised Moran's I_i or the Getis-Ord
G_i, yielding local measures of spatial autocorrelation for each of the
grid points and cluster those? This would be especially relevant if the
process generating the z values at the grid locations is known to exhibit
positive spatial dependence (values close to each other on the grid
are more alike than spatially distant values). If there is no spatial
dependence, trend surface won't help much either!
anova() on the trend surfaces could do this testing against "some
criterion of flatness", like the 0 order surface,
> x <- 1:5
> y <- 1:5
> z <- runif(25)
> x.g <- expand.grid(x,y)[,1]
> y.g <- expand.grid(x,y)[,2]
> anova(surf.ls(0, x.g, y.g, z), surf.ls(2, x.g, y.g, z))
Analysis of Variance Table
Model 1: surf.ls(np = 0, x = x.g, y = y.g, z = z)
Model 2: surf.ls(np = 2, x = x.g, y = y.g, z = z)
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1 24 2.07349
2 19 1.28882 5 0.78467 2.3136 0.08421
but maybe if a classifier was trained to distinguish grids using "some
criterion of flatness", incoming data could be sorted into flat/not flat
for further exploration. One of the issues I would watch with trend
surface is the influence of outlying z values, something a classification
approach might not be affected by to the same extent.
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help