[R] fit simple surface to 2d data?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sat Jul 7 15:50:35 CEST 2001
Well, I would say that the definition of 'interesting' makes this into
a supervised pattern recognition problem, and that using clustering
techniques for such problems is a classic error. What one needs is some
way to put in the prior information, and that needs model-based clustering
techniques, at least.
A close analogy: people studying automated screening of mammograms are
trying to pink up signs of (pre)-cancer, not the many benign variations in
breast tissue. Yet clustering techniques have been proposed frequently
(and those I have studied are not at all successful).
On Sat, 7 Jul 2001, Roger Bivand wrote:
> On Fri, 6 Jul 2001, Prof Brian Ripley wrote:
> > On Fri, 6 Jul 2001, george young wrote:
> > > I have an array of floating-point measurements on a square (5 by 5) 2d grid.
> > > The data are nominally constant, and somewhat noisy.
> > > I need to find any significant spatial trend, e.g. bigger on the
> > > left, bigger in the middle, etc. I have many thousands of these data sets
> > > that need to be scanned for 'interesting' spatial variations, selecting the
> > > datasets that are beyond some criterion of flatness.
> > >
> > > My thought was to fit a 2'nd order polynomial with least-squares or some
> > > such metric, and scan for coefficients bigger than some cutoff. I think
> > > a parabolic surface is probably as complex a surface as the small amount of data merits.
> > >
> > > Is there functionality in R that would be appropriate?
> > Trend surfaces in package spatial do that, and I would rather do an anova,
> > which Roger Bivand has kindly contributed.
> > > Is there some other approach anyone would suggest for the general task?
> > > I'm not very experienced in data crunching, so any suggestion would
> > > be appreciated.
> > That's more or less what I would do, the anova bit being the difference.
> Yes, this feels like trend surface, but I'm not sure that it isn't a
> classification problem? Given that there are thousands of replications of
> the 25 grid values, maybe clara() in the cluster package or one of the
> many other classifiers could pull out a much smaller number of classes for
> which the surfaces could be calculated?
> Clara wouldn't be using the distance information at all, unfortunately.
> Another cut might be to compute a localised Moran's I_i or the Getis-Ord
> G_i, yielding local measures of spatial autocorrelation for each of the
> grid points and cluster those? This would be especially relevant if the
> process generating the z values at the grid locations is known to exhibit
> positive spatial dependence (values close to each other on the grid
> are more alike than spatially distant values). If there is no spatial
> dependence, trend surface won't help much either!
> anova() on the trend surfaces could do this testing against "some
> criterion of flatness", like the 0 order surface,
> > x <- 1:5
> > y <- 1:5
> > z <- runif(25)
> > x.g <- expand.grid(x,y)[,1]
> > y.g <- expand.grid(x,y)[,2]
> > anova(surf.ls(0, x.g, y.g, z), surf.ls(2, x.g, y.g, z))
> Analysis of Variance Table
> Model 1: surf.ls(np = 0, x = x.g, y = y.g, z = z)
> Model 2: surf.ls(np = 2, x = x.g, y = y.g, z = z)
> Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
> 1 24 2.07349
> 2 19 1.28882 5 0.78467 2.3136 0.08421
> but maybe if a classifier was trained to distinguish grids using "some
> criterion of flatness", incoming data could be sorted into flat/not flat
> for further exploration. One of the issues I would watch with trend
> surface is the influence of outlying z values, something a classification
> approach might not be affected by to the same extent.
One could do robust fitting (and we do on brain images). *But* outliers
will correspond to non-flatness here.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help