[R-sig-Geo] model fitting of randomly generated data in spatstat

Rolf Turner r.turner at auckland.ac.nz
Thu Apr 2 02:02:51 CEST 2015


On 02/04/15 03:09, Robert Leaf wrote:

> I was generating some data for analysis and was curious to see if we could
> fit a “MatClust” model using the function *spatstat*::kppm to some of our
> observed data. As a first cut, and to see if we get values that conform to
> our expectations, I fit models to simulated data and was curious about the
> results. I am hoping that the group can help me understand the departures
> from expecations.
>
> Is it reasonable that the kppm function should return parameters values
> that are similar to the those that generated the data?

Sure, given that the those used to generate the data are not too bizarre.

>
> We are not getting value that are anywhere close to what we would expect.

That appears to be because you are using *bizarre* parameter values to 
generate your data.  The algorithms used by kppm() can be expected to 
return far-out results unless the data to which kppm() is applied have 
at least *some* reasonable prospect of conforming to the model that is 
being fitted.

> library(*spatstat*)

What are those asterisks doing in that call???  That cannot have been 
the call that you actually used.


> (point.vals <- rMatClust(kappa = 2, r = 2, mu = 2000)) # generate random
> points
>
> if (point.vals$n > 0) { # some realizations of the model return .ppp
> variables of with no data

I was initially bewildered by this --- the expected number of points is 
4000, so how could you possibly get zero points? I asked.  Finally I saw
the light; with kappa = 2 you will zero parent points, and hence an 
empty pattern about 13.5% of the time.  I.e. kappa = 2 is just plain 
silly-small.

Using "r = 2" (these days the syntax is ***scale = 2*** means that you 
are forming clusters in discs of radius 2 .... in the unit square!!!
(You are using the default window.) This makes no sense to me.

Setting mu = 2000 means you are generating an average of 2000 points in
each such disk.  I really don't think this is a realistic value for a 
Matérn cluster process.

Your simulated pattern (if it is not empty) will have the appearance of 
having arisen from a very high intensity Poisson process.  Fitting a 
Matérn cluster process to such a pattern results in ill-determined 
parameter values.

Try:

set.seed(42)
X <- rMatClust(kappa=20,scale=0.04,mu=5)
fit <- kppm(X ~ 1,"MatClust")
fit

....

Fitted cluster parameters:
       kappa       scale
22.37058543  0.04168089
Mean cluster size:  4.514857 points

The estimated parameters are reasonably commensurate with those used
to generate the pattern.

<SNIP>

cheers,

Rolf Turner

P.S. If your chosen parameter values (kappa = 2, mu = 2000) were 
selected in imitation of parameter estimates obtained from fitting a 
Matérn cluster model to real data, then I would suggest that you should 
probably *not* fit such a model to those data.

In modelling it is important to try fitting *appropriate* models to data 
sets.  Otherwise the results you get may well be meaningless.

R. T.

-- 
Rolf Turner
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
Home phone: +64-9-480-4619



More information about the R-sig-Geo mailing list