[R-sig-Geo] Point pattern analysis

Fri Feb 20 18:26:05 CET 2009

Hi Michael,
A couple of thoughts. Many of the statistical methods are geared toward
describing 
the pattern over a region. So methods like the k-functions, and such
will describe the global 
covariance function over a range of spatial scales. Here you are looking
at very local phenomena,
I am standing in Rotterdam central with my crackberry, and I want to
know what is the best
Italian restaurant within a kilometer. Well one issue is euclidean
distance a good proxy for walking
distance? So nearest neighbor searching might better be done on
Manhattan distance, or even
a quick all shortest path calculation on the street network in the one
kilometer circle. A nice
way to think about these problems might be in terms of graphs (as in
graph theory). Rather than point
process ideas, you could look at some of the social network models. 

Also inherent in this is ranking the restaurants, what order should they
be listed?
If distance, quality, price, ... all play a part in my search criteria,
than 
the ordering should reflect those weightings.  So a neat application
might be that I want
to know (if I were younger) which cheap Italian restaurants are in
walking distance of night
clubs with live music. And get some ranking on those pairs weighted by
distance and price.
So the top ten pairs would be displayed in the view, with lines
connecting them, and the
lines weighted by the pair rankings.

So what you want seems to be more on the lines of heuristics for picking
the best
in a limited query. "statistics" might help a little if you exploit the
local nature of
the queries, ie build your rankings normalized by all Italian
restaurants in Rotterdam,
and highlight the ones that are spatial outliers (really good) LISA's 
(local indicators of spatial association) might be  a good approach for
this. Again,
these are just my musings, but in this context point pattern analysis
may not have that much to
offer.

Nicholas

-------------------------------------------------------------------------------

Dear Virgilio / Adrian,

I do have data regarding districts, cities and provinces. I could easily
ask
for all the restaurants in Rotterdam or for the province. But for
Location-Based Services, the users tend to be interested in local /
nearby
points of interests. I'm looking for techniques to allow the users to
make
better decisions about the restaurants. By aggregating the data and
doing
area data analysis it might be useful for users that have no real idea
what
place they're looking for. For example tourists that want to know which
city
has the most / best restaurants.

but if you allow them to zoom in then you probably
> want to show the individual locations of the restaurants.
>

Yeah, on a more detailed level, I want to analyse the nearby
restaurants.
Let's say someone's looking for nearby Italian restaurants, for example
within a 1 kilometer radius. The user is presented with 10 restaurants.
To
move beyond pinpoints on a google map, I want to analyse these
restaurants.
I'm interested in showing more information about these 10 restaurants
then
just their location (which becomes obvious from the pinpoints). Let's
say I
have another point pattern dataset which contains the location for ATMs.
I
could use the *nncross* function in spatstat to return the nearest ATM
for
these 10 restaurants that match my criteria. Using arrows to pinpoint
the
ATMs would clearly help users to determine if restaurant 1 is better
than
restaurant 10. Any ideas / tips regarding analysing / comparing two
point
patterns? So far, I've played with the nndist and nncross functions
provided
by spatstat.

This can be more complex because then you may want to produce a map
> based on the rating, and then the rating becomes the response variable
> in your model...

Isn't this what smooth.ppp is trying to accomplish?

Yes, you can compare the spatial distributions of different types of
> restaurants.

Could you give me an example of such a comparison? Do you mean
estimating a
surface for Italian restaurants and for Greek restaurants. And show them
next to each other, as used in split.ppp?
Or different outputs such as tabular comparisons?

Basically, I'm restricted to the screen size of either the iPhone or
Nokia
N95 8GB, since my research involves developing for either one of those
two
phones since they utilise A-GPS.
My dataset has ratings on food / interior / service and a general
rating.
What type of analysis would best suit my dataset in this case? 4 kernel
density estimations and comparing them?

Please have a look at Part V of the e-book 'Analysing Spatial Point
Patterns
> in R' (version 3) available at <
> http://www.csiro.au/resources/Spatial-Point-Patterns-in-R.html> which
> contains a detailed description of how to analyse such data in the package
> 'spatstat' using both exploratory tools and formal statistical models.
>

I've read through Part V and other sections of the e-book. I want to
utilise
Visualisation techniques and Exploratory techniques. Modelling and
thereby
forming statistical models goes beyond the scope of my research. Given
this
limitations, are there any other papers/techniques/r packages I should
consider? My dataset is clearly a point pattern dataset. I might be able
to
get some other point pattern datasets as well. I've looked through the
ones
mentioned under the section *point pattern analysis* at
http://cran.r-project.org/web/views/Spatial.html.

Thanks very much for the great input so far!

Kind regards,

Michel

009/2/17 Virgilio Gomez Rubio <Virgilio.Gomez at uclm.es>

> Hi,
>
>
> > Thanks :) Actually, I'm busy with developing a Location-Based Service
> > (a restaurant finder to be precise) utilising SDA. The goal of my
> > research is to integrate SDA in an LBS. For this purpose, I've
> > gathered about 13,000 unique restaurants in the Netherlands and would
> > like to use 3 SDA techniques that enhance the restaurant finder either
> > visually and/or analytically. The motivation behind my research is t
> > start a discussion on how SDA can be used inside LBSs to enhance the
> > services. In this case, to enable users to make better decisions about
> > nearby restaurants. One thing that popped in my mind was to use kernel
> > density estimation and overlay it on the google/microsoft map to allow
> > users to easily grasp the proximity of restaurants.
>
> Perhaps it would be better if you aggregated your data and considered
> municipalities in The Netherlands. I guess that area level maps are
> easier to understand. What I mean is that your users will find more
> meaningful that there are, say, 20 Indian restaurants in Nijmegen than
> saying that the intensity for the Indian restaurants have a peak in the
> centre of Nijmegen. Regional maps will be helpful if you have a whole
> map of the country, but if you allow them to zoom in then you probably
> want to show the individual locations of the restaurants.
>
> >
> >         Depending on the number of different types of restaurants, you
> >         may want
> >         to estimate a different surface for each type. Basically, you
> >         may
> >         consider a multivariate point pattern, so that you estimate a
> >         different
> >         surface for each type and  you compare then to see if they are
> >         similar
> >         or not. This will address the question of whether the spatial
> >         distribution of different types of restaurants is the same or
> >         not
> >
> > This is quite interesting. Would this allow me to estimate a surface
> > for let's say Italian restaurants vs Greek restaurants? I have ratings
>
> Yes, you can compare the spatial distributions of different types of
> restaurants.
>
> >  for each restaurant. So a user might want to ask "Where can I find
> > good Italian restaurants in the South?" Where good is any rating above
> > a 7.0 for example.
>
> This can be more complex because then you may want to produce a map
> based on the rating, and then the rating becomes the response variable
> in your model...
>
> >
> >         You may also want to compute bivariate K-functions (see
> >         'k12hat' in
> >         splancs; 'Kmulti' in spatstat) to detect differences between
> >         the spatial
> >         distributions of types of restaurants. This will give you a
> >         partial
> >         answer to Question 2.
> >
> > Would this mean that a kmulti analysis should be applied for each
> > restaurant type and thus each subset I wish to test?
>
> You will need to consider each pair of restaurants at a time.
>
> >
> >         Have you considered to test for whether a certain type of
> >         restaurant tends to appear around a particular area of the
> >         city? For
> >         example, are Chinese restaurants clustered around Chinatown?
> >
> > This is something I'm looking for as well. Considering the fact that
> > I'm in the process of developing such an LBS, it would be something
> > along the line of: A user takes out his mobile phone. He starts the
> > application and the applications looks acquires a position fix. When
> > this is done, a user might want to know: "What type of restaurant is
> > typical for my current location or current neighbourhood. So,
> > analysing whether a certain type of restaurant tends to appear around
> > the CURRENT area of the city. Is this possible?
>
> Yes, I guess that you can make a buffer of, say, 300 m around the user's
> location and then display your results based on the restaurants included
> in that buffer.
>
> > Overall, thanks very much for your reply. I'm really excited about
> > using these SDA techniques and am very grateful for your quick reply.
> > I'll look up a copy of the papers you mentioned and will read through
> > them as soon as I can. When I've successfully analysed the dataset
> > with some SDA techniques I can begin the process of constructing the
> > appropriate architecture for the LBS. I'll definitely keep you guys
> > posted if you're interested.
>
> That would be good. And if you get free vouchers let us now as well!! :)
>
> Best,
>
> Virgilio
>
>

        [[alternative HTML version deleted]]