stopdetection-vignette

Importance of stop detection

This vignette shows how the stopdetection package can be used to segment timestamped trajectories into sequences of stops and tracks (alternatively known as stay points and trajectories).

Living beings tend to have movement behavior that intersperses time spent in motion with time spent at rest. Often it’s interesting to distinguish between the the two states of “moving” and “stopped,” because we want to investigate only one or the other, we want to aggregate over the state, or because we want to calculate some statistics for these states independently.

Humans tend to be quite good at subjectively distinguishing between states. You can probably list the different places you’ve been today without much difficulty, for example, because you are able to combine the aspects of space, time and motive. Raw location data, on the other hand, must be in some way analyzed in order to differentiate between these states. This package uses a very simple algorithm to cluster trajectories.

Stop detection parameters

Distance (\(\theta_{D}\))

We can think of a stop as a place where our location didn’t change very much for a certain length of time. How much and how long depend very much on the goals of the researcher. While the same concepts apply across varied disciplines, this vignette is written from the context of human mobility studies on transportation. If a person is at home for many hours, his or her location may differ slightly from hour to hour, but their subjective place, home, has not changed. Because an object’s location may change while the stop of interest remains the same, we uncover the first necessary parameter in our algorithm, \(\theta_{D}\). This reflects the distance (in meters) that someone may range from a stop point before they are considered to be no longer at that stop.

The smaller \(\theta_{D}\) is, the more closely locations must be grouped together to be called a stop. Below is an example of stop clustering under a \(\theta_{D}\) of 100 (red), 200 (blue) or 500 (green) meters.

Time (\(\theta_{T}\))

Now consider person who is walking from their home to the grocery store. They are in the “moving” state, but come to a stop light where they must wait to cross the street. During this time, their location does not change, or changes very little, but in our context, we would not consider this person to be at a “stop.” This demonstrates the need for the second parameter in the algorithm, \(\theta_{T}\). This we use to place a lower limit on the length of time that someone must stay within \(\theta_{D}\) meters of an initiating location before it will be considered a stop.

Using stopdetection

Data

Timestamped trajectories of latitude and longitude coordinates can come from many sources, but the stop detection algorithm in this package has been built to function best on streams of location data coming from individual persons.

The following data set contains daily movement behavior for one person over a span of two weeks time. It contains WGS84 latitude and longitude coordinates collected every few minutes.

data("loc_data_2019")

Importantly, this package works with the library data.table. This has been done in order to improve performance when working with large data sets, or when calling the functions in this package repeatedly, as one might within a simulation study.

It’s easy to go from a data.frame object to a data.table object using .

library(data.table)
setDT(loc_data_2019)
loc_data_2019
#>        latitude longitude           timestamp
#>           <num>     <num>              <POSc>
#>     1: 52.07211  5.123721 2019-11-01 00:02:46
#>     2: 52.07211  5.123721 2019-11-01 00:04:59
#>     3: 52.07211  5.123721 2019-11-01 00:08:02
#>     4: 52.07211  5.123721 2019-11-01 00:10:15
#>     5: 52.07211  5.123721 2019-11-01 00:13:07
#>    ---                                       
#> 21907: 52.07212  5.123699 2019-11-14 23:48:23
#> 21908: 52.07212  5.123699 2019-11-14 23:50:28
#> 21909: 52.07212  5.123699 2019-11-14 23:53:23
#> 21910: 52.07212  5.123699 2019-11-14 23:56:23
#> 21911: 52.07212  5.123699 2019-11-14 23:59:23

stopFinder

Initial stop clusters can be identified using . This requires named parameters thetaD and thetaT, as described above. The data.table supplied as the first argument will be modified by reference, and columns will be added for the state and state_id.

stopFinder(loc_data_2019, thetaD = 200, thetaT = 300)[]
#>        latitude longitude           timestamp stop_initiation_idx  timedif
#>           <num>     <num>              <POSc>               <int>    <num>
#>     1: 52.07211  5.123721 2019-11-01 00:02:46                   1  66.5265
#>     2: 52.07211  5.123721 2019-11-01 00:04:59                   1 158.2015
#>     3: 52.07211  5.123721 2019-11-01 00:08:02                   1 157.9520
#>     4: 52.07211  5.123721 2019-11-01 00:10:15                   1 152.5980
#>     5: 52.07211  5.123721 2019-11-01 00:13:07                   1 163.8395
#>    ---                                                                    
#> 21907: 52.07212  5.123699 2019-11-14 23:48:23               21707 138.6700
#> 21908: 52.07212  5.123699 2019-11-14 23:50:28               21707 150.0025
#> 21909: 52.07212  5.123699 2019-11-14 23:53:23               21707 177.5640
#> 21910: 52.07212  5.123699 2019-11-14 23:56:23               21707 180.0490
#> 21911: 52.07212  5.123699 2019-11-14 23:59:23               21707  90.0665
#>        state_id   state
#>           <int>  <char>
#>     1:        1 stopped
#>     2:        1 stopped
#>     3:        1 stopped
#>     4:        1 stopped
#>     5:        1 stopped
#>    ---                 
#> 21907:      285 stopped
#> 21908:      285 stopped
#> 21909:      285 stopped
#> 21910:      285 stopped
#> 21911:      285 stopped

returnStateEvents

Once the initial stops have been generated, it is possible to use the function to extract a data.table containing one row per event. For both stop and move events, these are annotated with the state and state_id, begin_time and end_time and number of locations belonging to the state. For move states, the raw distance traveled is included (sum of all distances between points). For stop states, the mean latitude and longitude coordinates are included.

events <- returnStateEvents(loc_data_2019)
events[]
#>      state_id   state  meanlat  meanlon          begin_time            end_time
#>         <int>  <char>    <num>    <num>              <POSc>              <POSc>
#>   1:        1 stopped 52.07212 5.123761 2019-11-01 00:02:46 2019-11-01 08:05:39
#>   2:        2  moving       NA       NA 2019-11-01 08:05:55 2019-11-01 08:06:27
#>   3:        3 stopped 52.07788 5.122714 2019-11-01 08:06:42 2019-11-01 08:11:29
#>   4:        4  moving       NA       NA 2019-11-01 08:12:00 2019-11-01 08:15:24
#>   5:        5 stopped 52.08902 5.109717 2019-11-01 08:15:40 2019-11-01 08:24:10
#>  ---                                                                           
#> 281:      281 stopped 52.07211 5.123767 2019-11-14 16:45:02 2019-11-14 19:02:28
#> 282:      282  moving       NA       NA 2019-11-14 19:02:43 2019-11-14 19:11:46
#> 283:      283 stopped 52.08177 5.138043 2019-11-14 19:12:02 2019-11-14 19:57:11
#> 284:      284  moving       NA       NA 2019-11-14 19:57:40 2019-11-14 20:08:32
#> 285:      285 stopped 52.07213 5.123719 2019-11-14 20:08:47 2019-11-14 23:59:23
#>      raw_travel_dist stop_id move_id n_locations
#>                <num>   <int>   <int>       <int>
#>   1:              NA       1      NA         471
#>   2:        158.2833      NA       1           2
#>   3:              NA       2      NA          21
#>   4:       1253.8918      NA       2          13
#>   5:              NA       3      NA          36
#>  ---                                            
#> 281:              NA     180      NA         115
#> 282:       2171.3438      NA     102          33
#> 283:              NA     181      NA          65
#> 284:       1911.7921      NA     103          38
#> 285:              NA     182      NA         205

Merging close stops

It may be useful to merge successive locations that have been clustered into stops. Consider the situation in which multiple stops have been identified within a building with the same semantic meaning. This parameter is another distance parameter and reflects how far away the centroids of the stops may be while being merged. This doesn’t have to be the same as the distance parameter set during the stop detection algorithm.

Merging or excluding short tracks and errors

Often tracks consisting of only one point, lasting for only a few seconds, or covering very little distance are actually errors, rather than tracks, and interrupt what would otherwise be a single contiguous stop. These sets of locations can be handled either by merging them with a stop, or by excluding them. Short tracks may be excluded on the basis of time, using the max_time parameter, distance (in meters) using the max_dist parameter, or total number of locations involved, using the max_locs parameter.

Most often, these steps will be carried out together, as removing or merging short tracks will tend to create two subsequent stops with very close centroids.

mergingCycle(loc_data_2019,
             thetaD = 200,
             small_track_action = "exclude",
             max_time = 600,
             max_dist = 2000,
             max_locs = 20)
returnStateEvents(loc_data_2019)[]
#>      state_id    state  meanlat  meanlon          begin_time
#>         <int>   <char>    <num>    <num>              <POSc>
#>   1:        1  stopped 52.07212 5.123760 2019-11-01 00:02:46
#>   2:       NA excluded       NA       NA 2019-11-01 08:05:55
#>   3:       NA excluded       NA       NA 2019-11-01 08:05:55
#>   4:        2  stopped 52.07798 5.122445 2019-11-01 08:06:42
#>   5:        3  stopped 52.08889 5.109546 2019-11-01 08:15:40
#>  ---                                                        
#> 196:      194  stopped 52.07211 5.123737 2019-11-13 18:20:59
#> 197:      195   moving       NA       NA 2019-11-14 19:02:43
#> 198:      196  stopped 52.08177 5.138043 2019-11-14 19:12:02
#> 199:      197   moving       NA       NA 2019-11-14 19:57:40
#> 200:      198  stopped 52.07213 5.123719 2019-11-14 20:08:47
#>                 end_time raw_travel_dist stop_id move_id n_locations
#>                   <POSc>           <num>   <int>   <int>       <int>
#>   1: 2019-11-01 08:05:39              NA       1      NA         471
#>   2: 2019-11-14 16:44:47              NA      NA      NA         255
#>   3: 2019-11-14 16:44:47              NA      NA      27         255
#>   4: 2019-11-01 08:11:29              NA       2      NA          21
#>   5: 2019-11-01 08:24:10              NA       3      NA          36
#>  ---                                                                
#> 196: 2019-11-14 19:02:28              NA     142      NA        1300
#> 197: 2019-11-14 19:11:46        2171.344      NA      53          33
#> 198: 2019-11-14 19:57:11              NA     143      NA          65
#> 199: 2019-11-14 20:08:32        1911.792      NA      54          38
#> 200: 2019-11-14 23:59:23              NA     144      NA         205