The hot spot data, or the Moderate Resolution Imaging
Spectroradiometer (MODIS) hot spot data, is usually taken from satellite
for burning fire detection. An example of this kind of data is the *Himawari-8
Wild Fire product* provided by the Japan Aerospace Exploration
Agency (JAXA).

This vignette shows you how to cluster satellite hot spots, detect
ignition points and reconstruct fire movement using the
`spotoroo`

package.

Load the `spotoroo`

package using the
`library()`

function.

In order to use the spatiotemporal clustering algorithm provided by
the `spotoroo`

package, the satellite hot spot data needs to
be stored in a `list`

which have at least three fields: the
**observed time**, the **longitude** and the
**latitude**.

The **observed time** needs to be an object inherited
from one of these classes:

`Date`

`POSIXlt`

`POSIXct`

`numeric`

Besides, the **longitude** and the
**latitude** needs to be in `numeric`

.

Using the built-in dataset `hotspots`

as an example, it is
a `data.frame`

with a `lon`

(**longitude**) `numeric`

column, a
`lat`

(**latitude**) `numeric`

column
and a `obsTime`

(**observed time**)
`POSIXct`

column.

```
str(hotspots)
#> 'data.frame': 1070 obs. of 3 variables:
#> $ lon : num 147 146 143 149 142 ...
#> $ lat : num -37.5 -37.9 -37.8 -37.4 -37.1 ...
#> $ obsTime: POSIXct, format: "2020-02-01 05:20:00" "2020-01-02 06:30:00" ...
```

This dataset contains 1070 selected hot spots in Victoria (Australia)
during the 2019-2020 Australian bushfire season. More details about this
dataset can be found by using the function
`help(hotspots)`

A common way to get a better understanding of the hot spot data is to
visualize it. This package provides a function
`plot_vic_map()`

(package `sf`

needs to be
installed) to draw a map of Victoria. It returns a `ggplot`

object, so new layers could be added onto it. Here we use
`geom_point()`

to draw red dots for hot spots.

From the map, we observe that there are approximately \(4\) clusters, but we don’t know if they can be further broken down into more clusters. Our goal is to cluster these hot spots into fires in a temporal and spatial manner.

```
library(ggplot2)
if (requireNamespace("sf", quietly = TRUE)) {
plot_vic_map() +
geom_point(data = hotspots, aes(lon, lat), col = "red")
}
```

If you would like to visualize the hot spots in other areas, you
might be able to find the spatial data from the
`rnaturalearth`

package and make a similar map using the
`geom_sf()`

and `ggthemes::theme_map()`

function.
An example is given below.

`hotspot_cluster()`

is the main function of this
package. In most of the cases, you only need to use this function to
perform the spatiotemporal clustering algorithm.

This function has \(11\) arguments, which can be divided into four categories:

Arguments | Description |
---|---|

`hotspots` |
the object that contains the dataset |

`lon` |
the name of the longitude column |

`lat` |
the name of the latitude column |

`obsTime` |
the name of the observed time column |

The first four arguments are pretty straight forward, you only need to provide the dataset object and its corresponding column names.

Arguments | Description |
---|---|

`activeTime` |
the time tolerance |

`adjDist` |
the distance tolerance |

`minPts` |
the minimum number of hot spots |

`minTime` |
the minimum length of time |

These four arguments control the clustering process.

`activeTime`

could be interpreted as**the time a fire can stay smouldering but undetectable by satellite before flaring up again**. For example, if`activeTime`

\(= 24\), then the time tolerance is \(24\) time indexes.`adjDist`

could be interpreted as**the maximum intra-cluster distance between a hot spot and its nearest hot spot**. For example, if`adjDist`

\(= 3\), then the distance tolerance is \(3\) km.*However, in some very special cases, the intra-cluster distance between a hot spot and its nearest hot spot will exceed this threshold*. You can learn more about this parameter and the algorithm by using the`help(hotspot_cluster)`

function.`minPts`

is**the minimum number of hot spots in a cluster**. For example, if`minPts`

is \(4\), then any cluster with less than \(4\) hot spots will be treated as noise.`minTime`

is**the minimum length of time of a cluster**. For example, if`minTime`

is \(3\), then any cluster lasts shorter than \(3\) time indexes will be treated as noise.

In practice, we usually don’t have knowledge about the parameters
`activeTime`

and `adjDist`

, but in general, the
number of clusters will decrease when you increase these two
parameters.

Comparing the clustering results under different settings is an available method to determine these two parameters. We will show how to do this in the section Additional topic: Choice of parameters.

In terms of `minPts`

and `minTime`

, they depend
on your personal preference of noise reduction. And, the number of
clusters will often decrease when you increase these two parameters.

A sensible choice of these two parameters is `minPts`

\(\in [3,10]\) and `minTime`

\(\in [1~hour, 12~hours]\)

Arguments | Description |
---|---|

`ignitionCenter` |
method of the calculation of the ignition points |

For a cluster, if `ignitionCenter`

is “mean”, the centroid
of the earliest hot spots will be used as the ignition point, if
`ignitionCenter`

is “median”, the median longitude and the
median latitude of these hot spots will be used as the ignition
point.

In usual, there is no significant difference between these two
methods, so we recommend to set
`ignitionCenter = "mean"`

.

Arguments | Description |
---|---|

`timeUnit` |
the unit of time |

`timeStep` |
the number of time unit one time index contains |

Due to the design of the algorithm, the observed time needs to be
transformed to discrete time index. Available time units are “d” (days),
“h” (hours), “m” (minutes), “s” (seconds) and “n” (numeric).
`timeUnit = "n"`

will only be accepted when the observed time
is already a `numeric`

vector.

For example, if `timeUnit`

is “h” and
`timeStep`

is \(2\), then
the difference between time index \(1\)
and time index \(2\) is \(2\) hours.

With specifications of all the arguments, the
`hotspot_cluster()`

can be used as below. Generally, you need
to use an object to catch the return.

```
result <- hotspot_cluster(hotspots = hotspots,
lon = "lon",
lat = "lat",
obsTime = "obsTime",
activeTime = 24,
adjDist = 3000,
minPts = 4,
minTime = 3,
ignitionCenter = "mean",
timeUnit = "h",
timeStep = 1)
#>
#> ──────────────────────────────── SPOTOROO 0.1.4 ────────────────────────────────
#>
#> ── Calling Core Function : `hotspot_cluster()` ──
#>
#> ── "1" time index = 1 hour
#> ✔ Transform observed time → time indexes
#> ℹ 970 time indexes found
#>
#> ── activeTime = 24 time indexes | adjDist = 3000 meters
#> ✔ Cluster
#> ℹ 16 clusters found (including noise)
#>
#> ── minPts = 4 hot spots | minTime = 3 time indexes
#> ✔ Handle noise
#> ℹ 6 clusters left
#> ℹ noise proportion : 0.935 %
#>
#> ── ignitionCenter = "mean"
#> ✔ Compute ignition points for clusters
#> ℹ average hot spots : 176.7
#> ℹ average duration : 131.9 hours
#>
#> ── Time taken = 0 mins 1 sec for 1070 hot spots
#> ℹ 0.001 secs per hot spot
#>
#> ────────────────────────────────────────────────────────────────────────────────
```

Messages produced by this function tell you some important
information of the clustering results such as the number of discrete
time indexes, the number of clusters and the proportion of noise. If you
would like to silent this function, you could wrap this function with
the `suppressMessages()`

function like the example given
below.

The `hotspot_cluster()`

function returns a
`spotoroo`

object, which is actually a `list`

contains a `data.frame`

called `hotpsots`

, a
`data.frame`

called `ignition`

and a
`list`

called `setting`

.

If you evaluate the `result`

or print it, you will get a
concise description of the clustering results. Here, according to the
output, we know there are \(6\)
clusters in the clustering results.

You can access these two `data.frame`

s in the usual
way.

```
head(result$hotspots, 2)
#> lon lat obsTime timeID membership noise distToIgnition
#> 1 147.46 -37.46000 2020-02-01 05:20:00 809 -1 TRUE 0
#> 2 146.48 -37.93999 2020-01-02 06:30:00 90 -1 TRUE 0
#> distToIgnitionUnit timeFromIgnition timeFromIgnitionUnit
#> 1 m 0 hours h
#> 2 m 0 hours h
head(result$ignition, 2)
#> membership lon lat obsTime timeID obsInCluster
#> 1 1 149.30 -37.77 2019-12-29 13:10:00 1 146
#> 2 2 146.72 -36.84 2020-01-08 01:40:00 229 165
#> clusterTimeLen clusterTimeLenUnit
#> 1 116.1667 hours h
#> 2 148.3333 hours h
```

The `hotspots`

dataset contains information of each hot
spot. Particularly, the `membership`

column is the membership
label column. \(-1\) represents
noise.

The `ignition`

dataset contains information of each
cluster. Similarly, the `membership`

column is the membership
label column. And the `lon`

and `lat`

are the
coordinate information of the ignition points.

If you would like to extract a subset of clusters from the results or
merge the `hotspots`

and `ignition`

dataset, you
could use the function `extract_fire()`

.

You could choose to extract all clusters along with noise points by
setting `cluster = "all"`

and `noise = TRUE`

. This
will merge the `hotspots`

and `ignition`

dataset.

```
# Merge the `hotspots` and `ignition` dataset
merged_result <- extract_fire(result, cluster = "all", noise = TRUE)
```

You could also only extract a subset of clusters without any noise by
providing a vector of membership labels to the argument
`cluster`

and set `noise = FALSE`

. This will merge
the `hotspots`

and `ignitoin`

dataset but
filtering out the noise points and selecting needed clusters.

In principal, the parameters `activeTime`

and
`adjDist`

are determined using the professional knowledge of
fire behaviour, but in practice, we generally don’t know much about
them.

In the rest of the section, we will show one of the methods to choose proper values for these two parameters.

We first set `minPts = 4`

and `minTime = 3`

.
You could set different values for `minPts`

and
`minTime`

if you like.

We then could do a grid search for `activeTime`

and
`adjDist`

in a sensible range. Here, we set
`adjDist`

\(\in\)
`[500,1000,1500,2000,2500,3000,3500,4000]`

and
`activeTime`

\(\in\)
`[6,12,18,24,30,36,42,48]`

. For each pair of
`activeTime`

and `adjDist`

, we record the
proportion of noise as the metric for comparison.

The following code does the calculation. It may takes around 10 minutes to run. You could have a try if you like.

```
# NOT RUN
# NOTICE: MAY TAKE AROUND 10 MINS TO RUN THIS CODE BLOCK
noise_prop <- c()
for (adjDist in seq(500, 4000, 500)) {
for (activeTime in seq(6, 48, 6)) {
result <- suppressMessages(hotspot_cluster(hotspots = hotspots,
lon = "lon",
lat = "lat",
obsTime = "obsTime",
activeTime = activeTime,
adjDist = adjDist,
minPts = 4,
minTime = 3,
ignitionCenter = "mean",
timeUnit = "h",
timeStep = 1))
noise_prop <- c(noise_prop, mean(result$hotspots$noise))
}
}
tab <- expand.grid(activeTime = seq(6, 48, 6),
adjDist = seq(500, 4000, 500))
tab$noise_prop <- noise_prop
```

With the proportion of noise, we could make two line plots to reveal
the relationships between proportion of noise, `adjDist`

and
`activeTime`

.

It works like the scree plot used in principal component analysis. We want to keep clusters separate without introducing too much noise.

In the first plot, most of the significant drops of proportion of
noise are observed when `adjDist`

less than 2500 metres.
Therefore, `adjDist = 2500`

is a reasonable choice.

```
ggplot(tab) +
geom_line(aes(adjDist, noise_prop, color = as.factor(activeTime))) +
ylab("Noise Propotion") +
labs(col = "activeTime") +
theme_minimal() +
scale_x_continuous(breaks = seq(500, 4000, 500))
```

In the second plot, most of the significant drops of proportion of
noise are observed when `activeTime`

less than 24 hours.
Therefore, `activeTime = 24`

is a reasonable choice.

The package provides some useful functions to explore the clustering results.

You could make a brief summary of the clustering results.

```
summary_spotoroo(result)
#>
#> ──────────────────────────────── SPOTOROO 0.1.4 ────────────────────────────────
#>
#> ── Calling Core Function : `summary_spotoroo()` ──
#>
#> CLUSTERS: ALL
#> OBSERVATIONS: 1070
#> FROM: 2019-12-29 13:10:00
#> TO: 2020-02-07 22:50:00
#>
#> ── Clusters
#> ℹ Number of clusters: 6
#>
#> Observations in cluster
#> Min. 1st Qu. Mean 3rd Qu. Max.
#> 111.0 131.0 176.7 233.2 256.0
#> Duration of cluster (hours)
#> Min. 1st Qu. Mean 3rd Qu. Max.
#> 111.2 118.2 131.9 146.1 148.3
#>
#> ── Hot spots (excluding noise)
#> ℹ Number of hot spots: 1060
#>
#> Distance to ignition points (m)
#> Min. 1st Qu. Mean 3rd Qu. Max.
#> 0.0 2840.3 5058.2 6981.6 13452.7
#> Time from ignition (hours)
#> Min. 1st Qu. Mean 3rd Qu. Max.
#> 0.0 25.2 62.5 98.2 148.3
#>
#> ── Noise
#> ℹ Number of noise points: 10 (0.93 %)
#>
#> ────────────────────────────────────────────────────────────────────────────────
```

Or make a brief summary of a subset of clusters by providing a vector
of membership labels to the `cluster`

argument.

You could produce a plot of the clustering results. There are three types of plots, which are “def” (default), “mov” (fire movement) and “timeline” (timeline).

The fire movement is calculated from the `get_fire_mov()`

function.

If you have a background `ggplot`

object, you can let the
function plots onto it.

```
if (requireNamespace("sf", quietly = TRUE)) {
plot_spotoroo(result, type = "mov", bg = plot_vic_map(), step = 6)
}
```

More details about the usage of this function can be found by using
the `help(plot_spotoroo)`

function.