[R-sig-Geo] Point Interpolation and Analysis
Roger Bivand
Roger@B|v@nd @end|ng |rom nhh@no
Tue Mar 15 12:14:56 CET 2022
On Tue, 15 Mar 2022, sownal chand wrote:
> Hello Sir/madam,
>
> I have been trying to interpolate this data and make an animation by
> following the codes given by Devan using Dr Timo's Dialect analysis in
> Germany,
> https://github.com/devanmcg/IntroRangeR/blob/master/12_RasGIS/Interpolation.R
> .
> Is there a way in which these code could be used for Mapping in
> Pacific Region. Below is the edited version with errors
Please never use any code just "found" on the internet. This code seems to
be part of a course written partly in 2018, here from 2020. Lots of
packages have changed in the years that have passed, especially with
regard to coordinate reference systems and their representation. Never use
a package (pacman) that does things you do not control (here it will
install lots of tidyverse packages that are not needed for the essential
sf package to run).
Do not use pipes %>% unless you are sure that all of the steps are
correct; take everything in the smallest possible steps and check the
intermediate output after each step. Build from simple steps, steps that
you know how to explain to yourself and (if you also teach) to students or
to your research collaborators. In that way you base your work not on
possibly outdated code found somewhere, but on your own understanding. Try
to avoid overwriting objects - if you use pipes and overwrite the input
object, it is really hard to find out where things went wrong.
One remark - if the code you are looking at is tested often, like the
examples on the help pages of functions, methods and datasets in CRAN
packages, you are on safer ground. If the code has never been re-run even
on its own data, new versions of R and the packages used may change the
results - this applies in particular to tidyverse packages, which are
often updated.
For coordinate reference systems, avoid all beginning with "+proj= ...",
because these Proj.4 strings are no longer reliable if you need accuracy
better than about 120m. You will see that the course materials you refer
to have not been updated to reflect the need (in most cases) to use
coordinate reference systems with known authority. Your use of "EPSG:3832"
uses the authority of EPSG to assert the specification of the CRS; it also
means that the datum transformation (say from or to the common
geographical CRS - WGS84 - EPSG:4326) giving 2m accuracy is also asserted
by an authority. Using "+proj= ..." leaves the user unsure about the
accuracy of any transformation (typically known as ballpark accuracy).
For visualisation, please avoid using ggplot2 unless you use this package
often (daily). For thematic mapping, tmap and mapsf are to be preferred,
because they are written for making maps. Do not simplify/generalise
coastlines unless you really need to do so. You can use tmap and mapview
to view thematic maps interactively - as you zoom in, you see artefacts
created by line generalization.
My guess is that your aim is to visualise 13 meteorological station annual
average temperatures 1965-2018 after interpolation by year across the
study area. I'm unsure whether you should be interpolating between islands
across the ocean, and 13 stations (I assume that annual temperatures are
transformed to sea level) is a very small number for interpolation. I'm
also uncertain whether the input data are reliable (the values for
Koronivia and Labasa seem to be identical for all years). If the data set
is just to learn from, OK, but probably drawing conclusions from any
results is not very secure.
Hope this clarifies,
Roger
>
> #packages
> if (!require("pacman")) install.packages("pacman")
> pacman::p_load(tidyverse, sf, kknn)
> #
> # Create a shorthand reference to Fiji Islands CRS
> crs_etrs = "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m
> +no_defs"
> #
> # Fiji geo data
> # download
> #Draft map of Fiji
> world <- rnaturalearth::ne_countries(scale = "Large", returnclass = "sf")
>
> Fiji <- ggplot(data=world) +
> geom_sf() +
> coord_sf(
> crs = 3832, # https://epsg.io/3832
> xlim = c(2984265.06, 3539584.72), # limits are taken from projected
> bounds
> ylim = c(-2224162.41, -1811688.88) # of EPSG:3832
> )+ theme_bw()
>
> #Extract Fiji Boundary data
> Fiji <- FJI %>%
> filter(iso_a3 == "FJI")
> plot(Fiji)
>
>
>
> Fiji <- Fiji %>%
> group_by(iso_a3) %>%
> summarize()
> # simplify
> Fiji %>%
> st_simplify(preserveTopology = T, dTolerance = 0.01)
>
> #load point data from the data file stored
> data_53YearTemp <- read.csv("C://Users/Sownal/Documents/53YearsTemp.csv")
> View(data_53YearTemp)
>
> data$long <- as.numeric(data$Longitude)
> data$lat <- as.numeric(data$Latitude)
>
> #remove NA valuues in the spatial Data Frame
> data_53YearTemp <- na.omit(data_53YearTemp)
>
> #convert 53YearTemp to a Dataframe
> #spatial points dataframe
> data.sp1 <- SpatialPointsDataFrame(data_53YearTemp[,c(3,4)],
> data_53YearTemp[,-c(3,4)])
> View(data.sp1)
> #point data to sf
> data.sp1 %>%
> st_as_sf(coords = c("lng", "lat"),
> crs = "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84
> +units=m +no_defs ")
>
>
> #reproject data to the specificed crs or epsg country code
> # CRS: ETRS
> etrs <- "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m
> +no_defs "
> Fiji %>%
> st_transform(etrs)
> data.sp_train %>%
> st_transform(etrs)
>
> #remove NA valuues in the spatial Data Frame
> data_53YearTemp <- na.omit(data_53YearTemp)
>
> #cliping dataframe to buffer Fiji
> Fiji_buffered <- Fiji %>%
> st_buffer(dist = 10000) # in meters, 10km buffer
> data.sp1 <- data.sp1[Fiji_buffered, ]
>
> # specify grid width in pixels
> width_in_pixels <- 300
> # dx is the width of a grid cell in meters
> dx <- ceiling( (st_bbox(Fiji_buffered)["xmax"] -
> st_bbox(Fiji_buffered)["xmin"]) / width_in_pixels)
> # dy is the height of a grid cell in meters
> # because we use quadratic grid cells, dx == dy
> dy <- dx
> # calculate the height in pixels of the resulting grid
> height_in_pixels <- floor( (st_bbox(Fiji_buffered)["ymax"] -
> st_bbox(Fiji_buffered)["ymin"]) / dy)
>
> grid <- st_make_grid(Fiji_buffered,
> cellsize = dx,
> n = c(width_in_pixels, height_in_pixels),
> what = "centers"
> )
>
> rm(dx, dy, height_in_pixels, Fiji_buffered)
>
> plot(grid)
> # View grid
> ggplot() +
> geom_sf(data=grid) +
> geom_sf(data=Fiji, fill=NA, color="blue", lwd=2)
> # Prepare data for interpolation
> #
> # Create tibble of the TempData sf object
> TEmp_input <- data_53YearTemp %>%
> tibble::as.tibble(data_53YearTemp = .$pronunciation_id,
> lon = st_coordinates(.)[, 4],
> lat = st_coordinates(.)[, 3]) %>%
> select(Year, lon, lat)
> #training data set for the interpolations
> data.sp_train <- SpatialPointsDataFrame(data_53YearTemp[,c(3,4)],
> data_53YearTemp[,-c(3,4)])
> View(data.sp_train)
>
>
>
> # config
> k <- 1000 # "k" for k-nearest-neighbour-interpolation
>
> # specify function which is executed for each tile of the grid
> computeGrid <- function(grid, data.sp1, knn) {
> # create empty result data frame
> Temp_result <- data.frame(Temp = as.factor(NA),
> lon = st_coordinates(grid)[, 1],
> lat = st_coordinates(grid)[, 2])
> # run KKNN
> Temp_kknn <- kknn::kknn(data.sp1_train ~ .,
> train = data.sp1_train,
> test = Temp_result,
> kernel = "gaussian",
> k = knn)
> # bring back to result data frame
> # only retain the probability of the dominant dialect at that grid cell
> Temp_result %>%
> # extract the interpolated dialect at each grid cell with the
> # kknn::fitted function
> mutate(Temp = fitted(Temp_kknn),
> # only retain the probability of the interpolated dialect,
> # discard the other 7
> prob = apply(Temp_kknn$prob,
> 1,
> function(x) max(x)))
> return(Temp_result)
> }
>
> # specify the number of cores below (adapt if you have fewer cores or
> # want to reserve some computation power to other stuff)
> #registerDoParallel(cores = 5)
>
> # specify number of batches and resulting size of each batch (in grid cells)
> no_batches <- 60 # increase this number if you run into memory problems
>
> batch_size <- ceiling(length(grid) / no_batches)
>
> # PARALLEL COMPUTATION
> start.time <- Sys.time()
> Temp_result <- foreach(
> batch_no = 1:no_batches,
> # after each grid section is computed, rbind the resulting df into
> # one big dialects_result df
> .combine = rbind,
> # the order of grid computation doesn't matter: this speeds it up even
> more
> .inorder = FALSE) %dopar% {
> # compute indices for each grid section, depending on batch_size and
> current
> # batch
> start_idx <- (batch_no - 1) * batch_size + 1
> end_idx <- batch_no * batch_size
> # specify which section of the grid to interpolate, using regular
> # subsetting
> grid_batch <- grid[start_idx:ifelse(end_idx > length(grid),
> length(grid),
> end_idx)]
> # apply the actual computation to each grid section
> df <- computeGrid(grid_batch, data.sp1_train, k)
> }
> end.time <- Sys.time()
> time.taken <- end.time - start.time
> time.taken
>
>
> # convert resulting df back to sf object, but do not remove raw geometry
> cols
> Temp_raster <- st_as_sf(data.sp_train,
> coords = c("lon", "lat"),
> crs = etrs,
> remove = F)
> # clip raster to Fiji again
> Temp_raster <- Temp_raster[Fiji, ]
> rm(Temp_result, k)
>
> Sample data is also provided. Any assistance would be greatly appreciated.
>
> Yours sincerely
> sownalc
>
--
Roger Bivand
Emeritus Professor
Department of Economics, Norwegian School of Economics,
Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
More information about the R-sig-Geo
mailing list