[R-sig-Geo] Point Interpolation and Analysis

Roger Bivand Roger@B|v@nd @end|ng |rom nhh@no
Tue Mar 15 12:14:56 CET 2022


On Tue, 15 Mar 2022, sownal chand wrote:

> Hello Sir/madam,
>
> I have been trying to interpolate this data and make an animation by
> following the codes given by Devan using Dr Timo's Dialect analysis in
> Germany,
> https://github.com/devanmcg/IntroRangeR/blob/master/12_RasGIS/Interpolation.R
> .
> Is there a way in which these code could be used for Mapping in
> Pacific Region. Below is the edited version with errors

Please never use any code just "found" on the internet. This code seems to 
be part of a course written partly in 2018, here from 2020. Lots of 
packages have changed in the years that have passed, especially with 
regard to coordinate reference systems and their representation. Never use 
a package (pacman) that does things you do not control (here it will 
install lots of tidyverse packages that are not needed for the essential 
sf package to run).

Do not use pipes %>% unless you are sure that all of the steps are 
correct; take everything in the smallest possible steps and check the 
intermediate output after each step. Build from simple steps, steps that 
you know how to explain to yourself and (if you also teach) to students or 
to your research collaborators. In that way you base your work not on 
possibly outdated code found somewhere, but on your own understanding. Try 
to avoid overwriting objects - if you use pipes and overwrite the input 
object, it is really hard to find out where things went wrong.

One remark - if the code you are looking at is tested often, like the 
examples on the help pages of functions, methods and datasets in CRAN 
packages, you are on safer ground. If the code has never been re-run even 
on its own data, new versions of R and the packages used may change the 
results - this applies in particular to tidyverse packages, which are 
often updated.

For coordinate reference systems, avoid all beginning with "+proj= ...", 
because these Proj.4 strings are no longer reliable if you need accuracy 
better than about 120m. You will see that the course materials you refer 
to have not been updated to reflect the need (in most cases) to use 
coordinate reference systems with known authority. Your use of "EPSG:3832" 
uses the authority of EPSG to assert the specification of the CRS; it also 
means that the datum transformation (say from or to the common 
geographical CRS - WGS84 - EPSG:4326) giving 2m accuracy is also asserted 
by an authority. Using "+proj= ..." leaves the user unsure about the 
accuracy of any transformation (typically known as ballpark accuracy).

For visualisation, please avoid using ggplot2 unless you use this package 
often (daily). For thematic mapping, tmap and mapsf are to be preferred, 
because they are written for making maps. Do not simplify/generalise 
coastlines unless you really need to do so. You can use tmap and mapview 
to view thematic maps interactively - as you zoom in, you see artefacts 
created by line generalization.

My guess is that your aim is to visualise 13 meteorological station annual 
average temperatures 1965-2018 after interpolation by year across the 
study area. I'm unsure whether you should be interpolating between islands 
across the ocean, and 13 stations (I assume that annual temperatures are 
transformed to sea level) is a very small number for interpolation. I'm 
also uncertain whether the input data are reliable (the values for 
Koronivia and Labasa seem to be identical for all years). If the data set 
is just to learn from, OK, but probably drawing conclusions from any 
results is not very secure.

Hope this clarifies,

Roger

>
> #packages
> if (!require("pacman")) install.packages("pacman")
> pacman::p_load(tidyverse, sf, kknn)
> #
> # Create a shorthand reference to Fiji Islands CRS
> crs_etrs = "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m
> +no_defs"
> #
> # Fiji geo data
> # download
> #Draft map of Fiji
> world <- rnaturalearth::ne_countries(scale = "Large", returnclass = "sf")
>
> Fiji <- ggplot(data=world) +
>  geom_sf() +
>  coord_sf(
>    crs = 3832, # https://epsg.io/3832
>   xlim = c(2984265.06, 3539584.72), # limits are taken from projected
> bounds
>   ylim = c(-2224162.41, -1811688.88)  # of EPSG:3832
>  )+ theme_bw()
>
> #Extract Fiji Boundary data
> Fiji <- FJI %>%
>  filter(iso_a3 == "FJI")
> plot(Fiji)
>
>
>
> Fiji <- Fiji %>%
>  group_by(iso_a3) %>%
>  summarize()
> # simplify
> Fiji %>%
>  st_simplify(preserveTopology = T, dTolerance = 0.01)
>
> #load point data from the data file stored
> data_53YearTemp <- read.csv("C://Users/Sownal/Documents/53YearsTemp.csv")
> View(data_53YearTemp)
>
> data$long <- as.numeric(data$Longitude)
> data$lat <- as.numeric(data$Latitude)
>
> #remove NA valuues in the spatial Data Frame
> data_53YearTemp <- na.omit(data_53YearTemp)
>
> #convert 53YearTemp to a Dataframe
> #spatial points dataframe
> data.sp1 <- SpatialPointsDataFrame(data_53YearTemp[,c(3,4)],
> data_53YearTemp[,-c(3,4)])
> View(data.sp1)
> #point data to sf
> data.sp1 %>%
>  st_as_sf(coords = c("lng", "lat"),
>           crs = "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84
> +units=m +no_defs ")
>
>
> #reproject data to the specificed crs or epsg country code
> # CRS: ETRS
> etrs <- "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m
> +no_defs "
> Fiji %>%
>  st_transform(etrs)
> data.sp_train %>%
>  st_transform(etrs)
>
> #remove NA valuues in the spatial Data Frame
> data_53YearTemp <- na.omit(data_53YearTemp)
>
> #cliping dataframe to buffer Fiji
> Fiji_buffered <- Fiji %>%
>  st_buffer(dist = 10000) # in meters, 10km buffer
> data.sp1 <- data.sp1[Fiji_buffered, ]
>
> # specify grid width in pixels
> width_in_pixels <- 300
> # dx is the width of a grid cell in meters
> dx <- ceiling( (st_bbox(Fiji_buffered)["xmax"] -
>                  st_bbox(Fiji_buffered)["xmin"]) / width_in_pixels)
> # dy is the height of a grid cell in meters
> # because we use quadratic grid cells, dx == dy
> dy <- dx
> # calculate the height in pixels of the resulting grid
> height_in_pixels <- floor( (st_bbox(Fiji_buffered)["ymax"] -
>                              st_bbox(Fiji_buffered)["ymin"]) / dy)
>
> grid <- st_make_grid(Fiji_buffered,
>                     cellsize = dx,
>                     n = c(width_in_pixels, height_in_pixels),
>                     what = "centers"
> )
>
> rm(dx, dy, height_in_pixels, Fiji_buffered)
>
> plot(grid)
> # View grid
> ggplot() +
>  geom_sf(data=grid) +
>  geom_sf(data=Fiji, fill=NA, color="blue", lwd=2)
> # Prepare data for interpolation
> #
> # Create tibble of the TempData sf object
> TEmp_input <- data_53YearTemp %>%
>  tibble::as.tibble(data_53YearTemp = .$pronunciation_id,
>         lon = st_coordinates(.)[, 4],
>         lat = st_coordinates(.)[, 3]) %>%
>  select(Year, lon, lat)
> #training data set for the interpolations
> data.sp_train <- SpatialPointsDataFrame(data_53YearTemp[,c(3,4)],
> data_53YearTemp[,-c(3,4)])
> View(data.sp_train)
>
>
>
> # config
> k <- 1000 # "k" for k-nearest-neighbour-interpolation
>
> # specify function which is executed for each tile of the grid
> computeGrid <- function(grid, data.sp1, knn) {
>  # create empty result data frame
>  Temp_result <- data.frame(Temp = as.factor(NA),
>                                lon = st_coordinates(grid)[, 1],
>                                lat = st_coordinates(grid)[, 2])
>  # run KKNN
>  Temp_kknn <- kknn::kknn(data.sp1_train ~ .,
>                              train = data.sp1_train,
>                              test = Temp_result,
>                              kernel = "gaussian",
>                              k = knn)
>  # bring back to result data frame
>  # only retain the probability of the dominant dialect at that grid cell
>  Temp_result %>%
>    # extract the interpolated dialect at each grid cell with the
>    # kknn::fitted function
>    mutate(Temp = fitted(Temp_kknn),
>           # only retain the probability of the interpolated dialect,
>           # discard the other 7
>           prob = apply(Temp_kknn$prob,
>                        1,
>                        function(x) max(x)))
>  return(Temp_result)
> }
>
> # specify the number of cores below (adapt if you have fewer cores or
> # want to reserve some computation power to other stuff)
> #registerDoParallel(cores = 5)
>
> # specify number of batches and resulting size of each batch (in grid cells)
> no_batches <- 60 # increase this number if you run into memory problems
>
> batch_size <- ceiling(length(grid) / no_batches)
>
> # PARALLEL COMPUTATION
> start.time <- Sys.time()
> Temp_result <- foreach(
>  batch_no = 1:no_batches,
>  # after each grid section is computed, rbind the resulting df into
>  # one big dialects_result df
>  .combine = rbind,
>  # the order of grid computation doesn't matter: this speeds it up even
> more
>  .inorder = FALSE) %dopar% {
>    # compute indices for each grid section, depending on batch_size and
> current
>    # batch
>    start_idx <- (batch_no - 1) * batch_size + 1
>    end_idx <- batch_no * batch_size
>    # specify which section of the grid to interpolate, using regular
>    # subsetting
>    grid_batch <- grid[start_idx:ifelse(end_idx > length(grid),
>                                        length(grid),
>                                        end_idx)]
>    # apply the actual computation to each grid section
>    df <- computeGrid(grid_batch, data.sp1_train, k)
>  }
> end.time <- Sys.time()
> time.taken <- end.time - start.time
> time.taken
>
>
> # convert resulting df back to sf object, but do not remove raw geometry
> cols
> Temp_raster <- st_as_sf(data.sp_train,
>                            coords = c("lon", "lat"),
>                            crs = etrs,
>                            remove = F)
> # clip raster to Fiji again
> Temp_raster <- Temp_raster[Fiji, ]
> rm(Temp_result, k)
>
> Sample data is also provided. Any assistance would be greatly appreciated.
>
> Yours sincerely
> sownalc
>

-- 
Roger Bivand
Emeritus Professor
Department of Economics, Norwegian School of Economics,
Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list