[R-sig-Geo] Point Interpolation and Analysis

Tue Mar 15 16:18:59 CET 2022

Hello sir

Thank you for your detailed email and yes this clarifies alot of things. I
hope to learn by building slowly but accurately.

I will try to write new code following the latest information provide by R
&Cran documentation and if I get stuck would request for assistance.

Thanks for the comments and hope to improve from there.

Yours faithfully
Sownalc

On Tue, Mar 15, 2022, 23:14 Roger Bivand <Roger.Bivand using nhh.no> wrote:

> On Tue, 15 Mar 2022, sownal chand wrote:
>
> > Hello Sir/madam,
> >
> > I have been trying to interpolate this data and make an animation by
> > following the codes given by Devan using Dr Timo's Dialect analysis in
> > Germany,
> >
> https://github.com/devanmcg/IntroRangeR/blob/master/12_RasGIS/Interpolation.R
> > .
> > Is there a way in which these code could be used for Mapping in
> > Pacific Region. Below is the edited version with errors
>
> Please never use any code just "found" on the internet. This code seems to
> be part of a course written partly in 2018, here from 2020. Lots of
> packages have changed in the years that have passed, especially with
> regard to coordinate reference systems and their representation. Never use
> a package (pacman) that does things you do not control (here it will
> install lots of tidyverse packages that are not needed for the essential
> sf package to run).
>
> Do not use pipes %>% unless you are sure that all of the steps are
> correct; take everything in the smallest possible steps and check the
> intermediate output after each step. Build from simple steps, steps that
> you know how to explain to yourself and (if you also teach) to students or
> to your research collaborators. In that way you base your work not on
> possibly outdated code found somewhere, but on your own understanding. Try
> to avoid overwriting objects - if you use pipes and overwrite the input
> object, it is really hard to find out where things went wrong.
>
> One remark - if the code you are looking at is tested often, like the
> examples on the help pages of functions, methods and datasets in CRAN
> packages, you are on safer ground. If the code has never been re-run even
> on its own data, new versions of R and the packages used may change the
> results - this applies in particular to tidyverse packages, which are
> often updated.
>
> For coordinate reference systems, avoid all beginning with "+proj= ...",
> because these Proj.4 strings are no longer reliable if you need accuracy
> better than about 120m. You will see that the course materials you refer
> to have not been updated to reflect the need (in most cases) to use
> coordinate reference systems with known authority. Your use of "EPSG:3832"
> uses the authority of EPSG to assert the specification of the CRS; it also
> means that the datum transformation (say from or to the common
> geographical CRS - WGS84 - EPSG:4326) giving 2m accuracy is also asserted
> by an authority. Using "+proj= ..." leaves the user unsure about the
> accuracy of any transformation (typically known as ballpark accuracy).
>
> For visualisation, please avoid using ggplot2 unless you use this package
> often (daily). For thematic mapping, tmap and mapsf are to be preferred,
> because they are written for making maps. Do not simplify/generalise
> coastlines unless you really need to do so. You can use tmap and mapview
> to view thematic maps interactively - as you zoom in, you see artefacts
> created by line generalization.
>
> My guess is that your aim is to visualise 13 meteorological station annual
> average temperatures 1965-2018 after interpolation by year across the
> study area. I'm unsure whether you should be interpolating between islands
> across the ocean, and 13 stations (I assume that annual temperatures are
> transformed to sea level) is a very small number for interpolation. I'm
> also uncertain whether the input data are reliable (the values for
> Koronivia and Labasa seem to be identical for all years). If the data set
> is just to learn from, OK, but probably drawing conclusions from any
> results is not very secure.
>
> Hope this clarifies,
>
> Roger
>
> >
> > #packages
> > if (!require("pacman")) install.packages("pacman")
> > pacman::p_load(tidyverse, sf, kknn)
> > #
> > # Create a shorthand reference to Fiji Islands CRS
> > crs_etrs = "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84
> +units=m
> > +no_defs"
> > #
> > # Fiji geo data
> > # download
> > #Draft map of Fiji
> > world <- rnaturalearth::ne_countries(scale = "Large", returnclass = "sf")
> >
> > Fiji <- ggplot(data=world) +
> >  geom_sf() +
> >  coord_sf(
> >    crs = 3832, # https://epsg.io/3832
> >   xlim = c(2984265.06, 3539584.72), # limits are taken from projected
> > bounds
> >   ylim = c(-2224162.41, -1811688.88)  # of EPSG:3832
> >  )+ theme_bw()
> >
> > #Extract Fiji Boundary data
> > Fiji <- FJI %>%
> >  filter(iso_a3 == "FJI")
> > plot(Fiji)
> >
> >
> >
> > Fiji <- Fiji %>%
> >  group_by(iso_a3) %>%
> >  summarize()
> > # simplify
> > Fiji %>%
> >  st_simplify(preserveTopology = T, dTolerance = 0.01)
> >
> > #load point data from the data file stored
> > data_53YearTemp <- read.csv("C://Users/Sownal/Documents/53YearsTemp.csv")
> > View(data_53YearTemp)
> >
> > data$long <- as.numeric(data$Longitude)
> > data$lat <- as.numeric(data$Latitude)
> >
> > #remove NA valuues in the spatial Data Frame
> > data_53YearTemp <- na.omit(data_53YearTemp)
> >
> > #convert 53YearTemp to a Dataframe
> > #spatial points dataframe
> > data.sp1 <- SpatialPointsDataFrame(data_53YearTemp[,c(3,4)],
> > data_53YearTemp[,-c(3,4)])
> > View(data.sp1)
> > #point data to sf
> > data.sp1 %>%
> >  st_as_sf(coords = c("lng", "lat"),
> >           crs = "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84
> > +units=m +no_defs ")
> >
> >
> > #reproject data to the specificed crs or epsg country code
> > # CRS: ETRS
> > etrs <- "+proj=merc +lon_0=150 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m
> > +no_defs "
> > Fiji %>%
> >  st_transform(etrs)
> > data.sp_train %>%
> >  st_transform(etrs)
> >
> > #remove NA valuues in the spatial Data Frame
> > data_53YearTemp <- na.omit(data_53YearTemp)
> >
> > #cliping dataframe to buffer Fiji
> > Fiji_buffered <- Fiji %>%
> >  st_buffer(dist = 10000) # in meters, 10km buffer
> > data.sp1 <- data.sp1[Fiji_buffered, ]
> >
> > # specify grid width in pixels
> > width_in_pixels <- 300
> > # dx is the width of a grid cell in meters
> > dx <- ceiling( (st_bbox(Fiji_buffered)["xmax"] -
> >                  st_bbox(Fiji_buffered)["xmin"]) / width_in_pixels)
> > # dy is the height of a grid cell in meters
> > # because we use quadratic grid cells, dx == dy
> > dy <- dx
> > # calculate the height in pixels of the resulting grid
> > height_in_pixels <- floor( (st_bbox(Fiji_buffered)["ymax"] -
> >                              st_bbox(Fiji_buffered)["ymin"]) / dy)
> >
> > grid <- st_make_grid(Fiji_buffered,
> >                     cellsize = dx,
> >                     n = c(width_in_pixels, height_in_pixels),
> >                     what = "centers"
> > )
> >
> > rm(dx, dy, height_in_pixels, Fiji_buffered)
> >
> > plot(grid)
> > # View grid
> > ggplot() +
> >  geom_sf(data=grid) +
> >  geom_sf(data=Fiji, fill=NA, color="blue", lwd=2)
> > # Prepare data for interpolation
> > #
> > # Create tibble of the TempData sf object
> > TEmp_input <- data_53YearTemp %>%
> >  tibble::as.tibble(data_53YearTemp = .$pronunciation_id,
> >         lon = st_coordinates(.)[, 4],
> >         lat = st_coordinates(.)[, 3]) %>%
> >  select(Year, lon, lat)
> > #training data set for the interpolations
> > data.sp_train <- SpatialPointsDataFrame(data_53YearTemp[,c(3,4)],
> > data_53YearTemp[,-c(3,4)])
> > View(data.sp_train)
> >
> >
> >
> > # config
> > k <- 1000 # "k" for k-nearest-neighbour-interpolation
> >
> > # specify function which is executed for each tile of the grid
> > computeGrid <- function(grid, data.sp1, knn) {
> >  # create empty result data frame
> >  Temp_result <- data.frame(Temp = as.factor(NA),
> >                                lon = st_coordinates(grid)[, 1],
> >                                lat = st_coordinates(grid)[, 2])
> >  # run KKNN
> >  Temp_kknn <- kknn::kknn(data.sp1_train ~ .,
> >                              train = data.sp1_train,
> >                              test = Temp_result,
> >                              kernel = "gaussian",
> >                              k = knn)
> >  # bring back to result data frame
> >  # only retain the probability of the dominant dialect at that grid cell
> >  Temp_result %>%
> >    # extract the interpolated dialect at each grid cell with the
> >    # kknn::fitted function
> >    mutate(Temp = fitted(Temp_kknn),
> >           # only retain the probability of the interpolated dialect,
> >           # discard the other 7
> >           prob = apply(Temp_kknn$prob,
> >                        1,
> >                        function(x) max(x)))
> >  return(Temp_result)
> > }
> >
> > # specify the number of cores below (adapt if you have fewer cores or
> > # want to reserve some computation power to other stuff)
> > #registerDoParallel(cores = 5)
> >
> > # specify number of batches and resulting size of each batch (in grid
> cells)
> > no_batches <- 60 # increase this number if you run into memory problems
> >
> > batch_size <- ceiling(length(grid) / no_batches)
> >
> > # PARALLEL COMPUTATION
> > start.time <- Sys.time()
> > Temp_result <- foreach(
> >  batch_no = 1:no_batches,
> >  # after each grid section is computed, rbind the resulting df into
> >  # one big dialects_result df
> >  .combine = rbind,
> >  # the order of grid computation doesn't matter: this speeds it up even
> > more
> >  .inorder = FALSE) %dopar% {
> >    # compute indices for each grid section, depending on batch_size and
> > current
> >    # batch
> >    start_idx <- (batch_no - 1) * batch_size + 1
> >    end_idx <- batch_no * batch_size
> >    # specify which section of the grid to interpolate, using regular
> >    # subsetting
> >    grid_batch <- grid[start_idx:ifelse(end_idx > length(grid),
> >                                        length(grid),
> >                                        end_idx)]
> >    # apply the actual computation to each grid section
> >    df <- computeGrid(grid_batch, data.sp1_train, k)
> >  }
> > end.time <- Sys.time()
> > time.taken <- end.time - start.time
> > time.taken
> >
> >
> > # convert resulting df back to sf object, but do not remove raw geometry
> > cols
> > Temp_raster <- st_as_sf(data.sp_train,
> >                            coords = c("lon", "lat"),
> >                            crs = etrs,
> >                            remove = F)
> > # clip raster to Fiji again
> > Temp_raster <- Temp_raster[Fiji, ]
> > rm(Temp_result, k)
> >
> > Sample data is also provided. Any assistance would be greatly
> appreciated.
> >
> > Yours sincerely
> > sownalc
> >
>
> --
> Roger Bivand
> Emeritus Professor
> Department of Economics, Norwegian School of Economics,
> Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
> e-mail: Roger.Bivand using nhh.no
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

	[[alternative HTML version deleted]]