--- title: "Fitting many curves using rTPC" author: "Daniel Padfield" output: rmarkdown::html_vignette date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{Fitting many curves using rTPC} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- #### A brief example of how to fit many models to multiple TPCs using rTPC, nls.multstart and the tidyverse. *** ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", #tidy.opts=list(width.cutoff=60), #tidy=TRUE, fig.align = 'center' ) ``` ```{r runtimer, include=FALSE} runstart <- lubridate::now() ``` ## Things to consider - Go through all the other **Things to consider** sections in the other vignettes. - Think carefully about how to do model selection on the whole dataset. - If you care about specific parameters (e.g. optimum temperature or activation energy) then you might want to pick a single "best" model for the whole dataset. - If you care about the overall fit and its shape, then you might want to do pick a "best" model on each individual curve. - You should read resources about model selection to inform your decisions. *** ```{r setup, message=FALSE} # load packages library(rTPC) library(nls.multstart) library(broom) library(tidyverse) ``` In the final part of the general pipeline, we demonstrate how multiple models can be fitted to multiple TPCs. Instead of picking all `r length(rTPC::get_model_names())` model formulations to demonstrate this approach, we will use only 2 models in this example: **gaussian_1987()** and **sharpeschoolhigh_1981()**. We can demonstrate the fitting of multiple curves by modelling all 60 TPCs from the example dataset of **rTPC** curve from the example dataset **rTPC**. These TPCs are of respiration and photosynthesis of the aquatic algae, *Chlorella vulgaris*. These algae differed in their growth temperature, `growth_temp`, and how long they had been grown at that temperature, `process`, either for ~100 or ~10 generations. Using a similar approach to `vignette('fit_many_models')`, models can be fitted to each curve using list columns and **purrr::map()** to fit and store multiple models in a data frame. When fitting lots of models at once, it is useful to know the progress the code as it may take a long time to run.The R package **purrr** now has progress bars built in; you can add a progress bar to the **map()** function using the argument *.progress*. To make this work, we need to only do a single **map()** call, so we will write a function that fits both models that we will then pass to **purrr**. This is how we _strongly_ recommend fitting multiple models using **map()**. ```{r fit_multiple_models, warning=FALSE, message=FALSE} # load in data data("chlorella_tpc") d <- chlorella_tpc # write function to fit both models fit_TPCs <- function(d, ...) { # get start values and fit gaussian start_vals <- rTPC::get_start_vals( d$temp, d$rate, model_name = 'gaussian_1987' ) mod_gaussian <- nls.multstart::nls_multstart( rate ~ rTPC::gaussian_1987(temp = temp, rmax, topt, a), data = d, iter = c(4, 4, 4), start_lower = start_vals - 10, start_upper = start_vals + 10, lower = rTPC::get_lower_lims(d$temp, d$rate, model_name = 'gaussian_1987'), upper = rTPC::get_upper_lims(d$temp, d$rate, model_name = 'gaussian_1987'), supp_errors = 'Y', convergence_count = FALSE, ... ) # get start values and fit sharpe schoolfield start_vals <- rTPC::get_start_vals( d$temp, d$rate, model_name = 'sharpeschoolhigh_1981' ) # fit model mod_ss <- nls.multstart::nls_multstart( rate ~ rTPC::sharpeschoolhigh_1981(temp = temp, r_tref, e, eh, th, tref = 20), data = d, iter = c(3, 3, 3, 3), start_lower = start_vals - 10, start_upper = start_vals + 10, lower = rTPC::get_lower_lims( d$temp, d$rate, model_name = 'sharpeschoolhigh_1981' ), upper = rTPC::get_upper_lims( d$temp, d$rate, model_name = 'sharpeschoolhigh_1981' ), supp_errors = 'Y', convergence_count = FALSE, ... ) return(tibble::tibble( gaussian = list(mod_gaussian), sharpeschoolhigh = list(mod_ss) )) } # fit two chosen model formulation in rTPC d_fits <- nest(d, data = c(temp, rate)) %>% mutate( mods = map( data, ~ fit_TPCs(d = .x), .progress = TRUE ) ) %>% unnest(mods) ``` ```{r, echo=FALSE} cli::cli_text(cli::col_green(strrep("█", 30)), " 100% | ETA: 0s") ``` Like previous vignettes, the predictions of each model can be estimated using **broom::augment()**. To do this, we first create a new list column containing high resolution temperature values by taking the `min` and `max` of each curve. Next we stack the models and finally we get the new predictions using the **map2()**, which allows us to use both `fit` and `new_data` list columns. After unnesting the `preds` column, we are then left with high resolution predictions for each curve. As this code covers a lot of steps, each line of the code is commented. ```{r preds} # create new list column of for high resolution data d_preds <- mutate(d_fits, new_data = map(data, ~tibble(temp = seq(min(.x$temp), max(.x$temp), length.out = 100)))) %>% # get rid of original data column select(., -data) %>% # stack models into a single column, with an id column for model_name pivot_longer(., names_to = 'model_name', values_to = 'fit', c(gaussian,sharpeschoolhigh)) %>% # create new list column containing the predictions # this uses both fit and new_data list columns mutate(preds = map2(fit, new_data, ~augment(.x, newdata = .y))) %>% # select only the columns we want to keep select(curve_id, growth_temp, process, flux, model_name, preds) %>% # unlist the preds list column unnest(preds) glimpse(d_preds) ``` We can then plot the predictions of each curve using **ggplot2**. ```{r plot, fig.height = 12, fig.width=8} # plot ggplot(d_preds) + geom_line(aes(temp, .fitted, col = model_name)) + geom_point(aes(temp, rate), d) + facet_wrap(~curve_id, scales = 'free_y', ncol = 6) + theme_bw() + theme(legend.position = 'none') + scale_color_brewer(type = 'qual', palette = 2) + labs(x = 'Temperature (ºC)', y = 'Metabolic rate', title = 'All fitted thermal performance curves', subtitle = 'gaussian in green; sharpeschoolhigh in orange') ``` The traits of each thermal performance curve can also easily be calculated. ```{r params} # stack models and calculate extra params d_params <- pivot_longer(d_fits, names_to = 'model_name', values_to = 'fit', c(gaussian,sharpeschoolhigh)) %>% mutate(params = map(fit, calc_params, .progress = TRUE)) %>% select(curve_id, growth_temp, process, flux, model_name, params) %>% unnest(params) glimpse(d_params) ``` ```{r tot_time, include=FALSE} tot_time <- lubridate::as.duration(lubridate::now() - runstart) ``` [Built in `r tot_time`s]{style="opacity: 0.1;font-size: small;"}