\documentclass[a4paper,10pt]{article} %\VignetteIndexEntry{Hierarchical logistic regression with package HOF} %\VignetteDepends{vegan} %\VignetteEngine{knitr::knitr} \usepackage[utf8x]{inputenc} \usepackage[T1]{fontenc} \usepackage[sort&compress]{natbib} \usepackage{fullpage} \title{Hierarchical species response curves in package eHOF} \author{Florian Jansen} \begin{document} \maketitle \begin{abstract} \noindent This is an example session to show how to use enhanced hierarchical logistic regression modeling in R with package \textit{eHOF}. Only a few possibilities and applications can be covered. Use the usual help functions for further information. After the installation of package eHOF you can access this PDF \footnote{$ $Id: eHOF.Rnw $ $ processed with eHOF \Sexpr{packageDescription("eHOF", field="Version")} in \Sexpr{R.version.string} on \today} with vignette("eHOF") \end{abstract} \section{Preparations} <>= suppressPackageStartupMessages(library(vegdata)) tmp <- tempdir() options(tv_home = tmp) dir.create(tmp) dir.create(file.path(tmp, 'Species')) dir.create(file.path(tmp, 'Popup')) dir.create(file.path(tmp, 'Data')) file.copy(from = file.path(path.package("vegdata"), 'tvdata', 'Popup'), to = tmp, recursive = TRUE) file.copy(from = file.path(path.package("vegdata"), 'tvdata', 'Species'), to = tmp, recursive = TRUE) file.copy(from = file.path(path.package("vegdata"), 'tvdata', 'Data'), to = tmp, recursive = TRUE) @ <>= library(eHOF) @ \subsection{Load vegetation data} Hierarchical logistic regressions can be used in many fields. We will use vegetation data and we can use package \texttt{vegdata} \citep{Jansen2010} to load vegetation data from a Turboveg \citep{Hennekens2001} database. Taxon names can be evaluated \cite{Jansen2010} if they are referenced with an appropriate taxonomic reference list \citep{Jansen2008}. Within this sample session we will use the build-in vegetation dataset of package \texttt{vegdata} from the floodplain of river Elbe \citep{Leyer2007} and a dataset delivered with \texttt{eHOF} from arable land of North-Eastern Germany with measured pH. <<4site.echo, results='hide', message=FALSE>>= library(vegdata) db <- 'elbaue' site <- tv.site(db) veg <- tv.veg(db, taxval = FALSE, spcnames = 'Numbers') obs <- tv.obs(db) # taxa <- tax(unique(obs$TaxonUsageID), verbose=TRUE) taxa <- tax('all') names(veg) <- sub('.0', '', names(veg), fixed=TRUE) names(veg) <- taxa$LETTERCODE[match(names(veg), taxa$TaxonUsageID)] @ Normally you will use the capabilities of the vegdata package and the taxonomic reference list of your Turboveg installation to replace species numbers with shortletters or real names. For the sake of CRAN policy (not to download foreign data) we set tax=FALSE and replace taxnumbers manually. \subsection{Cover transformation} If you want to use cover values as performance, it might be better to use cover transformed abundance values instead of the original percentage cover. <<5veg.2, results='hide'>>= veg.sqrt <- tv.veg(db, cover.transform='sqrt', tax=FALSE, spcnames='Numbers') names(veg.sqrt) <- sub('.0', '', names(veg.sqrt), fixed=TRUE) names(veg.sqrt) <- taxa$LETTERCODE[match(names(veg.sqrt), taxa$TaxonUsageID)] @ or even presence-absence information only. <<5veg.3, results='hide'>>= veg.pa <- tv.veg(db, cover.transform='pa', tax=FALSE, spcnames='Numbers') names(veg.pa) <- sub('.0', '', names(veg.pa), fixed=TRUE) names(veg.pa) <- taxa$LETTERCODE[match(names(veg.pa), taxa$TaxonUsageID)] @ \section{Modeling hierarchical logistic regressions} Introduced by Ramenskij in the early 20th century and named direct gradient analysis by \citet{Whittaker1967} is modeling species responses along environmental gradients still a frequent task in vegetation ecology. Several algorithms are available to map the performance of a species along changing ecological conditions. Often it is difficult to decide which level of complexity is needed to get an adequate simplification of the data. In 1993 Huisman, Olff and Fresco introduced a set of hierarchical models to combine the wish for simple and easy to interpret response models with the need to catch different kinds of niche types and species response data \citep{Huisman1993}. Species responses along measured environmental gradients are cross-sections of the species hyperniche. Even if we assume, that the physiological niche of a species should be simple (e.g. unimodal or with a specific threshold) we can not expect, that realized niches should be that simple too. Nevertheless, due to the generally high number of hidden gradients in field data and the resulting unbalanced datasets, it seems to be advisable not to use a modeling technique like Generalised Additive Models (GAM) but to restrict ourselves to a more conservative set of model types which can be interpreted afterwards. \subsection{Modeltypes} Huisman, Olff and Fresco suggested five response shapes \cite{Huisman1993}. Additional to these we added two bimodal model types so that we get 7 hierarchical model types in total (see Fig. \ref{fig:HOFModels}). \begin{figure} \begin{center} <>= data(acre) sel <- c('ELYMREP', 'VEROPES', 'CONSREG', 'DESUSOP', 'VEROARV', 'ARTE#VU', 'ACHIMIL') mo <- HOF(acre[match(sel, names(acre))], acre.env$PH_KCL, M=1, bootstrap=NULL) par(mar=c(4,4,1,1)+.1) autolayout(7) par(mar=c(4,4,1,1)+.1) for(i in 1:7) plot(mo[[i]], model = eHOF.modelnames[i], marginal ='n') @ \end{center} \caption{ The layout plot shows the seven model types of enhanced Hierarchical logistic regression modeling. } \label{fig:HOFModels} \end{figure} The seven models are of increasing complexity. Maybe the most important model type is number I: a flat response, that means there is no significant trend along the gradient for that species. It is our null hypothesis and ensures that only species with a clear response will be modelled with one of the further model types. Shape II is monotone sigmoidal with a top at one end of the gradient, III is monotone sigmoidal with a plateau below the maximal upper abundance value. Model type IV is the canonical form of species response, a unimodal symmetric model, V is a unimodal skewed model and model types VI and VII have two optima, VI with tops being equal. \subsection{How to use function HOF?} For all species in a vegetation data frame above a specific frequency threshold (10 by default) all six model shapes are modelled and stored in a HOF object. The most appropriate model type is evaluated only at the moment this object is printed, summarized or plotted. <>= mods <- HOF(veg, site$MGL, M=100, family = poisson, bootstrap = NULL) @ <>= mods @ Printing the output of a HOF object with more than one species will give a matrix of deviances for all modelled species along all model types and thereafter the model type suggested as most appropriate by selected test criteria (Aikaike Information Criterion corrected for small sample size by default). \begin{figure} \begin{center} <<7sqrt, results='hide', warning=FALSE>>= mods.sq <- HOF(veg.sqrt, site$MGL, M=10, family= poisson, freq.limit=10, bootstrap=NULL) plot(mods.sq) @ \end{center} \caption{ Most adequate model types for all species with at least 10 occurrences, square-root cover performance. } \label{fig:sqrtModels} \end{figure} \begin{figure} \begin{center} <<7pres-abs, results='hide', warning=FALSE>>= mods.pa <- HOF(veg.pa, site$MGL, M=1, bootstrap=NULL) plot(mods.pa) @ \end{center} \caption{ Most adequate model types for all species of the Elbaue dataset with at least 10 occurrences, presence-absence information. } \label{fig:paModels} \end{figure} Depending on the chosen performance measure HOF modeling will lead to different species response shapes. \section{Model parameters} Restricting models to predefined shapes offers the possibility to derive ecologically interpretable model characteristics like "the optimum", species niches etc. (see Fig. \ref{fig:ParaRep}). Package eHOF contains functions to compute parameters for the different model types. <>= Rrep <- taxa$LETTERCODE[taxa$TaxonName == 'Ranunculus repens'] Para(mods.pa[[Rrep]]) @ \begin{figure} \begin{center} <>= plot(mods.pa[[Rrep]], para=TRUE, onlybest=FALSE) @ \end{center} \caption{ Hierarchical logistic regression models for \textit{Ranunculus repens} presence/absence within the Elbaue dataset with plotted parameters of the most adequate model. } \label{fig:ParaRep} \end{figure} \begin{figure} \begin{center} <>= mod.sqrt <- eHOF:::HOF.default(veg.sqrt[[Rrep]], site$MGL, M=10, family=poisson, bootstrap = 10) plot(mod.sqrt, marginal='point', para=TRUE, onlybest=FALSE, newdata=seq(min(mod.sqrt$range), max(mod.sqrt$range), length.out=10000) ) Para(mod.sqrt) @ \end{center} \caption{ Hierarchical logistic regression models for \textit{Ranunculus repens} square root cover within the Elbaue dataset with plotted parameters of the most adequate model. } \label{fig:paraModels} \end{figure} % \subsection{Turnover} % % <>= % data(mtf) % mods <- HOF(mtf, mtf.env$Altitude, M=max(mtf), family=poisson, bootstrap=NULL) % plot(turnover(mods)) % @ \bibliographystyle{plainnat} \bibliography{bibdata} \end{document}