---
title: "Searching for temporal trends in phenotypes or rates"
author: "Silvia Castiglione, Carmela Serio, Pasquale Raia"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{search.trend}
  %\VignetteEngine{knitr::rmarkdown_notangle}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
if (!requireNamespace("rmarkdown", quietly = TRUE) ||
     !rmarkdown::pandoc_available()) {
   warning(call. = FALSE, "Pandoc not found, the vignettes is not built")
   knitr::knit_exit()
}

if (!requireNamespace("plotrix", quietly = TRUE)) {
   warning(call. = FALSE, "plotrix not found, the vignettes is not built")
   knitr::knit_exit()
}

if (!requireNamespace("kableExtra", quietly = TRUE)) {
   warning(call. = FALSE, "kableExtra not found, the vignettes is not built")
   knitr::knit_exit()
}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

require(RRphylo)
options(knitr.kable.NA = '',rmarkdown.html_vignette.check_title = FALSE)
load("st-data.Rda")
```

## Index
1. [search.trend basics](#basics)
2. [Temporal trends on the entire tree](#tree)
3. [Temporal trends at clade level](#nodes)
4. [Multivariate data](#multi)
5. [Guided examples](#examples)

## search.trend basics{#basics}
The branch-wise estimation of phenotypic evolutionary rates and the computation of ancestral states at each node make [RRphylo](RRphylo.html) especially suitable to study temporal trends in phenotypic means and absolute rates applying to the entire phylogeny or just on a part of it. This is particularly true as the RRphylo method is specifically meant to work with phylogenies of extinct species, taking full advantage from fossil information. The function `search.trend` (Castiglione et al. 2019) is designed to explore the domain of macroevolution, addressing patterns such as Cope's rule, or to test other specific research questions as to whether the rate of evolution increased/decreased in a particular tree or clade within it. Although possible in principle, running `search.trend` without extinct species in the tree makes little sense.   

`search.trend` takes an object produced by `RRphylo`, which is necessary to retrieve branch-wise rate estimates and ancestral characters at nodes. By default, the function searches for significant temporal trends in phenotypic means and absolute evolutionary rates as applying to the entire phylogeny. If a specific hypothesis about one or more clades is tested, the clades presumed to experience trended evolution, in either phenotypes, rates or both, must be indicated by the argument `node`. If more than one clade is under testing, the function performs a pairwise comparison of trends also between the clades.

## Temporal trends on the entire tree{#tree}
To search for temporal trends in phenotypes occurring on the entire phylogeny, the function computes the regression slopes between phenotypic values at nodes and tips and their age, meant as the distance from the tree root. The same is implemented to detect temporal trends in absolute rates, but absolute rates are first rescaled in the 0-1 range and logged. Since a significant relationship between phenotypes (absolute rates) and age is possible even under the Brownian Motion (BM from now on), significance is assessed comparing the real slope to a family of 100 slopes (the number can be specified by setting the argument `nsim` within the function) generated by simulating phenotypes according to the BM process (*BMslopes*). 

```{r,echo=FALSE,message=FALSE,warning=FALSE,fig.dim=c(8,4),fig.align='center',out.width='100%',dpi=220}
par(mfrow=c(1,2),mar=c(4,3,3,1))
plot(NA,ylim=range(rts[,1]),xlim=range(rootD),
     mgp=c(1.8,0.5,0),
     xlab="distance from the root",ylab="rescaled rates",main="Simulation for trend\nin absolute rates")
polygon(c(dfk$rootD,rev(dfk$rootD)),c(dfk$fk,rev(dfk$fk.min)),col = rgb(0.5, 0.5, 0.5,
                                                                        0.3), border = NA)
points(rootD,rts[,1],cex=1.5,bg=rts$col,
       col="black",pch=21)
abline(lm(rts[,1]~rootD),lwd=2,col="#ff00ff")

legend("topleft",legend=c("nodes","tips","brownian range"),fill=c("red","green","#dadad9"),bty="n")

#### DRIFT
plot(NA,ylim=range(phen[,1]),xlim=range(rootP),
     mgp=c(1.8,0.5,0),
     xlab="distance from the root",ylab="rescaled phenotypes",main="Simulation for trend\nin mean phenotypes")
polygon(c(dfk2$rootP,rev(dfk2$rootP)),c(dfk2$fk,rev(dfk2$fk.min)),col = rgb(0.5, 0.5, 0.5,
                                                                        0.3), border = NA)
points(rootP,phen[,1],cex=1.5,bg=phen$col,
       col="black",pch=21)
abline(lm(phen[,1]~rootP),lwd=2,col="#ff00ff")

# rbind(data.frame(st.rates[[3]],dev=NA),data.frame(st.phen[[2]][,1:3,drop=FALSE],spread=NA,st.phen[[2]][,4,drop=FALSE]))->res
# 
# colnames(res)[4:5]<-c("spread","dev")
# rownames(res)<-c("rescaled absolute rate regression","phenotypic regression")
```
```{r,eval=FALSE,message=FALSE,warning=FALSE}
search.trend(RR=RR,y=y)->ST
```
```{r,echo=FALSE,message=FALSE,warning=FALSE}
require(kableExtra)

knitr::kable(res,digits=3,align="c") %>%
kable_styling(full_width = F, position = "center")  

```

As in the table above, `search.trend` results for phenotypic (`$phenotypic.regression`) and absolute rate (`$rate.regression`) regressions both include: the actual regression slope and its p-value (**p.real**), and the p-value derived by contrasting the real slope to the *BMslopes* (**p.random**). The user wants to look at **p.random** to see if a real trend applies. 

The output of `search.trend` also includes two metrics specifically designed to quantify the magnitude of the phenotypic/absolute rates deviation from BM. The **spread** represents the deviation from BM produced by a trend in absolute rates. It is calculated as the ratio between the range of phenotypic values and the range of such values halfway along the tree height, divided to the same metric value generated under BM. **spread** is 1 under BM.
The **dev** quantifies the deviation from BM produced by a trend in phenotypic means. It represents the deviation of the phenotypic mean from the root value in terms of the number of standard deviations of the trait distribution. **dev** is zero under the BM.

Therefore, the example above produced significant results for both phenotypic and absolute rates temporal trends (**p.random** are both > 0.95). Absolute rates increase over time is significant (**p.real** < 0.05) and some nine times as much (**spread** = 9.380) as expected under BM. The increase in phenotypic means through time is significant (**p.real** < 0.05) and significantly different from the BM simulation, describing a deviation in mean phenotype more than one half of the standard deviation of the phenotype (**dev** = 0.681).


## Temporal trends at clade level{#nodes}
A peculiarity of `search.trend` is that it can test whether individual clades follow a different trend in phenotypic means (absolute rates) over time, as compared to the rest of the tree. 

In the case of phenotypic trend, individual clades are tested for the hypothesis that trend intensity (slope) does not depart from BM and whether the marginal means of the regression values differ from the means calculated for the rest of the tree. In the case of a trend in absolute rates the actual regression slope depends on the relative position (age) of the focal nodes respective to the root, given the exponential nature of phenotypic variance change in time. Because of this, estimated marginal means and regression slopes for the rate versus age regression of the focal clade are contrasted directly to the same metrics generated for the rest of the tree. The comparisons for both phenotypes (estimated marginal means) and rates regression (estimated marignal means and regression slopes) are implemented by using the functions `emmeans` and `emtrends`  in package emmeans (Lenth 2020).

If more than one clade is under testing, the function performs the pairwise comparisons of trend intensity between clades. For absolute rates regression, such comparison is performed by computing differences in estimated marginal means and regression slopes as described above. Trends in phenotypic means are compared by computing the difference in regression slopes between the clades and contrasting such value to a random distribution of differences generated under BM. The difference in estimated marginal means of phenotype versus age regression between clades is also returned. 

```{r,echo=FALSE,message=FALSE,warning=FALSE,fig.dim=c(8,4),fig.align='center',out.width='100%',dpi=220}
require(plotrix)

par(mfrow=c(1,2),mar=c(4,3,3,1))
plot(max(ages)-ages[-which(names(ages)%in%c(desn,desn2))],rates[-which(names(rates)%in%c(desn,desn2))],
     ylim=range(rates),xlim=c(max(ages),min(ages)),
     pch=21,bg="gray80",cex=1.5,mgp=c(1.8,0.5,0),
     xlab="age",ylab="absolute rates",main="Simulation for trend\nin absolute rates")
abline(lm(rates~ages),lwd=3,col="gray20")
points(max(ages)-ages[which(names(ages)%in%desn)],rates[which(names(rates)%in%desn)],
       xlim=c(max(ages),min(ages)),
       pch=21,bg="slateblue2",cex=1.5)
ablineclip(lm(rates[which(names(rates)%in%desn)]~I(max(ages)-ages[which(names(ages)%in%desn)])),
           x1=max(max(ages)-ages[which(names(ages)%in%desn)]),
           x2=min(max(ages)-ages[which(names(ages)%in%desn)]),
           col="black",lwd=4.5)
ablineclip(lm(rates[which(names(rates)%in%desn)]~I(max(ages)-ages[which(names(ages)%in%desn)])),
           x1=max(max(ages)-ages[which(names(ages)%in%desn)]),
           x2=min(max(ages)-ages[which(names(ages)%in%desn)]),
           col="slateblue2",lwd=3)
points(max(ages)-ages[which(names(ages)%in%desn2)],rates[which(names(rates)%in%desn2)],
       xlim=c(max(ages),min(ages)),
       pch=21,bg="#ae9eff",cex=1.5)
ablineclip(lm(rates[which(names(rates)%in%desn2)]~I(max(ages)-ages[which(names(ages)%in%desn2)])),
           x1=max(max(ages)-ages[which(names(ages)%in%desn2)]),
           x2=min(max(ages)-ages[which(names(ages)%in%desn2)]),
           col="black",lwd=4.5)
ablineclip(lm(rates[which(names(rates)%in%desn2)]~I(max(ages)-ages[which(names(ages)%in%desn2)])),
           x1=max(max(ages)-ages[which(names(ages)%in%desn2)]),
           x2=min(max(ages)-ages[which(names(ages)%in%desn2)]),
           col="#ae9eff",lwd=3)
legend("topleft",legend=c("node 225","node 319","entire tree"),fill=c("slateblue2","#ae9eff","gray20"),bty="n",x.intersp = .5)

plot(max(ages)-ages[-which(names(ages)%in%c(desn,desn2))],phen2[-which(names(phen2)%in%c(desn,desn2))],
     ylim=range(phen2),xlim=c(max(ages),min(ages)),
     pch=21,bg="gray80",cex=1.5,mgp=c(1.8,0.5,0),
     xlab="age",ylab="phenotype",main="Simulation for trend\nin mean phenotypes")
abline(lm(phen2~ages),lwd=3,col="gray20")
points(max(ages)-ages[which(names(ages)%in%desn)],phen2[which(names(phen2)%in%desn)],
       xlim=c(max(ages),min(ages)),
       pch=21,bg="aquamarine3",cex=1.5)
ablineclip(lm(phen2[which(names(phen2)%in%desn)]~I(max(ages)-ages[which(names(ages)%in%desn)])),
           x1=max(max(ages)-ages[which(names(ages)%in%desn)]),
           x2=min(max(ages)-ages[which(names(ages)%in%desn)]),
           col="black",lwd=4.5)
ablineclip(lm(phen2[which(names(phen2)%in%desn)]~I(max(ages)-ages[which(names(ages)%in%desn)])),
           x1=max(max(ages)-ages[which(names(ages)%in%desn)]),
           x2=min(max(ages)-ages[which(names(ages)%in%desn)]),
           col="aquamarine3",lwd=3)

points(max(ages)-ages[which(names(ages)%in%desn2)],phen2[which(names(phen2)%in%desn2)],
       xlim=c(max(ages),min(ages)),
       pch=21,bg="aquamarine",cex=1.5)
ablineclip(lm(phen2[which(names(phen2)%in%desn2)]~I(max(ages)-ages[which(names(ages)%in%desn2)])),
           x1=max(max(ages)-ages[which(names(ages)%in%desn2)]),
           x2=min(max(ages)-ages[which(names(ages)%in%desn2)]),
           col="black",lwd=4.5)
ablineclip(lm(phen2[which(names(phen2)%in%desn2)]~I(max(ages)-ages[which(names(ages)%in%desn2)])),
           x1=max(max(ages)-ages[which(names(ages)%in%desn2)]),
           x2=min(max(ages)-ages[which(names(ages)%in%desn2)]),
           col="aquamarine",lwd=3)
legend("topleft",legend=c("node 225","node 319","entire tree"),fill=c("aquamarine3","aquamarine","gray20"),bty="n",x.intersp = .5)

cbind(data.frame(node=names(STrates[[5]]),do.call(rbind,STrates[[5]])),
      data.frame(node=names(STphen[[4]]),do.call(rbind,STphen[[4]])))->res2
```
```{r,eval=FALSE,message=FALSE,warning=FALSE}
search.trend(RR=RR,y=y,node=c(225,319))->ST
```
```{r,echo=FALSE,message=FALSE,warning=FALSE}
rownames(res2)<-NULL
knitr::kable(res2,digits=3,align="c") %>%
kable_styling(full_width = F, position = "center") %>%
column_spec(6, border_right = TRUE) %>%
add_header_above(c("Trend in absolute rates" = 6, "Trend in phenotypic means" = 5))

```

Same as for the entire tree, `search.trend` returns the results for phenotypic (`$node.phenotypic.regression`) and absolute rate (`$node.rate.regression`) regressions over time at each node under testing. For trends in absolute rates, significance level for both differences in estimated marginal means (**emm.difference** and **p.emm**) and regression slopes (**slope.node**,**slope.others**, and **p.slope**) between each clade and the rest of the tree are reported. Results for trend in phenotypic means through each node include significance for the comparison of the real slope to the random slopes (**slope** and **p.slope**), and the significance for the difference in estimated marginal means between each clade and the rest of the tree (**emm.difference** and **p.emm**).

In the example above, the absolute rates versus age regression through the clade subtended by node 225 (just node 225 from now on) produced significant and negative difference in the estimated marginal means of the clade as compared to the rest of the tree (**p.emm** < 0.05) but no significant difference in regression slopes (**p.slope** > 0.05). Similarly, the estimated marginal means of regression through node 319 are significantly lower than the rest of the tree (**emm.difference** < 0 and **p.emm** < 0.05), and the difference in slopes is negative and significant (**p.slope** < 0.05). As for the trend in phenotypic means, this is positive and significantly higher than BM for both clades (**slope** > 0 and **p.slope** > 0.975). Both clades also show significantly larger estimated marginal means of phenotypes versus age regression than the rest of the tree (**emm.difference** > 0 and **p.emm** < 0.05). 

Please note, the dark gray line in both figures represents the whole tree, inclusive of both clades under testing.

```{r,echo=FALSE,message=FALSE,warning=FALSE}
knitr::kable(STrates[[6]][[2]],digits=3,align="c") %>%
kable_styling(full_width = F, position = "center") %>%
add_header_above(c("Comparison of trends in absolute rates" = 7))

knitr::kable(STphen[[6]][[1]],digits=3,align="c") %>%
kable_styling(full_width = F, position = "center") %>%
add_header_above(c("Comparison of trends in phenotypic means" = 8))

```

If more than one clade is indicated, the results also include pairwise comparison of trends between clades (`$group.comparison`). The comparison of trends in absolute rates is performed in the same way as the comparison of a single clade against the rest of the tree. Thus, results consist of differences in both estimated marginal means (**emm.difference**) and slopes (**slope.group_1**,**slope.group_2**) between clades with respective p.values (**p.emm** and **p.slope** respectively). For the comparison of phenotypic trends between clades, the differences in regression slopes (**slope.difference**) and estimated marginal means (**emm.difference**) between clades are returned. In this case the p-value for the slope difference (**p.slope**) is computed through randomization.

In the example above, the comparison of trends in absolute rates at nodes 225 and 319 is not significant for either slope or estimated marginal means difference (both **p.emm** and **p.slope** > 0.05). On the other hand, trends in phenotypic means significantly differ in estimate marginal means (**p.emm** < 0.05), but not in slope (**p.slope** > 0.05).

## Multivariate data and multiple RRphylo output{#multi}
When applied on multivariate data, `search.trend` treats each phenotypic and rate (derived by `RRphylo`) component independently. Also, it performs the whole set of analyses on a multivariate vector of phenotypes (rates), derived by computing the L2-norm of the individual phenotypes (rates) at each branch (the same as in multivariate RRphylo).

When applying `search.trend` on a [multiple RRphylo](RRphylo.html#predictor) output, the possible effect of an additional predictor is incorporated in the computation

The [multiple regression version of RRphylo](RRphylo.html#predictor) allows to incorporate the effect of an additional predictor in the computation of evolutionary rates without altering the ancestral character estimation. Thus, when a multiple `RRphylo` output is fed to `search.trend`, the predictor effect is accounted for on the absolute evolutionary rates, but not on the phenotype. However, in some situations the user might want to ‘factor out’ the predictor effect on phenotypes as well. Under the latter circumstance, by setting the argument `x1.residuals = TRUE`, the residuals of the response to predictor regression are used as to represent the phenotype.
 
## Guided examples {#examples}

```{r,out.width='100%',fig.dim=c(8,7),message=FALSE,dpi=220}
require(ape)
# load the RRphylo example dataset including Cetaceans tree and data
data("DataCetaceans")
DataCetaceans$treecet->treecet # phylogenetic tree
DataCetaceans$masscet->masscet # logged body mass data
DataCetaceans$brainmasscet->brainmasscet # logged brain mass data
DataCetaceans$aceMyst->aceMyst # known phenotypic value for the most recent 
                               # common ancestor of Mysticeti


par(mar=c(0,0,0,1))
plot(ladderize(treecet,FALSE),show.tip.label = FALSE,edge.color = "gray40")
plotinfo<-get("last_plot.phylo",envir =ape::.PlotPhyloEnv)
nodelabels(text="",node=128,frame="circle",bg="red",cex=0.5)
nodelabels(text="Mystacodon",node=128,frame="n",bg="w",cex=1,adj=c(-0.1,0.5),font=2)
range(plotinfo$yy[which(treecet$tip.label%in%tips(treecet,128))])->yran128
rect(plotinfo$x.lim[2]+0.4,yran128[1],plotinfo$x.lim[2]+0.7,yran128[2],col="red",border="red")
range(plotinfo$yy[which(treecet$tip.label%in%tips(treecet,142))])->yran142
rect(plotinfo$x.lim[2]+0.4,yran142[1],plotinfo$x.lim[2]+0.7,yran142[2],col="blue",border="blue")
mtext(c("Mysticeti","Odontoceti"), side = 4,line=-0.5,at=c(sum(yran128)/2,sum(yran142)/2),
      cex=1.5,adj=0.5,col=c("red","blue"))
```

```{r,eval=FALSE,message=FALSE,dpi=220}
# check the order of your data: best if data vectors
# are sorted in the same order of the species on the phylogeny
masscet[match(treecet$tip.label,names(masscet))]->masscet

## Example 1: search.trend by setting values at internal nodes
# Set the body mass of Mysticetes ancestor (Mystacodon selenensis) 
# as known value at node and perform RRphylo on the vector of (log) body mass
RRphylo(tree=treecet,y=masscet,aces=aceMyst)->RR

# Perform search.trend on the RR object and (log) body mass by indicating Mysticeti as focal clade
search.trend(RR=RR,y=masscet,node=as.numeric(names(aceMyst)))


## Example 2: search.trend on multiple regression version of RRphylo
# cross-reference the phylogenetic tree and body and brain mass data. Remove from both the tree and
# vector of body sizes the species whose brain size is missing
drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet),treecet$tip.label)])->treecet.multi
masscet[match(treecet.multi$tip.label,names(masscet))]->masscet.multi

# check the order of your data: best if
# data vectors (i.e. masscet and brainmasscet) are sorted
# in the same order of the species on the phylogeny
masscet.multi[match(treecet.multi$tip.label,names(masscet.multi))]->masscet.multi
brainmasscet[match(treecet.multi$tip.label,names(brainmasscet))]->brainmasscet

# perform RRphylo on tree and body mass data
RRphylo(tree=treecet.multi,y=masscet.multi)->RRmass.multi

# create the predictor vector: retrieve the ancestral character estimates 
# of body size at internal nodes from the RR object ($aces) and collate them
# to the vector of species' body sizes to create
c(RRmass.multi$aces[,1],masscet.multi)->x1.mass

# perform the multiple regression version of RRphylo by using
# the brain size as variable and the body size as predictor
RRphylo(treecet.multi,y=brainmasscet,x1=x1.mass)->RRmulti

# Perform search.trend on the multiple RR object to inspect the effect of body 
# size on absolute rates temporal trend only
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,clus=cc)

# Perform search.trend on the multiple RR object to inspect the effect of body
# size on trends in both absolute evolutionary rates and phenotypic values 
# (by using brain versus body mass regression residuals)
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,x1.residuals=TRUE,
             clus=cc)

```


## References
Castiglione, S., Serio, C., Mondanaro, A., Di Febbraro, M., Profico, A., Girardi, G., & Raia, P. (2019). Simultaneous detection of macroevolutionary patterns in phenotypic means and rate of change with and within phylogenetic trees including extinct species. PloS one, 14: e0210101. doi.org/10.1371/journal.pone.0210101

Lenth, R. (2020). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.4.4. https://CRAN.R-project.org/package=emmeans