--- title: "Simulating Tracks" author: "Inge Wortel" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Simulating Tracks} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(dpi=72) ``` # Introduction Simulating artifical tracks using mathematical or statistical models can help interpret the data from a tracking experiment. This tutorial will explain how to use simulation methods provided with the package. The simulation methods provided in the package are: - `brownianTrack()` : a random walk in n dimensions. - `beaucheminTrack()` : a special version of a random walk model specifically designed to describe T cell motion in an uninflamed lymph node. - `bootstrapTrack()` : a data-driven simulation, where step speeds and turning angles are sampled from those observed in experimental data of interest. # Datasets First load the package: ```{r pack, warning = FALSE, message = FALSE} library( celltrackR ) library( ggplot2 ) ``` ```{r savepar, echo = FALSE} # Save current par() settings oldpar <- par( no.readonly =TRUE ) ``` The package contains a dataset of T cells imaged in a mouse cervical lymph node using two-photon microscopy. We will here use this dataset as an example of how we can compare real tracking data with data from a simulation model. While the original data contained 3D coordinates, we'll use the 2D projection on the XY plane (see the vignette on [preprocessing of the package datasets](./data-QC.html) for details). The dataset consists of 199 tracks of individual cells in a tracks object: ```{r Tdata} str( TCells, list.len = 3 ) ``` Each element in this list is a track from a single cell, consisting of a matrix with $(x,y)$ coordinates and the corresponding measurement timepoints: ```{r showTdata} head( TCells[[1]] ) ``` # 1 Modelling brownian motion ### 1.1 A simple random walk We can use the function `brownianTrack()` to simulate a simple random walk (here in 3 dimensions): ```{r brownian1} brownian <- brownianTrack( nsteps = 20, dim = 3 ) str(brownian) ``` Note that this returns a single track matrix, which we can turn into a track object by using `wrapTrack()`: ```{r plotSingleBrownian} plot( wrapTrack(brownian) ) ``` We can also simulate multiple tracks at once using the `simulateTracks()` function. Do this and compare to the T cell tracks: ```{r plotBrownian, fig.width = 7 } brownian.tracks <- simulateTracks( 10, brownianTrack( nsteps = 20, dim = 3 ) ) par(mfrow=c(1,2)) plot( brownian.tracks, main = "simulated random walk" ) plot( normalizeTracks( TCells ), main = "real T cell data" ) ``` ### 1.2 Matching displacement to data In this simulation, the displacement vector for each step is sampled randomly in each dimension from a normal distribution of $\mu=0, \sigma=1$. To match the average displacement observed in the T cell data: ```{r matchDisp, fig.width = 7 } # get displacement vectors step.displacements <- t( sapply( subtracks(TCells,1), displacementVector ) ) # get mean and sd of displacement in each dimension step.means <- apply( step.displacements, 2, mean ) step.sd <- apply( step.displacements, 2, sd ) # simulate brownian motion with the same statistics brownian.tracks.matched <- simulateTracks( 10, brownianTrack( nsteps = 20, dim = 3, mean = step.means, sd = step.sd ) ) # compare displacement distributions data.displacement <- sapply( subtracks( TCells,1), displacement ) matched.displacement <- sapply( subtracks( brownian.tracks.matched, 1 ), displacement ) df <- data.frame( disp = data.displacement, data = "TCells" ) df2 <- data.frame( disp = matched.displacement, data = "model" ) df <- rbind( df, df2 ) ggplot( df, aes( x = data, y = disp ) ) + geom_boxplot() + theme_classic() # Plot new simulation versus real data par(mfrow=c(1,2)) plot( brownian.tracks.matched, main = "simulated random walk" ) plot( normalizeTracks( TCells ), main = "real T cell data" ) ``` > Note: this is not the best method for matching a Brownian model to data; the best method would be to use the diffusion coefficient as estimated using Furth's equation. An example of this method is given later in this vignette. ### 1.3 A biased random walk We can bias movement in a certain direction by setting the mean in each dimension: ```{r biasedWalk} # simulate brownian motion with bias brownian.tracks.bias <- simulateTracks( 10, brownianTrack( nsteps = 20, dim = 3, mean = c(1,1,1) ) ) plot( brownian.tracks.bias, main = "biased random walk" ) ``` # 2 The Beauchemin model of lymphocyte migration Aside from a simple random walk, the package also implements a slightly different model proposed by [Beauchemin et al (2007)](https://pubmed.ncbi.nlm.nih.gov/17442932/). In contrast to the simple random walk, this model has a tuneable amount of persistence, and the cell pauses briefly between each step as it needs to reorient itself, which takes some time. For these reasons, the Beauchemin model generates tracks that are a bit more realistic than a simple random walk. ```{r beauchemin} beauchemin.tracks <- simulateTracks( 10, beaucheminTrack(sim.time=20) ) plot( beauchemin.tracks ) ``` See `?beaucheminTrack` for further details. # 3 Bootstrapping method for simulating migration The final simulation method provided in the package is `bootstrapTrack()`, which does not assume an underlying migration model but samples speeds and turning angles from a real dataset. For example, we can do this for the T cell data: ```{r bootstrap} bootstrap.tracks <- simulateTracks( 10, bootstrapTrack( nsteps = 20, TCells ) ) plot( bootstrap.tracks ) ``` If all is as should be, the distribution of speed and turning angles should now match the real data closely. Check this: ```{r distributionComp, fig.width = 7, warning = FALSE, message = FALSE } # Simulate more tracks to reduce noice bootstrap.tracks <- simulateTracks( 50, bootstrapTrack( nsteps = 20, TCells ) ) # Compare step speeds in real data to those in bootstrap data real.speeds <- sapply( subtracks( TCells,1 ), speed ) bootstrap.speeds <- sapply( subtracks( bootstrap.tracks,1), speed ) dspeed <- data.frame( tracks = c( rep( "data", length( real.speeds ) ), rep( "bootstrap", length( bootstrap.speeds ) ) ), speed = c( real.speeds, bootstrap.speeds ) ) # Same for turning angles real.angles <- sapply( subtracks( TCells,2 ), overallAngle, degrees = TRUE ) bootstrap.angles <- sapply( subtracks( bootstrap.tracks,2), overallAngle, degrees = TRUE ) dangle <- data.frame( tracks = c( rep( "data", length( real.angles ) ), rep( "bootstrap", length( bootstrap.angles ) ) ), angle = c( real.angles, bootstrap.angles ) ) # plot pspeed <- ggplot( dspeed, aes( x = tracks, y = speed ) ) + geom_violin( color = NA, fill = "gray" ) + geom_boxplot( width = 0.3 ) + theme_classic() pangle <- ggplot( dangle, aes( x = tracks, y = angle ) ) + geom_violin( color = NA, fill = "gray" ) + geom_boxplot( width = 0.3 ) + theme_classic() gridExtra::grid.arrange( pspeed, pangle, ncol = 2 ) ``` # 4 Example: Comparing data with models ### 4.1 Mean square displacement plot To compare two different models with the real T cell data, we make a mean square displacement plot. To remove the effect of noise, we simulate a greater number of tracks than before: ```{r msdComp, fig.width=6} # Simulate more tracks brownian.tracks <- simulateTracks( 50, brownianTrack( nsteps = 20, dim = 3, mean = step.means, sd = step.sd ) ) bootstrap.tracks <- simulateTracks( 50, bootstrapTrack( nsteps = 20, TCells ) ) msd.data <- aggregate( TCells, squareDisplacement, FUN = "mean.se" ) msd.data$data <- "data" msd.brownian <- aggregate( brownian.tracks, squareDisplacement, FUN = "mean.se" ) msd.brownian$data <- "brownian" msd.bootstrap <- aggregate( bootstrap.tracks, squareDisplacement, FUN = "mean.se" ) msd.bootstrap$data <-"bootstrap" msd <- rbind( msd.data, msd.brownian, msd.bootstrap ) ggplot( msd, aes( x = i, y = mean, ymin = lower, ymax = upper, color = data, fill = data ) ) + geom_ribbon( color= NA, alpha = 0.2 ) + geom_line() + labs( x = "t (steps)", y = "square displacement" ) + scale_x_log10(limits= c(NA,10) ) + scale_y_log10() + theme_bw() ``` Both the random walk model and the bootstrapped tracks underestimate the MSD as observed in real data -- even though the bootstrapped data at least matches speed and turning angle distributions well. This suggests there is more directional persistence in the real data than those models assume. In the next section, we will check this by making the autocorrelation plot. ### 4.2 Persistence: autocovariance plot To check for directional persistence, we generate an autocovariance plot: ```{r acovComp, fig.width=6} # compute autocorrelation acor.data <- aggregate( TCells, overallDot, FUN = "mean.se" ) acor.data$data <- "data" acor.brownian <- aggregate( brownian.tracks, overallDot, FUN = "mean.se" ) acor.brownian$data <- "brownian" acor.bootstrap <- aggregate( bootstrap.tracks, overallDot, FUN = "mean.se" ) acor.bootstrap$data <-"bootstrap" acor <- rbind( acor.data, acor.brownian, acor.bootstrap ) ggplot( acor, aes( x = i, y = mean, ymin = lower, ymax = upper, color = data, fill = data ) ) + geom_ribbon( color= NA, alpha = 0.2 ) + geom_line() + labs( x = "dt (steps)", y = "autocovariance" ) + scale_x_continuous(limits= c(0,10) ) + theme_bw() ``` Indeed, the autocovariance drops less steeply for the real T cell data, which indicates that there is a directional persistence in the T cell data that is not captured by the brownian and bootstrapping models. Thus, even when a model captures some aspects of the walk statistics in the data, it may still behave differently in other respects. # 5 Fitting models on the MSD In the above, we saw that a Brownian motion model underestimated the MSD of the real T-cell dataset. Here, we show how we can do better by fitting models on the MSD. ### 5.1 Before we start: dimensionality and imaging windows Before we start, let's look at the data we are fitting in more detail. If we look at the dimensions of the T-cell dataset: ```{r bb} boundingBox( TCells ) ``` we see that the imaging window is limited to roughly 200 x 200 x 50 $\mu$m; thus, it is smaller in the z-dimension. A limited imaging window like this can cause artefacts in the MSD curve because cells can not be followed after they leave the imaging window. Thus, for larger $\Delta t$, we underestimate the mean (squared) displacement because the faster cells have left the imaging window and do not end up in our data. In our case, this will be particularly problematic in the z-dimension. Thus, for now we will not fit the 3D data, but instead use the 2D projection on the xy-plane: ```{r project} Txy <- projectDimensions( TCells, c("x","y") ) ``` Let's look at the MSD curve we get: ```{r msdXY, fig.width = 6} # Compute MSD msdData <- aggregate( Txy, squareDisplacement, FUN = "mean.se" ) # Scale time axis: by default, this is in number of steps; convert to minutes. tau <- timeStep( Txy ) / 60 # in minutes msdData$dt <- msdData$i * tau # plot ggplot( msdData, aes( x = dt, y = mean ) ) + geom_ribbon( alpha = 0.3, aes( ymin = lower, ymax = upper ) ) + geom_line() + labs( x = expression( Delta*"t (min)" ), y = expression( "displacement"^2*"("*mu*"m"^2*")") ) + coord_cartesian( xlim = c(0,NA), ylim = c(0,NA), expand = FALSE ) + theme_bw() ``` Two things are worth noting here: 1. the larger $\Delta$t, the higher the uncertainty (SE); this is because at larger $\Delta$t, we have fewer subtracks of that length, and thus less data to compute a mean on; 2. the MSD curve has a weird shape. This is actually the case because of the limited imaging window problem described above. We will therefore use only the first part of the curve (the first 5 minutes), where we have enough data and the imaging window problem remains manageable. ```{r fitThreshold} fitThreshold <- 5 # minutes ``` Now, let's see how we can fit the two simulation models: `brownianTrack` and `beaucheminTrack` on this MSD curve. ### 5.2 Fitting Brownian motion based on the diffusion coefficient The `brownianTrack` function takes 4 arguments: the number of steps to simulate, the number of dimensions to simulate in, and the mean and sd of the step sizes $(\Delta x, \Delta y, \Delta z) in each dimension. The number of steps depends on the time we wish to simulate, the dimension is 2 (since we projected the T cells on the xy plane) and the mean of an unbiased random walk is always 0. That leaves the standard deviation. For Brownian motion, the dimension-wise variance of the step size is related to the "diffusion coefficient" $D$ as: $$\text{var} = 2D\tau$$ with $\tau$ the step duration. For the standard deviation, we get: $$\sigma_\text{dim} = \sqrt{2D\tau}$$ we already know the $\tau$ of our data, and can obtain $D$ by fitting Furth's formula for the mean squared displacement $\langle d^2 \rangle$: $$\langle d^2 \rangle (\Delta t) = 2Dn_d \big( \Delta t - P(1-e^{-\Delta t/P} ) )$$ Here, $D$ is the diffusion coefficient, $P$ the persistence time, $\Delta t$ the time resolution we are looking at, and $n_d$ the number of dimensions. We can fit this function using R's `nls()` on the MSD data to get $D$ and $P$: ```{r furthFit} fuerthMSD <- function( dt, D, P, dim ){ return( 2*dim*D*( dt - P*(1-exp(-dt/P) ) ) ) } # Fit this function using nls. We fit only on the data where # dt < fitThreshold (see above), and need to provide reasonable starting # values or the fitting algorithm will not work properly. model <- nls( mean ~ fuerthMSD( dt, D, P, dim = 2 ), data = msdData[ msdData$dt < fitThreshold, ], start = list( D = 10, P = 0.5 ), lower = list( D = 0.001, P = 0.001 ), algorithm = "port" ) D <- coefficients(model)[["D"]] # this is now in units of um^2/min P <- coefficients(model)[["P"]] # persistence time in minutes D P ``` Note that we will not use $P$. However, we fit Furth's formula with a non-zero persistence time $P$ because we would otherwise underestimate $D$. Now, given $D$, simulate some tracks: ```{r fittedBrownian, fig.width = 5, fig.height = 5} # simulate tracks using the estimated D. We simulate with time unit seconds # to match the data, so divide D by 60 to go from um^2/min to um^2/sec. # The dimension-wise variance of brownian motion is 2*D*tau (with tau the step # duration), so use this to parametrise the model: tau <- timeStep( Txy ) brownianScaled <- function( nsteps, D, tau, dim = 2 ){ tr <- brownianTrack( nsteps = nsteps, dim = 2, sd = sqrt( 2*(D/60)*tau ) ) tr[,"t"] <- tr[,"t"]*tau # scale time return(tr) } brownian.tracks <- simulateTracks( 1000, brownianScaled( nsteps = maxTrackLength(Txy ), D = D, tau = tau ) ) plot( brownian.tracks[1:10], main = "brownian motion") ``` Finally, get MSD and compare it to data: ```{r msdFittedBrownian, fig.width = 6} # get msd of the simulated tracks and scale time to minutes msdBrownian <- aggregate( brownian.tracks, squareDisplacement, FUN = "mean.se" ) tau <- timeStep( brownian.tracks ) / 60 # in minutes msdBrownian$dt <- msdBrownian$i * tau # Get Furth MSD using fitted P and D for comparison msdFuerth <- data.frame( dt = msdBrownian$dt, mean = fuerthMSD( msdBrownian$dt, D, P, dim = 2 )) # plot ggplot( msdBrownian, aes( x = dt, y = mean) ) + geom_point( data = msdData, shape = 21, aes( fill = dt < fitThreshold ), color = "blue", show.legend = FALSE ) + geom_line( data = msdFuerth, color = "red", lty = 2 ) + geom_line( )+ scale_fill_manual( values = c( "TRUE" = "blue", "FALSE" = "transparent" ) ) + scale_x_log10( expand = c(0,0) ) + scale_y_log10( expand = c(0,0) ) + labs( color = NULL, x = expression( Delta*"t (min)"), y = expression( "displacement"^2*"("*mu*"m"^2*")") , fill = NULL ) + theme_bw() ``` Here, we see the data in blue, where the colored circles are the data the fit is based on. The red line is the Furth's MSD, the black line is the Brownian model. The Brownian model does not describe the data well especially on short time scales, because it does not contain the persistence that the Furth formula does. Nevertheless, this fit is better than the one with matched displacements on short time scales. Next, let's look at a slightly more complicated model. ### 5.3 Fitting the Beauchemin model We now use a similar approach to fit the model proposed by [Beauchemin et al (2007)](https://pubmed.ncbi.nlm.nih.gov/17442932/), which was designed for T-cell migration in the lymph node. The MSD of this model can be derived analytically [(Textor et al (2013), proposition 12)](https://doi.org/10.1186/1471-2105-14-S6-S10) : $$\langle d^2 \rangle (\Delta t)= 2Mt - 2Mt_\text{free} \cdot \begin{cases} \frac{1}{3} & t \geq t_\text{free} \\ \frac{1}{3} \big(t/t_\text{free}\big)^3 - \big(t/t_\text{free}\big)^2 + \big(t/t_\text{free}\big) & t < t_\text{free} \end{cases}$$ where $M$ is the motility coefficient, which should be the same as our diffusion coefficient $D$ (see above). $M$ is related to the model parameters [(Textor et al (2013), equation 2)](https://doi.org/10.1186/1471-2105-14-S6-S10): $$M = \frac{v_\text{free}^2\cdot t_\text{free}^2}{6(t_\text{free}+t_\text{pause})}$$ We should note here that the Beauchemin has three parameters ($v_\text{free}$, $t_\text{free}$, $t_\text{pause}$), but that these are underdefined based on MSD: multiple parameter combinations can give the exact same MSD (see [Beauchemin et al (2007)](https://pubmed.ncbi.nlm.nih.gov/17442932/), [Textor et al (2013)](https://doi.org/10.1186/1471-2105-14-S6-S10)). This means we cannot fit all three parameters based on the MSD. Here, we'll set $t_\text{pause}$ to the literature value of 0.5 minute, and fit the other parameters $v_\text{free}$ and $t_\text{free}$ based on the MSD to show how this works: ```{r fittedBeauchemin} # analytical formula beauchemin.msd <- function( x, M=60, t.free=2, t.pause=0.5, dim=2 ){ v.free <- sqrt( 6 * M * (t.free+t.pause) ) / t.free M <- (v.free^2 * t.free^2) / 6 / (t.free + t.pause) multiplier <- rep(1/3, length(x)) xg <- x[x