---
title: "Reading, Converting, and Filtering Tracking Data"
author: "Inge Wortel"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Reading, Converting, and Filtering Tracking Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(dpi=72)
```

# Introduction

The package implements a special data structure, the tracks object, to allow rapid computation of different analysis metrics on cell tracks. This tutorial will show how to load tracking data, how to deal with tracking objects, how to filter and subset data, and how to convert between track objects and other datastructures. 

# 1 Reading in data

First load the package:

```{r pack, warning = FALSE, message = FALSE}
library( celltrackR )
library( ggplot2 )
```

```{r, echo = FALSE}
# Save current par() settings
oldpar <- par( no.readonly =TRUE )
```


### 1.1 Input data format

Tracking data is usually stored as a table, with columns indicating the cellid, time, and coordinates of each measured point. Here we have an example in the file "t-cells.txt", which we can read in as a normal dataframe:

```{r}
d <- read.table( system.file("extdata", "t-cells.txt", package="celltrackR" ) )
str(d)
head(d)
```

The result is a normal dataframe, where here we have an index for the time point in the first column, cell id in the second column, the actual time (in seconds) in the third column, and $(x,y,z)$ coordinates in columns 4:6. 


# 1.2 Directly reading in data as a tracks object

While we can read tracks as a dataframe by using R's basic function `read.table()`, the function `read.tracks.csv()` allows to read in data directly as a *tracks object*, a special data structure designed for efficient handling of tracking data.

Applying this to the same file as before:

```{r fig.width=6, fig.height = 6}
t <- read.tracks.csv( system.file("extdata", "t-cells.txt", package="celltrackR" ), 
		      header = FALSE, 
                      id.column = 2, time.column = 3, pos.columns = 4:6 )
plot(t)
```

where we have to specify `header=FALSE` because the file does not contain any column headers. Note that `read.tracks.csv()` also works with non-csv text files, as long as the data is organised with separate columns for track id, time index, and coordinates. See the documentation at `?read.tracks.csv` for details.

These tracks are of T cells imaged in the cervical lymph node of a healthy mouse; they are the raw data from which the `TCells` dataset in the package was obtained. See the [vignette on preprocessing the package datasets](./data-QC.html) for details.


# 2 The tracks object


### 2.1 The tracks object data structure

The *tracks object* is a special datastructure that allows efficient handling of track datasets. As an example, we will use the tracks loaded in the previous section.

A tracks object has the form of a list, where each element of the list is a track of a single cell:

```{r}
# Structure of the TCells object
str( t, list.len = 3 )

# This object is both a list and a "tracks" object
is.list( t )
is.tracks( t )

# The first element is the track of the first cell in the data:
head( t[[1]] )
```

Each track in the tracks object is a matrix with coordinates at different timepoints for each cell. The cell id is no longer a column in this matrix, as tracks belonging to different cells are stored in different elements of the tracks object list. 

### 2.2 Subsetting data

Note that we can subset the track matrix of an individual track using the double square brackets:

```{r}
# Get the first track
t1 <- t[[1]]
str(t1)

# This is no longer a tracks object, but a matrix
is.tracks( t1 )
is.matrix( t1 )
```

If we now want to plot this track, the plotting method for tracks will not work because this is not recognized as a tracks object. We can use the frunction `wrapTrack()` to "pack" this matrix back into a tracks object:

```{r fig.width =7}
par( mfrow=c(1,2) )
plot( t1, main = "Plotting matrix directly" )
plot( wrapTrack( t1 ), main = "After using wrapTrack()" )
```

Note that we can also achieve this by subsetting with single instead of double brackets:

```{r}
# Get the first track
t1b <- t[1]
str(t1b)

# This remains a track object
is.tracks( t1b )
```

In the same way, we can also subset multiple tracks at once

```{r}
# Get the first and the third track
t13 <- t[c(1,3)]
str(t13)
```

Note that the track ids are strings that do not always correspond to the index of the track in the dataset. If we want the ones with ids 1 and 3, we can subset using the track name as a character string:

```{r}
# Get tracks with ids 1 and 3
t13b <- t[c("1","3")]
str(t13b)
```

### 2.3 Using tracks objects in combination with R's lapply and sapply

Because tracks objects are lists, we can make use of R's `lapply()` and `sapply()` functions to compute metrics or manipulate tracks efficiently.

For example, if we want to compute the speed of each track, we simply use:

```{r}
speeds <- sapply( t, speed )
head(speeds)
```

Note that `sapply()` applies the `speed()` function to each *matrix* in the track list (analogous to subsetting with double brackets). Thus, the `speed()` function sees an individual track matrix, not a tracks object.

Or we can use `lapply()` to manipulate each track in the dataset with some custom function, keeping separate tracks as separate list elements. For example, suppose we wish to remove all data after a given timepoint:

```{r}
# Function to remove all data after given timepoint
# x must be a single track matrix, which is what this function will
# receive from lapply
removeAfterT <- function( x, time.cutoff ){
  
  # Filter out later timepoints
  x2 <- x[ x[,"t"] <= time.cutoff, ]
  
  # Return the new matrix, or NULL if there are no timepoints before the cutoff
  if( nrow(x2) == 0 ){
    return(NULL)
  } else {
    return(x2)
  }
}

# Call function on each track using lapply
filtered.t <- lapply( t, function(x) removeAfterT( x, 200 ) )

# Remove any tracks where NULL was returned
filtered.t <- filtered.t[ !sapply( filtered.t, is.null )]
```

Note that `lapply()` returns list but not a tracks object:

```{r}
str(filtered.t, list.len = 3 )
is.list( filtered.t )
is.tracks( filtered.t )
```

We can fix this by calling `as.tracks()`.

```{r}
filtered.t <- as.tracks( filtered.t )
is.tracks( filtered.t )

str(filtered.t, list.len = 1)
```

We now have a new tracks object, which contains only tracks that had coordinates at $t<200$.

### 2.4 Built-in filtering/subsetting functions

The package contains several built-in functions to filter and subset tracks.

The function `filterTracks()` can be used to select tracks with a certain property. For example, to select all tracks with at least 15 steps (16 datapoints):

```{r}
# The filtering function must return TRUE or FALSE for each track given to it
my.filter <- function(x){
  return( nrow(x) > 15 )
}

# Filter with this function using filterTracks
long.tracks <- filterTracks( my.filter, t )

# check the minimum track length; # steps = number of coordinates minus 1
min( sapply( long.tracks, nrow ) - 1 )
```

The function `selectTracks()` selects tracks based on upper and lower bounds of a certain measure. For example, we can get the fastest half of the T cells:

```{r fig.width = 7, fig.height = 4.5 }

# Filter with this function using filterTracks
median.speed <- median( sapply( t, speed ) )
fast.tracks <- selectTracks( t, speed, median.speed, Inf )

# these should have a higher mean speed
c( "all tracks" = mean( sapply( t, speed ) ),
   "fastest half" = mean( sapply( fast.tracks, speed ) ) )
```

Another option is to filter not tracks, but timepoints within those tracks. Using the function `subsample()`, we can adjust the time resolution of the data by keeping e.g. only every $k^{th}$ timepoint:

```{r, fig.width = 7, fig.height = 3.5 }
# Lower resolution
lower.res <- subsample( t, k = 2 )

# Plot the result; plot just one track to see the result more clearly
par(mfrow=c(1,2))
plot( t[1], main = "Original data")
plot( lower.res[1], main = "Lower resolution" )
```

### 2.5 Extracting subtracks

The package also contains functions to extract parts of tracks. For example, use `subtracks()` to extract subtracks of a given length:

```{r}
subtrack.nsteps <- 2
t.2steps <- subtracks( t, subtrack.nsteps )
str( t.2steps, list.len = 3 )
```

Note that these subtracks overlap:

```{r}
# Last step of the first subtrack and first step of the second are equal
t.2steps[c(1,2)]
```

We can prevent this by adjusting the `overlap` argument to 0, or even to negative values so that space is left between the subtracks:

```{r}
t.2steps.b <- subtracks( t, subtrack.nsteps, overlap = 0 )

# No longer any overlap
t.2steps.b[c(1,2)]
```

An alternative to `subtracks()` is `prefixes()`, which returns only the first subtrack of a given length from each track:

```{r}
t.prefixes <- prefixes( t, subtrack.nsteps )

# these subtracks come from different cells
t.prefixes[c(1,2)]

```

If we want to extract subtracks starting at a specific timepoint, use `subtracksByTime()`:

```{r}
# Check which timepoints occur in the dataset
tp <- timePoints(t)
tp

# Extract all subtracks starting from the third timepoint
t.sbytime <- subtracksByTime( t, tp[3], subtrack.nsteps )

t.sbytime[c(1,2)]
```


# 3 Converting between tracks objects and other data structures

We can convert between tracks, regular R lists, and dataframes using `as.tracks()`, `as.list()`, or `as.data.frame()`:

```{r}
# Original tracks object
str( t, list.len = 3 )

# Converted to dataframe
t.df <- as.data.frame(t)
str( t.df )

# Converted to list (note class at the bottom)
t.list <- as.list(t)
str( t.list, list.len = 3 )

# Convert list back to tracks
str( as.tracks( t.list ), list.len = 3 )

# Convert dataframe to tracks
str( as.tracks( t.df ), list.len = 3 )

```

Note that the method `as.tracks.data.frame()` contains arguments `id.column`, `time.column`, and `pos.columns` to specify where information is stored, just like `read.tracks.csv`.

For help, see `?as.list.tracks`, `?as.data.frame.tracks`, `?as.tracks.data.frame`, or `as.tracks.list`.

```{r, echo = FALSE}
# Reset par() settings
par(oldpar)
```