This vignette illustrates the usage of the `SNPknock`

package for creating knockoffs of discrete Markov chains and hidden Markov models (Sesia, Sabatti, and Candès 2019). For simplicity, we will use synthetic data.

The `SNPknock`

package also provides a simple interface to the genotype imputation software `fastPhase`

, which can be used to fit hidden Markov models for genotype data. Since `fastPhase`

is not available as an R package, this particular functionality of `SNPknock`

cannot be demonstrated here. A tutorial showing how to use a combination of `SNPknock`

and `fastPhase`

to create knockoffs of genotype data can be found here.

First, we verify that the `SNPknock`

can be loaded.

We define a Markov chain model with 50 variables, each taking one of 5 possible values. We specify a uniform marginal distribution for the first variable in the chain and create 49 transition matrices with randomly sampled entries.

```
p=50; # Number of variables in the model
K=5; # Number of possible states for each variable
# Marginal distribution for the first variable
pInit = rep(1/K,K)
# Create p-1 transition matrices
Q = array(stats::runif((p-1)*K*K),c(p-1,K,K))
for(j in 1:(p-1)) {
Q[j,,] = Q[j,,] + diag(rep(1,K))
Q[j,,] = Q[j,,] / rowSums(Q[j,,])
}
```

We can sample 100 independent observations of this Markov chain using the `SNPknock`

package.

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0 0 3 2 3 3 2 0 0 2
## [2,] 3 3 3 4 2 3 1 2 3 2
## [3,] 3 1 1 1 3 3 1 3 2 3
## [4,] 3 0 3 3 3 0 1 1 1 2
## [5,] 4 1 1 4 3 3 2 2 2 3
```

Above, each row of `X`

contains an independent realization of the Markov chain.

A knockoff copy of `X`

can be sampled as follows.

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 3 3 3 3 3 3 2 0 0 2
## [2,] 3 3 3 3 0 0 2 3 2 2
## [3,] 0 1 1 2 3 1 1 4 3 2
## [4,] 3 3 3 4 0 1 1 1 1 1
## [5,] 0 0 1 2 3 3 2 2 4 2
```

If you want to see how to use `SNPknock`

to create knockoffs of genotype data, see the genotypes vignette.

If you want to learn about `SNPknock`

for large genome-wide association studies (Sesia et al. 2019), see https://msesia.github.io/knockoffzoom/

Sesia, M., E. Katsevich, S. Bates, E. Candès, and C. Sabatti. 2019. “Multi-Resolution Localization of Causal Variants Across the Genome.” *bioRxiv*. Cold Spring Harbor Laboratory. https://doi.org/10.1101/631390.

Sesia, M., C. Sabatti, and E. J. Candès. 2019. “Gene Hunting with Hidden Markov Model Knockoffs.” *Biometrika* 106:1–18. https://doi.org/10.1093/biomet/asy033.