Semi-parametric GxEInteraction viaBayesian Variable Selection

Many complex diseases are known to be affected by the interactions between genetic variants and environmental exposures beyond the main genetic and environmental effects. Existing Bayesian methods for gene-environment (G×E) interaction studies are challenged by the high-dimensional nature of the study and the complexity of environmental influences. We have developed a novel and powerful semi-parametric Bayesian variable selection method that can accommodate linear and nonlinear G×E interactions simultaneously (Ren et al. (2019)). Furthermore, the proposed method can conduct structural identification by distinguishing nonlinear interactions from main effects only case within Bayesian framework. Spike-and-slab priors are incorporated on both individual and group level to shrink coefficients corresponding to irrelevant main and interaction effects to zero exactly. The Markov chain Monte Carlo algorithms of the proposed and alternative methods are efficiently implemented in C++.

- BVCfit() integrates five different models for G×E Bayesian variable selection.
- Generic functions BVSelection(), predict() and plot() make the workflow very simple (see ‘Examples’).
- Highly efficient c++ implementation for MCMC algorithm.

- To install from github, run these two lines of code in R

```
install.packages("devtools")
devtools::install_github("jrhub/spinBayes")
```

- Released versions of spinBayes are available on CRAN , and can be installed within R via

`install.packages("spinBayes")`

```
library(spinBayes)
data(gExp.L)
test = sample((1:nrow(X2)), floor(nrow(X2)/5))
spbayes=BVCfit(X2[-test,], Y2[-test,], Z2[-test,], E2[-test,], clin2[-test,])
spbayes
selected = BVSelection(spbayes)
selected
pred = predict(spbayes, X2[test,], Z2[test,], E2[test,], clin2[test,], Y2[test,])
pred$pmse
# c(pred$y.pred)
## plot the varying effects
plot(spbayes)
```

```
data(gExp.L)
test = sample((1:nrow(X2)), floor(nrow(X2)/5))
spbayes=BVCfit(X2[-test,], Y2[-test,], Z2[-test,], E2[-test,], clin2[-test,], structural=FALSE)
spbayes
selected = BVSelection(spbayes)
selected
pred = predict(spbayes, X2[test,], Z2[test,], E2[test,], clin2[test,], Y2[test,])
pred$pmse
# c(pred$y.pred)
```

```
data(gExp.L)
test = sample((1:nrow(X2)), floor(nrow(X2)/5))
spbayes=BVCfit(X2[-test,], Y2[-test,], Z2[-test,], E2[-test,], clin2[-test,], structural=TRUE, sparse=FALSE)
spbayes
selected = BVSelection(spbayes)
selected
pred = predict(spbayes, X2[test,], Z2[test,], E2[test,], clin2[test,], Y2[test,])
pred$pmse
# c(pred$y.pred)
```

- Added a generic function plot() for plotting identified varying effects.
- Updated the documentation.

This package provides implementation for methods proposed in

- Ren, J., Zhou, F., Li, X., Chen, Q., Zhang, H., Ma, S., Jiang, Y., Wu, C. (2019) Semi-parametric Bayesian variable selection for gene-environment interactions.
*Statistics in Medicine*39: 617– 638. https://doi.org/10.1002/sim.8434