Title: The T-Rex selector for fast high-dimensional variable selection with FDR control
Description: It performs fast variable selection in large-scale high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level.
Paper: The package is based on the paper
J. Machkour, M. Muma, and D. P. Palomar, “The terminating-random experiments selector: Fast high-dimensional variable selection with false discovery rate control,” arXiv preprint arXiv:2110.06048, 2022. (https://doi.org/10.48550/arXiv.2110.06048)
Note: The T-Rex selector performs terminated-random experiments (T-Rex) using the T-LARS algorithm (R package) and fuses the selected active sets of all random experiments to obtain a final set of selected variables. The T-Rex selector provably controls the false discovery rate (FDR), i.e., the expected fraction of selected false positives among all selected variables, at the user-defined target level while maximizing the number of selected variables and, thereby, achieving a high true positive rate (TPR) (i.e., power). The T-Rex selector can be applied in various fields, such as genomics, financial engineering, or any other field that requires a fast and FDR-controlling variable/feature selection method for large-scale high-dimensional settings.
In the following sections, we show you how to install and use the package.
Before installing the ‘TRexSelector’ package, you need to install the required ‘tlars’ package. You can install the ‘tlars’ package from CRAN (stable version) or GitHub (developer version) with:
# Option 1: Install stable version from CRAN
install.packages("tlars")
# Option 2: install developer version from GitHub
install.packages("devtools")
::install_github("jasinmachkour/tlars") devtools
Then, you can install the ‘TRexSelector’ package from CRAN (stable version) or GitHub (developer version) with:
# Option 1: Install stable version from CRAN
install.packages("TRexSelector")
# Option 2: install developer version from GitHub
install.packages("devtools")
::install_github("jasinmachkour/TRexSelector") devtools
You can open the help pages with:
library(TRexSelector)
help(package = "TRexSelector")
?trex
?random_experiments
?lm_dummy
?add_dummies
?add_dummies_GVS
?FDP
?TPP# etc.
To cite the package ‘TRexSelector’ in publications use:
citation("TRexSelector")
This section illustrates the basic usage of the ‘TRexSelector’ package to perform FDR-controlled variable selection in large-scale high-dimensional settings based on the T-Rex selector.
library(TRexSelector)
# Setup
<- 75 # number of observations
n <- 150 # number of variables
p <- 3 # number of true active variables
num_act <- c(rep(1, times = num_act), rep(0, times = p - num_act)) # coefficient vector
beta <- which(beta > 0) # indices of true active variables
true_actives <- p # number of dummy predictors (also referred to as dummies)
num_dummies
# Generate Gaussian data
set.seed(123)
<- matrix(stats::rnorm(n * p), nrow = n, ncol = p)
X <- X %*% beta + stats::rnorm(n) y
# Seed
set.seed(1234)
# Numerical zero
<- .Machine$double.eps
eps
# Variable selection via T-Rex
<- trex(X = X, y = y, tFDR = 0.05, verbose = FALSE)
res <- which(res$selected_var > eps)
selected_var paste0("True active variables: ", paste(as.character(true_actives), collapse = ", "))
#> [1] "True active variables: 1, 2, 3"
paste0("Selected variables: ", paste(as.character(selected_var), collapse = ", "))
#> [1] "Selected variables: 1, 2, 3"
So, for a preset target FDR of 5%, the T-Rex selector has selected all true active variables and there are no false positives in this example.
Note that users have to choose the target FDR according to the requirements of their specific applications.
For more information and some examples, please check the GitHub-vignette.
T-Rex paper: https://doi.org/10.48550/arXiv.2110.06048
TRexSelector package (stable version): CRAN-TRexSelector.
TRexSelector package (developer version): GitHub-TRexSelector.
README file: GitHub-readme.
Vignette: GitHub-vignette.
tlars package: CRAN-tlars and GitHub-tlars.