GIGWA Example

Khaled Al-Shamaa

2024-09-18

QBMS

This R package assists breeders in linking data systems with their analytic pipelines, a crucial step in digitizing breeding processes. It supports querying and retrieving phenotypic and genotypic data from systems like EBS, BMS, BreedBase, and GIGWA (using BrAPI calls). Extra helper functions support environmental data sources, including TerraClimate and FAO HWSDv2 soil database.

GIGWA

GIGWA is a web-based tool which provides an easy and intuitive way to explore large amounts of genotyping data by filtering the latter based not only on variant features, including functional annotations, but also on genotype patterns. The data storage relies on MongoDB, which offers good scalability perspectives. GIGWA can handle multiple databases and may be deployed in either single or multi-user mode. Finally, it provides a wide range of popular export formats.

BrAPI

The Breeding API (BrAPI) project is an effort to enable interoperability among plant breeding databases. BrAPI is a standardized RESTful web service API specification for communicating plant breeding data. This community driven standard is free to be used by anyone interested in plant breeding data management.

Example

# load the QBMS library
library(QBMS)

# The public GIGWA testing server required no authentication. If your GIGWA server 
# requires authentication, then make sure that no_auth parameter value is FALSE
# IMPORTENT NOTE: QBMS required GIGWA version 2.4.1 or higher
set_qbms_config("https://gigwa.southgreen.fr/gigwa/", 
                time_out = 300, engine = "gigwa", no_auth = TRUE)

# If login is required, then you can use your GIGWA account (interactive mode)
# or pass your GIGWA username and password as parameters (batch mode)
# login_gigwa()
# login_gigwa("gigwadmin", "nimda")

# list existing databases in the current GIGWA server
gigwa_list_dbs()

# select a database by name
gigwa_set_db("Sorghum-JGI_v1")

# list all projects in the selected database
gigwa_list_projects()

# select a project by name
gigwa_set_project("Nelson_et_al_2011")

# list all runs in the selected project
gigwa_list_runs()

# select a specific run by name
gigwa_set_run("run1")

# get a list of all samples in the selected run
samples <- gigwa_get_samples()

# show the first 6 individuals on the list of samples
head(samples)

# query the variants (e.g., SNPs markers) in the selected run 
# that match the given criteria:
# - max_missing: maximum missing ratio (by sample) [0-1] (default is 1 for 100%) 
# - min_maf: minimum Minor Allele Frequency (MAF) [0-1] (default is 0 for 0%) 
# - start: start position of region (zero-based, inclusive) (e.g., 19750802)
# - end: end position of region (zero-based, exclusive) (e.g., 19850125)
# - referenceName: reference sequence name  (e.g., '6H' in the Barley LI-AM)
# - samples: a list of a samples subset (default is NULL will retrieve for all samples) 
marker_matrix <- gigwa_get_variants(max_missing = 0.2, 
                                    min_maf = 0.05, 
                                    start = 100000,
                                    end = 500000,
                                    samples = c("ind1", "ind3", "ind7"))

# Data returns in data.frame format. The first 4 columns describe attributes of the SNP 
# - rs#: variant name
# - alleles: reference allele / alternative allele
# - chrom: chromosome name
# - pos: position in bp
# while the following columns describe the SNP value for a single sample line using 
# numerical coding 0, 1, and 2 for reference, heterozygous, alternative/minor alleles.
head(marker_matrix)

# get the metadata associated with the samples in the current active run
gigwa_set_db("DIVRICE_NB")
gigwa_set_project("refNB")
gigwa_set_run("03052022")

# get a list of all samples in the selected run
metadata <- gigwa_get_metadata()

View(metadata)

Enhanced Allele Matrix Retrieval

The following functions now utilize the new efficient BrAPI v2.1 /allelematrix calls, requiring version 2.6 of GIGWA or higher. This update significantly improves the QBMS allele matrix retrieval speed, increasing it by more than 10 times, as demonstrated by benchmark tests.

# Configure your GIGWA connection
set_qbms_config("https://gigwa.southgreen.fr/gigwa/", 
                time_out = 300, engine = "gigwa", no_auth = TRUE)

# Select a database by name
gigwa_set_db("Sorghum-JGI_v1")

# Select a project by name
gigwa_set_project("Nelson_et_al_2011")

# Select a specific run by name
gigwa_set_run("run1")

# Get the list of all samples in the selected project
germplasmNames <- gigwa_get_samples()

# Get the list of all sequences in the selected project
chroms <- gigwa_get_sequences()

### Get Variants Info (Geno Map) ###############################################

?gigwa_get_markers
geno_map <- gigwa_get_markers(start = 0,
                              end = 1234567,
                              # chrom = c("Sb01", "Sb02"),      # chroms[1:3]
                              )

### Get Marker Matrix ##########################################################

?gigwa_get_allelematrix
geno_data <- gigwa_get_allelematrix(start = 0,                  # default is 0
                                    end = 1234567,              # default is "" -> "ref:0-"
                                    snps = geno_map$`rs#`,      # optional
                                    # chrom = "Sb01",           # c("Sb01", "Sb07")
                                    # samples = germplasmNames, # gigwa_get_samples()
                                    # snps_pageSize = 10000,
                                    # samples_pageSize = 100,
                                    )