Bioactivity APIs

Center for Computational Toxicology and Exposure

Introduction

In this vignette, CCTE Bioactivity APIs will be explored.

NOTE: Please see the introductory vignette for an overview of the ccdR package and initial set up instruction with API key storage.

Data for the Bioactivity APIs comes from ToxCast’s invitrodb.

US EPA’s Toxicity Forecaster (ToxCast) program makes in vitro medium- and high-throughput screening assay data publicly available for prioritization and hazard characterization of thousands of chemicals.

The ToxCast pipeline (tcpl) is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, InvitroDB. These assays comprise Tier 2-3 of the new Computational Toxicology Blueprint, and employ automated chemical screening technologies, to evaluate the effects of chemical exposure on living cells and biological macromolecules, such as proteins (Thomas et al., 2019). More information on the ToxCast program can be found at https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast.

This flexible analysis pipeline is capable of efficiently processing and storing large volumes of data. The diverse data, received in heterogeneous formats from numerous vendors, are transformed to a standard computable format and loaded into the tcpl database by vendor-specific R scripts. Once data is loaded into the database, ToxCast utilizes generalized processing functions provided in this package to process, normalize, model, qualify, and visualize the data.

Figure 1: Conceptual overview of the ToxCast Pipeline functionality

Functions

Several ccdR functions are used to access the CCTE Bioactivity API data.

Bioactivity Assay Resource

Specific assays may be searched as well as all available assays that have data.

Get annotation by aeid

get_annotation_by_aeid() retrieves annotation for a specific assay endpoint id (aeid).

res_dt <- get_annotation_by_aeid(AEID = "891", API_key = apikey, Server = url)
# optionally perform this unnest, apply names_repair = "unique" to give a unique column name
# note - the gene column may be an array of multiple genes rather than just one, meaning this step may not work
#res_dt <- res_dt |> tidyr::unnest_wider(col = c("citation", "gene", "assayList"), names_repair = "unique")

Get all assay annotations

get_all_assays() retrieves all annotations for all assays available. Optionally, the user can unnest “citation”, “gene”, “assayList” wider so each element has its own column.

res_dt <- get_all_assays(API_key = apikey, Server = url)
# optionally perform the following unnest, apply names_repair = "unique" to give a unique column name
# note - the gene column may be an array of multiple genes rather than just one, meaning this step may not work
#res_dt <- res_dt |> tidyr::unnest_wider(col = c("citation", "gene", "assayList"), names_repair = "unique")

Bioactivity Data Resource

There are several resources for retrieving bioactivity data associated with a variety of identifier types (e.g., DTXSID, aeid) that are available to the user.

Get summary data

get_bioactivity_summary() retrieves a summary of the number of active hits compared to the total number tested for both multiple and single concentration by aeid.

res_dt <- get_bioactivity_summary(AEID = "891", API_key = apikey, Server = url)

Get data

get_bioactivity_details() can retrieve all available multiple concentration data by assay endpoint id (aeid), sample id (spid), Level 4 ID (m4id), or chemical DTXSID. Examples for each are provided below:

By spid
res_dt <- get_bioactivity_details(SPID = "TP0001055F12", API_key = apikey, Server = paste0(url, "/data"))
By m4id
res_dt <- get_bioactivity_details(m4id = 739695, API_key = apikey, Server = paste0(url, "/data"))
By DTXSID
res_dt <- get_bioactivity_details(DTXSID = "DTXSID7020182", API_key = apikey, Server = paste0(url, "/data"))
By aeid
res_dt <- get_bioactivity_details(AEID = "891", API_key = apikey, Server = paste0(url, "/data"))

Conclusion

In this vignette, a variety of functions that access different types of data found in the Bioactivity endpoints of the CCTE APIs were listed. We encourage the reader to explore the data accessible through these endpoints work with it to get a better understanding of what data is available. Additional endpoints and corresponding functions exist and we encourage the user to explore these.