Using bubbleHeatmap to Plot Nightingale Metabolomics Data

library(bubbleHeatmap)
#> Loading required package: grid

Abstract

We present bubbleHeatmap, an R plotting package based on the grid system, which combines elements of a bubble plot and heatmap to conveniently display two numerical variables, (represented by color and size) grouped by categorical variables on the x and y axes. This is a useful alternative to a forest plot when the data can be grouped in two dimensions, such as predictors x outcomes. We also demonstrate the particular advantage of this plot type over a traditional forest plot for visualising the 225 metabolic measures produced by the automated, high-throughput NMR-based platform developed by Nightingale Health. Nightingale metabolomic profiles are already available for several large biobanks and global cohorts (including recently released data on 100K individuals in UK Biobank) and have been used in over 300 peer-reviewed publications. We therefore expect the figure template included in this package to be of ongoing general interest.

The bubbleHeatmap Plotting Function

The bubbleHeatmap function provided in this package returns a graphical object (grob) representing a plot consisting of a rectangular grid with squares containing colored “bubbles”. The color and size of the bubbles can be scaled according to the values of different variables which are input to the function as two numeric matrices with identical dimensions. The row and column names of the matrices can be added to the plot to label the grid. There are also various options to add plot and axis titles, legends, label a row or column of plots in a multi-plot figure, and to change the color scale and grid/bubble sizes. Legends can be built into the plot or drawn separately using bubbleHMLegends(). Additional edits to individual elements (including graphical parameters) can easily be made before the plot is drawn.

All plots are drawn in a viewport with a 5 x 5 layout grid and a given object is always positioned in the same cell. This allows multiple plots to be easily aligned for combining in a single figure. The positions of the elements in the layout grid (column, row) are: PlotTitle (3, 1), XTitle(3, 2), LeftLabelsTitle(2, 3), TopLabels(3:4, 3), YTitle(1, 4), LeftLabels(2, 4), PlotGrid(3, 4), RowBracket/RowTitle(4, 4), Legends (5, 4), ColBracket/ColTitle(3, 5).

#Simulate data
names <- list(paste0("leftLabels", 1:6), paste0("topLabels", 1:10))
colorMat <- matrix(rnorm(60), nrow=6, ncol=10, dimnames = names)
sizeMat <- matrix(abs(rnorm(60)), nrow=6, ncol=10, dimnames = names)

#Create sample plot tree containing all elements
tree <-  bubbleHeatmap(colorMat, sizeMat, treeName = "example",
             leftLabelsTitle = "leftLabelsTitle", showRowBracket = T,
             rowTitle = "rowTitle", showColBracket = T, colTitle="colTitle",
             plotTitle="plotTitle", xTitle="xTitle", yTitle="yTitle",
             legendTitles = c("legendTitles[1]", "legendTitles[2]"))
#> Warning: 
#> bubbleHeatmap must be drawn on a graphics device with gradient
#>             fill support (e.g. cairo_pdf or png(type = 'cairo')) in order to
#>             render the color legend correctly.
#> 
#>             This warning will be shown  once per session and may be disabled
#>             by setting options('bubbleLegends.device.warning' = FALSE)
#Draw plot
grid.newpage()
grid.draw(tree)

The Nightingale Plot Templates

Nightingale Metabolomics Platform

Nightingale Health is a Finnish company which has developed a fully automated platform for deriving over 200 quantitative metabolomics measures from a single blood serum sample using NMR technology, at a comparable cost to standard lipid clinical chemistry. These measures include concentration and composition of 14 lipoprotein subclasses, apolipoproteins, fatty acids, amino acids and glycolysis related metabolites. Over 1M samples have already been processed, including biobank participants, population cohorts and clinical studies, with capacity to process an additional 250K samples per year. Over 100K profiles of UK Biobank participants have recently been released, with data on the remaining individuals to follow. The wide availability of standardised Nightingale datasets in global cohorts facilitates collaborative research and is contributing to the identification of new biomarkers across a range of diseases.

CETP Dataset

The data included in this package represents associations between Nightingale metabolic measures and a genetic risk score (GRS) for Cholesterol Ester Transfer Protein (CETP) in the China Kadoorie Biobank (CKB), a prospective study of Chinese adults from 10 distinct areas of China. The full design of the CKB cohort and the methods and results of this study have been previously published.

CETP transfers esterified cholesterol from HDL to apolipoprotein B-containing lipoproteins in exchange for triglycerides. This process is a key component of the atheroprotective reverse cholesterol transport pathway, which modulates the return of excess cholesterol from peripheral cells such as macrophages to the liver, where it can be redistributed or excreted. Reduced CETP activity results in higher levels of HDL cholesterol (HDL-C), which has resulted in interest in CETP as a drug target due to the inverse correlation between HDL-C and risk of atherosclerotic disease. In this analysis, five CETP SNP variants were selected on the basis of previously reported associations with HDL cholesterol and CETP activity. Genotyping data for these variants and conventional lipid biochemistry measures were available for a subset of 17,854 CKB participants selected for a CVD case-control study. This data was used to generate a CETP GRS weighted by HDL-C association, derived internally with 100-fold cross-validation. The Nightingale Health metabolomics panel consisting of 225 metabolomics measures was available for 4657 of these individuals, and association with the GRS was assessed by linear regression with adjustment for age and sex, stratified by geographical region. In generation of the GRS and association of metabolomics, outcomes were standardized by rank inverse normal transformation, stratified by region, after adjustment for age and sex.

The sample dataset provided includes the estimate, standard error, p-value, and -log10(p-value) for the association of the 225 metabolomics traits with the CETP GRS, scaled to 10-mg/dL higher levels of HDL-C.

Workflow

Load Data

The ntngale225 and ntngale249 data frames contain the necessary plot groupings and row/column names to arrange a Nightingale results dataset into the 10 plots that make up the figure. User dataset should include a column with either the UK Biobank or China Kadoorie Biobank variable name to enable the data to be matched to the template using the “merge_template” function.

myData <- merge_template(cetp, "ckb_id")

One-step Figure

The metabFigure function wraps the formatData, multiPlotInput, bubbleHeatmapList and metabFigurePlot functions listed below, applying default settings to build the standard figure in one step. Alternatively, see below for the individual functions.

metabTree <- metabFigure(myData)

Format Data

The formatData function splits a dataset according to the value of plotGroup, and reshapes each one into two matrices defined by the values of rowName and colName, containing the values of colorValue and sizeValue, and orders the list and the rows and columns of each matrix. The Nightingale sorting orders are built-in and will override other settings when nightingale = TRUE.

gridData <- formatData(myData, colorValue="estimate", sizeValue = "negLog10P", 
                       nightingale = TRUE)

Prepare Input for plotting function (multiPlotInput)

multiPlotInput() returns a list of lists of the input arguments/settings for bubbleHeatmap(), allowing multiple plots to be generated at once. It generates a single set of legends based on the data across all plots. Settings for the Nightingale plots are built-in and are requested using nightingale = TRUE.

treeInput <- multiPlotInput(colorList=gridData$colorList, 
                            sizeList=gridData$sizeList, 
                            nightingale=TRUE, legendHeight=8)

BubbleHeatmapList

bubbleHeatmapList is a convenient wrapper for generating a list of plots in a single call. It requires a single argument of two lists, $trees, a list of argument lists to create plots using bubbleHeatmap(), and $legends, a list of arguments for (optionally) creating legends using bubbleHMLegends(). The output from multiPlotInput is formatted in this way and can be passed directly to bubbleHeatmapList().

treeList <- bubbleHeatmapList(treeInput)

Nightingale Figure Function

The metabFigurePlot function takes a list of plot trees and a legend tree, and combines and arranges them to produce a single tree representing the Nightingale plot figure style. Like the individual bubbleHeatmap trees, they can be edited, or drawn using grid.draw()

metabTree <- metabFigurePlot(treeList)
grid.newpage()
grid.draw(metabTree)