MetaIntegrator Real Example on GEO Data

Winston Haynes

2020-02-25

1. Get Public Gene Expression Data

Download Data

For this example, we will work through a small gene expression meta-analysis of systemic lupus erythematosus (SLE). We have identified public datasets that we will download from GEO for this analysis.

Label samples

All samples need to be assigned labels in the $class vector, 1 for ‘disease’ or 0 for ‘control’.

2. Run Meta-Analysis

sleMetaAnalysis <- runMetaAnalysis(sleData, runLeaveOneOutAnalysis = F, maxCores = 1)

3. Identify Gene Signature

Filter Genes

Set up criteria to filter genes for whether or not they will be included in the disease signature.

Calculate Meta Score

Once you have identified a gene signature, you can calculate a score for each sample based on the geometric mean of the up-regulated genes minus the geometric mean of the down-regulated genes. This score will be elevated in SLE patients compared to healthy controls.

This score can now be used to examine the results. Most functions call this score in the background.

4. Examine Results

Visualize gene effect sizes

We can visualize the effect sizes for all genes in the signature.

MetaScore Classification Performance

Receiver operating characteristic (ROC) curves and precision-recall (PRC) curves can be used to demonstrate the classification performance of the MetaScore.

Multiple ROC Curves

Draw multiple ROC curves.

Draw multiple ROC curves with a summary ROC curve that represents an overall ROC estimate.

Draw multiple ROC curves with a pooled ROC curve that represents a moving average ROC.

Understand Sample Phenotypes

Violin plot

With a violin plot, you can drill into subgroups within datasets to observe differences between populations, with the individual samples called out.

Forest plot

Forest plots allow us to examine individual genes across studies.

Advanced Analyses

immunoStates Deconvolution

immunoStates is a tool for estimating immune cell proportions based on gene expression profiles. immunoStateMeta() in MetaIntegrator allow you to estimate cell proportions, then use these cell proportions as input for a downstream meta-analysis (in place of genes).

immunoStates can also correct the underlying gene expression data for differences in cell proportions

LINCS tools

LINCS tools allows users to compare disease gene expression signatures to perturbation expression signatures identified by the LINCS consortium. lincsTools() will generate a broadly useful report of many different classes of molecules. The call to lincsCorrelate(), below, is one particular example of looking for a drug with a gene expression profile that reverses the SLE profile. Note that this requires downloading a significant amount of data, so the first execution will be slow.

Impute sex

Based on known marker genes, impute sex of samples. This can be useful for identifying sample labeling errors.

COCONUT

COCONUT is a separate R package for correcting batch effects to merge multiple datasets into a single dataset. This is a wrapper function to call COCONUT on a MetaIntegrator object.

Pathway Analysis

Pathway analysis is commonly performed to provide biological interpretation for experiments. This is a wrapper function for deapathways, one R package for performing pathway analysis.
NOTE: This functionality will be added in future updates to MetaIntegrator