Travis-CI Build Status CircleCI build status AppVeyor Build Status Project Status: Active – The project has reached a stable, usable state and is being actively developed. codecov GitHub Views

CRAN_Status_Badge Downloads Total Downloads

DOI bioRxiv status


Version 1.0.0

Simulate Expression Data from ‘igraph’ Networks

This package provides functions to develop simulated continuous data (e.g., gene expression) from a sigma covariance matrix derived from a graph structure in ‘igraph’ objects. Intended to extend ‘mvtnorm’ to take ‘igraph’ structures rather than sigma matrices as input. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. Here we present a versatile statistical framework to simulate correlated gene expression data from biological pathways, by sampling from a multivariate normal distribution derived from a graph structure. This package allows the simulation of biological pathways from a graph structure based on a statistical model of gene expression, such as simulation of expression profiles that of log-transformed and normalised data from microarray and RNA-Seq data. experiments.


Network analysis of molecular biological pathways is important for insights into biology and medical genetics. Gene expression profiles capture the regulatory state of a cell and can be used to analyse complex molecular states with genome-scale data. Biological pathways are more than simply sets of genes involved in functions, they are rich in information of relationships defined by pathway structure.

Methods to infer biological pathways and gene regulatory networks from gene expression data can be tested on simulated datasets using this framework. This also allows for pathway structures to be considered as a confounding variable when simulating gene expression data to test the performance of genomics analyses.

This package enable the generation of simulated gene expression datasets containing pathway relationships from a known underlying network. These simulated datasets can be used to evaluate various bioinformatics methodologies, including statistical and network inference procedures.

Network analysis techniques have an important role in understanding of biological pathways and interpretation of genomics studies. Modelling biological pathways allows the evaluation of gene regulatory network inference techniques (which so far rely on experimental validation or resampling). This technique also enables modelling datasets with correlated pathway-structures to assess whether other genomics analysis techniques perform as expected with the background of complex pathways.


To install the latest release from CRAN:


To install the stable release of this package from github:

# install.packages("devtools")
devtools::install_github("TomKellyGenetics/graphsim", ref = "master")

To get the development version of this package from github:

# install.packages("devtools")
devtools::install_github("TomKellyGenetics/graphsim", ref = "dev")


Please see the vignettes for demonstrations of this package on examples of simple simulated networks and the reactome pathway TGF-β receptor signaling activates SMADs (R-HSA-2173789). A manuscript with further details has been submitted for peer-review.


To cite the graphsim package in publications use:

S. Thomas Kelly and Michael A. Black (2020). graphsim: Simulate Expression data from iGraph networks R package version 1.0.0. doi:10.5281/zenodo.1313986

A BibTeX entry for LaTeX users is

    title = {{graphsim}: Simulate Expression data from {iGraph} networks},
    author = {S. Thomas Kelly and Michael A. Black},
    year = {2020},
    note = {R package version 1.0.0},
    url = {},
    doi = {10.5281/zenodo.1313986},

Please also acknowledge the manuscript describing use of this package once it is published. It is currently avaliable as a preprint.

@article {Kelly2020.03.02.972471,
    author = {Kelly, S Thomas and Black, Michael A},
    title = {graphsim: An R package for simulating gene expression data from graph structures of biological pathways},
    elocation-id = {2020.03.02.972471},
    year = {2020},
    doi = {10.1101/2020.03.02.972471},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {},
    eprint = {},
    journal = {bioRxiv}

Contributions and Bug Reports

Please submit issues on GitHub to report problems or suggest features. Pull requests to the dev branch on GitHub are also welcome to add features or correct problems. Please see the contributor guide for more details.