[BioC] Pathview+GAGE workflow on RNA-seq data pathway analysis and visualization (was: Pathview published in Bioinformatics)
Luo Weijun
luo_weijun at yahoo.com
Wed Jul 24 19:09:57 CEST 2013
Hi Krishna,
Yes, pathview is definitely useful for RNA-seq data analysis. Actually pathview can be used to visualize a variety of user data as long as they can be mapped to pathways.
I am not sure about cufflinks output format. But below is an example workflow on RNA-seq data using tophat + Bioconductor facilities. The raw reads data were downloaded from http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1147/, and then mapped using tophat. This is the example data used in the RNA-seq section of the R/Bioconductor course at http://www.bioconductor.org/help/course-materials/2013/SeattleMay2013/. They worked with reads mapped to chromosome 14 only, here I worked with all reads/genes for a practical pathway analysis. Attached PDF file includes a few example pathview output graphs from this analysis.
This is just a representative and concise workflow on pathway analysis and visualization with RNA-seq data. There are lots of issues like RNA-seq data quality assessment and pre-processing etc are not covered or detailed here. HTH.
Weijun Luo
##package installation within R if you haven't done so, you will need to work with R 3.0 or newer version
source("http://bioconductor.org/biocLite.R")
biocLite(c("pathview", "gage", "Rsamtools", "TxDb.Hsapiens.UCSC.hg19.knownGene")
##extract exon regions by gene (i.e. Annotation of known gene models)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
exByGn <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
##count the reads mapped to genes/exons by tophat
#used the files "accepted_hits.bam" in tophat output directory, but you need to rename them follow your sample names
library(Rsamtools)
fls <- list.files("/path/to/your/tophat_out/", pattern="bam$", full.names =T)
bamfls <- BamFileList(fls)
flag <- scanBamFlag(isNotPrimaryRead=FALSE, isProperPair=TRUE)
param <- ScanBamParam(flag=flag)
gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
ignore.strand=TRUE, single.end=FALSE, param=param)
###extract, normalize and process the read counts into log2 expression matrix
expdat=assay(gnCnt)
libsizes=colSums(expdat)
size.factor=libsizes/exp(mean(log(libsizes)))
expdat.norm=t(t(expdat)/size.factor)
dim(expdat.norm)
sel.rn=rowSums(expdat.norm) != 0
expdat.norm=expdat.norm[sel.rn,]
dim(expdat.norm)
range(expdat.norm)
#log2 transformation
idx=expdat.norm==0
expdat.norm[expdat.norm==0]=0.1
range(expdat.norm)
expdat.norm=log2(expdat.norm)
range(expdat.norm)
###Pathway analysis using GAGE followed by visualization with Pathview
library(gage)
ref.idx=5:8
samp.idx=1:4
data(kegg.gs)
expdat.kegg.p <- gage(expdat.norm, gsets = kegg.gs,
ref = ref.idx, samp = samp.idx, compare ="unpaired")
#differential expression: log2 ratio or fold change
expdat.d= expdat.norm[, samp.idx]-rowMeans(expdat.norm[, ref.idx])
#up-regulated pathways visualized by pathview
sel <- expdat.kegg.p$greater[, "q.val"] < 0.1 & !is.na(expdat.kegg.p$greater[,
"q.val"])
path.ids <- rownames(expdat.kegg.p$greater)[sel]
path.ids2 <- substr(path.ids, 1, 8)
library(pathview)
pv.out.list <- sapply(path.ids2, function(pid) pathview(gene.data = expdat.d[,
1:2], pathway.id = pid, species = "hsa"))
#down-regulated pathways visualized by pathview
sel.l <- expdat.kegg.p$less[, "q.val"] < 0.1 & !is.na(expdat.kegg.p$less[,
"q.val"])
path.ids.l <- rownames(expdat.kegg.p$less)[sel.l]
path.ids.l2 <- substr(path.ids.l, 1, 8)
pv.out.list.l <- sapply(path.ids.l2, function(pid) pathview(gene.data = expdat.d[,
1:2], pathway.id = pid, species = "hsa"))
--------------------------------------------
On Wed, 7/24/13, km <srikrishnamohan at gmail.com> wrote:
Subject: Re: [BioC] Pathview published in Bioinformatics
Cc: "Ed" <edforum at gmail.com>, "BioC Help" <bioconductor at r-project.org>
Date: Wednesday, July 24, 2013, 12:39 AM
Dear All,
Is this package useful to do pathway analysis with
RNA-seq based expression data - say for eg. gene
expression analysis results from tophat/cuflinks pipeline ?
Regards,
Krishna Mohan
On Fri, Jun 28, 2013
wrote:
A little more info. You
may download the KGML data file for Propanoate metabolism
pathway. Gene nodes
correspond to node entries with type="gene", while
white nodes
correspond to entries with type="ortholog",
ortholog genes which are
not mapped for a particular KEGG species.
These ortholog gene
nodes are still show are in both KEGG and Graphviz view,
except no gene data
can be mapped when calling pathview function with species =
"hsa" (or
other KEGG species). However, gene data can still be mapped
to these nodes when
species = "ko". This is relevant for ortholog gene
data or
metagenomic data.
HTH.
----- Original Message -----
To: Ed <edforum at gmail.com>
Cc: BioC Help <bioconductor at r-project.org>
Sent: Thursday, June 27, 2013 6:32 PM
Subject: Re: [BioC] Pathview published in Bioinformatics
Yes, you can replace
get gene symbols instead of EC numbers by setting
kegg.native = T, same.layer =
F for KEGG view. However, only those enzyme nodes/genes
which are present in
the KGML data file. These nodes are marked in green in
original KEGG graph for
particular species, for example Propanoate
metabolism - Homo sapiens:
http://www.genome.jp/kegg-bin/show_pathway?map=hsa00640
In fact for Graphviz
view (kegg.native = F), the pathway graph are also limited
to genes which are
presented in KGML data file. Hence you may see some
inconsistence for metabolic
pathway graphs between KEGG view and Graphviz view.
Unfortunately, there is
little we are limited by the KGML source data files.
________________________________
From: Ed <edforum at gmail.com>
Sent: Thursday, June 27, 2013 12:04 AM
Subject: Re: [BioC] Pathview published in Bioinformatics
so if I understand and tried correctly,
I cannot simply replace the EC in the graph (like Propanoate
metabolism) with gene symbol. If I need to do so, I need to
set kegg.native=F, which means it will change the graph
structure of Propanoate Metoblism. Am I right?
wrote:
Hi Nick,
>Yes, you can get
gene symbol instead of EC (or original KEGG node labels).
If you want a native KEGG view, set kegg.native
= T, same.layer = F when you call pathview function.
Otherwise, simply set set kegg.native
= F for a Graphviz view. As example outputs, you may compare
the gene symbols
in Figure 1b and Figure 2 vs Figure 1a in the vignette.
HTH.Weijun
>
>
>________________________________
> From: Ed <edforum at gmail.com>
>Sent: Tuesday, June 25, 2013 9:10 PM
>Subject: Re: [BioC] Pathview published in
Bioinformatics
>
>
>
>Hi Dr. Luo,
>
>
[[elided Yahoo spam]]
>
>
>I am just wondering if you can change the EC into gene
symbol for the enzyme in KEGG pathway?
>
>
>Thanks,
>
>
>Nick
>
>
>
>
>
>
>
wrote:
>
>BioC List,
>>The pathview package was recently published in
>>Bioinformatics:
>>http://bioinformatics.oxfordjournals.org/content/early/2013/06/11/bioinformatics.btt285.full
>>
>>Pathview is an R/Bioconductor package for pathway
based data
>>integration and visualization. It maps and renders a
wide variety of biological
>>data on relevant pathway graphs.
>>
>>The package is available through Bioconductor and
R-Forge:
>>http://bioconductor.org/packages/release/bioc/html/pathview.html
>>http://pathview.r-forge.r-project.org/
>>Please try it out and let me know if you have any
[[elided Yahoo spam]]
>>Weijun Luo
>>
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at r-project.org
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list