[BioC] Pathview+GAGE workflow on RNA-seq data pathway analysis and visualization (was: Pathview published in Bioinformatics)

Luo Weijun luo_weijun at yahoo.com
Wed Jul 24 19:09:57 CEST 2013


Hi Krishna,
Yes, pathview is definitely useful for RNA-seq data analysis. Actually pathview can be used to visualize a variety of user data as long as they can be mapped to pathways.

I am not sure about cufflinks output format. But below is an example workflow on RNA-seq data using tophat + Bioconductor facilities. The raw reads data were downloaded from http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1147/, and then mapped using tophat. This is the example data used in the RNA-seq section of the R/Bioconductor course at http://www.bioconductor.org/help/course-materials/2013/SeattleMay2013/. They worked with reads mapped to chromosome 14 only, here I worked with all reads/genes for a practical pathway analysis. Attached PDF file includes a few example pathview output graphs from this analysis.

This is just a representative and concise workflow on pathway analysis and visualization with RNA-seq data. There are lots of issues like RNA-seq data quality assessment and pre-processing etc are not covered or detailed here. HTH.
Weijun Luo

##package installation within R if you haven't done so, you will need to work with R 3.0 or newer version 
source("http://bioconductor.org/biocLite.R")
biocLite(c("pathview", "gage", "Rsamtools", "TxDb.Hsapiens.UCSC.hg19.knownGene") 

##extract exon regions by gene (i.e. Annotation of known gene models)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
exByGn <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")

##count the reads mapped to genes/exons by tophat 
#used the files "accepted_hits.bam" in tophat output directory, but you need to rename them follow your sample names
library(Rsamtools)
fls <- list.files("/path/to/your/tophat_out/", pattern="bam$", full.names =T)
bamfls <- BamFileList(fls)
flag <- scanBamFlag(isNotPrimaryRead=FALSE, isProperPair=TRUE)
param <- ScanBamParam(flag=flag)
gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
    ignore.strand=TRUE, single.end=FALSE, param=param)

###extract, normalize and process the read counts into log2 expression matrix
expdat=assay(gnCnt)
libsizes=colSums(expdat)
size.factor=libsizes/exp(mean(log(libsizes)))
expdat.norm=t(t(expdat)/size.factor)
dim(expdat.norm)
sel.rn=rowSums(expdat.norm) != 0
expdat.norm=expdat.norm[sel.rn,]
dim(expdat.norm)
range(expdat.norm)
#log2 transformation
idx=expdat.norm==0
expdat.norm[expdat.norm==0]=0.1
range(expdat.norm)
expdat.norm=log2(expdat.norm)
range(expdat.norm)

###Pathway analysis using GAGE followed by visualization with Pathview
library(gage)
ref.idx=5:8
samp.idx=1:4
data(kegg.gs)
expdat.kegg.p <- gage(expdat.norm, gsets = kegg.gs, 
    ref = ref.idx, samp = samp.idx, compare ="unpaired")
#differential expression: log2 ratio or fold change
expdat.d= expdat.norm[, samp.idx]-rowMeans(expdat.norm[, ref.idx])
#up-regulated pathways visualized by pathview
sel <- expdat.kegg.p$greater[, "q.val"] < 0.1 & !is.na(expdat.kegg.p$greater[, 
    "q.val"])
path.ids <- rownames(expdat.kegg.p$greater)[sel]
path.ids2 <- substr(path.ids, 1, 8)
library(pathview)
pv.out.list <- sapply(path.ids2, function(pid) pathview(gene.data = expdat.d[, 
    1:2], pathway.id = pid, species = "hsa")) 
#down-regulated pathways visualized by pathview
sel.l <- expdat.kegg.p$less[, "q.val"] < 0.1 & !is.na(expdat.kegg.p$less[, 
    "q.val"])
path.ids.l <- rownames(expdat.kegg.p$less)[sel.l]
path.ids.l2 <- substr(path.ids.l, 1, 8)
pv.out.list.l <- sapply(path.ids.l2, function(pid) pathview(gene.data = expdat.d[, 
1:2], pathway.id = pid, species = "hsa"))


--------------------------------------------
On Wed, 7/24/13, km <srikrishnamohan at gmail.com> wrote:

 Subject: Re: [BioC] Pathview published in Bioinformatics

 Cc: "Ed" <edforum at gmail.com>, "BioC Help" <bioconductor at r-project.org>
 Date: Wednesday, July 24, 2013, 12:39 AM

 Dear All,

 Is this package useful to do pathway analysis  with
 RNA-seq  based expression  data - say for eg. gene
 expression analysis results from tophat/cuflinks pipeline ?


 Regards,
 Krishna Mohan


 On Fri, Jun 28, 2013

 wrote:





 A little more info. You

 may download the KGML data file for Propanoate metabolism
 pathway. Gene nodes

 correspond to node entries with type="gene", while
 white nodes

 correspond to entries with type="ortholog",
 ortholog genes which are

 not mapped for a particular KEGG species.

 These ortholog gene

 nodes are still show are in both KEGG and Graphviz view,
 except no gene data

 can be mapped when calling pathview function with species =
 "hsa" (or

 other KEGG species). However, gene data can still be mapped
 to these nodes when

 species = "ko". This is relevant for ortholog gene
 data or

 metagenomic data.

 HTH.





 ----- Original Message -----



 To: Ed <edforum at gmail.com>

 Cc: BioC Help <bioconductor at r-project.org>

 Sent: Thursday, June 27, 2013 6:32 PM

 Subject: Re: [BioC] Pathview published in Bioinformatics



 Yes, you can replace

 get gene symbols instead of EC numbers by setting
 kegg.native = T, same.layer =

 F for KEGG view. However, only those enzyme nodes/genes
 which are present in

 the KGML data file. These nodes are marked in green in
 original KEGG graph for

 particular species, for example Propanoate

 metabolism - Homo sapiens:

 http://www.genome.jp/kegg-bin/show_pathway?map=hsa00640

 In fact for Graphviz

 view (kegg.native = F), the pathway graph are also limited
 to genes which are

 presented in KGML data file. Hence you may see some
 inconsistence for metabolic

 pathway graphs between KEGG view and Graphviz view.
 Unfortunately, there is

 little we are limited by the KGML source data files.





 ________________________________

 From: Ed <edforum at gmail.com>



 Sent: Thursday, June 27, 2013 12:04 AM

 Subject: Re: [BioC] Pathview published in Bioinformatics







 so if I understand and tried correctly,



 I cannot simply replace the EC in the graph (like Propanoate
 metabolism) with gene symbol. If I need to do so, I need to
 set kegg.native=F, which means it will change the graph
 structure of Propanoate Metoblism. Am I right?









 wrote:



 Hi Nick,

 >Yes, you can get

 gene symbol instead of EC (or original KEGG node labels).
  If you want a native KEGG view, set kegg.native

 = T, same.layer = F when you call pathview function.
 Otherwise, simply set set kegg.native

 = F for a Graphviz view. As example outputs, you may compare
 the gene symbols

 in Figure 1b and Figure 2 vs Figure 1a in the vignette.
 HTH.Weijun

 >

 >

 >________________________________

 > From: Ed <edforum at gmail.com>



 >Sent: Tuesday, June 25, 2013 9:10 PM

 >Subject: Re: [BioC] Pathview published in
 Bioinformatics

 >

 >

 >

 >Hi Dr. Luo,

 >

 >

[[elided Yahoo spam]]

 >

 >

 >I am just wondering if you can change the EC into gene
 symbol for the enzyme in KEGG pathway?

 >

 >

 >Thanks,

 >

 >

 >Nick

 >

 >

 >

 >

 >

 >

 >


 wrote:

 >

 >BioC List,

 >>The pathview package was recently published in

 >>Bioinformatics:

 >>http://bioinformatics.oxfordjournals.org/content/early/2013/06/11/bioinformatics.btt285.full


 >> 

 >>Pathview is an R/Bioconductor package for pathway
 based data

 >>integration and visualization. It maps and renders a
 wide variety of biological

 >>data on relevant pathway graphs.

 >> 

 >>The package is available through Bioconductor and
 R-Forge:

 >>http://bioconductor.org/packages/release/bioc/html/pathview.html

 >>http://pathview.r-forge.r-project.org/

 >>Please try it out and let me know if you have any

[[elided Yahoo spam]]

 >>Weijun Luo

 >>

 >>

 >>_______________________________________________

 >>Bioconductor mailing list

 >>Bioconductor at r-project.org

 >>https://stat.ethz.ch/mailman/listinfo/bioconductor

 >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

 >>

 >

 >

 >



 _______________________________________________

 Bioconductor mailing list

 Bioconductor at r-project.org

 https://stat.ethz.ch/mailman/listinfo/bioconductor

 Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


More information about the Bioconductor mailing list