[Bioc-devel] Making GOstats pathway analysis using data from both a and b chip s in one go.

James W. MacDonald jmacdon at med.umich.edu
Tue Aug 22 19:18:26 CEST 2006


Robert Gentleman wrote:
> Hi,
>    Basically the problem with combining data from multiple chips is that 
> you need to be pretty careful about duplicate probes for single genes 
> (since GO maps at the EntrezGene level that is the level of resolution 
> you have to work with and double counting some genes and not others is a 
> pretty bad idea) and next it is getting the Universe right.  This is 
> also incredibly important, and is straightforward for a single chip (and 
> that is what has been done in GOstats) and less straightforward for 
> anything else.  You will need to do some programming to create your own 
> Universe and solve a few other problems (the GOHyper has a universe 
> argument that should work, let us know if it does not).
> 
>   Finally as Wolfgang recently said, this approach (hypergeometric 
> testing) is probably not as good as Gene Set Enrichment (there are two 
> Bioc packages for doing this, one is called Category the other is called 
> sigPathway

And yet another is called safe.

Best,

Jim


> 
>   best wishes
>     Robert
> 
> 
> Peder.Worning at astrazeneca.com wrote:
> 
>>Dear Bioconductor list,
>>
>>I have been using the GOstats package to make pathway analysis of Affymetrix
>>data from the mouse MOE430a chips, and that works fine. Now I want to make
>>an analysis which includes the data from both the a and b chips in one go is
>>that possible?
>>
>>My problem comes when I use the GOHyperG function where I have to make
>>either lib="moe430a" or lib="moe430b". Is there a way to incude annotation
>>from both chips in the same analysis?
>>
>>Here is the R code I have used:
>>****************************************************************************
>>************************************** 
>>library(GOstats)
>>library(moe430a)
>>library(moe430b)
>>library
>>
>>indat <- read.delim("ASM+CSM+12X2+LPS_A.SAMROC.ann.molcl.logP.tab.txt",
>>header = T, sep = "\t", blank.lines.skip = T)
>>indata <- indat[!(substr(as.character(indat[,1]),1,3) %in% "AFF"),] # remove
>>AFF probesets
>>
>>newdata_ASMup_01_pr02 <- indata[indata[,10] >= 2 & indata[,2] >=0.2,]
>>
>># GO pathway analysis from here
>>
>>############################################################################
>>#############################################
>>newdata <- newdata_ASMup_01_pr02 # Here is the dataset chosen everything
>>from here must be repeated for a new dataset ####
>>newprobeset <- as.character(newdata[,1])
>>newprobeset_a <- as.character(newdata[newdata[,61]=="A" ,1])
>>newprobeset_b <- as.character(newdata[newdata[,61]=="B" ,1])
>>
>>gNll_a = as.character(unlist(mget(newprobeset_a, moe430aLOCUSID)))
>>gNsLL_a <- unique(unlist(mget(newprobeset_a, env=moe430aLOCUSID,
>>ifnotfound=NA)))
>>gNll_b = as.character(unlist(mget(newprobeset_b, moe430bLOCUSID)))
>>gNsLL_b <- unique(unlist(mget(newprobeset_b, env=moe430bLOCUSID,
>>ifnotfound=NA)))
>>
>>gNll <- c(gNll_a,gNll_b)
>>gNsLL <- unique(c(gNsLL_a,gNsLL_b))
>>
>>
>># The three GO ontologies using only data from the a-chip
>>gGhyp_BP <- GOHyperG(gNsLL_a, lib="moe430a", what="BP")
>>gGO_BP <- makeGOGraph(gNll_a, Ontology="BP")
>>
>>gGhyp_CC <- GOHyperG(gNsLL_a, lib="moe430a", what="CC")
>>gGO_CC <- makeGOGraph(gNll_a, Ontology="CC")
>>
>>gGhyp_MF <- GOHyperG(gNsLL_a, lib="moe430a", what="MF")
>>gGO_MF <- makeGOGraph(gNll_a, Ontology="MF")
>>
>>gGhyp_BP.pv <- gGhyp_BP$pv[nodes(gGO_BP)]
>>gg_BP = sort(gGhyp_BP.pv[gGhyp_BP.pv < 0.05])
>>
>>gGhyp_CC.pv <- gGhyp_CC$pv[nodes(gGO_CC)]
>>gg_CC = sort(gGhyp_CC.pv[gGhyp_CC.pv < 0.05])
>>
>>gGhyp_MF.pv <- gGhyp_MF$pv[nodes(gGO_MF)]
>>gg_MF = sort(gGhyp_MF.pv[gGhyp_MF.pv < 0.05])
>>
>>gg_BP.terms <- getGOTerm(names(gg_BP))[["BP"]]
>>gg_CC.terms <- getGOTerm(names(gg_CC))[["CC"]]
>>gg_MF.terms <- getGOTerm(names(gg_MF))[["MF"]]
>>
>>ggt_BP = as.character(unlist(gg_BP.terms))
>>numCh_BP = nchar(ggt_BP)
>>ggt2_BP = substr(ggt_BP, 1, 50)
>>ggt3_BP = paste("BP: ",ggt2_BP, ifelse(numCh_BP > 50, "..", ""), sep="")
>>
>>ggt_CC = as.character(unlist(gg_CC.terms))
>>numCh_CC = nchar(ggt_CC)
>>ggt2_CC = substr(ggt_CC, 1, 50)
>>ggt3_CC = paste("CC: ",ggt2_CC, ifelse(numCh_CC > 50, "..", ""), sep="")
>>
>>ggt_MF = as.character(unlist(gg_MF.terms))
>>numCh_MF = nchar(ggt_MF)
>>ggt2_MF = substr(ggt_MF, 1, 50)
>>ggt3_MF = paste("MF: ",ggt2_MF, ifelse(numCh_MF > 50, "..", ""), sep="")
>>
>>
>>##get counts
>>gg_BP.counts <- gGhyp_BP$intCounts[names(gg_BP)]
>>gg_CC.counts <- gGhyp_CC$intCounts[names(gg_CC)]
>>gg_MF.counts <- gGhyp_MF$intCounts[names(gg_MF)]
>>gg_BP.allcounts <- gGhyp_BP$goCounts[names(gg_BP)]
>>gg_CC.allcounts <- gGhyp_CC$goCounts[names(gg_CC)]
>>gg_MF.allcounts <- gGhyp_MF$goCounts[names(gg_MF)]
>>
>>ggMat_BP = matrix(c(names(gg_BP.terms), ggt3_BP, signif(gg_BP,3),
>>gg_BP.counts, gg_BP.allcounts),
>>    byrow=FALSE, nc=5, dimnames=list(1:length(gg_BP), c("GO ID",
>>   "Term", "p-value","No of Genes","Total number")))
>>
>>ggMat_CC = matrix(c(names(gg_CC.terms), ggt3_CC, signif(gg_CC,3),
>>gg_CC.counts, gg_CC.allcounts),
>>    byrow=FALSE, nc=5, dimnames=list(1:length(gg_CC), c("GO ID",
>>   "Term", "p-value","No of Genes","Total number")))
>>
>>ggMat_MF = matrix(c(names(gg_MF.terms), ggt3_MF, signif(gg_MF,3),
>>gg_MF.counts, gg_MF.allcounts),
>>    byrow=FALSE, nc=5, dimnames=list(1:length(gg_MF), c("GO ID",
>>   "Term", "p-value","No of Genes","Total number")))
>>
>>newMat <- rbind(ggMat_BP,ggMat_CC,ggMat_MF)
>>
>>This is what I have done and it works beautifully, but how do I make the
>>analysis using both chips, because making the analysis seaprate of the A and
>>the B chip data doesn't make sense. It means that the B chip data is somehow
>>useless in this context. I have the same problem with my human data.
>>
>>Some technical details: R is version 2.3.1 and GOstats is version 1.6.0 and
>>the annotation files are moe430a_1.12.0 and moeb430_1.12.0
>>
>>Sincerly yours
>>
>>Peder
>>
>>________________________________________________
>>Peder Worning, PhD
>>
>>Senior Scientist, Bioinformatics
>>AstraZeneca R&D Lund
>>S-221 87 Lund, Sweden
>>
>>Tel:     +46 46 33 72 93
>>Fax:    +46 46 33 71 64
>>Email:  peder.worning at astrazeneca.com
>>
>>_______________________________________________
>>Bioc-devel at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 
> 


-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioc-devel mailing list