[BioC] topGO using de novo assembled transcriptome
oystercow
ian.mcdowell at gmail.com
Wed Nov 9 05:33:52 CET 2011
Hi all,
> gene.table <- read.table("/Users/oystercow/Desktop/11:07:2011workfolder/p-value_for_topGO_5d_1d_all", header = TRUE, row.names=1)
> genelist_topGO_5d_1d_all <- as.numeric(gene.table$p.value)
> names(genelist_topGO_5d_1d_all) <- as.character(row.names(gene.table))
#My geneList looks good, just like the example, e.g.:
> head(genelist_topGO_5d_1d_all)
comp0_c0_seq1 comp0_c0_seq10 comp0_c0_seq2 comp0_c0_seq3 comp0_c0_seq4 comp0_c0_seq5
1.742075e-03 3.160000e-159 1.453968e-02 9.230000e-06 3.300000e-14 1.710000e-65
#Yet when I try to define and use the topDiffGenes function, the results are unexpected
> topDiffGenes <- function(allScore) {
+ return(allScore < 0.01)
+ }
> sum(topDiffGenes(genelist_topGO_5d_1d_all))
[1] NA
#this should be <58819, and certainly not 'NA'
> length(topDiffGenes(genelist_topGO_5d_1d_all))
[1] 58819
#this is the total number of IDs, contigs in my case
> head(topDiffGenes(genelist_topGO_5d_1d_all))
comp0_c0_seq1 comp0_c0_seq10 comp0_c0_seq2 comp0_c0_seq3 comp0_c0_seq4 comp0_c0_seq5
TRUE TRUE FALSE TRUE TRUE TRUE
#If you think my error came from:
> genelist_topGO_5d_1d_all <- as.numeric(gene.table$p.value)
#and that I instead should import the p.values as.character (which I saw on a previous posting, https://stat.ethz.ch/pipermail/bioconductor/2007-November/020045.html)
> genelist_topGO_5d_1d_all_2 <- as.character(gene.table$p.value)
> names(genelist_topGO_5d_1d_all_2) <- as.character(row.names(gene.table))
> head(genelist_topGO_5d_1d_all_2)
comp0_c0_seq1 comp0_c0_seq10 comp0_c0_seq2 comp0_c0_seq3 comp0_c0_seq4 comp0_c0_seq5
"0.001742075" "3.16e-159" "0.014539683" "9.23e-06" "3.3e-14" "1.71e-65"
> sum(topDiffGenes(genelist_topGO_5d_1d_all_2))
[1] NA
> length(topDiffGenes(genelist_topGO_5d_1d_all_2))
[1] 58819
#same results, except even worse , inaccurate comparisons:
> head(topDiffGenes(genelist_topGO_5d_1d_all_2))
comp0_c0_seq1 comp0_c0_seq10 comp0_c0_seq2 comp0_c0_seq3 comp0_c0_seq4 comp0_c0_seq5
TRUE FALSE FALSE FALSE FALSE FALSE
I would like to do this:
> GOdata <- new("topGOdata", ontology = "BP", allGenes = genelist_topGO_5d_1d_all, geneSel = topDiffGenes(genelist_topGO_5d_1d_all), annot = annFUN.GO2genes, GO2genes = as.list(read.table("~/Desktop/annot_readyforR.annot", header = FALSE, sep = "\t")))
#using my own annotations
#"~/Desktop/annot_readyforR.annot", is:
comp517_c0_seq1 GO:0015850
comp517_c0_seq1 GO:0015665
comp517_c0_seq1 GO:0031224
comp517_c0_seq1 GO:0015291
comp517_c0_seq1 GO:0012501
comp517_c0_seq1 GO:0030001
comp1970_c0_seq1 GO:0004000
comp1970_c0_seq1 GO:0003676
comp1970_c0_seq1 GO:0031981
comp1970_c0_seq1 GO:0016553
comp1970_c0_seq1 GO:0019221
comp1970_c0_seq1 GO:0010467
comp1964_c0_seq1 GO:0005488
comp1964_c0_seq2 GO:0005488
...
My error message for the above is:
Error in checkSlotAssignment(object, name, value) :
assignment of an object of class "logical" is not valid for slot "geneSelectionFun" in an object of class "topGOdata"; is(value, "function") is not TRUE
Any suggestions? topGO seems quite streamlined for microarray data but for "self-annotated" transcriptome data, any other hints would surely help.
Thanks,
Ian McDowell
University of Rhode Island
More information about the Bioconductor
mailing list