[BioC] [devteam-bioc] Getting GO ids for genenames in plasmodium falciparum

Martin Morgan mtmorgan at fhcrc.org
Wed Oct 9 05:46:01 CEST 2013


On 10/08/2013 08:21 PM, Ipsita Sinha wrote:
> Thanks for the tip. Seems to be working.
>
> I have now have the nested list for the GOids now for the plasmodium falciparum.
> I am now looking at summarizing them and have two questions.
>
> Is there a go slim database for plasmodium falciparum? Doesn't appear to have one.
> There are multiple evidence levels for each GoID for each gene, and at this
> point it is difficult to divide them. I am trying to get the CC part of the Go
> database and bin the genes based on location and I am thinking the GO slim will
> reduce the granularity?

I'm not sure which evidence codes would be useful (they are enumerated here 
http://www.geneontology.org/GO.evidence.shtml); in general after having mapped 
your ids to GO

   mapped = select(org.Pf.plasmo.db, ids, "GO", keytype="SYMBOL")

you can subset the map to have things you're interested in with something like

   codesILike = c("EXP", "IDA", "IPI", "IMP", "IGI", "IEP")
   good = subset(mapped, (ONTOLOGY %in% "CC") & (EVIDENCE %in% codesILike))

I'm not sure what your 'nested list' looks like or where you're aiming for, but

   with(good, split(SYMBOL, GO))

would give you a list of GO ids with their corresponding SYMBOL.

I don't have any special insight into P. falciparum GO slims.

Martin

>
> Thank you for your help. Again sorry for the confusion. Its a steep learning
> curve for me as I have only been looking at bioinformatics in general for the
> past month.
>
> Ipsita
>
>
> On 8 October 2013 19:14, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     On 10/08/2013 01:09 AM, Maintainer wrote:
>
>
>         I have a list of genenames - plasmodium falciparum gotten from the
>         plasmodb website.
>
>         I am trying to get the associated GO:IDs in order to bin the genes into
>         housekeeping versus non-housekeeping genes.
>         And also in terms of functional and process.
>
>         I have installed the org.Pf.plasmo.db using biocLite.
>
>
>     I'm guessing you have keys like
>
>      > ids <- head(keys(org.Pf.plasmo.db, "SYMBOL"))
>      > ids
>     [1] "PF3D7_0100100" "PF3D7_0100200" "PF3D7_0100300" "PF3D7_0100400"
>     [5] "PF3D7_0100500" "PF3D7_0100600"
>
>     and what you want to do is create your own vector 'ids' and then
>
>        select(org.Pf.plasmo.db, ids, "GO", keytype="SYMBOL")
>
>     Martin
>
>
>         I have tried to use this example:
>
>            x <- org.Pf.plasmoGO
>               # Get the ORF identifiers that are mapped to a GO ID
>               mapped_genes <- mappedkeys(x)
>               # Convert to a list
>               xx <- as.list(x[mapped_genes])
>               if(length(xx) > 0) {
>                   # Try the first one
>                   got <- xx[[1]]
>                   got[[1]][["GOID"]]
>                   got[[1]][["Ontology"]]
>                   got[[1]][["Evidence"]]
>               }
>
>         It doesnt provide an opportunity to create a column and enter my own
>         gene names. It appears to be a premapped set of genenames. As a result I
>         decided to use the example to get all mappings in the list xx
>
>         Unfortunately, I am unable to iterate through the list to generate it in
>         a dataframe to meaningfully divide up the data.
>
>         Secondly is there a way to actually query a database directly via R to
>         get the associated GO:ID where the input would be a genename.
>
>         Sorry to sound confused. I am pretty new to R and bioconductor.
>
>
>            -- output of sessionInfo():
>
>         R version 3.0.1 (2013-05-16)
>         Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>         locale:
>         [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>         States.1252    LC_MONETARY=English_United States.1252
>         [4] LC_NUMERIC=C                           LC_TIME=English_United
>         States.1252
>
>         attached base packages:
>         [1] parallel  stats     graphics  grDevices utils     datasets  methods
>            base
>
>         other attached packages:
>            [1] org.Pf.plasmo.db_2.9.0 BiocInstaller_1.10.3   GO.db_2.9.0
>             hgu95av2.db_2.9.0      org.Hs.eg.db_2.9.0
>            [6] RSQLite_0.11.4         DBI_0.2-7
>           AnnotationDbi_1.22.6   Biobase_2.20.1         BiocGenerics_0.6.0
>
>         loaded via a namespace (and not attached):
>            [1] digest_0.6.3       grid_3.0.1         gtable_0.1.2
>         IRanges_1.18.4     plyr_1.8           proto_0.3-10
>            [7] RColorBrewer_1.0-5 reshape2_1.2.2     stats4_3.0.1
>         stringr_0.6.2      tools_3.0.1
>
>         --
>         Sent via the guest posting facility at bioconductor.org
>         <http://bioconductor.org>.
>
>         ____________________________________________________________________________
>         devteam-bioc mailing list
>         To unsubscribe from this mailing list send a blank email to
>         devteam-bioc-leave at lists.__fhcrc.org
>         <mailto:devteam-bioc-leave at lists.fhcrc.org>
>         You can also unsubscribe or change your personal options at
>         https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc
>         <https://lists.fhcrc.org/mailman/listinfo/devteam-bioc>
>
>
>
>     --
>     Computational Biology / Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N.
>     PO Box 19024 Seattle, WA 98109
>
>     Location: Arnold Building M1 B861
>     Phone: (206) 667-2793
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list