[BioC] [devteam-bioc] Getting GO ids for genenames in plasmodium falciparum
Martin Morgan
mtmorgan at fhcrc.org
Wed Oct 9 05:46:01 CEST 2013
On 10/08/2013 08:21 PM, Ipsita Sinha wrote:
> Thanks for the tip. Seems to be working.
>
> I have now have the nested list for the GOids now for the plasmodium falciparum.
> I am now looking at summarizing them and have two questions.
>
> Is there a go slim database for plasmodium falciparum? Doesn't appear to have one.
> There are multiple evidence levels for each GoID for each gene, and at this
> point it is difficult to divide them. I am trying to get the CC part of the Go
> database and bin the genes based on location and I am thinking the GO slim will
> reduce the granularity?
I'm not sure which evidence codes would be useful (they are enumerated here
http://www.geneontology.org/GO.evidence.shtml); in general after having mapped
your ids to GO
mapped = select(org.Pf.plasmo.db, ids, "GO", keytype="SYMBOL")
you can subset the map to have things you're interested in with something like
codesILike = c("EXP", "IDA", "IPI", "IMP", "IGI", "IEP")
good = subset(mapped, (ONTOLOGY %in% "CC") & (EVIDENCE %in% codesILike))
I'm not sure what your 'nested list' looks like or where you're aiming for, but
with(good, split(SYMBOL, GO))
would give you a list of GO ids with their corresponding SYMBOL.
I don't have any special insight into P. falciparum GO slims.
Martin
>
> Thank you for your help. Again sorry for the confusion. Its a steep learning
> curve for me as I have only been looking at bioinformatics in general for the
> past month.
>
> Ipsita
>
>
> On 8 October 2013 19:14, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> On 10/08/2013 01:09 AM, Maintainer wrote:
>
>
> I have a list of genenames - plasmodium falciparum gotten from the
> plasmodb website.
>
> I am trying to get the associated GO:IDs in order to bin the genes into
> housekeeping versus non-housekeeping genes.
> And also in terms of functional and process.
>
> I have installed the org.Pf.plasmo.db using biocLite.
>
>
> I'm guessing you have keys like
>
> > ids <- head(keys(org.Pf.plasmo.db, "SYMBOL"))
> > ids
> [1] "PF3D7_0100100" "PF3D7_0100200" "PF3D7_0100300" "PF3D7_0100400"
> [5] "PF3D7_0100500" "PF3D7_0100600"
>
> and what you want to do is create your own vector 'ids' and then
>
> select(org.Pf.plasmo.db, ids, "GO", keytype="SYMBOL")
>
> Martin
>
>
> I have tried to use this example:
>
> x <- org.Pf.plasmoGO
> # Get the ORF identifiers that are mapped to a GO ID
> mapped_genes <- mappedkeys(x)
> # Convert to a list
> xx <- as.list(x[mapped_genes])
> if(length(xx) > 0) {
> # Try the first one
> got <- xx[[1]]
> got[[1]][["GOID"]]
> got[[1]][["Ontology"]]
> got[[1]][["Evidence"]]
> }
>
> It doesnt provide an opportunity to create a column and enter my own
> gene names. It appears to be a premapped set of genenames. As a result I
> decided to use the example to get all mappings in the list xx
>
> Unfortunately, I am unable to iterate through the list to generate it in
> a dataframe to meaningfully divide up the data.
>
> Secondly is there a way to actually query a database directly via R to
> get the associated GO:ID where the input would be a genename.
>
> Sorry to sound confused. I am pretty new to R and bioconductor.
>
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United
> States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] org.Pf.plasmo.db_2.9.0 BiocInstaller_1.10.3 GO.db_2.9.0
> hgu95av2.db_2.9.0 org.Hs.eg.db_2.9.0
> [6] RSQLite_0.11.4 DBI_0.2-7
> AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] digest_0.6.3 grid_3.0.1 gtable_0.1.2
> IRanges_1.18.4 plyr_1.8 proto_0.3-10
> [7] RColorBrewer_1.0-5 reshape2_1.2.2 stats4_3.0.1
> stringr_0.6.2 tools_3.0.1
>
> --
> Sent via the guest posting facility at bioconductor.org
> <http://bioconductor.org>.
>
> ____________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.__fhcrc.org
> <mailto:devteam-bioc-leave at lists.fhcrc.org>
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc
> <https://lists.fhcrc.org/mailman/listinfo/devteam-bioc>
>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list