[BioC] Custom GeneSetCollection PFAM
Martin Morgan
mtmorgan at fhcrc.org
Sat Jun 7 02:06:57 CEST 2014
Hi Fabian --
On 06/06/2014 06:28 AM, Fabian Grammes wrote:
> Dear all
>
> I'm working with an unsupported organism. However I have a table
> of PFAM annotations for my genes and would like to make a
> GeneSetCollection out of it (to use it later for hypergeometric testing
> etc...)
>
> So how would I get a data set like this:
>
> gene_id pfam_id
> XLOC_000002 PF00354
> XLOC_000002 PF13385
> XLOC_000005 PF10523
> XLOC_000005 PF13385
> XLOC_000007 PF00013
> XLOC_000007 PF02791
> XLOC_000007 PF13385
>
> into a GeneSetCollection ??
I read in your data
df <- read.csv(textConnection("gene_id,pfam_id
XLOC_000002,PF00354
XLOC_000002,PF13385
XLOC_000005,PF10523
XLOC_000005,PF13385
XLOC_000007,PF00013
XLOC_000007,PF02791
XLOC_000007,PF13385"), stringsAsFactors=FALSE, row.names=NULL)
then split it into groups based on pfam identifier
sets <- split(df$gene_id, df$pfam_id)
then created one gene set for each pfam id, and collected the set into a collection
library(GSEABase)
gsc <- GeneSetCollection(Map(function(pid, gids) {
GeneSet(gids, setName=pid, collectionType=PfamCollection(pid))
}, names(sets), sets))
resulting in
> gsc
GeneSetCollection
names: PF00013, PF00354, ..., PF13385 (5 total)
unique identifiers: XLOC_000007, XLOC_000002, XLOC_000005 (3 total)
types in collection:
geneIdType: NullIdentifier (1 total)
collectionType: PfamCollection (1 total)
> gsc[["PF13385"]]
setName: PF13385
geneIds: XLOC_000002, XLOC_000005, XLOC_000007 (total: 3)
geneIdType: Null
collectionType: Pfam
ids: PF13385 (1 total)
details: use 'details(object)'
Hope that helps,
Martin
>
> Thanks, kind regards
>
> Fabian
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list