[BioC] Custom GeneSetCollection PFAM

Martin Morgan mtmorgan at fhcrc.org
Sat Jun 7 02:06:57 CEST 2014


Hi Fabian --

On 06/06/2014 06:28 AM, Fabian Grammes wrote:
> Dear all
>
> I'm working with an unsupported organism. However I have a table
> of PFAM annotations for my genes and would like to make a
> GeneSetCollection out of it (to use it later for hypergeometric testing
> etc...)
>
> So how would I get a data set like this:
>
> gene_id	pfam_id
> XLOC_000002 PF00354
> XLOC_000002 PF13385
> XLOC_000005 PF10523
> XLOC_000005 PF13385
> XLOC_000007 PF00013
> XLOC_000007 PF02791
> XLOC_000007 PF13385
>
> into a GeneSetCollection ??

I read in your data

     df <- read.csv(textConnection("gene_id,pfam_id
     XLOC_000002,PF00354
     XLOC_000002,PF13385
     XLOC_000005,PF10523
     XLOC_000005,PF13385
     XLOC_000007,PF00013
     XLOC_000007,PF02791
     XLOC_000007,PF13385"), stringsAsFactors=FALSE, row.names=NULL)

then split it into groups based on pfam identifier

     sets <- split(df$gene_id, df$pfam_id)

then created one gene set for each pfam id, and collected the set into a collection

     library(GSEABase)
     gsc <- GeneSetCollection(Map(function(pid, gids) {
         GeneSet(gids, setName=pid, collectionType=PfamCollection(pid))
     }, names(sets), sets))

resulting in

     > gsc
     GeneSetCollection
       names: PF00013, PF00354, ..., PF13385 (5 total)
       unique identifiers: XLOC_000007, XLOC_000002, XLOC_000005 (3 total)
       types in collection:
         geneIdType: NullIdentifier (1 total)
         collectionType: PfamCollection (1 total)
     > gsc[["PF13385"]]
     setName: PF13385
     geneIds: XLOC_000002, XLOC_000005, XLOC_000007 (total: 3)
     geneIdType: Null
     collectionType: Pfam
       ids: PF13385 (1 total)
     details: use 'details(object)'

Hope that helps,

Martin

>
> Thanks, kind regards
>
> Fabian
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list