[BioC] goseq/nullp with non-native identifiers

Tue Sep 4 07:59:14 CEST 2012

Hi Ravi,

You can use your own length data and GO categories by:
pwf=nullp(gene.vector,bias.data=lengthData)
go=goseq(pwf,gene2cat=GOmap)

Cheers,
Alicia

On 3/09/12 8:00 PM, "bioconductor-request at r-project.org"
<bioconductor-request at r-project.org> wrote:

> Date: Sun, 2 Sep 2012 09:39:48 -0400
> From: Ravi Karra <ravi.karra at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] goseq/nullp with non-native identifiers
> Message-ID: <1446F9C1-DB8C-4F0F-BB7A-ABE4AA47A64A at gmail.com>
> Content-Type: text/plain
> 
> Hello, 
> 
> I am trying to use goseq to find enriched GO terms for zebrafish RNA-seq data
> and am looking for advice on manually providing gene length information and GO
> annotation to goseq.    My RNA-Seq data is mapped to danRer7 Ensembl gene
> id's.   Unfortunately danRer7 does not appear to be supported by goeqs's
> built-ins for ensembl gene ids.
> 
>> supportedGenomes () [68,]
>         db   species      date                  name AvailableGeneIDs
> 68 danRer7 Zebrafish Jul. 2010 Sanger Institute Zv9
> 
>> pwf = nullp(gene.vector, "danRer7", "ensGene")
> Error in getlength(names(DEgenes), genome, id) :
>   Length information for genome danRer7 and gene ID ensGene is not in the
> geneLenDataBase database.  You will have to specify bias.data manually.
> 
> I would like to manually supply the gene length information by:
> 
>> zv9txs = makeTranscriptDbFromBiomart (biomart ="ensembl", dataset =
>> "drerio_gene_ensembl")
>> txsByGene=transcriptsBy(zv9txs,"gene")
>> lengthData=median(width(txsByGene))
> 
> and GO Data (using biomaRt):
> 
>> zv9 = useDataset("drerio_gene_ensembl",mart=useMart ("ensembl"))
>> GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id",
>> "go_id"), values = gene.universe, mart = zv9)
> 
> How can I input this GO Data and gene length data into the nullp function of
> goseq to calculate a probability weighting function?
> 
> Thanks and sessionInfo() below,
> 
> Ravi
> 
>> sessionInfo ()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
>  [1] GenomicFeatures_1.8.3  AnnotationDbi_1.18.1   Biobase_2.16.0
> GenomicRanges_1.8.13
>  [5] IRanges_1.14.4         BiocGenerics_0.2.0     goseq_1.8.0
> geneLenDataBase_0.99.9
>  [9] BiasedUrn_1.04         biomaRt_2.12.0
> 
> loaded via a namespace (and not attached):
>  [1] Biostrings_2.24.1  bitops_1.0-4.1     BSgenome_1.24.0    DBI_0.2-5
> grid_2.15.1      
>  [6] hwriter_1.3        lattice_0.20-10    Matrix_1.0-6       mgcv_1.7-20
> nlme_3.1-104     
> [11] RCurl_1.91-1       Rsamtools_1.8.6    RSQLite_0.11.1
> rtracklayer_1.16.3 ShortRead_1.14.4
> [16] stats4_2.15.1      tools_2.15.1       XML_3.9-4          zlibbioc_1.2.0
>> 
> [[alternative HTML version deleted]]

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com