[BioC] goseq/nullp with non-native identifiers
Alicia Oshlack
alicia.oshlack at mcri.edu.au
Tue Sep 4 07:59:14 CEST 2012
Hi Ravi,
You can use your own length data and GO categories by:
pwf=nullp(gene.vector,bias.data=lengthData)
go=goseq(pwf,gene2cat=GOmap)
Cheers,
Alicia
On 3/09/12 8:00 PM, "bioconductor-request at r-project.org"
<bioconductor-request at r-project.org> wrote:
> Date: Sun, 2 Sep 2012 09:39:48 -0400
> From: Ravi Karra <ravi.karra at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] goseq/nullp with non-native identifiers
> Message-ID: <1446F9C1-DB8C-4F0F-BB7A-ABE4AA47A64A at gmail.com>
> Content-Type: text/plain
>
> Hello,
>
> I am trying to use goseq to find enriched GO terms for zebrafish RNA-seq data
> and am looking for advice on manually providing gene length information and GO
> annotation to goseq. My RNA-Seq data is mapped to danRer7 Ensembl gene
> id's. Unfortunately danRer7 does not appear to be supported by goeqs's
> built-ins for ensembl gene ids.
>
>> supportedGenomes () [68,]
> db species date name AvailableGeneIDs
> 68 danRer7 Zebrafish Jul. 2010 Sanger Institute Zv9
>
>> pwf = nullp(gene.vector, "danRer7", "ensGene")
> Error in getlength(names(DEgenes), genome, id) :
> Length information for genome danRer7 and gene ID ensGene is not in the
> geneLenDataBase database. You will have to specify bias.data manually.
>
> I would like to manually supply the gene length information by:
>
>> zv9txs = makeTranscriptDbFromBiomart (biomart ="ensembl", dataset =
>> "drerio_gene_ensembl")
>> txsByGene=transcriptsBy(zv9txs,"gene")
>> lengthData=median(width(txsByGene))
>
> and GO Data (using biomaRt):
>
>> zv9 = useDataset("drerio_gene_ensembl",mart=useMart ("ensembl"))
>> GOmap = getBM (filters = "ensembl_gene_id", attributes = c("ensembl_gene_id",
>> "go_id"), values = gene.universe, mart = zv9)
>
> How can I input this GO Data and gene length data into the nullp function of
> goseq to calculate a probability weighting function?
>
> Thanks and sessionInfo() below,
>
> Ravi
>
>> sessionInfo ()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicFeatures_1.8.3 AnnotationDbi_1.18.1 Biobase_2.16.0
> GenomicRanges_1.8.13
> [5] IRanges_1.14.4 BiocGenerics_0.2.0 goseq_1.8.0
> geneLenDataBase_0.99.9
> [9] BiasedUrn_1.04 biomaRt_2.12.0
>
> loaded via a namespace (and not attached):
> [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 DBI_0.2-5
> grid_2.15.1
> [6] hwriter_1.3 lattice_0.20-10 Matrix_1.0-6 mgcv_1.7-20
> nlme_3.1-104
> [11] RCurl_1.91-1 Rsamtools_1.8.6 RSQLite_0.11.1
> rtracklayer_1.16.3 ShortRead_1.14.4
> [16] stats4_2.15.1 tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0
>>
> [[alternative HTML version deleted]]
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
More information about the Bioconductor
mailing list