[BioC] Using Biomart other than Ensembl
James W. MacDonald
jmacdon at uw.edu
Wed Feb 26 21:29:43 CET 2014
Hi Daniela,
Please don't take things off-list (e.g., use Reply-all).
On 2/26/2014 3:20 PM, Daniela Moré wrote:
> Hi Jim,
>
> Actually, this first step will give me the read counts through
> summarizeOverlaps before the DE analysis.
>
> More specifically, I'm choosing a gene model to make a transcriptDb
> using makeTranscriptDbFromBiomart (page 7)
> I'm following the attached documentation available during the last
> Bioconductor summer course in Brazil (to which the page number refers)
If you prefer NCBI identifiers, you can use makeTranscriptDbFromUCSC()
instead.
library(GenomicFeatures)
tx <- makeTranscriptDbFromUCSC("bosTau6", "refGene")
Should do the trick.
Best,
Jim
>
> Thank you in advance
>
> Daniela
>
>
> On Wed, Feb 26, 2014 at 4:55 PM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
> Hi Daniela,
>
> On 2/26/2014 2:29 PM, Daniela Moré [guest] wrote:
>
> Hi guys,
> I'm new on R and Bioconductor packages so my question can
> sounds a little basics but I really could not figure out how
> to use a database from NCBI in BiomaRt.
> I'm working on RNA-Seq reads to perform DE analysis and I'm
> interested in Bos taurus database from NCBI version UMD3.1.
>
>
> I think you will need to give more information here. What exactly
> are you trying to do? Have you already done the DE analysis, and
> now are simply trying to annotate the results? If so, what type of
> gene/transcript IDs do you have?
>
> Best,
>
> Jim
>
>
>
> So my question is: how to choose the bovine UMD3.1 from NCBI
> in BiomaRt? Or the best way to solve this would be to perform
> the aligment using the ensembl version?
>
> Just to make me clear I can't find any NCBI databases when I type:
>
> library("biomaRt")
> listMarts()
>
> If I take a look at “ensembl†[ensembl=useMart("ensembl")]
> so I can see the btaurus_gene_ensembl dataset. However, as I
> aligned my reads against a NCBI version when I tried count the
> reads, it did not work ('cause they have different identifiers
> I guess). The manual shows a short example using a wormDb but
> it did not help so much.
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] DESeq2_1.2.10 RcppArmadillo_0.4.000.2 Rcpp_0.11.0
> Rsamtools_1.14.3 Biostrings_2.30.1 GenomicRanges_1.14.4
> [7] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.24.0 BSgenome_1.30.0 Biobase_2.22.0
> DBI_0.2-7 GenomicFeatures_1.14.2 RColorBrewer_1.0-5
> [7] RCurl_1.95-4.1 RSQLite_0.11.4 XML_3.98-1.1 annotate_1.40.0
> biomaRt_2.18.0 bitops_1.0-6
> [13] genefilter_1.44.0 grid_3.0.2 lattice_0.20-24
> locfit_1.5-9.1 rtracklayer_1.22.3 splines_3.0.2
> [19] stats4_3.0.2 survival_2.37-7 tools_3.0.2 xtable_1.7-1
> zlibbioc_1.8.0
>
> --
> Sent via the guest posting facility at bioconductor.org
> <http://bioconductor.org>.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list