[Bioc-devel] Txdb Issues - all exon names are NA's ?

Arora, Sonali sarora at fredhutch.org
Wed Sep 23 02:43:56 CEST 2015


I was following Mike's RNAseq workflow from here
http://www.bioconductor.org/help/workflows/rnaseqGene/

and it had exon_name's - but that's probably because the txdb is made 
from NCBI (GrCh37.75)

Thanks for the clarification Herve and Marc!

Sonali.


On 9/22/2015 5:39 PM, Marc Carlson wrote:
> Herve is right. UCSC doesn't give us this information,  And actually, 
> I think it's pretty rare to see exon names from anybody.   So it's 
> weird to me that they are a default return value for this method.
>
>   Marc
>
> On Tue, Sep 22, 2015 at 5:29 PM, Hervé Pagès <hpages at fredhutch.org 
> <mailto:hpages at fredhutch.org>> wrote:
>
>     Hi Sonali,
>
>     UCSC doesn't provide names for the exons of their gene models.
>     See the table where this data is coming from:
>
>
>     https://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=genes&hgta_track=knownGene&hgta_table=knownGene&hgta_doSchema=describe+table+schema
>
>     The exon information is all coming from the exonStarts and exonEnds
>     columns. No exon names!
>
>     H.
>
>     PS: Maybe this would better be asked on the support site.
>
>
>     On 09/22/2015 04:44 PM, Arora, Sonali wrote:
>
>           Hi everyone,
>
>         I was trying to get the exons by gene from a txdb object but I
>         get NA's
>         for all exon_name's. Please advise.
>
>          > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>          > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>          > ebg2 <- exonsBy(txdb, by="gene")
>          >
>          > ebg2
>         GRangesList object of length 23459:
>         $1
>         GRanges object with 15 ranges and 2 metadata columns:
>                 seqnames               ranges strand   |  exon_id
>                    <Rle>            <IRanges> <Rle>   | <integer>
>             [1]    chr19 [58858172, 58858395]      -   | 250809
>             [2]    chr19 [58858719, 58859006]      -   | 250810
>             [3]    chr19 [58859832, 58860494]      -   | 250811
>             [4]    chr19 [58860934, 58862017]      -   | 250812
>             [5]    chr19 [58861736, 58862017]      -   | 250813
>             ...      ...                  ...    ... ...  ...
>            [11]    chr19 [58868951, 58869015]      -   | 250821
>            [12]    chr19 [58869318, 58869652]      -   | 250822
>            [13]    chr19 [58869855, 58869951]      -   | 250823
>            [14]    chr19 [58870563, 58870689]      -   | 250824
>            [15]    chr19 [58874043, 58874214]      -   | 250825
>                   exon_name
>                 <character>
>             [1]        <NA>
>             [2]        <NA>
>             [3]        <NA>
>             [4]        <NA>
>             [5]        <NA>
>             ...         ...
>            [11]        <NA>
>            [12]        <NA>
>            [13]        <NA>
>            [14]        <NA>
>            [15]        <NA>
>
>         $10
>         GRanges object with 2 ranges and 2 metadata columns:
>                seqnames               ranges strand | exon_id exon_name
>            [1]     chr8 [18248755, 18248855]      + |  113603     <NA>
>            [2]     chr8 [18257508, 18258723]      + |  113604     <NA>
>
>         ...
>         <23457 more elements>
>         -------
>         seqinfo: 93 sequences (1 circular) from hg19 genome
>          > testgr <- unlist(ebg2)
>          > table(is.na <http://is.na>(mcols(testgr)$exon_name))
>
>            TRUE
>         272776
>          > sessionInfo()
>         R version 3.2.2 RC (2015-08-09 r68965)
>         Platform: x86_64-w64-mingw32/x64 (64-bit)
>         Running under: Windows 7 x64 (build 7601) Service Pack 1
>
>         locale:
>         [1] LC_COLLATE=English_United States.1252
>         [2] LC_CTYPE=English_United States.1252
>         [3] LC_MONETARY=English_United States.1252
>         [4] LC_NUMERIC=C
>         [5] LC_TIME=English_United States.1252
>
>         attached base packages:
>         [1] stats4    parallel  stats     graphics  grDevices utils
>         [7] datasets  methods   base
>
>         other attached packages:
>         [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.1
>         [2] GenomicFeatures_1.21.29
>         [3] AnnotationDbi_1.31.18
>         [4] Biobase_2.29.1
>         [5] GenomicRanges_1.21.28
>         [6] GenomeInfoDb_1.5.16
>         [7] IRanges_2.3.21
>         [8] S4Vectors_0.7.18
>         [9] BiocGenerics_0.15.6
>
>         loaded via a namespace (and not attached):
>           [1] XVector_0.9.4              zlibbioc_1.15.0
>           [3] GenomicAlignments_1.5.17   BiocParallel_1.3.52
>           [5] tools_3.2.2 SummarizedExperiment_0.3.9
>           [7] DBI_0.3.1                  lambda.r_1.1.7
>           [9] futile.logger_1.4.1        rtracklayer_1.29.27
>         [11] futile.options_1.0.0       bitops_1.0-6
>         [13] RCurl_1.95-4.7             biomaRt_2.25.3
>         [15] RSQLite_1.0.0              Biostrings_2.37.8
>         [17] Rsamtools_1.21.17          XML_3.98-1.3
>
>
>     -- 
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>     _______________________________________________
>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
>     list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

-- 
Thanks and Regards,
Sonali
Office: C2-169
http://tinyurl.com/sonali-hb-calendar


	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list