[Bioc-devel] Txdb Issues - all exon names are NA's ?

Marc Carlson mrjc42 at gmail.com
Wed Sep 23 08:32:25 CEST 2015


Works for me.

 Marc


On Tue, Sep 22, 2015 at 6:03 PM, Hervé Pagès <hpages at fredhutch.org> wrote:

> Hi Marc,
>
> On 09/22/2015 05:39 PM, Marc Carlson wrote:
>
>> Herve is right. UCSC doesn't give us this information,  And actually, I
>> think it's pretty rare to see exon names from anybody.   So it's weird
>> to me that they are a default return value for this method.
>>
>
> Ensembl does provide exon names/ids so any TxDb object that was created
> with makeTxDbFromBiomart("ensembl", ...) should have them:
>
>   library(GenomicFeatures)
>   txdb <- makeTxDbFromBiomart("ensembl", dataset="celegans_gene_ensembl")
>   exonsBy(txdb, use.names=TRUE)$Y74C9A.2a.2
>   # GRanges object with 4 ranges and 3 metadata columns:
>   #       seqnames         ranges strand |   exon_id          exon_name
> exon_rank
>   #          <Rle>      <IRanges>  <Rle> | <integer>        <character>
> <integer>
>   #   [1]        I [10413, 10585]      + |         1  WBGene00022276.e1
>      1
>   #   [2]        I [11618, 11689]      + |         6  WBGene00022276.e6
>      2
>   #   [3]        I [14951, 15160]      + |        11 WBGene00022276.e11
>      3
>   #   [4]        I [16473, 16842]      + |        14 WBGene00022276.e14
>      4
>   #   -------
>   #   seqinfo: 7 sequences (1 circular) from an unspecified genome
>
> Note that the *By() extractors don't let the user choose which column
> to return at the moment so that's why it was decided (a long time ago)
> to return exon internal ids *and* names (better more than less).
>
> H.
>
>
>>    Marc
>>
>> On Tue, Sep 22, 2015 at 5:29 PM, Hervé Pagès <hpages at fredhutch.org
>> <mailto:hpages at fredhutch.org>> wrote:
>>
>>     Hi Sonali,
>>
>>     UCSC doesn't provide names for the exons of their gene models.
>>     See the table where this data is coming from:
>>
>>
>>
>> https://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=genes&hgta_track=knownGene&hgta_table=knownGene&hgta_doSchema=describe+table+schema
>>
>>     The exon information is all coming from the exonStarts and exonEnds
>>     columns. No exon names!
>>
>>     H.
>>
>>     PS: Maybe this would better be asked on the support site.
>>
>>
>>     On 09/22/2015 04:44 PM, Arora, Sonali wrote:
>>
>>            Hi everyone,
>>
>>         I was trying to get the exons by gene from a txdb object but I
>>         get NA's
>>         for all exon_name's. Please advise.
>>
>>           > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>           > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>           > ebg2 <- exonsBy(txdb, by="gene")
>>           >
>>           > ebg2
>>         GRangesList object of length 23459:
>>         $1
>>         GRanges object with 15 ranges and 2 metadata columns:
>>                  seqnames               ranges strand   |   exon_id
>>                     <Rle>            <IRanges>  <Rle>   | <integer>
>>              [1]    chr19 [58858172, 58858395]      -   |    250809
>>              [2]    chr19 [58858719, 58859006]      -   |    250810
>>              [3]    chr19 [58859832, 58860494]      -   |    250811
>>              [4]    chr19 [58860934, 58862017]      -   |    250812
>>              [5]    chr19 [58861736, 58862017]      -   |    250813
>>              ...      ...                  ...    ... ...       ...
>>             [11]    chr19 [58868951, 58869015]      -   |    250821
>>             [12]    chr19 [58869318, 58869652]      -   |    250822
>>             [13]    chr19 [58869855, 58869951]      -   |    250823
>>             [14]    chr19 [58870563, 58870689]      -   |    250824
>>             [15]    chr19 [58874043, 58874214]      -   |    250825
>>                    exon_name
>>                  <character>
>>              [1]        <NA>
>>              [2]        <NA>
>>              [3]        <NA>
>>              [4]        <NA>
>>              [5]        <NA>
>>              ...         ...
>>             [11]        <NA>
>>             [12]        <NA>
>>             [13]        <NA>
>>             [14]        <NA>
>>             [15]        <NA>
>>
>>         $10
>>         GRanges object with 2 ranges and 2 metadata columns:
>>                 seqnames               ranges strand | exon_id exon_name
>>             [1]     chr8 [18248755, 18248855]      + |  113603      <NA>
>>             [2]     chr8 [18257508, 18258723]      + |  113604      <NA>
>>
>>         ...
>>         <23457 more elements>
>>         -------
>>         seqinfo: 93 sequences (1 circular) from hg19 genome
>>           > testgr <- unlist(ebg2)
>>           > table(is.na <http://is.na>(mcols(testgr)$exon_name))
>>
>>
>>             TRUE
>>         272776
>>           > sessionInfo()
>>         R version 3.2.2 RC (2015-08-09 r68965)
>>         Platform: x86_64-w64-mingw32/x64 (64-bit)
>>         Running under: Windows 7 x64 (build 7601) Service Pack 1
>>
>>         locale:
>>         [1] LC_COLLATE=English_United States.1252
>>         [2] LC_CTYPE=English_United States.1252
>>         [3] LC_MONETARY=English_United States.1252
>>         [4] LC_NUMERIC=C
>>         [5] LC_TIME=English_United States.1252
>>
>>         attached base packages:
>>         [1] stats4    parallel  stats     graphics  grDevices utils
>>         [7] datasets  methods   base
>>
>>         other attached packages:
>>         [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.1
>>         [2] GenomicFeatures_1.21.29
>>         [3] AnnotationDbi_1.31.18
>>         [4] Biobase_2.29.1
>>         [5] GenomicRanges_1.21.28
>>         [6] GenomeInfoDb_1.5.16
>>         [7] IRanges_2.3.21
>>         [8] S4Vectors_0.7.18
>>         [9] BiocGenerics_0.15.6
>>
>>         loaded via a namespace (and not attached):
>>            [1] XVector_0.9.4              zlibbioc_1.15.0
>>            [3] GenomicAlignments_1.5.17   BiocParallel_1.3.52
>>            [5] tools_3.2.2                SummarizedExperiment_0.3.9
>>            [7] DBI_0.3.1                  lambda.r_1.1.7
>>            [9] futile.logger_1.4.1        rtracklayer_1.29.27
>>         [11] futile.options_1.0.0       bitops_1.0-6
>>         [13] RCurl_1.95-4.7             biomaRt_2.25.3
>>         [15] RSQLite_1.0.0              Biostrings_2.37.8
>>         [17] Rsamtools_1.21.17          XML_3.98-1.3
>>
>>
>>     --
>>     Hervé Pagès
>>
>>     Program in Computational Biology
>>     Division of Public Health Sciences
>>     Fred Hutchinson Cancer Research Center
>>     1100 Fairview Ave. N, M1-B514
>>     P.O. Box 19024
>>     Seattle, WA 98109-1024
>>
>>     E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>     _______________________________________________
>>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
>> list
>>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list