[Bioc-devel] Txdb Issues - all exon names are NA's ?
Marc Carlson
mrjc42 at gmail.com
Wed Sep 23 08:32:25 CEST 2015
Works for me.
Marc
On Tue, Sep 22, 2015 at 6:03 PM, Hervé Pagès <hpages at fredhutch.org> wrote:
> Hi Marc,
>
> On 09/22/2015 05:39 PM, Marc Carlson wrote:
>
>> Herve is right. UCSC doesn't give us this information, And actually, I
>> think it's pretty rare to see exon names from anybody. So it's weird
>> to me that they are a default return value for this method.
>>
>
> Ensembl does provide exon names/ids so any TxDb object that was created
> with makeTxDbFromBiomart("ensembl", ...) should have them:
>
> library(GenomicFeatures)
> txdb <- makeTxDbFromBiomart("ensembl", dataset="celegans_gene_ensembl")
> exonsBy(txdb, use.names=TRUE)$Y74C9A.2a.2
> # GRanges object with 4 ranges and 3 metadata columns:
> # seqnames ranges strand | exon_id exon_name
> exon_rank
> # <Rle> <IRanges> <Rle> | <integer> <character>
> <integer>
> # [1] I [10413, 10585] + | 1 WBGene00022276.e1
> 1
> # [2] I [11618, 11689] + | 6 WBGene00022276.e6
> 2
> # [3] I [14951, 15160] + | 11 WBGene00022276.e11
> 3
> # [4] I [16473, 16842] + | 14 WBGene00022276.e14
> 4
> # -------
> # seqinfo: 7 sequences (1 circular) from an unspecified genome
>
> Note that the *By() extractors don't let the user choose which column
> to return at the moment so that's why it was decided (a long time ago)
> to return exon internal ids *and* names (better more than less).
>
> H.
>
>
>> Marc
>>
>> On Tue, Sep 22, 2015 at 5:29 PM, Hervé Pagès <hpages at fredhutch.org
>> <mailto:hpages at fredhutch.org>> wrote:
>>
>> Hi Sonali,
>>
>> UCSC doesn't provide names for the exons of their gene models.
>> See the table where this data is coming from:
>>
>>
>>
>> https://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=genes&hgta_track=knownGene&hgta_table=knownGene&hgta_doSchema=describe+table+schema
>>
>> The exon information is all coming from the exonStarts and exonEnds
>> columns. No exon names!
>>
>> H.
>>
>> PS: Maybe this would better be asked on the support site.
>>
>>
>> On 09/22/2015 04:44 PM, Arora, Sonali wrote:
>>
>> Hi everyone,
>>
>> I was trying to get the exons by gene from a txdb object but I
>> get NA's
>> for all exon_name's. Please advise.
>>
>> > library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>> > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>> > ebg2 <- exonsBy(txdb, by="gene")
>> >
>> > ebg2
>> GRangesList object of length 23459:
>> $1
>> GRanges object with 15 ranges and 2 metadata columns:
>> seqnames ranges strand | exon_id
>> <Rle> <IRanges> <Rle> | <integer>
>> [1] chr19 [58858172, 58858395] - | 250809
>> [2] chr19 [58858719, 58859006] - | 250810
>> [3] chr19 [58859832, 58860494] - | 250811
>> [4] chr19 [58860934, 58862017] - | 250812
>> [5] chr19 [58861736, 58862017] - | 250813
>> ... ... ... ... ... ...
>> [11] chr19 [58868951, 58869015] - | 250821
>> [12] chr19 [58869318, 58869652] - | 250822
>> [13] chr19 [58869855, 58869951] - | 250823
>> [14] chr19 [58870563, 58870689] - | 250824
>> [15] chr19 [58874043, 58874214] - | 250825
>> exon_name
>> <character>
>> [1] <NA>
>> [2] <NA>
>> [3] <NA>
>> [4] <NA>
>> [5] <NA>
>> ... ...
>> [11] <NA>
>> [12] <NA>
>> [13] <NA>
>> [14] <NA>
>> [15] <NA>
>>
>> $10
>> GRanges object with 2 ranges and 2 metadata columns:
>> seqnames ranges strand | exon_id exon_name
>> [1] chr8 [18248755, 18248855] + | 113603 <NA>
>> [2] chr8 [18257508, 18258723] + | 113604 <NA>
>>
>> ...
>> <23457 more elements>
>> -------
>> seqinfo: 93 sequences (1 circular) from hg19 genome
>> > testgr <- unlist(ebg2)
>> > table(is.na <http://is.na>(mcols(testgr)$exon_name))
>>
>>
>> TRUE
>> 272776
>> > sessionInfo()
>> R version 3.2.2 RC (2015-08-09 r68965)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats4 parallel stats graphics grDevices utils
>> [7] datasets methods base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.1
>> [2] GenomicFeatures_1.21.29
>> [3] AnnotationDbi_1.31.18
>> [4] Biobase_2.29.1
>> [5] GenomicRanges_1.21.28
>> [6] GenomeInfoDb_1.5.16
>> [7] IRanges_2.3.21
>> [8] S4Vectors_0.7.18
>> [9] BiocGenerics_0.15.6
>>
>> loaded via a namespace (and not attached):
>> [1] XVector_0.9.4 zlibbioc_1.15.0
>> [3] GenomicAlignments_1.5.17 BiocParallel_1.3.52
>> [5] tools_3.2.2 SummarizedExperiment_0.3.9
>> [7] DBI_0.3.1 lambda.r_1.1.7
>> [9] futile.logger_1.4.1 rtracklayer_1.29.27
>> [11] futile.options_1.0.0 bitops_1.0-6
>> [13] RCurl_1.95-4.7 biomaRt_2.25.3
>> [15] RSQLite_1.0.0 Biostrings_2.37.8
>> [17] Rsamtools_1.21.17 XML_3.98-1.3
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
>> list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list