[Bioc-sig-seq] Why are there non-imformative names() from GenomicFeatures:::exonsBy(...)?

Wed Aug 4 01:33:11 CEST 2010

Hi,

I'm seeing if I can transition some of my code from my
(previously-mentioned) GenomeAnnotations package to use
GenomicFeatures, so I have a few questions of how certain things
should be done w/ GenomicFeatures.

I'll start with this one:

Shouldn't the GRangesList returned by exonsBy(txdb, 'tx') have more
informative names than:
"1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
?

I've made a TranscriptDb from the 'ensembl' gene annos, and I'd like
to reconcile which "transcript exons" belong to which transcripts, but
it seems there's no direct link from the GRangesList returned by
exonsBy(..., 'tx') and the GRanges object returned by
transcripts(txdb).

Shouldn't there be? One would expect a 1:1 map, no? The length of the
exonsBy/GRangesList is the same as transcripts/GRanges object, so ...
yes :-).

I mean:

R> txdb <- makeTranscriptDbFromUCSC(genome='hg18', tablename='ensGene')
R> txdb
TranscriptDb object:
| Db type: TranscriptDb
| Data source: UCSC
| Genome: hg18
| UCSC Table: ensGene
| Type of Gene ID: Ensembl gene ID
| Full dataset: yes
| transcript_nrow: 63280
| exon_nrow: 276075
| cds_nrow: 225373
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2010-08-03 16:50:06 -0400 (Tue, 03 Aug 2010)
| GenomicFeatures version at creation time: 1.0.6
| RSQLite version at creation time: 0.9-2

R> xcripts <- transcripts(txdb)
R> xcripts
xcripts
GRanges with 63280 ranges and 2 elementMetadata values
        seqnames               ranges strand   |     tx_id         tx_name
           <Rle>            <IRanges>  <Rle>   | <integer>     <character>
    [1]     chr1     [  1873,   3533]      +   |      1730 ENST00000404059
    [2]     chr1     [ 20229,  20366]      +   |      1732 ENST00000408384
...

R> xcripts.exons <- exonsBy(txdb, 'tx')
R> head(names(xcripts.exons))
[1] "1" "2" "3" "4" "5" "6"

I feel like names(xcripts.exons) should give me something like:
"ENST00000404059" "ENST00000408384" "ENST00000359752" .... (in correct
order, of course)

My same confusion has to do lack of informative names returned from
exonsBy(txdb, 'gene') -- I feel like it should set the names in the
same way that transcriptsBy(txdb, 'gene').

So, I wonder if this is just an oversight, or is it not there on
purpose and I have to rethink the way I approach these problems.
Should I be findOverlap-ing between my xcripts.exons GRangesList and
my xcripts GRanges, or something? And if so, why is that better?

R> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-28 r52408)
Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RSQLite_0.9-2         DBI_0.2-5             GenomicFeatures_1.1.6
GenomicRanges_1.1.20
[5] IRanges_1.7.15

loaded via a namespace (and not attached):
[1] Biobase_2.9.0      biomaRt_2.5.1      Biostrings_2.17.27
BSgenome_1.17.6    RCurl_1.4-3
[6] rtracklayer_1.9.4  tools_2.12.0       XML_3.1-0

Thanks,
-steve
-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact