[BioC] bug in GenomicRanges gaps?
Blanchette, Marco
MAB at stowers.org
Fri Sep 6 02:14:00 CEST 2013
Hi,
Is this a bug of normal behavior... (I suspect it's a bug). I am trying to compute the intergenic regions using GenomicFeatures txdb and GenomicRanges gaps() and the GRanges return contains entry spanning all chromosomes on both strand... On a mock GRanges, I don't get this weird behavior (See below). Any suggestions?
thanks
> library(GenomicFeatures)
### Downloading the txdb from UCSC
> txdb <- makeTranscriptDbFromUCSC(genome="dm3",
tablename="ensGene")
### Getting the transcripts for each genes
> allTx <- transcriptsBy(txdb, by='gene')
### Geting the GRanges for each genes
> genes <- unlist(range(allTx))
### Removing the strand to perform the reduce operation
> strand(genes) <- '*'
### Now, let's find the gaps
> genic <- reduce(genes.noStrand)
### Because of teh reduce, there should be no overlaping genic regions
> length(findOverlaps(genic,ignoreSelf=TRUE))
[1] 0
### Finding the intergenic regions
> intergenic <- gaps(genic)
### The difference of non-overlaping ranges should return non-overlaping gaps, but...
> length(findOverlaps(intergenic,ignoreSelf=TRUE))
[1] 46300
### What's funky is that I am getting gaps spanning the full lenght of each chromosomes
### Why???
> chrs.width <- seqlengths(seqinfo(intergenic))
> intergenic[width(intergenic) == gene2chrsLength]
GRanges with 30 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2L [1, 23011544] +
[2] chr2L [1, 23011544] -
[3] chr2R [1, 21146708] +
[4] chr2R [1, 21146708] -
[5] chr3L [1, 24543557] +
... ... ... ...
[26] chrXHet [1, 204112] -
[27] chrYHet [1, 347038] +
[28] chrYHet [1, 347038] -
[29] chrUextra [1, 29004656] +
[30] chrUextra [1, 29004656] -
---
seqlengths:
chr2L chr2R chr3L chr3R ... chrXHet chrYHet chrUextra
23011544 21146708 24543557 27905053 ... 204112 347038 29004656
### On Fake data
> genes <- GRanges(seqnames=c(rep("2L",3),rep("3L",3)),
ranges=IRanges(start=rep(c(20,200,350),2),end=rep(c(80,275,450),2)))
> length(findOverlaps(intergenic,ignoreSelf=TRUE))
[1] 0
> intergenic
GRanges with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 2L [ 1, 19] *
[2] 2L [ 81, 199] *
[3] 2L [276, 349] *
[4] 3L [ 1, 19] *
[5] 3L [ 81, 199] *
[6] 3L [276, 349] *
---
seqlengths:
2L 3L
NA NA
--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.
Kansas City, MO 64110
Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018
More information about the Bioconductor
mailing list