[BioC] bug in GenomicRanges gaps?
Blanchette, Marco
MAB at stowers.org
Fri Sep 6 02:52:08 CEST 2013
Little bit more investigating let me realize that it happen when the seqinfo has seqLengths populated, check this out
> genes <- GRanges(seqnames=c(rep("2L",3),rep("3L",3)),
+ ranges=IRanges(start=rep(c(20,200,350),2),end=rep(c(80,275,450),2)))
### Adding some chromosome width
> seqlengths(seqinfo(genes)) <- c(500,700)
### Computing the gaps
> intergenic <- gaps(genes)
### Now I am getting overlapping gaps!
> length(findOverlaps(intergenic,ignoreSelf=TRUE))
[1] 32
### Take a look at ranges 1,2, 7 and 8, they are spanning the full chromosome. Why... Expected?
> intergenic[c(1,2,7,8)]
GRanges with 4 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 2L [1, 500] +
[2] 2L [1, 500] -
[3] 3L [1, 700] +
[4] 3L [1, 700] -
---
seqlengths:
2L 3L
500 700
Thanks
--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.
Kansas City, MO 64110
Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018
On Sep 5, 2013, at 7:14 PM, "Blanchette, Marco" <MAB at stowers.org> wrote:
> Hi,
>
> Is this a bug of normal behavior... (I suspect it's a bug). I am trying to compute the intergenic regions using GenomicFeatures txdb and GenomicRanges gaps() and the GRanges return contains entry spanning all chromosomes on both strand... On a mock GRanges, I don't get this weird behavior (See below). Any suggestions?
>
> thanks
>
>> library(GenomicFeatures)
> ### Downloading the txdb from UCSC
>> txdb <- makeTranscriptDbFromUCSC(genome="dm3",
> tablename="ensGene")
>
> ### Getting the transcripts for each genes
>> allTx <- transcriptsBy(txdb, by='gene')
>
> ### Geting the GRanges for each genes
>> genes <- unlist(range(allTx))
>
> ### Removing the strand to perform the reduce operation
>> strand(genes) <- '*'
>
> ### Now, let's find the gaps
>> genic <- reduce(genes.noStrand)
>
> ### Because of teh reduce, there should be no overlaping genic regions
>> length(findOverlaps(genic,ignoreSelf=TRUE))
> [1] 0
>
> ### Finding the intergenic regions
>> intergenic <- gaps(genic)
>
> ### The difference of non-overlaping ranges should return non-overlaping gaps, but...
>> length(findOverlaps(intergenic,ignoreSelf=TRUE))
> [1] 46300
>
> ### What's funky is that I am getting gaps spanning the full lenght of each chromosomes
> ### Why???
>> chrs.width <- seqlengths(seqinfo(intergenic))
>> intergenic[width(intergenic) == gene2chrsLength]
> GRanges with 30 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] chr2L [1, 23011544] +
> [2] chr2L [1, 23011544] -
> [3] chr2R [1, 21146708] +
> [4] chr2R [1, 21146708] -
> [5] chr3L [1, 24543557] +
> ... ... ... ...
> [26] chrXHet [1, 204112] -
> [27] chrYHet [1, 347038] +
> [28] chrYHet [1, 347038] -
> [29] chrUextra [1, 29004656] +
> [30] chrUextra [1, 29004656] -
> ---
> seqlengths:
> chr2L chr2R chr3L chr3R ... chrXHet chrYHet chrUextra
> 23011544 21146708 24543557 27905053 ... 204112 347038 29004656
>
>
> ### On Fake data
>> genes <- GRanges(seqnames=c(rep("2L",3),rep("3L",3)),
> ranges=IRanges(start=rep(c(20,200,350),2),end=rep(c(80,275,450),2)))
>> length(findOverlaps(intergenic,ignoreSelf=TRUE))
> [1] 0
>> intergenic
> GRanges with 6 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] 2L [ 1, 19] *
> [2] 2L [ 81, 199] *
> [3] 2L [276, 349] *
> [4] 3L [ 1, 19] *
> [5] 3L [ 81, 199] *
> [6] 3L [276, 349] *
> ---
> seqlengths:
> 2L 3L
> NA NA
>
> --
> Marco Blanchette, Ph.D.
> Assistant Investigator
> Stowers Institute for Medical Research
> 1000 East 50th St.
>
> Kansas City, MO 64110
>
> Tel: 816-926-4071
> Cell: 816-726-8419
> Fax: 816-926-2018
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list