[BioC] bug in GenomicRanges gaps?
Cook, Malcolm
MEC at stowers.org
Fri Sep 6 02:53:58 CEST 2013
Marco,
genes.noStrand is unset when you first reference it, so it is hard to know what you are trying to do.
Does your definition of intergenic want to ignore strand (i.e. can the same locus be genic on one strand and integenic on another?)
I'm guessing the problem you are witnessing is that the strand _is_ not '*' when you thought you had set it to '*'. Check this!
Also, does your definition of intergenic want to include non-genic segments that are the very beginning/end of a chromosome (and therefore only have a gene on one-side and thus arguably are not truly inter-genic). If so, your approach is not going to include these chromosome-terminal regions. In which case, you might try:
genic.gr<-unlist(transcriptsBy(tr.db,'gene'),use.names=FALSE)
strand(genic.gr)<-'*'
genic.gr<-reduce(genic.gr)
chrom.gr<-as(seqinfo(tr.db),'GRanges')
intergenic.gr<-setdiff(chrom.gr,genic.gr)
stopifnot(0==length(findOverlaps(intergenic.gr,ignoreSelf=TRUE)))
~ malcolm_cook at stowers.org
________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Blanchette, Marco
Sent: Thursday, September 05, 2013 7:14 PM
To: BioConductor
Subject: [BioC] bug in GenomicRanges gaps?
Hi,
Is this a bug of normal behavior... (I suspect it's a bug). I am trying to compute the intergenic regions using GenomicFeatures txdb and GenomicRanges gaps() and the GRanges return contains entry spanning all chromosomes on both strand... On a mock GRanges, I don't get this weird behavior (See below). Any suggestions?
thanks
> library(GenomicFeatures)
### Downloading the txdb from UCSC
> txdb <- makeTranscriptDbFromUCSC(genome="dm3",
tablename="ensGene")
### Getting the transcripts for each genes
> allTx <- transcriptsBy(txdb, by='gene')
### Geting the GRanges for each genes
> genes <- unlist(range(allTx))
### Removing the strand to perform the reduce operation
> strand(genes) <- '*'
### Now, let's find the gaps
> genic <- reduce(genes.noStrand)
### Because of teh reduce, there should be no overlaping genic regions
> length(findOverlaps(genic,ignoreSelf=TRUE))
[1] 0
### Finding the intergenic regions
> intergenic <- gaps(genic)
### The difference of non-overlaping ranges should return non-overlaping gaps, but...
> length(findOverlaps(intergenic,ignoreSelf=TRUE))
[1] 46300
### What's funky is that I am getting gaps spanning the full lenght of each chromosomes
### Why???
> chrs.width <- seqlengths(seqinfo(intergenic))
> intergenic[width(intergenic) == gene2chrsLength]
GRanges with 30 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2L [1, 23011544] +
[2] chr2L [1, 23011544] -
[3] chr2R [1, 21146708] +
[4] chr2R [1, 21146708] -
[5] chr3L [1, 24543557] +
... ... ... ...
[26] chrXHet [1, 204112] -
[27] chrYHet [1, 347038] +
[28] chrYHet [1, 347038] -
[29] chrUextra [1, 29004656] +
[30] chrUextra [1, 29004656] -
---
seqlengths:
chr2L chr2R chr3L chr3R ... chrXHet chrYHet chrUextra
23011544 21146708 24543557 27905053 ... 204112 347038 29004656
### On Fake data
> genes <- GRanges(seqnames=c(rep("2L",3),rep("3L",3)),
ranges=IRanges(start=rep(c(20,200,350),2),end=rep(c(80,275,450),2)))
> length(findOverlaps(intergenic,ignoreSelf=TRUE))
[1] 0
> intergenic
GRanges with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 2L [ 1, 19] *
[2] 2L [ 81, 199] *
[3] 2L [276, 349] *
[4] 3L [ 1, 19] *
[5] 3L [ 81, 199] *
[6] 3L [276, 349] *
---
seqlengths:
2L 3L
NA NA
--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.
Kansas City, MO 64110
Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list