[BioC] distanceToNearest in GenomicRanges
James W. MacDonald
jmacdon at uw.edu
Mon Feb 11 19:08:11 CET 2013
Hi Tom,
On 2/11/2013 11:35 AM, Tom Oates wrote:
> Hi
> I am very much a learner in R in general& GenomicRanges in general
> I am struggling to find documentation to help me get my head around
> distanceToNearest in GenomicRanges
> If I have a GRanges object:
>
> GRanges with 6 ranges and 4 metadata columns:
> seqnames ranges strand |
> <Rle> <IRanges> <Rle> |
> [1] 10 [ 96723746, 96723747] - |
> [2] 7 [ 13641170, 13641171] + |
> [3] 16 [ 17772801, 17772802] - |
> [4] 3 [ 88173502, 88173503] - |
> [5] 13 [106979682, 106979683] + |
> [6] 9 [104393139, 104393140] + |
>
> (You will notice that all the regions are only dinucleotides& I have
> removed the metadata )
>
> I have a 2nd GRanges object which is ensembl rat transcripts as below:
> 39549 ranges and 2 metadata columns:
> seqnames ranges strand | tx_id
> tx_name
> <Rle> <IRanges> <Rle> |<integer>
> <character>
> [1] 1 [5473, 16844] + | 1
> ENSRNOT00000044270
> [2] 1 [5526, 16968] + | 2
> ENSRNOT00000049921
> [3] 1 [5526, 16968] + | 3
> ENSRNOT00000051735
> [4] 1 [5598, 13520] + | 4
> ENSRNOT00000034630
> [5] 1 [8268, 16850] + | 5
> ENSRNOT00000044505
> [6] 1 [8316, 17577] + | 6
> ENSRNOT00000042693
> [7] 1 [8884, 16850] + | 7
> ENSRNOT00000044187
> [8] 1 [8956, 9955] + | 8
> ENSRNOT00000041082
> [9] 1 [9055, 17351] + | 9
> ENSRNOT00000050254
>
>
> If I invoke:
> xx<-distanceToNearest(diff.cpgs.gr, rat.transcripts, ignore.strand=F)
>
> xx
> DataFrame with 1133 rows and 3 columns
> queryHits subjectHits distance
> <integer> <integer> <integer>
> 1 1 7752 0
> 2 2 32166 11946
> 3 3 14678 25377
> 4 4 24286 66747
> 5 5 10609 34242
> 6 6 37076 122683
> 7 7 35184 0
> 8 8 34180 45561
> 9 9 19351 50156
> ... ... ... ...
> etc
>
> I am uncertain how I would then use the xx output to gain information (i.e.
> tx_id, tx_name) about the feature which the function has identified as
> nearest?
> I would be happy to supply any more info as required
The subjectHits column gives the row of your transcript GRanges object
that matches the corresponding query row. I am assuming here that the
'diff.cpgs.gr' GRanges object is longer than 6? Anyway, here is an
example using your data and the TxDb.Mmusculus.UCSC.mm10.knownGene package:
> x
GRanges with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr10 [ 96723746, 96723747] *
[2] chr7 [ 13641170, 13641171] *
[3] chr16 [ 17772801, 17772802] *
[4] chr3 [ 88173502, 88173503] *
[5] chr13 [106979682, 106979683] *
[6] chr9 [104393139, 104393140] *
---
> y <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
> xx <- distanceToNearest(x, y, ignore.strand=F)
> xx
DataFrame with 6 rows and 3 columns
queryHits subjectHits distance
<integer> <integer> <integer>
1 1 4514 100935
2 2 45653 0
3 3 19383 0
4 4 34197 0
5 5 14383 0
6 6 54212 8108
> y[xx[,2],]
GRanges with 6 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr10 [ 96617001, 96622811] + | 33419 uc007gww.2
[2] chr7 [ 13623967, 13670807] + | 21400 uc012ezp.1
[3] chr16 [ 17759663, 17779206] + | 48288 uc007ylz.1
[4] chr3 [ 88171560, 88177785] - | 10107 uc008puf.2
[5] chr13 [106963757, 107022114] - | 43288 uc007rue.1
[6] chr9 [104361832, 104385031] + | 29956 uc009rhp.1
---
seqlengths:
chr1 chr2 ... chrUn_JH584304
195471971 182113224 ... 114452
Best,
Jim
> Tom
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list