[BioC] Determining an overlapping annotation data subset(overlap/overlaps)
alex lam (RI)
alex.lam at bbsrc.ac.uk
Mon Aug 6 18:13:47 CEST 2007
Hi Stephen,
Don't know if it does what you want (and it isn't a one-liner), but here
it is anyway:
> a<-data.frame(id=1:4, start=seq(10, 40, 10), end=seq(15, 45, 10))
> b<-data.frame(id=5:8, start=c(11,24,44,55), end=c(14,26,45,57))
> a # large sequence features
id start end
1 1 10 15
2 2 20 25
3 3 30 35
4 4 40 45
> b # smaller sequence features
id start end
1 5 11 14
2 6 24 26
3 7 44 45
4 8 55 57
> bool.matrix<-NULL
> for(i in 1:nrow(b)) {bool.matrix<-rbind(bool.matrix, b$start[i] >=
a$start & b$end[i] <= a$end)}
> colnames(bool.matrix)<-a$id
> rownames(bool.matrix)<-b$id
> bool.matrix
1 2 3 4
5 TRUE FALSE FALSE FALSE
6 FALSE FALSE FALSE FALSE
7 FALSE FALSE FALSE TRUE
8 FALSE FALSE FALSE FALSE
Cheers,
Alex
------------------------------------
Alex Lam
Roslin Institute (Edinburgh)
Roslin
Midlothian EH25 9PS
Great Britain
Phone +44 131 5274471
Web http://www.roslin.ac.uk
Roslin Institute is a company limited by guarantee, registered in
Scotland (registered number SC157100) and a Scottish Charity (registered
number SC023592). Our registered office is at Roslin, Midlothian, EH25
9PS. VAT registration number 847380013.
The information contained in this e-mail (including any attachments) is
confidential and is intended for the use of the addressee only. The
opinions expressed within this e-mail (including any attachments) are
the opinions of the sender and do not necessarily constitute those of
Roslin Institute (Edinburgh) ("the Institute") unless specifically
stated by a sender who is duly authorised to do so on behalf of the
Institute
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Stephen
Montgomery
Sent: 06 August 2007 13:52
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Determining an overlapping annotation data
subset(overlap/overlaps)
Hello Bioconductor -
Apologies as this a fairly rookie bioinformatics based R question, but I
am trying to determine if there is a R one-liner to extract a subset of
a data frame which possesses annotation contained within it that has
been stored in another data frame? (For example extracting genomic
intervals which contain certain features/annotation)
Such that:
If I have dataframe "A" possessing an "id", "start", and "end"; And
dataframe "B" also possessing an "id", "start", and "end"; The output is
all the rows of A which contain an entry of B (B$start, B$end) within
A$start and A$end.
I have tried my own fairly uninformed variants like this to no-avail
A[length(B[B$start <= A$end & B$end >= A$start]) > 0,] I fear the
solution will be trivial but as yet it has eluded me. :/
Thanks for any help! (Theoretically, I can also see doing this in its
own function by creating a vector of counts for each member of "A" and
then reporting those that are non-zero but I was wondering if there was
a more succinct and likely efficient way)
Thanks again,
Stephen
Stephen Montgomery, B.A.Sc., Ph.D.
Postdoctoral Researcher, Team 16
Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA
Phone: 44-1223-834244 (ext 7297)
Skype: stephen.b.montgomery
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list