[BioC] Determining an overlapping annotation data subset (overlap/overlaps)
Stephen Montgomery
sm8 at sanger.ac.uk
Mon Aug 6 14:52:27 CEST 2007
Hello Bioconductor -
Apologies as this a fairly rookie bioinformatics based R question, but I
am trying to determine if there is a R one-liner to extract a subset of
a data frame which possesses annotation contained within it that has
been stored in another data frame? (For example extracting genomic
intervals which contain certain features/annotation)
Such that:
If I have dataframe "A" possessing an "id", "start", and "end"; And
dataframe "B" also possessing an "id", "start", and "end"; The output is
all the rows of A which contain an entry of B (B$start, B$end) within
A$start and A$end.
I have tried my own fairly uninformed variants like this to no-avail
A[length(B[B$start <= A$end & B$end >= A$start]) > 0,]
I fear the solution will be trivial but as yet it has eluded me. :/
Thanks for any help! (Theoretically, I can also see doing this in its
own function by creating a vector of counts for each member of "A" and
then reporting those that are non-zero but I was wondering if there was
a more succinct and likely efficient way)
Thanks again,
Stephen
Stephen Montgomery, B.A.Sc., Ph.D.
Postdoctoral Researcher, Team 16
Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA
Phone: 44-1223-834244 (ext 7297)
Skype: stephen.b.montgomery
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioconductor
mailing list