[BioC] Determining an overlapping annotation data subset (overlap/overlaps)
Herve Pages
hpages at fhcrc.org
Tue Aug 7 03:01:34 CEST 2007
Herve Pages wrote:
> Hi Stephen,
>
>> A <- data.frame(start=(1:5)*10L, end=(4:8)*10L)
>> A
> start end
> 1 10 40
> 2 20 50
> 3 30 60
> 4 40 70
> 5 50 80
>
>> B <- data.frame(start=c(31L, 39L, 80L), end=c(60L, 40L, 84L))
>> B
> start end
> 1 31 60
> 2 39 40
> 3 80 84
>
> You can create a logical vector of the length the number of rows in A: for each
> A-row it says if there is any B-row inside:
>
> contains_a_Brow <- mapply(function(Astart, Aend) any(Astart <= B$start & B$end <= Aend),
> A$start, A$end)
This will be TRUE for A-rows that have at least 1 B-row within their limits.
For selecting the A-rows that are _overlapping_ with at least 1 B-rows, use:
contains_a_Brow <- mapply(function(Astart, Aend) any(Astart <= B$end & B$start <= Aend),
A$start, A$end)
H.
>
> Then use this logical vector to subset A:
>
> A[contains_a_Brow, ]
>
> Cheers,
> H.
>
> Stephen Montgomery wrote:
>> Hello Bioconductor -
>>
>> Apologies as this a fairly rookie bioinformatics based R question, but I
>> am trying to determine if there is a R one-liner to extract a subset of
>> a data frame which possesses annotation contained within it that has
>> been stored in another data frame? (For example extracting genomic
>> intervals which contain certain features/annotation)
>>
>> Such that:
>> If I have dataframe "A" possessing an "id", "start", and "end"; And
>> dataframe "B" also possessing an "id", "start", and "end"; The output is
>> all the rows of A which contain an entry of B (B$start, B$end) within
>> A$start and A$end.
>>
>> I have tried my own fairly uninformed variants like this to no-avail
>> A[length(B[B$start <= A$end & B$end >= A$start]) > 0,]
>> I fear the solution will be trivial but as yet it has eluded me. :/
>>
>> Thanks for any help! (Theoretically, I can also see doing this in its
>> own function by creating a vector of counts for each member of "A" and
>> then reporting those that are non-zero but I was wondering if there was
>> a more succinct and likely efficient way)
>>
>> Thanks again,
>> Stephen
>>
>>
>>
>> Stephen Montgomery, B.A.Sc., Ph.D.
>> Postdoctoral Researcher, Team 16
>> Wellcome Trust Sanger Institute
>> Hinxton, Cambridge CB10 1SA
>> Phone: 44-1223-834244 (ext 7297)
>> Skype: stephen.b.montgomery
>>
>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list