[R] [BioC] comparing two tables
David Winsemius
dwinsemius at comcast.net
Tue Oct 25 18:01:07 CEST 2011
On Oct 25, 2011, at 10:40 AM, Assa Yeroslaviz wrote:
> Hi all,
>
> @Martin - thanks for the help it works very good.
>
> @David - sorry for the misunderstanding. I will see to it, that it
> won't
> happen again.
> BTW, unfortunately your function is not working.
> It is patialy my error as I gave no regions with overlaps, but even
> after
> changing them it just doesn't fit.
>
> Here is the new data with an overlap in the third gene:
>
> genetable <- rd.txt("name chr start end str
> accession Length
>
> gen1 4 646752 646838 + MI0005806 86
> gen12 2L 243035 243141 - MI0005821 106
> gen3 2L 159838 159928 + MI0005813 90
> gen7 2L 1831685 1831799 - MI0011290 114
> gen4 2L 2737568 2737661 + MI0017696 93")
> loctable <- rd.txt("Chr Start End length
>
> 4 136532 138654 2122
> 3 139870 141970 2100
> 2L 157838 160440 2602
> X 160834 162966 2132
> 4 204040 208536 4496")
>
> But I still get:
>> apply(genetable, 1, function(x) inregion(x, loctable[, c("Start",
> "End")]) )
> [1] FALSE FALSE FALSE FALSE FALSE
You just want to pass the start and end columns of genetable
> # Helper function
> inregion <- function(vec, locs) {
+ any( apply(locs, 1, function(x) vec["start"]>x[1] &
vec["end"]<=x[2])) }
> # Test the function
> inregion(genetable[2, ], loctable[, c("Start", "End")])
[1] FALSE
> # [1] FALSE
>
> apply(genetable[, 3:4], 1, function(x) inregion(x, loctable[,
c("Start", "End")]) )
[1] FALSE FALSE TRUE FALSE FALSE
( I really wish that you would stop crossposting. I am only following
your bad practice because you posted my code on BioC.)
--
David
>
> for the single queries I get TRUE:
>
>> inregion(genetable[3, ], loctable[, c("Start", "End")])
> [1] TRUE
>
> Do you have Idea, as to how I can fix this problem?
>
> Thanks and again sorry for the trouble.
>
> Assa
>
> On Tue, Oct 25, 2011 at 15:48, Martin Morgan <mtmorgan at fhcrc.org>
> wrote:
>
>> On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote:
>>
>>> Hi everybody,
>>>
>>> I would like to know whether it is possible to compare to tables for
>>> certain
>>> parameters.
>>> I have these two tables:
>>> gene table
>>> name chr start end str accession Length
>>> gen1 4 646752 646838 + MI0005806 86
>>> gen12 2L 243035 243141 - MI0005821 106
>>> gen3 2L 159838 159928 + MI0005813 90
>>> gen7 2L 1831685 1831799 - MI0011290 114
>>> gen4 2L 2737568 2737661 + MI0017696 93
>>> ...
>>>
>>> localization table:
>>> Chr Start End length
>>> 4 136532 138654 2122
>>> 3 139870 141970 2100
>>> 2L 157838 158440 602
>>> X 160834 162966 2132
>>> 4 204040 208536 4496
>>> ...
>>>
>>> I would like to check whether a specific gene lie within a certain
>>> region.
>>> For example I want to see if gene 3 on chromosome 2L lies within the
>>> region
>>> given in the second table.
>>>
>>
>> Hi Assa --
>>
>> In Bioconductor, use the GenomicRanges package. Create two GRanges
>> objects
>>
>> genes = with(genetable, GRanges(chr, IRanges(start, end), str,
>> accession=accession, Length=length)
>> locations = with(locationtable, GRanges(Chr, IRanges(Start, End)))
>>
>> then
>>
>> olaps = findOverlaps(genes, locations)
>>
>> queryHits(olaps) and subjectHits(olaps) index each gene with all
>> locations
>> it overlaps. The definition of 'overlap' is flexible, see ?
>> findOverlaps.
>>
>> Martin
>>
>>
>>
>>> What I would like to is like
>>> 1. check if the gene lies on a specific chromosome
>>> 1.a if no - go to the next line
>>> 1.b if yes - go to 2
>>> 2. check if the start position of the gene is bigger than the start
>>> position
>>> of the localization table AND if it smaller than the end position
>>> (if it
>>> lies between the start and end positions in the localization table)
>>> 2.a if no - go to the next gene
>>> 2.b if yes - give it to me.
>>>
>>> I was having difficulties doing it without running into three
>>> interleaved
>>> conditional loops (if).
>>>
>>> I would appreciate any help.
>>>
>>> Thanks
>>>
>>> Assa
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >
>>>
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list