[BioC] [R] comparing two tables

David Winsemius dwinsemius at comcast.net
Tue Oct 25 18:01:07 CEST 2011


On Oct 25, 2011, at 10:40 AM, Assa Yeroslaviz wrote:

> Hi all,
>
> @Martin - thanks for the help it works very good.
>
> @David - sorry for the misunderstanding. I will see to it, that it  
> won't
> happen again.
> BTW, unfortunately your function is not working.
> It is patialy my error as I gave no regions with overlaps, but even  
> after
> changing them it just doesn't fit.
>
> Here is the new data with an overlap in the third gene:
>
> genetable <- rd.txt("name     chr     start     end     str
> accession     Length
>
> gen1     4     646752     646838     +     MI0005806     86
> gen12     2L     243035     243141     -     MI0005821     106
> gen3     2L     159838     159928     +     MI0005813     90
> gen7     2L     1831685     1831799     -     MI0011290     114
> gen4     2L     2737568     2737661     +     MI0017696     93")
> loctable <- rd.txt("Chr     Start     End     length
>
> 4     136532     138654     2122
> 3     139870     141970     2100
> 2L     157838     160440     2602
> X     160834     162966     2132
> 4     204040     208536     4496")
>
> But I still get:
>> apply(genetable, 1, function(x) inregion(x, loctable[, c("Start",
> "End")]) )
> [1] FALSE FALSE FALSE FALSE FALSE

You just want to pass the start and end columns of genetable

 > # Helper function
 > inregion <- function(vec, locs) {
+        any( apply(locs, 1, function(x) vec["start"]>x[1] &  
vec["end"]<=x[2])) }
 > # Test the function
 > inregion(genetable[2, ], loctable[, c("Start", "End")])
[1] FALSE
 > # [1] FALSE
 >
 > apply(genetable[, 3:4], 1, function(x) inregion(x, loctable[,  
c("Start", "End")]) )
[1] FALSE FALSE  TRUE FALSE FALSE

( I really wish that you would stop crossposting. I am only following  
your bad practice because you posted my code on BioC.)

-- 
David
>
> for the single queries I get TRUE:
>
>> inregion(genetable[3, ], loctable[, c("Start", "End")])
> [1] TRUE
>
> Do you have Idea, as to how I can fix this problem?
>
> Thanks and again sorry for the trouble.
>
> Assa
>
> On Tue, Oct 25, 2011 at 15:48, Martin Morgan <mtmorgan at fhcrc.org>  
> wrote:
>
>> On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote:
>>
>>> Hi everybody,
>>>
>>> I would like to know whether it is possible to compare to tables for
>>> certain
>>> parameters.
>>> I have these two tables:
>>> gene table
>>> name     chr     start     end     str     accession     Length
>>> gen1     4     646752     646838     +     MI0005806     86
>>> gen12     2L     243035     243141     -     MI0005821     106
>>> gen3     2L     159838     159928     +     MI0005813     90
>>> gen7     2L     1831685     1831799     -     MI0011290     114
>>> gen4     2L     2737568     2737661     +     MI0017696     93
>>> ...
>>>
>>> localization table:
>>> Chr     Start     End     length
>>> 4     136532     138654     2122
>>> 3     139870     141970     2100
>>> 2L     157838     158440     602
>>> X     160834     162966     2132
>>> 4     204040     208536     4496
>>> ...
>>>
>>> I would like to check whether a specific gene lie within a certain  
>>> region.
>>> For example I want to see if gene 3 on chromosome 2L lies within the
>>> region
>>> given in the second table.
>>>
>>
>> Hi Assa --
>>
>> In Bioconductor, use the GenomicRanges package. Create two GRanges  
>> objects
>>
>> genes = with(genetable, GRanges(chr, IRanges(start, end), str,
>>                                 accession=accession, Length=length)
>> locations = with(locationtable, GRanges(Chr, IRanges(Start, End)))
>>
>> then
>>
>> olaps = findOverlaps(genes, locations)
>>
>> queryHits(olaps) and subjectHits(olaps) index each gene with all  
>> locations
>> it overlaps. The definition of 'overlap' is flexible, see ? 
>> findOverlaps.
>>
>> Martin
>>
>>
>>
>>> What I would like to is like
>>> 1. check if the gene lies on a specific chromosome
>>> 1.a if no - go to the next line
>>> 1.b if yes - go to 2
>>> 2. check if the start position of the gene is bigger than the start
>>> position
>>> of the localization table AND if it smaller than the end position  
>>> (if it
>>> lies between the start and end positions in the localization table)
>>> 2.a if no - go to the next gene
>>> 2.b if yes - give it to me.
>>>
>>> I was having difficulties doing it without running into three  
>>> interleaved
>>> conditional loops (if).
>>>
>>> I would appreciate any help.
>>>
>>> Thanks
>>>
>>> Assa
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor 
>>> >
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor 
>>> >
>>>
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the Bioconductor mailing list