[R] comparing two tables
David Winsemius
dwinsemius at comcast.net
Tue Oct 25 15:27:47 CEST 2011
On Oct 25, 2011, at 6:42 AM, Assa Yeroslaviz wrote:
> Hi everybody,
>
> I would like to know whether it is possible to compare to tables for
> certain
> parameters.
> I have these two tables:
> gene table
> name chr start end str accession Length
> gen1 4 646752 646838 + MI0005806 86
> gen12 2L 243035 243141 - MI0005821 106
> gen3 2L 159838 159928 + MI0005813 90
> gen7 2L 1831685 1831799 - MI0011290 114
> gen4 2L 2737568 2737661 + MI0017696 93
> ...
>
> localization table:
> Chr Start End length
> 4 136532 138654 2122
> 3 139870 141970 2100
> 2L 157838 158440 602
> X 160834 162966 2132
> 4 204040 208536 4496
> ...
>
> I would like to check whether a specific gene lie within a certain
> region.
> For example I want to see if gene 3 on chromosome 2L lies within the
> region
> given in the second table.
>
rd.txt <- function(txt, header=TRUE, ...) {
rd <- read.table(textConnection(txt), header=header, ...)
closeAllConnections()
rd }
# Data input
genetable <- rd.txt("name chr start end str
accession Length
gen1 4 646752 646838 + MI0005806 86
gen12 2L 243035 243141 - MI0005821 106
gen3 2L 159838 159928 + MI0005813 90
gen7 2L 1831685 1831799 - MI0011290 114
gen4 2L 2737568 2737661 + MI0017696 93")
loctable <- rd.txt("Chr Start End length
4 136532 138654 2122
3 139870 141970 2100
2L 157838 158440 602
X 160834 162966 2132
4 204040 208536 4496")
# Helper function
inregion <- function(vec, locs) {
any( apply(locs, 1, function(x) vec["start"]>x[1] &
vec["end"]<=x[2])) }
# Test the function
inregion(genetable[2, ], loctable[, c("Start", "End")])
# [1] FALSE
apply(genetable, 1, function(x) inregion(x, loctable[, c("Start",
"End")]) )
#[1] FALSE FALSE FALSE FALSE FALSE
The logical vector can be used to extract elements from genetable, but
seems pointless to offer code that produces an empty dataframe.
(Wouldn't it have been more sensible to offer a test case that had a
combination that satisfied you requirements?)
I'm guessing that this facility would already be implemented in one or
more BioConductor functions.
--
David.
> What I would like to is like
> 1. check if the gene lies on a specific chromosome
> 1.a if no - go to the next line
> 1.b if yes - go to 2
> 2. check if the start position of the gene is bigger than the start
> position
> of the localization table AND if it smaller than the end position
> (if it
> lies between the start and end positions in the localization table)
> 2.a if no - go to the next gene
> 2.b if yes - give it to me.
>
> I was having difficulties doing it without running into three
> interleaved
> conditional loops (if).
>
> I would appreciate any help.
>
> Thanks
>
> Assa
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list