[R] merge function in R?
David Winsemius
dwinsemius at comcast.net
Sat Aug 14 01:29:58 CEST 2010
Neither you nor your responder have continued the eamil chain very
well so let me put things back together:
on Aug 13, 2010; 03:54pm fishkbob wrote subj = merge function in R?
>>> So I have a bunch of c(start,end) points and want to consolidate
>>> them into as few c(start,end) as possible.
>>>
>>> For example:
>>> sample start end
>>> A 5 10
>>> B 7 18
>>> C 1 4
>>> D 16 20
>>>
>>> I'd want the function to return the two distinct sets (1,4) and
>>> (5,20)
>>>
>>> Is there an R function that already does this?
>>> or should I write my own? (how would I go about that?)
> In an effort to be be helpful but not copying the prior message on
> Aug 13, 2010; 06:46pm JesperHybel wrote:
>> I think it would be helpful if you could clarify youre question -
>> do you want distinct sets - maybe use
>>
>> unique()
>>
>> but why (5,20) when its (5,10) in the row in youre example? What
>> criteria do you want the function to select the "sets" by and what
>> kind of output do you need?
>>
>> Maybe it's just me who dosn't get the question..sr
On Aug 13, 2010, at 7:01 PM, fishkbob wrote:
>
> I too think I worded it incorrectly...
>
> so the second two columns of the matrix are the start and end of an
> interval
> however, because some of the intervals overlap, I want to limit the
> number
> of intervals I have to deal with.
>
> So therefore,
> (5 10) should merge with (7 18) making (5 18)
> and then (5 18) should merge with (16 20) giving (5 20)
> whereas (1 4) has no overlap with any other interval and is
> therefore
> left on its own
>
> Ideal output would just be a collapsing of the matrix
> sample start end
> # 5 20
> # 1 4
>
> I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives
> me a
> c(1:4,5:20)
> However, I have to do this on a very large dataset and the numbers
> are more
> like
> c(100542:100782,598322:598821,...)
>
> any help would be appreciated
> thanks
> --
> View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html
> Sent from the R help mailing list archive at Nabble.com.
Nabble is where I saw all of this, but Nabble is not r-help:
I suggest you sort your rows by the "start" variable and then examine
where the breaks would remain by looking at the prior values of "end":
> dd <- rd.txt("sample start end
+ A 5 10
+ B 7 18
+ C 1 4
+ D 16 20")
> dd[order(dd$start), ]
sample start end
3 C 1 4
1 A 5 10
2 B 7 18
4 D 16 20
> ndd <- dd[order(dd$start), ]
> ndd$inprior <- c(NA, ndd[1:nrow(ndd)-1,3] >= ndd[2:nrow(ndd),2] )
> ndd
sample start end inprior
3 C 1 4 NA
1 A 5 10 FALSE
2 B 7 18 TRUE
4 D 16 20 TRUE
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list