[R] a difficult situation, how to do this using base function.
Bert Gunter
bgunter.4567 at gmail.com
Sat Jul 22 02:41:17 CEST 2017
1. Please always reply to the list, especially here so that others can
see your clarification.
2. What happens if your match.start value exceeds all the cumulative
sums?? -- you seem to imply that this cannot happen.
Your minimal example, while a little confusing (to me) and in html --
this can get mangled in this plain text list, though seemingly not
here -- was very helpful. Essential even . Here is a solution that
seems to work:
WARNING: There are a zillion ways that one might do this. Mine may be
far from the most efficient or the most elegant or the most clear. I
hope it is understandable.
The chief task here is to parse your second column so that it is
numeric and your logic can be applied to it. Due to its simply
structured format, I chose to do this by simply converting the dashes
to commas and using strsplit() to split the single string into a
character vector of numeric values that then can be converted to
numerics. Like this:
df <-data.frame(match.start=c(5,10,100,200),range.coordinates=c("1000-1050","1500-1555","5000-5050,6000-6180","100-150,200-260,600-900"))
## Note the following to convert the default factor to a character
vector. This is essnetial!
df[,2]<- as.character(df[,2])
numex <-gsub("-",",",df[,2],fixed=TRUE) ## convert dashes
## convert to a list of numeric vectors
numex <-lapply(strsplit(numex,",",fixed = TRUE),as.numeric)
## Here's what you get:
> numex
[[1]]
[1] 1000 1050
[[2]]
[1] 1500 1555
[[3]]
[1] 5000 5050 6000 6180
[[4]]
[1] 100 150 200 260 600 900
Because of the fixed format, we know that the even numbered indices in
each vector are for the upper values of the range, and the odd indices
are the lower values. I just break these out in a convenient form -- a
2 column matrix, the first column giving the lower value and the
second the cumulative ranges:
> numex <- lapply(numex,function(x){
+ i <- seq_along(x)
+ odds <- i %% 2 == 1
+ evens <- i %% 2 == 0
+ cbind(x[odds],cumsum(x[evens] - x[odds]))
+ })
## Giving:
> numex
[[1]]
[,1] [,2]
[1,] 1000 50
[[2]]
[,1] [,2]
[1,] 1500 55
[[3]]
[,1] [,2]
[1,] 5000 50
[2,] 6000 230
[[4]]
[,1] [,2]
[1,] 100 50
[2,] 200 110
[3,] 600 410
Now I just apply your logic row by row (i.e. index by index) to get
the desired column:
df$updated <- sapply( seq_len(nrow(df)),function(i){
test <- numex[[i]]
val <- df[i,1]
if(nrow(test) ==1 ) test[1,1]+val
else {
wm <- which(val < test[,2])[1]
test[wm,1]+val - test[wm-1,2]
}
})
> df$updated
[1] 1005 1510 6050 690
Cheers,
Bert
On Fri, Jul 21, 2017 at 4:24 PM, Honkit Wong <stephen66 at gmail.com> wrote:
> Sorry for confusion, it was right, it should be: 600+(200-60-50)=690.
> 60 and 50 are from difference of previous two ranges. Thanks! Any clue?
>
> Stephen (Hon-Kit) Wong
>
>> On Jul 21, 2017, at 4:13 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>
>> Shouldn't your last value in match.start.updated = 710, i.e. 600 + 60 + 50 ??
>>
>> If not, you will need to explain yourself more clearly (for me, anyway).
>>
>> Cheers,
>> Bert
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Fri, Jul 21, 2017 at 12:22 PM, Stephen HonKit Wong
>> <stephen66 at gmail.com> wrote:
>>> Hello,
>>>
>>> I have a following dataframe with many rows.
>>> data.frame(match.start=c(5,10,100,200),range.coordinates=c("1000-1050","1500-1555","5000-5050,6000-6180","100-150,200-260,600-900"))
>>>
>>> match.start range.coordinates
>>> 5 1000-1050
>>> 10 1500-1555
>>> 100 5000-5050,6000-6180
>>> 200 100-150,200-260,600-900
>>>
>>> I want to test for each row element in column "match.start" (e.g. 100 on
>>> 3rd row) if it is less than the accumulated range (e.g. for 5000-5050,
>>> 6000-6180, the accumulated range is: 50, 230), then update the match start
>>> as 6000+ (100-50) = 6050. The result is put on third column.
>>>
>>> match.start range.coordinates match.start.updated
>>> 5 1000-1050 1005
>>> 10 1500-1555 1510
>>> 100 5000-5050,6000-6180 6050
>>> 200 100-150,200-260,600-900 690
>>>
>>> Many thanks.
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list