[R] a difficult situation, how to do this using base function.

Bert Gunter bgunter.4567 at gmail.com
Sat Jul 22 02:41:17 CEST 2017


1. Please always reply to the list, especially here so that others can
see your clarification.

2. What happens if your match.start value exceeds all the cumulative
sums?? -- you seem to imply that this cannot happen.

Your minimal example, while a little confusing (to me) and in html --
this can get mangled in this plain text list, though seemingly not
here -- was very helpful. Essential even . Here is a solution that
seems to work:

WARNING: There are a zillion ways that one might do this. Mine may be
far from the most efficient or the most elegant or the most clear. I
hope it is understandable.

The chief task here is to parse your second column so that it is
numeric and your logic can be applied to it. Due to its simply
structured format, I chose to do this by simply converting the dashes
to commas and using strsplit() to split the single string into a
character vector of numeric values that then can be converted to
numerics. Like this:


df <-data.frame(match.start=c(5,10,100,200),range.coordinates=c("1000-1050","1500-1555","5000-5050,6000-6180","100-150,200-260,600-900"))

## Note the following to convert the default factor to a character
vector. This is essnetial!
df[,2]<- as.character(df[,2])

numex <-gsub("-",",",df[,2],fixed=TRUE) ## convert dashes

## convert to a list of numeric vectors
numex <-lapply(strsplit(numex,",",fixed = TRUE),as.numeric)

## Here's what you get:

> numex
[[1]]
[1] 1000 1050

[[2]]
[1] 1500 1555

[[3]]
[1] 5000 5050 6000 6180

[[4]]
[1] 100 150 200 260 600 900

Because of the fixed format, we know that the even numbered indices in
each vector are for the upper values of the range, and the odd indices
are the lower values. I just break these out in a convenient form -- a
2 column matrix, the first column giving the lower value and the
second the cumulative ranges:

> numex <- lapply(numex,function(x){
+    i <- seq_along(x)
+    odds <- i %% 2 == 1
+    evens <- i %% 2 == 0
+    cbind(x[odds],cumsum(x[evens] - x[odds]))
+ })

## Giving:

> numex
[[1]]
     [,1] [,2]
[1,] 1000   50

[[2]]
     [,1] [,2]
[1,] 1500   55

[[3]]
     [,1] [,2]
[1,] 5000   50
[2,] 6000  230

[[4]]
     [,1] [,2]
[1,]  100   50
[2,]  200  110
[3,]  600  410

Now I just apply your logic row by row (i.e. index by index) to get
the desired column:

df$updated <- sapply( seq_len(nrow(df)),function(i){
   test <- numex[[i]]
   val <- df[i,1]
   if(nrow(test) ==1 ) test[1,1]+val
   else {
      wm <- which(val < test[,2])[1]
      test[wm,1]+val - test[wm-1,2]
   }
})

> df$updated
[1] 1005 1510 6050  690


Cheers,
Bert




On Fri, Jul 21, 2017 at 4:24 PM, Honkit Wong <stephen66 at gmail.com> wrote:
> Sorry for confusion, it was right, it should be: 600+(200-60-50)=690.
> 60 and 50 are from difference of previous two ranges. Thanks! Any clue?
>
> Stephen (Hon-Kit) Wong
>
>> On Jul 21, 2017, at 4:13 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>
>> Shouldn't your last value in match.start.updated = 710, i.e. 600 + 60 + 50  ??
>>
>> If not, you will need to explain yourself more clearly (for me, anyway).
>>
>> Cheers,
>> Bert
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Fri, Jul 21, 2017 at 12:22 PM, Stephen HonKit Wong
>> <stephen66 at gmail.com> wrote:
>>> Hello,
>>>
>>> I have a following dataframe with many rows.
>>> data.frame(match.start=c(5,10,100,200),range.coordinates=c("1000-1050","1500-1555","5000-5050,6000-6180","100-150,200-260,600-900"))
>>>
>>> match.start       range.coordinates
>>>           5               1000-1050
>>>          10               1500-1555
>>>         100               5000-5050,6000-6180
>>>         200              100-150,200-260,600-900
>>>
>>> I want to test for each row element in column "match.start" (e.g. 100 on
>>> 3rd row) if it is less than the accumulated range (e.g. for 5000-5050,
>>> 6000-6180, the accumulated range is: 50, 230), then update the match start
>>> as 6000+ (100-50) = 6050. The result is put on third column.
>>>
>>> match.start         range.coordinates   match.start.updated
>>>          5                   1000-1050                                 1005
>>>         10                 1500-1555                                  1510
>>>        100       5000-5050,6000-6180                         6050
>>>        200   100-150,200-260,600-900                        690
>>>
>>> Many thanks.
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list