[R] merging and obtaining the nearest value

Francesco cariboupad at gmx.fr
Sun Aug 19 15:14:47 CEST 2012


Thank you very much Rui

On 19 August 2012 13:49, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Yes you can, if you have memory problems, say so and we'll see it then.
> In the mean time, there's something you should change, to allow for several
> minima but to only return one per combination of  TYPE and DATE.
>
> Replace this
>
> x[which(min(a) == a), ]
>
> by this
>
> x[which.min(a), ]
>
> Rui Barradas
>
> Em 19-08-2012 12:00, Francesco escreveu:
>
>> Dear Riu, Many thanks for your suggestion
>>
>> However these are just simplified examples... in reality the dataset A
>> contains millions of observations and B several thousands of rows...
>> Could I still use a modified form of your suggestion?
>>
>> Thanks
>>
>> On 19 August 2012 12:51, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>>>
>>> Hello,
>>>
>>> Try the following.
>>>
>>>
>>> A <- read.table(text="
>>>
>>> TYPE   DATE
>>> A            2
>>> A            5
>>> A            20
>>> B            10
>>> B            2
>>> ", header = TRUE)
>>>
>>>
>>> B <- read.table(text="
>>>
>>> TYPE  Special_Date
>>> A              2
>>> A              6
>>> A              20
>>> A              22
>>> B              5
>>> B              6
>>> ", header = TRUE)
>>>
>>> result <- do.call( rbind, lapply(split(merge(A, B), list(m$DATE,
>>> m$TYPE)),
>>> function(x){
>>>          a <- abs(x$DATE - x$Special_Date)
>>>          if(nrow(x)) x[which(min(a) == a), ] }) )
>>> result$Difference <- result$DATE - result$Special_Date
>>> result$Special_Date <- NULL
>>> rownames(result) <- seq_len(nrow(result))
>>> result
>>>
>>>
>>> Also, it's a good practice to post data examples using dput(). For
>>> instance,
>>>
>>> dput(A)
>>> structure(list(TYPE = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A",
>>> "B"), class = "factor"), DATE = c(2L, 5L, 20L, 10L, 2L)), .Names =
>>> c("TYPE",
>>> "DATE"), class = "data.frame", row.names = c(NA, -5L))
>>>
>>> Now all we have to do is run the statement A <- structure(... etc...) to
>>> have an exact copy of the data example.
>>> Anyway, your example with input and the wanted result was very welcome.
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Em 19-08-2012 11:10, Francesco escreveu:
>>>>
>>>> Dear R-help
>>>>
>>>> Î would like to know if there is a short solution in R for this
>>>> merging problem...
>>>>
>>>> Let say I have a dataset A as:
>>>>
>>>> TYPE   DATE
>>>> A            2
>>>> A            5
>>>> A            20
>>>> B            10
>>>> B            2
>>>>
>>>> (there can be duplicates for the same type and date)
>>>>
>>>> and I have another dataset B as :
>>>>
>>>> TYPE  Special_Date
>>>> A              2
>>>> A              6
>>>> A              20
>>>> A              22
>>>> B              5
>>>> B              6
>>>>
>>>> The question is : I would like to obtain the difference between the
>>>> date of each observation in A and the closest special date in B with
>>>> the same type. In case of ties I would take the latest date of the
>>>> two.
>>>>
>>>> For example I would obtain here
>>>>
>>>> TYPE   DATE   Difference
>>>> A            2            0=2-2
>>>> A            5            -1=5-6
>>>> A            20            0=20-20
>>>> B            10           +4=10-6
>>>> B            2             -3=2-5
>>>>
>>>> Do you know how to (simply?) obtain this in R?
>>>>
>>>> Many thanks!
>>>> Best Regards
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list