[R] merging and obtaining the nearest value
Francesco
cariboupad at gmx.fr
Sun Aug 19 15:14:47 CEST 2012
Thank you very much Rui
On 19 August 2012 13:49, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Yes you can, if you have memory problems, say so and we'll see it then.
> In the mean time, there's something you should change, to allow for several
> minima but to only return one per combination of TYPE and DATE.
>
> Replace this
>
> x[which(min(a) == a), ]
>
> by this
>
> x[which.min(a), ]
>
> Rui Barradas
>
> Em 19-08-2012 12:00, Francesco escreveu:
>
>> Dear Riu, Many thanks for your suggestion
>>
>> However these are just simplified examples... in reality the dataset A
>> contains millions of observations and B several thousands of rows...
>> Could I still use a modified form of your suggestion?
>>
>> Thanks
>>
>> On 19 August 2012 12:51, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>>>
>>> Hello,
>>>
>>> Try the following.
>>>
>>>
>>> A <- read.table(text="
>>>
>>> TYPE DATE
>>> A 2
>>> A 5
>>> A 20
>>> B 10
>>> B 2
>>> ", header = TRUE)
>>>
>>>
>>> B <- read.table(text="
>>>
>>> TYPE Special_Date
>>> A 2
>>> A 6
>>> A 20
>>> A 22
>>> B 5
>>> B 6
>>> ", header = TRUE)
>>>
>>> result <- do.call( rbind, lapply(split(merge(A, B), list(m$DATE,
>>> m$TYPE)),
>>> function(x){
>>> a <- abs(x$DATE - x$Special_Date)
>>> if(nrow(x)) x[which(min(a) == a), ] }) )
>>> result$Difference <- result$DATE - result$Special_Date
>>> result$Special_Date <- NULL
>>> rownames(result) <- seq_len(nrow(result))
>>> result
>>>
>>>
>>> Also, it's a good practice to post data examples using dput(). For
>>> instance,
>>>
>>> dput(A)
>>> structure(list(TYPE = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A",
>>> "B"), class = "factor"), DATE = c(2L, 5L, 20L, 10L, 2L)), .Names =
>>> c("TYPE",
>>> "DATE"), class = "data.frame", row.names = c(NA, -5L))
>>>
>>> Now all we have to do is run the statement A <- structure(... etc...) to
>>> have an exact copy of the data example.
>>> Anyway, your example with input and the wanted result was very welcome.
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Em 19-08-2012 11:10, Francesco escreveu:
>>>>
>>>> Dear R-help
>>>>
>>>> Î would like to know if there is a short solution in R for this
>>>> merging problem...
>>>>
>>>> Let say I have a dataset A as:
>>>>
>>>> TYPE DATE
>>>> A 2
>>>> A 5
>>>> A 20
>>>> B 10
>>>> B 2
>>>>
>>>> (there can be duplicates for the same type and date)
>>>>
>>>> and I have another dataset B as :
>>>>
>>>> TYPE Special_Date
>>>> A 2
>>>> A 6
>>>> A 20
>>>> A 22
>>>> B 5
>>>> B 6
>>>>
>>>> The question is : I would like to obtain the difference between the
>>>> date of each observation in A and the closest special date in B with
>>>> the same type. In case of ties I would take the latest date of the
>>>> two.
>>>>
>>>> For example I would obtain here
>>>>
>>>> TYPE DATE Difference
>>>> A 2 0=2-2
>>>> A 5 -1=5-6
>>>> A 20 0=20-20
>>>> B 10 +4=10-6
>>>> B 2 -3=2-5
>>>>
>>>> Do you know how to (simply?) obtain this in R?
>>>>
>>>> Many thanks!
>>>> Best Regards
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list