[R] merging and obtaining the nearest value
William Dunlap
wdunlap at tibco.com
Sun Aug 19 22:49:15 CEST 2012
The following, f2(A,B), should do well with lots of rows in A and B
as long as the number of types is not huge.
f2 <- function(A, B) {
types <- as.character(unique(A$TYPE))
result <- numeric(nrow(A))
Bs <- split(B$Special_Date, B$TYPE)
for(type in types) {
w <- A$TYPE == type
# can omit the sort() below if you know that B$Special_Date is sorted.
result[w] <- closestValue(A$DATE[w], sort(Bs[[type]]))
}
A$Difference <- A$DATE - result
A
}
closestValue <- function (x, vec)
{
# for each value in x, find closest value in vec.
# Break ties by using highest.
# Assume vec is sorted.
intervalNo <- findInterval(x, vec)
lowerValue <- vec[pmax(1, intervalNo)]
upperValue <- vec[pmin(length(vec), intervalNo+1)]
ifelse(x - lowerValue < upperValue - x, lowerValue, upperValue)
}
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of William Dunlap
> Sent: Sunday, August 19, 2012 9:43 AM
> To: Francesco; r-help at r-project.org
> Subject: Re: [R] merging and obtaining the nearest value
>
> How many different types are there? Just a handful or many thousands?
> For this sort of problem it is often handy to write a function which generates
> datasets of the sort you are thinking of but parameterized by the
> number of rows, levels, etc., so you can see how the execution time
> varies with these things.
>
> If there are just a few types, try looping over types and using findInterval
> to see where A$Date fits into the sequence of B$Special_Date.
>
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> > Of Francesco
> > Sent: Sunday, August 19, 2012 4:01 AM
> > To: r-help at r-project.org
> > Subject: Re: [R] merging and obtaining the nearest value
> >
> > Dear Riu, Many thanks for your suggestion
> >
> > However these are just simplified examples... in reality the dataset A
> > contains millions of observations and B several thousands of rows...
> > Could I still use a modified form of your suggestion?
> >
> > Thanks
> >
> > On 19 August 2012 12:51, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> > > Hello,
> > >
> > > Try the following.
> > >
> > >
> > > A <- read.table(text="
> > >
> > > TYPE DATE
> > > A 2
> > > A 5
> > > A 20
> > > B 10
> > > B 2
> > > ", header = TRUE)
> > >
> > >
> > > B <- read.table(text="
> > >
> > > TYPE Special_Date
> > > A 2
> > > A 6
> > > A 20
> > > A 22
> > > B 5
> > > B 6
> > > ", header = TRUE)
> > >
> > > result <- do.call( rbind, lapply(split(merge(A, B), list(m$DATE, m$TYPE)),
> > > function(x){
> > > a <- abs(x$DATE - x$Special_Date)
> > > if(nrow(x)) x[which(min(a) == a), ] }) )
> > > result$Difference <- result$DATE - result$Special_Date
> > > result$Special_Date <- NULL
> > > rownames(result) <- seq_len(nrow(result))
> > > result
> > >
> > >
> > > Also, it's a good practice to post data examples using dput(). For instance,
> > >
> > > dput(A)
> > > structure(list(TYPE = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A",
> > > "B"), class = "factor"), DATE = c(2L, 5L, 20L, 10L, 2L)), .Names = c("TYPE",
> > > "DATE"), class = "data.frame", row.names = c(NA, -5L))
> > >
> > > Now all we have to do is run the statement A <- structure(... etc...) to
> > > have an exact copy of the data example.
> > > Anyway, your example with input and the wanted result was very welcome.
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > Em 19-08-2012 11:10, Francesco escreveu:
> > >>
> > >> Dear R-help
> > >>
> > >> Î would like to know if there is a short solution in R for this
> > >> merging problem...
> > >>
> > >> Let say I have a dataset A as:
> > >>
> > >> TYPE DATE
> > >> A 2
> > >> A 5
> > >> A 20
> > >> B 10
> > >> B 2
> > >>
> > >> (there can be duplicates for the same type and date)
> > >>
> > >> and I have another dataset B as :
> > >>
> > >> TYPE Special_Date
> > >> A 2
> > >> A 6
> > >> A 20
> > >> A 22
> > >> B 5
> > >> B 6
> > >>
> > >> The question is : I would like to obtain the difference between the
> > >> date of each observation in A and the closest special date in B with
> > >> the same type. In case of ties I would take the latest date of the
> > >> two.
> > >>
> > >> For example I would obtain here
> > >>
> > >> TYPE DATE Difference
> > >> A 2 0=2-2
> > >> A 5 -1=5-6
> > >> A 20 0=20-20
> > >> B 10 +4=10-6
> > >> B 2 -3=2-5
> > >>
> > >> Do you know how to (simply?) obtain this in R?
> > >>
> > >> Many thanks!
> > >> Best Regards
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list