[R] Quickest way to match two vectors besides %in%?
Duncan Murdoch
murdoch at stats.uwo.ca
Tue Nov 8 21:00:16 CET 2005
On 11/8/2005 2:28 PM, Pete Cap wrote:
> Hello list,
>
> I have two data frames, X (48469,2) and Y (79771,5).
>
> X[,1] contains distinct values of Y[,2].
> I want to match values in X[,1] and Y[,2], then take
> the corresponding value in [X,2] and place it in
> Y[,4].
>
> So far I have been doing it like so:
> for(i in 1:48469) {
> y[which(x[i,1]==y[,3]),4]<-x[i,2]
> }
>
> But it chunks along so very slowly that I can't help
> but wonder if there's a faster way, mainly because on
> my box it takes R about 30 seconds to simply COUNT to
> 48,469 in the for loop.
>
> I have already tried using %in%. It tells me if the
> values in X[,1] are IN Y[,2], which is useful in
> removing unnecessary values from X[,1]. But it does
> not tell me exactly where they match. which(X[,1]
> %in% Y[,2]) does but it only matches on the first
> instance.
>
> This is the slowest part of the script I'm working
> on--if I could improve it I could shave off some
> serious operating time. Any pointers?
Look at the merge() function to add the X and Y columns to a new
dataframe, then process that to merge the X[,2] and Y[,4] values.
It will be something like
Z <- merge(X, Y, by.x=1, by.y=2, all.y=TRUE)
changes <- !is.na(Z[,2])
Z[changes,5] <- Z[changes,2]
but you are almost certainly better off (from a maintenance point of
view) to use the names of the columns, rather than guessing at column
numbers.
Duncan Murdoch
More information about the R-help
mailing list