[R] matching vectors against vectors

Adaikalavan Ramasamy ramasamy at cancer.org.uk
Thu Mar 31 14:36:25 CEST 2005


You can use merge but to do so you will need to define the common key
first. This can be a rowname in the case of a matrix or names in the
case of a vector.

v1 <- 1:10
names(v1) <- LETTERS[1:10]

v2 <- 101:105
names(v2) <- sample( LETTERS[1:10], 5 )

> merge( v1, v2, by=0, all=TRUE )
   Row.names  x   y
1          A  1  NA
2          B  2 102
3          C  3 104
4          D  4 103
5          E  5 105
6          F  6  NA
7          G  7  NA
8          H  8 101
9          I  9  NA
10         J 10  NA


Regards, Adai



On Tue, 2005-03-29 at 22:47 +0200, Piet van Remortel wrote:
> Hi all.
> 
> I have a re-occuring typical problem that I don't know how to solve 
> efficiently.
> 
> The situation is the following:   I have a number of data-sets 
> (A,B,C,...) , consisting of an identifier (e.g. 11,12,13,...,20) and a 
> measurement (e.g. in the range 100-120).   I want to compile a large 
> table, with all availabe identifiers in all data-sets in the rows, and 
> a column for every dataset.
> 
> Now, not all datasets have a measurement for every identifier, so I 
> want NA if the set does not contain the identifier.
> 
> an example for a single dataset:
> 
> #all identifiers
>  > rep <- c(10:20)
> 
> #Identifiers in my dataset (a subset of rep)
>  > rep1 <- c(12,13,15,16,17,18)
> 
> #measurements in this dataset
>  > rep1.r <- c(112,113,115,116,117,118)
> 
> #a vector which should become a column in the final table, now 
> containing all NAs
>  > res <- rep(NA,10)
> 
> #the IDs and values of my dataset together
>  > data <- cbind(rep1, rep1.r)
> 
> data looks like this:
>       rep1 rep1.r
> [1,]   12    112
> [2,]   13    113
> [3,]   15    115
> [4,]   16    116
> [5,]   17    117
> [6,]   18    118
> 
> Now, I want to put the values 112, 113, 115,... in the correct rows of 
> the final table, using the identifiers as an indicator of which row to 
> put it in, so that I finally obtain:
> 
> rep     res
> 10    NA
> 11    NA
> 12    112
> 13    113
> 14    NA
> 15    115
> 16    116
> 17    117
> 18    118
> 19    NA
> 20    NA
> 
> I try to avoid repeating 'which' a lot and filling in every 
> identifier's observation etc, since I will be doing this for thousands 
> of rows at once.    There must be an efficient way using factors, 
> tapply etc, but I have trouble finding it.  Ideal would be if this 
> could be done in one go, instead of looping.
> 
> Any suggestions ?
> 
> Thanks,
> 
> Piet
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list