[R] Smart Indexing

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Mon Aug 9 11:07:18 CEST 2010


I think you just need merge(), e.g.

a <- data.frame(id = rep(1:3, each=3), val = rnorm(9))
b <- data.frame(id = 1:3, set1 = LETTERS[1:3], set2 = 5:7)

merge(a, b, by = "id")


I hope it helps.

Best,
Dimitris


On 8/9/2010 11:01 AM, Thaler, Thorn, LAUSANNE, Applied Mathematics wrote:
> Hi all,
>
> Suppose that I've two data frames, a and b say, both containing a column
> 'id'. While data frame 'a' contains multiple rows sharing the same id,
> data frame 'b' contains just one entry per id (i.e. a 1 to n
> relationship). For the ease of modeling I now want to generate a new
> data frame c, which is basically a copy of data frame 'a' augmented by
> the values of b. If I have
>
> a<- data.frame(id = rep(1:3, each=3), val=rnorm(9))
> b<- data.frame(id=1:3, set1=LETTERS[1:3], set2=5:7)
>
> the resulting data frame should look like:
>
> c<- data.frame(id = rep(1:3, each=3), val = a$val,
> set1=rep(LETTERS[1:3], each=3), set2 = rep(5:7, each = 3))
>
> While this task is just an application of some 'rep's and 'c's for
> structured data frames, it is somehow cumbersome (and error prone) to
> construct 'c' explicitly for less structured data. Thus, I was thinking
> of making use of R's smart indexing possibilities to generate an index
> vector, i.e.:
>
> ind<- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
> c.prime<- cbind(a, b[ind,-1])
> rownames(c.prime)<- NULL
> all.equal(c.prime , c) # TRUE
>
> The way I generate the index vector ind for the moment is
>
> tmp<- seq_along(b$id)
> names(tmp)<- b$id
> ind<- tmp[a$id]
>
> However, I think that there should be a smarter way of doing that
> without the need of defining a temporary variable. Some combination of
> match, which, %in% maybe? Any hints?
>
> While writing these lines, I think
>
> ind<- pmatch(a$id, b$id, duplicates=T)
>
> could do the job? Or do I run into troubles regarding the "partial
> matching" involved in pmatch?
>
> BTW, is there a way to prevent R of assigning [row|col]names? In the
> example above I had to remove the rownames generated by rbind
> explicitly, is there an one-liner?
>
> Thanks for your input + BR
>
> Thorn
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014



More information about the R-help mailing list