[R] Restructure some data

William Dunlap wdunlap at tibco.com
Thu Feb 25 21:59:58 CET 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Doran, Harold
> Sent: Thursday, February 25, 2010 10:35 AM
> To: r-help at r-project.org
> Subject: [R] Restructure some data
> 
> Suppose I have a data frame like "dat" below. For some 
> context, this is the format that represents student's taking 
> a computer adaptive test. first.item is the first item that 
> student was administered and then score.1 is the student's 
> response to that item and so forth.
> 
> item.pool <- paste("item", 1:10, sep = "")
> set.seed(54321)
> dat <- data.frame(id = c(1,2,3,4,5), first.item = 
> sample(item.pool, 5, replace=TRUE),
>                 second.item = sample(item.pool, 
> 5,replace=TRUE), third.item = sample(item.pool, 5,replace=TRUE),
>                 score1 = sample(c(0,1), 5,replace=TRUE), 
> score2 = sample(c(0,1), 5,replace=TRUE), score3 = 
> sample(c(0,1), 5,replace=TRUE))
> 
> I need to restructure this into a new format. The new matrix 
> df (after the loop) is exactly what I want in the end. But, 
> I'm annoyed at myself for not thinking of a more efficient 
> way to restructure this without using a loop.
> 
> df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat))
> colnames(df) <- unique(item.pool)
> 
> for(i in 1:5){
>                 for(j in 2:4){
>                                 rr <- which(dat[i,j] == colnames(df))
>                                 df[i,rr] <- dat[i, (j+3)]
>                 }
> }
> 
> Any thoughts?

You can try subscripting by a 2-column matrix, the first
giving the row index and the second the column index.  E.g.,

  > f <- function(dat) {
      allItems <- paste("item", 1:10, sep = "")
      items <- as.matrix(dat[2:4])
      scores <- as.matrix(dat[, 5:7])
      retval <- matrix(NA_real_, nrow = nrow(dat), ncol = 10,
          dimnames = list(character(), allItems))
      retval[cbind(dat$id, match(items, allItems))] <- scores
      retval
  }
  > identical(f(dat), df)
  [1] TRUE

That was a very nice problem description, letting me
reproduce the example data and desired output with
just copy and paste.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Harold
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list