[R] Restructure some data

Phil Spector spector at stat.berkeley.edu
Thu Feb 25 23:38:15 CET 2010


Harold -
    Here's what I came up with:

>  tapply(as.vector(as.matrix(dat[5:7])),
+         list(rep(dat$id,3),as.vector(as.matrix(dat[2:4]))),I)
   item1 item10 item2 item3 item4 item5 item7 item9
1    NA     NA     1    NA    NA     1    NA     0
2     0     NA    NA    NA    NA     1     1    NA
3     1     NA     0     1    NA    NA    NA    NA
4    NA     NA    NA     1     0    NA     0    NA
5    NA      1    NA     0     1    NA    NA    NA

I thought there would be a way to use xtabs, but I had
trouble preserving the NAs.

The columns aren't in the right order, and the item6 column is
missing, but it's pretty close.
Thanks for the easily reproducible example, and the interesting
puzzle.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu


On Thu, 25 Feb 2010, Doran, Harold wrote:

> Suppose I have a data frame like "dat" below. For some context, this is the format that represents student's taking a computer adaptive test. first.item is the first item that student was administered and then score.1 is the student's response to that item and so forth.
>
> item.pool <- paste("item", 1:10, sep = "")
> set.seed(54321)
> dat <- data.frame(id = c(1,2,3,4,5), first.item = sample(item.pool, 5, replace=TRUE),
>                second.item = sample(item.pool, 5,replace=TRUE), third.item = sample(item.pool, 5,replace=TRUE),
>                score1 = sample(c(0,1), 5,replace=TRUE), score2 = sample(c(0,1), 5,replace=TRUE), score3 = sample(c(0,1), 5,replace=TRUE))
>
> I need to restructure this into a new format. The new matrix df (after the loop) is exactly what I want in the end. But, I'm annoyed at myself for not thinking of a more efficient way to restructure this without using a loop.
>
> df <- matrix(NA, ncol = length(item.pool), nrow = nrow(dat))
> colnames(df) <- unique(item.pool)
>
> for(i in 1:5){
>                for(j in 2:4){
>                                rr <- which(dat[i,j] == colnames(df))
>                                df[i,rr] <- dat[i, (j+3)]
>                }
> }
>
> Any thoughts?
>
> Harold
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list