[R] Reshaping data from wide to tall format for multilevel modeling

Jim Lemon jim at bitwrit.com.au
Thu Sep 8 13:36:42 CEST 2011


On 09/08/2011 12:02 AM, dadrivr wrote:
> Hi,
>
> I'm trying to reshape my data set from wide to tall format for multilevel
> modeling.  Unfortunately, the function I typically use (make.univ from the
> multilevel package) does not appear to work with unbalanced data frames,
> which is what I'm dealing with.
>
> Below is an example of the columns of a data frame similar to what I'm
> working with:
> ID  a1  a2  a4  b2  b3  b4  b5  b6
>
> Below is what I want the columns to be after reshaping the data to long
> format:
> ID  a  b time
>
> Here is an example data frame that I want to reshape:
> ID<- c(1,2,3)
> a1<- c(NA, rnorm(2))
> a2<- c(NA, rnorm(1), NA)
> a4<- c(NA, rnorm(2))
> b2<- c(rnorm(2), NA)
> b3<- rnorm(3)
> b4<- NA
> b5<- rnorm(3)
> b6<- rnorm(3)
> mydata<- as.data.frame(cbind(ID,a1,a2,a4,b2,b3,b4,b5,b6))
>
> What is the best way to do this efficiently with MANY variables with widely
> differing time ranges?  Note that I will have to manually enter the time for
> a given measurement because in the wide format, the time is in the variable
> names.  By the way, I have a fairly large data set, with some variables
> occurring at 2 time points and other variables occurring at 20 time points.
> Thanks for your help!
>
Hi dadrivr,
I think you can do what you want using the rep_n_stack function in the 
prettyR package. If you want a data frame at the end, you will have to 
pad out your input data frame so that the lengths of the columns will be 
equal. You'll get lots of NAs, but without them, you won't get a data frame.

mydata$a3<-NA
mydata$a5<-NA
mydata$a6<-NA
mydata$b1<-NA
mydata

Now you have equal numbers of "a" and "b" columns. To reshape this into 
three columns is easy:

rep_n_stack(mydata,to.stack=c("a1","a2","a3","a4","a5","a6",
  "b1","b2","b3","b4","b5","b6"),stack.names=c("ab","time"))

If you want the "a" and "b" columns separate, try this:

rep_n_stack(mydata,to.stack=matrix(c(2,3,10,4,11,12,13,5,6,7,8,9),nrow=2,
  byrow=TRUE),stack.names=c("a","time","b","time"))

Currently you have to pass the column indices directly to get the 
correct order in the output. I hadn't anticipated the missing column 
problem when I wrote the function.

Jim



More information about the R-help mailing list