[R] Reshaping data from wide to tall format for multilevel modeling
Jim Lemon
jim at bitwrit.com.au
Thu Sep 8 13:36:42 CEST 2011
On 09/08/2011 12:02 AM, dadrivr wrote:
> Hi,
>
> I'm trying to reshape my data set from wide to tall format for multilevel
> modeling. Unfortunately, the function I typically use (make.univ from the
> multilevel package) does not appear to work with unbalanced data frames,
> which is what I'm dealing with.
>
> Below is an example of the columns of a data frame similar to what I'm
> working with:
> ID a1 a2 a4 b2 b3 b4 b5 b6
>
> Below is what I want the columns to be after reshaping the data to long
> format:
> ID a b time
>
> Here is an example data frame that I want to reshape:
> ID<- c(1,2,3)
> a1<- c(NA, rnorm(2))
> a2<- c(NA, rnorm(1), NA)
> a4<- c(NA, rnorm(2))
> b2<- c(rnorm(2), NA)
> b3<- rnorm(3)
> b4<- NA
> b5<- rnorm(3)
> b6<- rnorm(3)
> mydata<- as.data.frame(cbind(ID,a1,a2,a4,b2,b3,b4,b5,b6))
>
> What is the best way to do this efficiently with MANY variables with widely
> differing time ranges? Note that I will have to manually enter the time for
> a given measurement because in the wide format, the time is in the variable
> names. By the way, I have a fairly large data set, with some variables
> occurring at 2 time points and other variables occurring at 20 time points.
> Thanks for your help!
>
Hi dadrivr,
I think you can do what you want using the rep_n_stack function in the
prettyR package. If you want a data frame at the end, you will have to
pad out your input data frame so that the lengths of the columns will be
equal. You'll get lots of NAs, but without them, you won't get a data frame.
mydata$a3<-NA
mydata$a5<-NA
mydata$a6<-NA
mydata$b1<-NA
mydata
Now you have equal numbers of "a" and "b" columns. To reshape this into
three columns is easy:
rep_n_stack(mydata,to.stack=c("a1","a2","a3","a4","a5","a6",
"b1","b2","b3","b4","b5","b6"),stack.names=c("ab","time"))
If you want the "a" and "b" columns separate, try this:
rep_n_stack(mydata,to.stack=matrix(c(2,3,10,4,11,12,13,5,6,7,8,9),nrow=2,
byrow=TRUE),stack.names=c("a","time","b","time"))
Currently you have to pass the column indices directly to get the
correct order in the output. I hadn't anticipated the missing column
problem when I wrote the function.
Jim
More information about the R-help
mailing list