[R] Data frame from list of lists (Quick Summary)

Mon Sep 22 21:33:13 CEST 2003

Here is a quick summary, since I always like it when people post the useful
answers they get (thanks very much to the three respondents).  What I learnt
was (and apologies to those list members for whom these are not exactly
revelations):

1) When making dataframes, work column-wise not row-wise when possible.
This is likely to be much faster (e.g. cbind not rbind) and friendlier
(data.frame(ListOfColumns) is a one liner, whereas data.frame(ListOfRows)
doesn't work).

2) To prevent complex classes like POSIXct (a date) from being unclassed:
    do.call("c",lapply(list,FUN)) is better than sapply(list,FUN)

Code:
#------------------------------------------------------------------------
# The input data was:
myfunc=function(x) return(list(A=x,L=letters[x],T=Sys.time()))
ToyListOfLists=lapply(1:4,myfunc)

# My Solution was:
#------------------------------------------------------------------------
FirstSubList=ToyListOfLists[[1]]
getCol=function(n) do.call( "c",lapply(ToyListOfLists,function(x) x[[n]]) )
ListOfCols=lapply(seq(FirstSubList),getCol)
df=data.frame(ListOfCols)
names(df)=names(FirstSubList)

# Damon Wischik's solution
# this is essentially the same but better, since:
# 1) it will also work if the list returned by myfunc() includes a factor
# 2) the protection against NAs in the list _may_ be useful
#------------------------------------------------------------------------
transpose.list(ToyListOfLists)

transpose.list <- function(lst) {
  typicalrow <- lst[[1]]
# GJ However, I am not sure that protection against NAs in
# the next 6 lines is necessary
  if (length(lst)>1) for (i in 2:length(lst)) {
    if (!any(is.na(typicalrow))) break
    better <- (is.na(typicalrow) & !is.na(lst[[i]]))
    for (j in which(better))
      typicalrow[[j]] <- lst[[i]][[j]]
  }
  getfield <- function(i) {
    v <- lapply(lst, function(row) row[[i]] )
    vv <- do.call("c",v)
    typicalitem <- typicalrow[[i]]
    if (is.factor(typicalitem))
      {
      vvf <- rep(typicalitem,length(vv))
      codes(vvf) <- vv
      vvf
      }
    else
      vv
    }
  cols <- lapply(1:length(typicalrow), function(i) getfield(i))
  names(cols) <- names(typicalrow)
# I think the next 2 lines could be replaced by:
# data.frame(cols) 
  df <- do.call("data.frame",cols)
  df
}

On 9/22/03 5:44, "Liaw, Andy" <andy_liaw at merck.com> wrote:

> Don't know if this will be any faster, and it doesn't give you a data frame,
> but the final conversion to data frame is probably fairly easy:
> 
>> xx <- do.call("rbind", lapply(ListOfLists, function(x) do.call("cbind",
> x)))
>> xx
>    A   L   T     
> [1,] "1" "a" "1064233098"
> [2,] "2" "b" "1064233098"
> [3,] "3" "c" "1064233098"
> [4,] "4" "d" "1064233098"
> 
> This gives you a character matrix.  The tricky part (for me) is how to get
> that last column back to POSIXct.  I have not dealt with date/time in R
> before.
> 
> HTH,
> Andy
> 
> 
>> -----Original Message-----
>> From: Gregory Jefferis [mailto:jefferis at stanford.edu]
>> Sent: Monday, September 22, 2003 5:15 AM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] Data frame from list of lists
>> 
>> 
>> This seems to be a simple problem, and I feel that there
>> ought to be a simple answer, but I can't seem to find it.
>> 
>> I have a function that returns a number of values as a
>> heterogeneous list - always the same length and same names(),
>> but a number of different data types, including character.  I
>> want to apply it to many inputs, resulting in a list of lists.
>> 
>> I would like to turn this list of lists into a single data
>> frame in which each row corresponds to one of the original sublists.
>> 
>> Here is a toy example:
>> 
>> myfunc=function(x) return(list(A=x,L=letters[x],T=Sys.time()))
>> ListOfLists=lapply(1:4,myfunc)
>> ListOfDataFrames=lapply(ListOfLists,as.data.frame)
>> df=do.call("rbind",ListOfDataFrames)
>> 
>> df
>> 
>> Which gives:
>> 
>>    A L                   T
>> 1  1 a 2003-09-22 02:08:44
>> 11 2 b 2003-09-22 02:08:44
>> 12 3 c 2003-09-22 02:08:44
>> 13 4 d 2003-09-22 02:08:44
>> 
>> Which is what I want (bar the rownames).  The problem is that
>> this can be very slow, particularly the last rbind step, when
>> I have a large data set (e.g. 5000 rows x20 cols).
>> 
>> I thought that one improvement might be to preassign the data
>> frame since I know how big it should be and then make
>> assignments row by row.  But it turns out that I can't then
>> assign rows to the data frame one at a time - I get errors
>> because factor levels don't exist e.g.:
>> 
>> df[5:10,]=df[4,]
>> for (i in 5:10){
>>     df[i,]=as.data.frame(myfunc(i))
>> }
>> 
>> I presume that rbind.data.frame normally looks after adding
>> extra levels to factors as they appear in the new rows being
>> appended to the data frame. If anyone has a solution that is
>> quick (and/or elegant), I would be extremely grateful,
>> 
>> Greg Jefferis.
>> 
>> ______________________________________________________________
>> ____________
>> Greg Jefferis,                          Lab Address: Liqun
>> Luo, Herrin 144
>> Neurosciences PhD Programme &                e-mail:
>> jefferis at stanford.edu
>> Dept Biological Sciences,                       Lab: (650) 725 5809
>> Gilbert Biology Building,                       Fax: (650) 723 0589
>> 371 Serra Mall,
>> Stanford, CA 94305-5020.                       Home: (650) 326 9597
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
>> 
> 

__________________________________________________________________________
Greg Jefferis,                          Lab Address: Liqun Luo, Herrin 144
Neurosciences PhD Programme &                e-mail: jefferis at stanford.edu
Dept Biological Sciences,                       Lab: (650) 725 5809
Gilbert Biology Building,                       Fax: (650) 723 0589
371 Serra Mall,
Stanford, CA 94305-5020.                       Home: (650) 326 9597