[R] Data frame from list of lists (Quick Summary)
Gregory Jefferis
jefferis at stanford.edu
Mon Sep 22 21:33:13 CEST 2003
Here is a quick summary, since I always like it when people post the useful
answers they get (thanks very much to the three respondents). What I learnt
was (and apologies to those list members for whom these are not exactly
revelations):
1) When making dataframes, work column-wise not row-wise when possible.
This is likely to be much faster (e.g. cbind not rbind) and friendlier
(data.frame(ListOfColumns) is a one liner, whereas data.frame(ListOfRows)
doesn't work).
2) To prevent complex classes like POSIXct (a date) from being unclassed:
do.call("c",lapply(list,FUN)) is better than sapply(list,FUN)
Code:
#------------------------------------------------------------------------
# The input data was:
myfunc=function(x) return(list(A=x,L=letters[x],T=Sys.time()))
ToyListOfLists=lapply(1:4,myfunc)
# My Solution was:
#------------------------------------------------------------------------
FirstSubList=ToyListOfLists[[1]]
getCol=function(n) do.call( "c",lapply(ToyListOfLists,function(x) x[[n]]) )
ListOfCols=lapply(seq(FirstSubList),getCol)
df=data.frame(ListOfCols)
names(df)=names(FirstSubList)
# Damon Wischik's solution
# this is essentially the same but better, since:
# 1) it will also work if the list returned by myfunc() includes a factor
# 2) the protection against NAs in the list _may_ be useful
#------------------------------------------------------------------------
transpose.list(ToyListOfLists)
transpose.list <- function(lst) {
typicalrow <- lst[[1]]
# GJ However, I am not sure that protection against NAs in
# the next 6 lines is necessary
if (length(lst)>1) for (i in 2:length(lst)) {
if (!any(is.na(typicalrow))) break
better <- (is.na(typicalrow) & !is.na(lst[[i]]))
for (j in which(better))
typicalrow[[j]] <- lst[[i]][[j]]
}
getfield <- function(i) {
v <- lapply(lst, function(row) row[[i]] )
vv <- do.call("c",v)
typicalitem <- typicalrow[[i]]
if (is.factor(typicalitem))
{
vvf <- rep(typicalitem,length(vv))
codes(vvf) <- vv
vvf
}
else
vv
}
cols <- lapply(1:length(typicalrow), function(i) getfield(i))
names(cols) <- names(typicalrow)
# I think the next 2 lines could be replaced by:
# data.frame(cols)
df <- do.call("data.frame",cols)
df
}
On 9/22/03 5:44, "Liaw, Andy" <andy_liaw at merck.com> wrote:
> Don't know if this will be any faster, and it doesn't give you a data frame,
> but the final conversion to data frame is probably fairly easy:
>
>> xx <- do.call("rbind", lapply(ListOfLists, function(x) do.call("cbind",
> x)))
>> xx
> A L T
> [1,] "1" "a" "1064233098"
> [2,] "2" "b" "1064233098"
> [3,] "3" "c" "1064233098"
> [4,] "4" "d" "1064233098"
>
> This gives you a character matrix. The tricky part (for me) is how to get
> that last column back to POSIXct. I have not dealt with date/time in R
> before.
>
> HTH,
> Andy
>
>
>> -----Original Message-----
>> From: Gregory Jefferis [mailto:jefferis at stanford.edu]
>> Sent: Monday, September 22, 2003 5:15 AM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] Data frame from list of lists
>>
>>
>> This seems to be a simple problem, and I feel that there
>> ought to be a simple answer, but I can't seem to find it.
>>
>> I have a function that returns a number of values as a
>> heterogeneous list - always the same length and same names(),
>> but a number of different data types, including character. I
>> want to apply it to many inputs, resulting in a list of lists.
>>
>> I would like to turn this list of lists into a single data
>> frame in which each row corresponds to one of the original sublists.
>>
>> Here is a toy example:
>>
>> myfunc=function(x) return(list(A=x,L=letters[x],T=Sys.time()))
>> ListOfLists=lapply(1:4,myfunc)
>> ListOfDataFrames=lapply(ListOfLists,as.data.frame)
>> df=do.call("rbind",ListOfDataFrames)
>>
>> df
>>
>> Which gives:
>>
>> A L T
>> 1 1 a 2003-09-22 02:08:44
>> 11 2 b 2003-09-22 02:08:44
>> 12 3 c 2003-09-22 02:08:44
>> 13 4 d 2003-09-22 02:08:44
>>
>> Which is what I want (bar the rownames). The problem is that
>> this can be very slow, particularly the last rbind step, when
>> I have a large data set (e.g. 5000 rows x20 cols).
>>
>> I thought that one improvement might be to preassign the data
>> frame since I know how big it should be and then make
>> assignments row by row. But it turns out that I can't then
>> assign rows to the data frame one at a time - I get errors
>> because factor levels don't exist e.g.:
>>
>> df[5:10,]=df[4,]
>> for (i in 5:10){
>> df[i,]=as.data.frame(myfunc(i))
>> }
>>
>> I presume that rbind.data.frame normally looks after adding
>> extra levels to factors as they appear in the new rows being
>> appended to the data frame. If anyone has a solution that is
>> quick (and/or elegant), I would be extremely grateful,
>>
>> Greg Jefferis.
>>
>> ______________________________________________________________
>> ____________
>> Greg Jefferis, Lab Address: Liqun
>> Luo, Herrin 144
>> Neurosciences PhD Programme & e-mail:
>> jefferis at stanford.edu
>> Dept Biological Sciences, Lab: (650) 725 5809
>> Gilbert Biology Building, Fax: (650) 723 0589
>> 371 Serra Mall,
>> Stanford, CA 94305-5020. Home: (650) 326 9597
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
>>
>
__________________________________________________________________________
Greg Jefferis, Lab Address: Liqun Luo, Herrin 144
Neurosciences PhD Programme & e-mail: jefferis at stanford.edu
Dept Biological Sciences, Lab: (650) 725 5809
Gilbert Biology Building, Fax: (650) 723 0589
371 Serra Mall,
Stanford, CA 94305-5020. Home: (650) 326 9597
More information about the R-help
mailing list