[R] How do I combine lists of data.frames into a single data frame?

Marc Schwartz marc_schwartz at me.com
Thu Jul 15 21:27:31 CEST 2010


On Jul 15, 2010, at 2:18 PM, Ted Byers wrote:

> The data.frame is constructed by one of the following functions:
> 
> funweek <- function(df)
>  if (length(df$elapsed_time) > 5) {
>    rv = fitdist(df$elapsed_time,"exp")
>    rv$year = df$sale_year[1]
>    rv$sample = df$sale_week[1]
>    rv$granularity = "week"
>    rv
>  }
> funmonth <- function(df)
>  if (length(df$elapsed_time) > 5) {
>    rv = fitdist(df$elapsed_time,"exp")
>    rv$year = df$sale_year[1]
>    rv$sample = df$sale_month[1]
>    rv$granularity = "month"
>    rv
>  }
> 
> It is basically the data.frame created by fitdist extended to include the
> variables used to distinguish one sample from another.
> 
> I have the following statement that gets me a set of IDs from my db:
> 
> ids <- dbGetQuery(con, "SELECT DISTINCT m_id FROM risk_input")
> 
> And then I have a loop that allows me to analyze one dataset after another:
> 
> for (i in 1:length(ids[,1])) {
>  print(i)
>  print(ids[i,1])
> 
> Then, after a set of statements that give me information about the dataset
> (such as its size), within a conditional block that ensures I apply the
> analysis only on sufficiently large samples, I have the following:
> 
> z <- lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_week),drop
> = TRUE), funweek)
> 
> or z <-
> lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_month),drop =
> TRUE), funmonth)
> 
> followed by:
> 
> str(z)
> 
> Of course, I close the loop and disconnect from my db.
> 
> NB: I don't see any way to get rid of the loop by adding ID as a factor to
> split because I have to query the DB for several key bits of data in order
> to determine whether or not there is sufficient data to work on.
> 
> I have everything working, except the final step of storing the results back
> into the db.  Storing data in the Db is easy enough.  But I am at a loss as
> to how to combine the lists placed in z in most of the iterations through
> the ID loop into a single data.frame.
> 
> Now, I did take a look at rbind and cbind, but it isn't clear to me if
> either is appropriate.  All the data frames have the same structure, but the
> lists are of variable length, and I am not certain how either might be used
> inside the IDs loop.
> 
> So, what is the best way to combine all lists assigned to z into a single
> data.frame?
> 
> Thanks
> 
> Ted


Ted,

If each of the data frames in the list 'z' have the same column structure, you can use:

  do.call(rbind, z)

The result of which will be a single data frame containing all of the rows from each of the data frames in the list.

HTH,

Marc Schwartz



More information about the R-help mailing list