[R] Convert list of data frames to one data frame

Ira Sharenow |r@@h@renow100 @end|ng |rom y@hoo@com
Sat Jun 30 02:29:07 CEST 2018


 
Sarah and David,

Thank you for your responses.I will try and be clearer.

Base R solution: Sarah’smethod worked perfectly

Is there a dplyrsolution?

START: list of dataframes

FINISH: one data frame

DETAILS: The initiallist of data frames might have hundreds or a few thousand data frames. Everydata frame will have two columns. The first column will represent first names.The second column will represent last names. The column names are notconsistent. Data frames will most likely have from one to five rows. 

SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data frames. Then somehow do an rbindeven though the number of columns differ from data frame to data frame.

EXAMPLE: List with twodata frames

# DF1

First          Last

George Washington

 

# DF2

Start              End

John               Adams

Thomas        Jefferson

 

# End Result. One dataframe

First1      Second1        First2           Second2

George Washington       NA                    NA

John               Adams    Thomas        Jefferson

 

DISCUSSION: As mentionedI posted something on Stack Overflow. Unfortunately, my example was not generalenough and so the suggested solutions worked on the easy case which I provided butnot when the names were different.

The suggested solution was:

library(dplyr)

bind_rows(lapply(employees4List,function(x) rbind.data.frame(c(t(x)))))

 

On this site I pointedout that the inner function: lapply(employees4List, function(x) rbind.data.frame(c(t(x))))

For each data frame correctlyspread the multiple rows into  1 by 2ndata frames. However, the column names were derived from the values and were amess. This caused a problem with bind_rows.

I felt that if I knewhow to change all the names of all of the data frames that were created afterlapply, then I could then use bind_rows. So if someone knows how to change allof the names at this intermediate stage, I hope that person will provide thesolution.

In  the end a 1 by 2 data frame would have namesFirst1      Second1. A 1 by 4 data framewould have names First1      Second1        First2           Second2.

Ira


    On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius <dwinsemius using comcast.net> wrote:  
 
 
> On Jun 29, 2018, at 7:28 AM, Sarah Goslee <sarah.goslee using gmail.com> wrote:
> 
> Hi,
> 
> It isn't super clear to me what you're after.

Agree.

Had a different read of ht erequest. Thought the request was for a first step that "harmonized" the names of the columns and then used `dplyr::bind_rows`:

library(dplyr)
 newList <- lapply( employees4List, 'names<-', names(employees4List[[1]]) ) 
 bind_rows(newList)

#---------

  first1 second1
1      Al  Jones
2    Al2  Jones
3    Barb  Smith
4    Al3  Jones
5 Barbara  Smith
6  Carol  Adams
7      Al  Jones2

Might want to wrap suppressWarnings around the right side of that assignment since there were many warnings regarding incongruent factor levels.

-- 
David.
> Is this what you intend?
> 
>> dfbycol(employees4BList)
>  first1 last1 first2 last2 first3 last3
> 1    Al Jones  <NA>  <NA>  <NA>  <NA>
> 2    Al Jones  Barb Smith  <NA>  <NA>
> 3    Al Jones  Barb Smith  Carol Adams
> 4    Al Jones  <NA>  <NA>  <NA>  <NA>
>> 
>> dfbycol(employees4List)
>  first1  last1  first2 last2 first3 last3
> 1    Al  Jones    <NA>  <NA>  <NA>  <NA>
> 2    Al2  Jones    Barb Smith  <NA>  <NA>
> 3    Al3  Jones Barbara Smith  Carol Adams
> 4    Al Jones2    <NA>  <NA>  <NA>  <NA>
> 
> 
> If so:
> 
> employees4BList = list(
> data.frame(first1 = "Al", second1 = "Jones"),
> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
> "Smith", "Adams")),
> data.frame(first1 = ("Al"), second1 = "Jones"))
> 
> employees4List = list(
> data.frame(first1 = ("Al"), second1 = "Jones"),
> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
> "Smith", "Adams")),
> data.frame(first4 = ("Al"), second4 = "Jones2"))
> 
> ###
> 
> dfbycol <- function(x) {
>  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>  x <- do.call(rbind, x)
>  x <- data.frame(x, stringsAsFactors=FALSE)
>  colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))
>  x
> }
> 
> ###
> 
> dfbycol(employees4BList)
> 
> dfbycol(employees4List)
> 
> On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
> <r-help using r-project.org> wrote:
>> I have a list of data frames which I would like to combine into one data
>> frame doing something like rbind. I wish to combine in column order and
>> not by names. However, there are issues.
>> 
>> The number of columns is not the same for each data frame. This is an
>> intermediate step to a problem and the number of columns could be
>> 2,4,6,8,or10. There might be a few thousand data frames. Another problem
>> is that the names of the columns produced by the first step are garbage.
>> 
>> Below is a method that I obtained by asking a question on stack
>> overflow. Unfortunately, my example was not general enough. The code
>> below works for the simple case where the names of the people are
>> consistent. It does not work when the names are realistically not the same.
>> 
>> https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>> 
>> 
>> Please note that the lapply step sets things up except for the column
>> name issue. If I could figure out a way to change the column names, then
>> the bind_rows step will, I believe, work.
>> 
>> So I really have two questions. How to change all column names of all
>> the data frames and then how to solve the original problem.
>> 
>> # The non general case works fine. It produces one data frame and I can
>> then change the column names to
>> 
>> # c("first1", "last1","first2", "last2","first3", "last3",)
>> 
>> #Non general easy case
>> 
>> employees4BList = list(data.frame(first1 = "Al", second1 = "Jones"),
>> 
>> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>> 
>> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>> "Smith", "Adams")),
>> 
>> data.frame(first1 = ("Al"), second1 = "Jones"))
>> 
>> employees4BList
>> 
>> bind_rows(lapply(employees4BList, function(x) rbind.data.frame(c(t(x)))))
>> 
>> # This produces a nice list of data frames, except for the names
>> 
>> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>> 
>> # This list is a disaster. I am looking for a solution that works in
>> this case.
>> 
>> employees4List = list(data.frame(first1 = ("Al"), second1 = "Jones"),
>> 
>> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones", "Smith")),
>> 
>> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 = c("Jones",
>> "Smith", "Adams")),
>> 
>> data.frame(first4 = ("Al"), second4 = "Jones2"))
>> 
>>  bind_rows(lapply(employees4List, function(x) rbind.data.frame(c(t(x)))))
>> 
>> Thanks.
>> 
>> Ira
>> 
> 
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'  -Gehm's Corollary to Clarke's Third Law




  
	[[alternative HTML version deleted]]




More information about the R-help mailing list