[R] merging dataframes in a list

jim holtman jholtman at gmail.com
Sat Jun 4 19:11:23 CEST 2016


Here is how you can to it with tidyr:

> x <-  list(data.frame(name="sample1", red=20)
+     , data.frame(name="sample1", green=15)
+     , data.frame(name="sample2", red=10)
+     , data.frame(name="sample2", green=30)
+     )
> library(dplyr)
> library(tidyr)
>
> # convert to 'name, type, value'; assumes dataframe with 2 variables
> x.conv <- lapply(x, function(df){
+     data.frame(name = as.character(df$name)
+         , type = names(df)[2L]  # use 'red'/'green' as indicators
+         , value = df[[2]]
+         , stringsAsFactors = FALSE
+         )
+     })
> print(x.conv)
[[1]]
     name type value
1 sample1  red    20
[[2]]
     name  type value
1 sample1 green    15
[[3]]
     name type value
1 sample2  red    10
[[4]]
     name  type value
1 sample2 green    30
>
> x.conv <- bind_rows(x.conv)  # create single dataframe
> print(x.conv)
Source: local data frame [4 x 3]
     name  type value
    (chr) (chr) (dbl)
1 sample1   red    20
2 sample1 green    15
3 sample2   red    10
4 sample2 green    30
>
> # create output
> spread(x.conv, type, value)  # uses tidyr 'spread'
Source: local data frame [2 x 3]
     name green   red
    (chr) (dbl) (dbl)
1 sample1    15    20
2 sample2    30    10
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jun 3, 2016 at 4:02 PM, Ed Siefker <ebs15242 at gmail.com> wrote:

> Thanks, ldply got me a data frame straight away.  But it filled empty
> spaces with NA and merge no longer works.
>
> > ldply(mylist)
>      name red green
> 1 sample1  20    NA
> 2 sample1  NA    15
> 3 sample2  10    NA
> 4 sample2  NA    30
> > mydf <- ldply(mylist)
> > merge(mydf[1,],mydf[2,])
> [1] name  red   green
> <0 rows> (or 0-length row.names)
> > merge(mydf[1,],mydf[2,], by=1)
>      name red.x green.x red.y green.y
> 1 sample1    20      NA    NA      15
>
>
> How do I merge dataframes with NA?
>
> On Fri, Jun 3, 2016 at 2:17 PM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
> wrote:
> > You can use ldply in the plyr package to bind all the data.frames
> together
> > (a regular loop will also work). Afterwards you can summarise using ddply
> >
> > Hope this helps
> > Ulrik
> >
> >
> > Ed Siefker <ebs15242 at gmail.com> schrieb am Fr., 3. Juni 2016 21:10:
> >>
> >> aggregate isn't really what I want.  Maybe tapply?  I still can't get
> >> it to work.
> >>
> >> > length(mylist)
> >> [1] 4
> >> > length(names)
> >> [1] 4
> >> > tapply(mylist, names, merge)
> >> Error in tapply(mylist, names, merge) : arguments must have same length
> >>
> >> I guess because a list isn't an atomic data type.  What function will
> >> do the same on lists?  lapply doesn't have a 'by' argument.
> >>
> >> On Fri, Jun 3, 2016 at 1:41 PM, Ed Siefker <ebs15242 at gmail.com> wrote:
> >> > I manually constructed the list of sample names and tried the
> >> > aggregate call I mentioned.
> >> > Merge works when called manually, but not when using aggregate.
> >> >
> >> >> mylist <- list(data.frame(name="sample1", red=20),
> >> >> data.frame(name="sample1", green=15), data.frame(name="sample2",
> red=10),
> >> >> data.frame(na me="sample2", green=30))
> >> >>  names <- list("sample1", "sample1", "sample2", "sample2")
> >> >> merge(mylist[1], mylist[2])
> >> >      name red green
> >> > 1 sample1  20    15
> >> >> merge(mylist[3], mylist[4])
> >> >      name red green
> >> > 1 sample2  10    30
> >> >> aggregate(mylist, by=as.list(names), merge)
> >> > Error in as.data.frame(y) : argument "y" is missing, with no default
> >> >
> >> > What's the right way to do this?
> >> >
> >> > On Fri, Jun 3, 2016 at 1:20 PM, Ed Siefker <ebs15242 at gmail.com>
> wrote:
> >> >> I have a list of data as follows.
> >> >>
> >> >>> list(data.frame(name="sample1", red=20), data.frame(name="sample1",
> >> >>> green=15), data.frame(name="sample2", red=10),
> data.frame(name="sample 2",
> >> >>> green=30))
> >> >> [[1]]
> >> >>      name red
> >> >> 1 sample1  20
> >> >>
> >> >> [[2]]
> >> >>      name green
> >> >> 1 sample1    15
> >> >>
> >> >> [[3]]
> >> >>      name red
> >> >> 1 sample2  10
> >> >>
> >> >> [[4]]
> >> >>      name green
> >> >> 1 sample2    30
> >> >>
> >> >>
> >> >> I would like to massage this into a data frame like this:
> >> >>
> >> >>      name red green
> >> >> 1 sample1  20    15
> >> >> 2 sample2  10    30
> >> >>
> >> >>
> >> >> I'm imagining I can use aggregate(mylist, by=samplenames, merge)
> >> >> right?  But how do I get the list of samplenames?  How do I subset
> >> >> each dataframe inside the list?
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list