[R] aggregate dataframe by multiple factors

David Winsemius dwinsemius at comcast.net
Fri Nov 18 09:30:11 CET 2016


> On Nov 17, 2016, at 11:27 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
> 
> Dear all,
> 
> the dat  has missing values NA,
> 
>    first.Name   Name Department  DCE   DP       date
> 5      Auction Videos        YME 0.57 0.56 2013-09-30
> 18       Amish  Wives        TAS 0.59 0.56 2013-09-30
> 34     Ancient Nation        QLH 0.54 0.58 2013-09-30
> 53     Auction Videos        YME   NA   NA 2013-12-28
> 66       Amish  Wives        TAS   NA   NA 2013-12-28
> 82     Ancient Nation        QLH 0.28 0.29 2013-12-28
> 102    Auction Videos        YME 0.57 0.56 2014-03-30
> 115      Amish  Wives        TAS 0.59 0.56 2014-03-30
> 131    Ancient Nation        QLH 0.54 0.58 2014-03-30
> 150    Auction Videos        YME   NA   NA 2014-06-28
> 163      Amish  Wives        TAS   NA   NA 2014-06-28
> 179    Ancient Nation        QLH 0.28 0.29 2014-06-28
> 
> 
> agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
> list(dat$first.Name, dat$Name, dat$Department) , "sort"))

The closest I could get on a few attempts was:

(agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
list(dat$first.Name, dat$Name, dat$Department) , function(d) { unlist(d)}))
 )

  Group.1 Group.2 Group.3 DCE.1 DCE.2 DCE.3 DCE.4 DP.1 DP.2 DP.3 DP.4
1 Ancient  Nation     QLH  0.54  0.28  0.54  0.28 0.58 0.29 0.58 0.29
2   Amish   Wives     TAS  0.59    NA  0.59    NA 0.56   NA 0.56   NA
3 Auction  Videos     YME  0.57    NA  0.57    NA 0.56   NA 0.56   NA

I think the sort operation might be somewhat ambiguous in this instance. I tried:

 (agg <- as.data.frame(aggregate(dat[ , c("DCE","DP")], by=
list(dat$first.Name, dat$Name, dat$Department) , function(d) { unlist(lapply(d,sort))}))
 )

With no success, not even a sorted result.

-- 
David.
> 
> 
> agg has list of value. I would separate value in different columns.
> 
>  Group.1 Group.2 Group.3                    DCE                     DP
> 1 Ancient  Nation     QLH 0.28, 0.28, 0.54, 0.54 0.29, 0.29, 0.58, 0.58
> 2   Amish   Wives     TAS             0.59, 0.59             0.56, 0.56
> 3 Auction  Videos     YME             0.57, 0.57             0.56, 0.56
> 
> The  goal:
> 
> Group.1 Group.2 Group.3  DCE.1 DCE.2 DCE.3  DCE.4  DP.1  DP.2  DP.3  DP.4
> 1 Ancient  Nation     QLH    0.28     0.28    0.54     0.54     0.29, 0.29,
> 0.58, 0.58
> 2   Amish   Wives     TAS        NA     NA     0.59, 0.59           NA
> NA  0.56, 0.56
> 3 Auction  Videos     YME         NA   NA      0.57, 0.57             NA
> NA  0.56, 0.56
> 
> 
> 
> dat <- structure(list(first.Name = structure(c(3L, 1L, 2L, 3L, 1L, 2L,
> 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("Amish", "Ancient", "Auction",
> "Ax", "Bachelorette", "Basketball", "BBQ", "Cake", "Celebrity",
> "Chef", "Clean", "Colonial", "Comedy", "Comic", "Crocodile",
> "Dog", "Empire", "Extreme", "Farm", "Half Pint", "Hollywood",
> "House", "Ice Road", "Jersey", "Justice", "Love", "Mega", "Model",
> "Modern", "Mountain", "Mystery", "Myth", "New York", "Paradise",
> "Pioneer", "Queer", "Restaurant", "Road", "Royal", "Spouse",
> "Star", "Storage", "Survival", "The Great American", "Tool",
> "Treasure", "Wedding", "Wife"), class = "factor"), Name = structure(c(43L,
> 47L, 29L, 43L, 47L, 29L, 43L, 47L, 29L, 43L, 47L, 29L), .Label =
> c("Aliens",
> "Behavior", "Casino", "Casting Call", "Challenge", "Contest",
> "Crashers", "Crew", "Dad", "Dancing", "Date", "Disasters", "Dynasty",
> "Family", "Garage", "Greenlight", "Gypsies", "Haul", "Hot Rod",
> "Inventor", "Jail", "Job", "Justice", "Marvels", "Master", "Mates",
> "Model", "Moms", "Nation", "Ninja", "Patrol", "People", "Pitmasters",
> "Queens", "Rescue", "Rivals", "Room", "Rooms", "Rules", "Star",
> "Stars", "Superhero", "Videos", "VIP", "Wars", "Wishes", "Wives",
> "Wrangler"), class = "factor"), Department = structure(c(8L,
> 6L, 2L, 8L, 6L, 2L, 8L, 6L, 2L, 8L, 6L, 2L), .Label = c("HXW",
> "QLH", "RAR", "RYC", "SYI", "TAS", "VUV", "YME"), class = "factor"),
>    DCE = c(0.57, 0.59, 0.54, NA, NA, 0.28, 0.57, 0.59, 0.54,
>    NA, NA, 0.28), DP = c(0.56, 0.56, 0.58, NA, NA, 0.29, 0.56,
>    0.56, 0.58, NA, NA, 0.29), date = structure(c(15978, 15978,
>    15978, 16067, 16067, 16067, 16159, 16159, 16159, 16249, 16249,
>    16249), class = "Date")), description = "", row.names = c(5L,
> 18L, 34L, 53L, 66L, 82L, 102L, 115L, 131L, 150L, 163L, 179L), class =
> "data.frame", .Names = c("first.Name",
> "Name", "Department", "DCE", "DP", "date"))
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list