[R] Reference factors inside split

Ben Tupper btupper @end|ng |rom b|ge|ow@org
Mon Jul 11 16:00:22 CEST 2022


Hi,

The grouping variable is removed from the subgroups when you split.
Instead of iterating over the elements of the split list, you can
iterate over the **names** of the elements.  In your case the account
name is the grouping variable.


##start

library(lattice)
mydf <- data.frame(
  date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
                        10), 4),
  account = c(rep("ABC", 20), rep("XYZ", 20)),
  client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
  profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))

account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
                            corp = c("ABC Corporation", "DEF LLC",
"XYZ Incorporated"))

mydf.split <- split(mydf, mydf$account)

myplots <- sapply(names(mydf.split),
  function(name, x = NULL) {
    df <- x[[name]]
    myts <- aggregate(sale ~ date, FUN = sum, data = df)
    xyplot(sale ~ date, data = myts, main = name)
  }, x = mydf.split, USE.NAMES = TRUE, simplify = FALSE)

myplots[["ABC"]]
myplots[["XYZ"]]

## end

Does that help?

On Mon, Jul 11, 2022 at 9:14 AM Naresh Gurbuxani
<naresh_gurbuxani using hotmail.com> wrote:
>
>
> I want to split my dataframe according to a list of factors.  Then, in
> the resulting list, I want to reference the factors used in split.  Is
> it possible?
>
> Thanks,
> Naresh
>
> mydf <- data.frame(
> date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
> 10), 4),
> account = c(rep("ABC", 20), rep("XYZ", 20)),
> client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
> profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
>
> account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
> corp = c("ABC Corporation", "DEF LLC", "XYZ Incorporated"))
>
> mydf.split <- split(mydf, mydf$account)
>
> # This does not work
> myplots <- lapply(mydf.split, function(df) {
> myts <- aggregate(sales ~ date, FUN = sum, data = df)
> xyplot(sales ~ date, data = myts, main = account)})
>
> # This works, but may have a large overhead
> mydf <- merge(mydf, account.names, by = "account", all.x = TRUE)
> mydf.split <- split(mydf, mydf$account)
> myplots <- lapply(mydf.split, function(df) {
> myts <- aggregate(sale ~ date, FUN = sum, data = df)
> xyplot(sale ~ date, data = myts, main = unique(myts$corp))})
>
> # Now I can print one plot at a time
> myplots[["ABC"]]
> myplots[["XYZ"]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Ben Tupper (he/him)
Bigelow Laboratory for Ocean Science
East Boothbay, Maine
http://www.bigelow.org/
https://eco.bigelow.org



More information about the R-help mailing list