[R] Reference factors inside split
Ben Tupper
btupper @end|ng |rom b|ge|ow@org
Mon Jul 11 16:00:22 CEST 2022
Hi,
The grouping variable is removed from the subgroups when you split.
Instead of iterating over the elements of the split list, you can
iterate over the **names** of the elements. In your case the account
name is the grouping variable.
##start
library(lattice)
mydf <- data.frame(
date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
10), 4),
account = c(rep("ABC", 20), rep("XYZ", 20)),
client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
corp = c("ABC Corporation", "DEF LLC",
"XYZ Incorporated"))
mydf.split <- split(mydf, mydf$account)
myplots <- sapply(names(mydf.split),
function(name, x = NULL) {
df <- x[[name]]
myts <- aggregate(sale ~ date, FUN = sum, data = df)
xyplot(sale ~ date, data = myts, main = name)
}, x = mydf.split, USE.NAMES = TRUE, simplify = FALSE)
myplots[["ABC"]]
myplots[["XYZ"]]
## end
Does that help?
On Mon, Jul 11, 2022 at 9:14 AM Naresh Gurbuxani
<naresh_gurbuxani using hotmail.com> wrote:
>
>
> I want to split my dataframe according to a list of factors. Then, in
> the resulting list, I want to reference the factors used in split. Is
> it possible?
>
> Thanks,
> Naresh
>
> mydf <- data.frame(
> date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
> 10), 4),
> account = c(rep("ABC", 20), rep("XYZ", 20)),
> client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
> profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
>
> account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
> corp = c("ABC Corporation", "DEF LLC", "XYZ Incorporated"))
>
> mydf.split <- split(mydf, mydf$account)
>
> # This does not work
> myplots <- lapply(mydf.split, function(df) {
> myts <- aggregate(sales ~ date, FUN = sum, data = df)
> xyplot(sales ~ date, data = myts, main = account)})
>
> # This works, but may have a large overhead
> mydf <- merge(mydf, account.names, by = "account", all.x = TRUE)
> mydf.split <- split(mydf, mydf$account)
> myplots <- lapply(mydf.split, function(df) {
> myts <- aggregate(sale ~ date, FUN = sum, data = df)
> xyplot(sale ~ date, data = myts, main = unique(myts$corp))})
>
> # Now I can print one plot at a time
> myplots[["ABC"]]
> myplots[["XYZ"]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Ben Tupper (he/him)
Bigelow Laboratory for Ocean Science
East Boothbay, Maine
http://www.bigelow.org/
https://eco.bigelow.org
More information about the R-help
mailing list