[R] [External] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

Eric Berger er|cjberger @end|ng |rom gm@||@com
Sat Apr 9 06:25:14 CEST 2022


library(dplyr)
my_df |> group_by(my_category) |> summarise(my_z = cor(my_x, my_y))


On Sat, Apr 9, 2022 at 4:37 AM Richard M. Heiberger <rmh using temple.edu> wrote:

> look at
> ?mapply
> Apply a Function to Multiple List or Vector Arguments
>
> to see if that meets your needs
>
> > On Apr 08, 2022, at 21:26, Kelly Thompson <kt1572757 using gmail.com> wrote:
> >
> > #Q. How can I "apply" a function that takes two or more vectors as
> > arguments, such as cor(x, y), over a "category" or "grouping variable"
> > or "index"?
> > #I'm using cor() as an example, I'd like to find a way to do this for
> > any function that takes 2 or more vectors as arguments.
> >
> >
> > #create example data
> >
> > my_category <- rep ( c("a","b","c"),  4)
> >
> > set.seed(12345)
> > my_x <- rnorm(12)
> >
> > set.seed(54321)
> > my_y <- rnorm(12)
> >
> > my_df <- data.frame(my_category, my_x, my_y)
> >
> > #review data
> > my_df
> >
> > #If i wanted to get the correlation of x and y grouped by category, I
> > could use this code and loop:
> >
> > my_category_unique <- unique(my_category)
> >
> > my_results <- vector("list", length(my_category_unique) )
> > names(my_results) <- my_category_unique
> >
> > #start i loop
> >  for (i in 1:length(my_category_unique) ) {
> >    my_criteria_i <- my_category == my_category_unique[i]
> >    my_x_i <- my_x[which(my_criteria_i)]
> >    my_y_i <- my_y[which(my_criteria_i)]
> >    my_correl_i <- cor(x = my_x_i, y = my_y_i)
> >    my_results[i] <- list(my_correl_i)
> > } # end i loop
> >
> > #review results
> > my_results
> >
> > #Q. Is there a better or more "elegant" way to do this, using by(),
> > aggregate(), apply(), or some other function?
> >
> > #This does not work and results in this error message: "Error in
> > FUN(dd[x, ], ...) : incompatible dimensions"
> > by (data = my_x, INDICES = my_category, FUN = cor, y = my_y)
> >
> > #This does not work and results in this error message: "Error in
> > cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like
> > 'x' "
> > by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor
> > (my_df$x, my_df$y) } )
> >
> >
> > #if I wanted the mean of x by category, I could use by() or aggregate():
> > by (data = my_x, INDICES = my_category, FUN = mean)
> >
> > aggregate(x = my_x, by = list(my_category), FUN = mean)
> >
> > #Thanks!
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7Crmh%40temple.edu%7C4c8a50fd1bf14b2cf7b408da19c7fe20%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850644148770767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=23Y%2Fqw7G1gb4ACIz5V41DjBIR8c2IFkkZgud9dGaftE%3D&reserved=0
> > PLEASE do read the posting guide
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=04%7C01%7Crmh%40temple.edu%7C4c8a50fd1bf14b2cf7b408da19c7fe20%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850644148770767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3vIZYrMBnAZKZhZCwHcLpILHEE72NuLc03LXAxr%2BXQ4%3D&reserved=0
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list