[R] aggregate and list elements of variables in data.frame
Ben Tupper
btupper @end|ng |rom b|ge|ow@org
Thu Jun 7 14:47:55 CEST 2018
Hi,
Does this do what you want? I had to change the id values to something more obvious. It uses tibbles which allow each variable to be a list.
library(tibble)
library(dplyr)
x <- tibble(id=LETTERS[1:10],
A=c(123,345,123,678,345,123,789,345,123,789))
uA <- unique(x$A)
idx <- lapply(uA, function(v) which(x$A %in% v))
vals <- lapply(idx, function(index) x$id[index])
r <- tibble(unique_A = uA, list_idx = idx, list_vals = vals)
> r
# A tibble: 4 x 3
unique_A list_idx list_vals
<dbl> <list> <list>
1 123. <int [4]> <chr [4]>
2 345. <int [3]> <chr [3]>
3 678. <int [1]> <chr [1]>
4 789. <int [2]> <chr [2]>
> r$list_idx[1]
[[1]]
[1] 1 3 6 9
> r$list_vals[1]
[[1]]
[1] "A" "C" "F" "I"
Cheers,
ben
> On Jun 7, 2018, at 8:21 AM, Massimo Bressan <massimo.bressan using arpa.veneto.it> wrote:
>
> sorry, but by further looking at the example I just realised that the posted solution it's not completely what I need because in fact I do not need to get back the 'indices' but instead the corrisponding values of column A
>
> #please consider this new example
>
> t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> # I need to get this result
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('18,20,27,4','91,54,15','68','26,97'))
> r
>
> # any help for this, please?
>
>
>
>
>
> Da: "Massimo Bressan" <massimo.bressan using arpa.veneto.it>
> A: "r-help" <R-help using r-project.org>
> Inviato: Giovedì, 7 giugno 2018 10:09:55
> Oggetto: Re: aggregate and list elements of variables in data.frame
>
> thanks for the help
>
> I'm posting here the complete solution
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t$A <- factor(t$A)
> l<-sapply(levels(t$A), function(x) which(t$A==x))
> r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", ")))
> r<-cbind(unique_A=row.names(r),r)
> row.names(r)<-NULL
> r
>
> best
>
>
>
> Da: "Massimo Bressan" <massimo.bressan using arpa.veneto.it>
> A: "r-help" <R-help using r-project.org>
> Inviato: Mercoledì, 6 giugno 2018 10:13:10
> Oggetto: aggregate and list elements of variables in data.frame
>
> #given the following reproducible and simplified example
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> #I need to get the following result
>
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
> r
>
> # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
> #any help for that?
>
> #so far I've just managed to "aggregate" and "count", like:
>
> library(sqldf)
> sqldf('select count(*) as count_id, A as unique_A from t group by A')
>
> library(dplyr)
> t%>%group_by(unique_A=A) %>% summarise(count_id = n())
>
> # thank you
>
>
> --
>
> ------------------------------------------------------------
> Massimo Bressan
>
> ARPAV
> Agenzia Regionale per la Prevenzione e
> Protezione Ambientale del Veneto
>
> Dipartimento Provinciale di Treviso
> Via Santa Barbara, 5/a
> 31100 Treviso, Italy
>
> tel: +39 0422 558545
> fax: +39 0422 558516
> e-mail: massimo.bressan using arpa.veneto.it
> ------------------------------------------------------------
>
>
> --
>
> ------------------------------------------------------------
> Massimo Bressan
>
> ARPAV
> Agenzia Regionale per la Prevenzione e
> Protezione Ambientale del Veneto
>
> Dipartimento Provinciale di Treviso
> Via Santa Barbara, 5/a
> 31100 Treviso, Italy
>
> tel: +39 0422 558545
> fax: +39 0422 558516
> e-mail: massimo.bressan using arpa.veneto.it
> ------------------------------------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org
Ecological Forecasting: https://eco.bigelow.org/
[[alternative HTML version deleted]]
More information about the R-help
mailing list