[R] Summarizing select columns in a data frame
David Winsemius
dw|n@em|u@ @end|ng |rom comc@@t@net
Sun Jan 17 23:59:20 CET 2021
On 1/17/21 12:15 PM, Bernard McGarvey wrote:
> I have a data frame that consists of several factor columns say A, B, C, D, and E and several columns containing numerical data, say X1, X2, .... X10. I would like to create statistics of some of the numerical columns by some of the factor columns. For example,
>
> Calculate the mean, min, and max of variables X1 and X7, by factors A, and E. The results should look like the table below:
>
> Factor A Factor E mean(X1) min(x1) max(X1) mean(X7) min(x7) max(X7) mean(X10) min(x10) max(X10)
> A1 E1
> A1 E2
> A1 E3
> A2 E1
> A2 E2
> A2 E3
>
> I would like the results to be returned to a data frame or other object that I can write out using the write.csv function. I have looked at the summarize and numSummary functions but they do not appear to be flexible enough to do the above.
The `aggregate` function will do the subsetting and function application.
> dfrm <- cbind(dfrm, matrix(rnorm(600), ncol=10 ) ); names(dfrm)[3:12]
<- paste0("X", 1:10)
> str(dfrm)
'data.frame': 60 obs. of 12 variables:
$ Factor_A: Factor w/ 2 levels "A1","A2": 1 1 1 2 2 2 1 1 1 2 ...
$ Factor_B: Factor w/ 3 levels "E1","E2","E3": 1 2 3 1 2 3 1 2 3 1 ...
$ X1 : num -0.02116 -0.00049 0.12875 -0.05412 0.51886 ...
$ X2 : num 1.6799 -0.0963 -0.5727 -0.3638 -0.322 ...
$ X3 : num -0.349 0.267 -0.666 -0.329 0.902 ...
$ X4 : num 0.1125 -0.5384 0.0924 0.6849 -0.4194 ...
$ X5 : num -0.421 0.372 1.316 1.323 -0.03 ...
$ X6 : num -0.0767 1.4972 0.1967 -0.7092 -1.0943 ...
$ X7 : num 0.1771 -0.2136 -1.0818 -0.0671 2.0015 ...
$ X8 : num 1.456 -0.383 -0.47 0.965 0.569 ...
$ X9 : num -1.795 -0.4546 0.0069 1.2245 -0.395 ...
$ X10 : num -1.931 1.708 0.274 0.73 -0.995 ...
aggregate( dfrm[ , c("X1", "X7", "X10")], # columns to analyze
dfrm[ c("Factor_A", "Factor_B")], # classifying
columns
FUN=function (x) c(mn =mean(x), min=min(x),
max=max(x) ) ) # desired "summarizers"
#--- result----
Factor_A Factor_B X1.mn X1.min X1.max X7.mn
X7.min X7.max
1 A1 E1 0.187513792 -0.866094155 2.310960164 0.22489729
-0.91442493 1.94095786
2 A2 E1 0.078361707 -1.515410191 1.382420050 -0.51309155
-1.67026123 0.70869034
3 A1 E2 -0.267416858 -1.995131138 1.392115793 -0.04772929
-2.45426692 2.02225946
4 A2 E2 -0.069807208 -0.703073589 1.879448658 -0.37770923
-2.66221239 2.00152154
5 A1 E3 -0.007800886 -1.297561250 1.216627848 -0.30395411
-1.08181218 1.09764895
6 A2 E3 -0.054466856 -1.577891927 1.674719118 0.35594015
-1.20865279 2.25765422
X10.mn X10.min X10.max
1 -0.3458888 -2.0312811 1.1483179
2 -0.1021727 -1.3230372 0.8045472
3 0.3514645 -3.2334010 1.7075298
4 -0.4988984 -2.1091311 0.5857192
5 0.2297461 -1.1336967 0.8483935
6 0.3700621 -1.5609424 2.2792024
--
David
>
> Any help would be appreciated,
>
> Thanks
>
> Bernard McGarvey
> Director, Fort Myers Beach Lions Foundation, Inc.
> Retired (Lilly Engineering Fellow).
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list