[R] Summarizing select columns in a data frame

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Sun Jan 17 23:59:20 CET 2021


On 1/17/21 12:15 PM, Bernard McGarvey wrote:
> I have a data frame that consists of several factor columns say A, B, C, D, and E and several columns containing numerical data, say X1, X2, .... X10. I would like to create statistics of some of the numerical columns by some of the factor columns. For example,
>
> Calculate the mean, min, and max of variables X1 and X7, by factors A, and E. The results should look like the table below:
>
> Factor A Factor E     mean(X1) min(x1) max(X1) mean(X7) min(x7) max(X7) mean(X10) min(x10) max(X10)
> A1        E1
> A1        E2
> A1        E3
> A2        E1
> A2        E2
> A2        E3
>
> I would like the results to be returned to a data frame or other object that I can write out using the write.csv function. I have looked at the summarize and numSummary functions but they do not appear to be flexible enough to do the above.


The `aggregate` function will do the subsetting and function application.

 > dfrm <- cbind(dfrm, matrix(rnorm(600), ncol=10 ) ); names(dfrm)[3:12] 
<- paste0("X", 1:10)
 > str(dfrm)
'data.frame':    60 obs. of  12 variables:
  $ Factor_A: Factor w/ 2 levels "A1","A2": 1 1 1 2 2 2 1 1 1 2 ...
  $ Factor_B: Factor w/ 3 levels "E1","E2","E3": 1 2 3 1 2 3 1 2 3 1 ...
  $ X1      : num  -0.02116 -0.00049 0.12875 -0.05412 0.51886 ...
  $ X2      : num  1.6799 -0.0963 -0.5727 -0.3638 -0.322 ...
  $ X3      : num  -0.349 0.267 -0.666 -0.329 0.902 ...
  $ X4      : num  0.1125 -0.5384 0.0924 0.6849 -0.4194 ...
  $ X5      : num  -0.421 0.372 1.316 1.323 -0.03 ...
  $ X6      : num  -0.0767 1.4972 0.1967 -0.7092 -1.0943 ...
  $ X7      : num  0.1771 -0.2136 -1.0818 -0.0671 2.0015 ...
  $ X8      : num  1.456 -0.383 -0.47 0.965 0.569 ...
  $ X9      : num  -1.795 -0.4546 0.0069 1.2245 -0.395 ...
  $ X10     : num  -1.931 1.708 0.274 0.73 -0.995 ...



  aggregate(  dfrm[ ,  c("X1", "X7", "X10")],    # columns to analyze

                       dfrm[ c("Factor_A", "Factor_B")],  # classifying 
columns

                       FUN=function (x) c(mn =mean(x), min=min(x), 
max=max(x) ) )  # desired "summarizers"

#--- result----

   Factor_A Factor_B        X1.mn       X1.min       X1.max X7.mn      
X7.min      X7.max
1       A1       E1  0.187513792 -0.866094155  2.310960164 0.22489729 
-0.91442493  1.94095786
2       A2       E1  0.078361707 -1.515410191  1.382420050 -0.51309155 
-1.67026123  0.70869034
3       A1       E2 -0.267416858 -1.995131138  1.392115793 -0.04772929 
-2.45426692  2.02225946
4       A2       E2 -0.069807208 -0.703073589  1.879448658 -0.37770923 
-2.66221239  2.00152154
5       A1       E3 -0.007800886 -1.297561250  1.216627848 -0.30395411 
-1.08181218  1.09764895
6       A2       E3 -0.054466856 -1.577891927  1.674719118 0.35594015 
-1.20865279  2.25765422
       X10.mn    X10.min    X10.max
1 -0.3458888 -2.0312811  1.1483179
2 -0.1021727 -1.3230372  0.8045472
3  0.3514645 -3.2334010  1.7075298
4 -0.4988984 -2.1091311  0.5857192
5  0.2297461 -1.1336967  0.8483935

6  0.3700621 -1.5609424  2.2792024


-- 

David

>
> Any help would be appreciated,
>
> Thanks
>
> Bernard McGarvey
> Director, Fort Myers Beach Lions Foundation, Inc.
> Retired (Lilly Engineering Fellow).
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list