[R] Summarizing select columns in a data frame

Bernard Comcast mcg@rvey@bern@rd @end|ng |rom comc@@t@net
Mon Jan 18 00:19:30 CET 2021


Thanks David

Bernard
Sent from my iPhone so please excuse the spelling!"

> On Jan 17, 2021, at 5:59 PM, David Winsemius <dwinsemius using comcast.net> wrote:
> 
> 
>> On 1/17/21 12:15 PM, Bernard McGarvey wrote:
>> I have a data frame that consists of several factor columns say A, B, C, D, and E and several columns containing numerical data, say X1, X2, .... X10. I would like to create statistics of some of the numerical columns by some of the factor columns. For example,
>> 
>> Calculate the mean, min, and max of variables X1 and X7, by factors A, and E. The results should look like the table below:
>> 
>> Factor A Factor E     mean(X1) min(x1) max(X1) mean(X7) min(x7) max(X7) mean(X10) min(x10) max(X10)
>> A1        E1
>> A1        E2
>> A1        E3
>> A2        E1
>> A2        E2
>> A2        E3
>> 
>> I would like the results to be returned to a data frame or other object that I can write out using the write.csv function. I have looked at the summarize and numSummary functions but they do not appear to be flexible enough to do the above.
> 
> 
> The `aggregate` function will do the subsetting and function application.
> 
> > dfrm <- cbind(dfrm, matrix(rnorm(600), ncol=10 ) ); names(dfrm)[3:12] <- paste0("X", 1:10)
> > str(dfrm)
> 'data.frame':    60 obs. of  12 variables:
>  $ Factor_A: Factor w/ 2 levels "A1","A2": 1 1 1 2 2 2 1 1 1 2 ...
>  $ Factor_B: Factor w/ 3 levels "E1","E2","E3": 1 2 3 1 2 3 1 2 3 1 ...
>  $ X1      : num  -0.02116 -0.00049 0.12875 -0.05412 0.51886 ...
>  $ X2      : num  1.6799 -0.0963 -0.5727 -0.3638 -0.322 ...
>  $ X3      : num  -0.349 0.267 -0.666 -0.329 0.902 ...
>  $ X4      : num  0.1125 -0.5384 0.0924 0.6849 -0.4194 ...
>  $ X5      : num  -0.421 0.372 1.316 1.323 -0.03 ...
>  $ X6      : num  -0.0767 1.4972 0.1967 -0.7092 -1.0943 ...
>  $ X7      : num  0.1771 -0.2136 -1.0818 -0.0671 2.0015 ...
>  $ X8      : num  1.456 -0.383 -0.47 0.965 0.569 ...
>  $ X9      : num  -1.795 -0.4546 0.0069 1.2245 -0.395 ...
>  $ X10     : num  -1.931 1.708 0.274 0.73 -0.995 ...
> 
> 
> 
>  aggregate(  dfrm[ ,  c("X1", "X7", "X10")],    # columns to analyze
> 
>                       dfrm[ c("Factor_A", "Factor_B")],  # classifying columns
> 
>                       FUN=function (x) c(mn =mean(x), min=min(x), max=max(x) ) )  # desired "summarizers"
> 
> #--- result----
> 
>   Factor_A Factor_B        X1.mn       X1.min       X1.max X7.mn      X7.min      X7.max
> 1       A1       E1  0.187513792 -0.866094155  2.310960164 0.22489729 -0.91442493  1.94095786
> 2       A2       E1  0.078361707 -1.515410191  1.382420050 -0.51309155 -1.67026123  0.70869034
> 3       A1       E2 -0.267416858 -1.995131138  1.392115793 -0.04772929 -2.45426692  2.02225946
> 4       A2       E2 -0.069807208 -0.703073589  1.879448658 -0.37770923 -2.66221239  2.00152154
> 5       A1       E3 -0.007800886 -1.297561250  1.216627848 -0.30395411 -1.08181218  1.09764895
> 6       A2       E3 -0.054466856 -1.577891927  1.674719118 0.35594015 -1.20865279  2.25765422
>       X10.mn    X10.min    X10.max
> 1 -0.3458888 -2.0312811  1.1483179
> 2 -0.1021727 -1.3230372  0.8045472
> 3  0.3514645 -3.2334010  1.7075298
> 4 -0.4988984 -2.1091311  0.5857192
> 5  0.2297461 -1.1336967  0.8483935
> 
> 6  0.3700621 -1.5609424  2.2792024
> 
> 
> -- 
> 
> David
> 
>> 
>> Any help would be appreciated,
>> 
>> Thanks
>> 
>> Bernard McGarvey
>> Director, Fort Myers Beach Lions Foundation, Inc.
>> Retired (Lilly Engineering Fellow).
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list