[R] Summarizing select columns in a data frame
Bernard Comcast
mcg@rvey@bern@rd @end|ng |rom comc@@t@net
Mon Jan 18 00:19:30 CET 2021
Thanks David
Bernard
Sent from my iPhone so please excuse the spelling!"
> On Jan 17, 2021, at 5:59 PM, David Winsemius <dwinsemius using comcast.net> wrote:
>
>
>> On 1/17/21 12:15 PM, Bernard McGarvey wrote:
>> I have a data frame that consists of several factor columns say A, B, C, D, and E and several columns containing numerical data, say X1, X2, .... X10. I would like to create statistics of some of the numerical columns by some of the factor columns. For example,
>>
>> Calculate the mean, min, and max of variables X1 and X7, by factors A, and E. The results should look like the table below:
>>
>> Factor A Factor E mean(X1) min(x1) max(X1) mean(X7) min(x7) max(X7) mean(X10) min(x10) max(X10)
>> A1 E1
>> A1 E2
>> A1 E3
>> A2 E1
>> A2 E2
>> A2 E3
>>
>> I would like the results to be returned to a data frame or other object that I can write out using the write.csv function. I have looked at the summarize and numSummary functions but they do not appear to be flexible enough to do the above.
>
>
> The `aggregate` function will do the subsetting and function application.
>
> > dfrm <- cbind(dfrm, matrix(rnorm(600), ncol=10 ) ); names(dfrm)[3:12] <- paste0("X", 1:10)
> > str(dfrm)
> 'data.frame': 60 obs. of 12 variables:
> $ Factor_A: Factor w/ 2 levels "A1","A2": 1 1 1 2 2 2 1 1 1 2 ...
> $ Factor_B: Factor w/ 3 levels "E1","E2","E3": 1 2 3 1 2 3 1 2 3 1 ...
> $ X1 : num -0.02116 -0.00049 0.12875 -0.05412 0.51886 ...
> $ X2 : num 1.6799 -0.0963 -0.5727 -0.3638 -0.322 ...
> $ X3 : num -0.349 0.267 -0.666 -0.329 0.902 ...
> $ X4 : num 0.1125 -0.5384 0.0924 0.6849 -0.4194 ...
> $ X5 : num -0.421 0.372 1.316 1.323 -0.03 ...
> $ X6 : num -0.0767 1.4972 0.1967 -0.7092 -1.0943 ...
> $ X7 : num 0.1771 -0.2136 -1.0818 -0.0671 2.0015 ...
> $ X8 : num 1.456 -0.383 -0.47 0.965 0.569 ...
> $ X9 : num -1.795 -0.4546 0.0069 1.2245 -0.395 ...
> $ X10 : num -1.931 1.708 0.274 0.73 -0.995 ...
>
>
>
> aggregate( dfrm[ , c("X1", "X7", "X10")], # columns to analyze
>
> dfrm[ c("Factor_A", "Factor_B")], # classifying columns
>
> FUN=function (x) c(mn =mean(x), min=min(x), max=max(x) ) ) # desired "summarizers"
>
> #--- result----
>
> Factor_A Factor_B X1.mn X1.min X1.max X7.mn X7.min X7.max
> 1 A1 E1 0.187513792 -0.866094155 2.310960164 0.22489729 -0.91442493 1.94095786
> 2 A2 E1 0.078361707 -1.515410191 1.382420050 -0.51309155 -1.67026123 0.70869034
> 3 A1 E2 -0.267416858 -1.995131138 1.392115793 -0.04772929 -2.45426692 2.02225946
> 4 A2 E2 -0.069807208 -0.703073589 1.879448658 -0.37770923 -2.66221239 2.00152154
> 5 A1 E3 -0.007800886 -1.297561250 1.216627848 -0.30395411 -1.08181218 1.09764895
> 6 A2 E3 -0.054466856 -1.577891927 1.674719118 0.35594015 -1.20865279 2.25765422
> X10.mn X10.min X10.max
> 1 -0.3458888 -2.0312811 1.1483179
> 2 -0.1021727 -1.3230372 0.8045472
> 3 0.3514645 -3.2334010 1.7075298
> 4 -0.4988984 -2.1091311 0.5857192
> 5 0.2297461 -1.1336967 0.8483935
>
> 6 0.3700621 -1.5609424 2.2792024
>
>
> --
>
> David
>
>>
>> Any help would be appreciated,
>>
>> Thanks
>>
>> Bernard McGarvey
>> Director, Fort Myers Beach Lions Foundation, Inc.
>> Retired (Lilly Engineering Fellow).
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list