[R] aggregate produces results in unexpected format

Enrico Schumann e@ @end|ng |rom enr|co@chum@nn@net
Wed Dec 11 21:54:04 CET 2024


On Wed, 11 Dec 2024, Sorkin, John writes:

> I am trying to use the aggregate function to run a function, catsbydat2, that produces the mean, minimum, maximum, and number of observations of the values in a dataframe, inJan2Test, by levels of the dataframe variable MyDay. The output should be in the form of a dataframe.
>
> #my code:
> # This function should process a data frame and return a data frame
> # containing the mean, minimum, maximum, and number of observations
> # in the data frame for each level of MyDay.
> catsbyday2 <- function(df){
>   # Create a matrix to hold the calculated values.
>   xx <- matrix(nrow=1,ncol=4)
>   # Give names to the columns.
>   colnames(xx) <- c("Mean","min","max","Nobs")
>   cat("This is the matrix that will hold the results\n",xx,"\n")
>
>   # For each level of the indexing variable, MyDay, compute the
>   # mean, minimum, maximum, and number of observations in the
>   # dataframe passed to the function.
>   xx[,1] <- mean(df)
>   xx[,2] <- min(df)
>   xx[,3] <- max(df)
>   xx[,4] <- length(df)
>   cat("These are the dimensions of the matrix in the function",dim(xx),"\n")
>   print(xx)
>   return(xx)
> }
>
> # Create data frame
> inJan2Test <- data.frame(MyDay=rep(c(1,2,3),4),AveragePM2_5=c(10,20,30,
>                                                               11,21,31,
>                                                               12,22,32,
>                                                               15,25,35))
> str(inJan2Test)
> cat("This is the data frame","\n")
> inJan2Test
>
> xx <- aggregate(inJan2Test[,"AveragePM2_5"],list(inJan2Test[,"MyDay"]),catsbyday2,simplify=FALSE)
> xx
> class(xx)
> str(xx)
> names(xx)
>
> # Create a data frame in the format that I expect aggregate would return
> examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4))
> examplar
> str(examplar)
>
>
> While the output is correct (the mean, mean etc. are correctly calculated), the format of the output is not what I want.  
>
> (1) Although the returned object appears to be a data frame, it does appear to be a "normal" data frame. (see the output of  
> (2) The column names I define in the function are not part of the data frame that is created.
> (3) The returned values on each row are separated by commas. I would expect them to be separated by spaces.
> (4) When I run str() on the output it appears that the output dataframe contains a list. 
>> str(xx)
> 'data.frame':	3 obs. of  2 variables:
>  $ Group.1: num  1 2 3
>  $ x      :List of 3
>   ..$ : num [1, 1:4] 12 10 15 4
>   .. ..- attr(*, "dimnames")=List of 2
>   .. .. ..$ : NULL
>   .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"
>   ..$ : num [1, 1:4] 22 20 25 4
>   .. ..- attr(*, "dimnames")=List of 2
>   .. .. ..$ : NULL
>   .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"
>   ..$ : num [1, 1:4] 32 30 35 4
>   .. ..- attr(*, "dimnames")=List of 2
>   .. .. ..$ : NULL
>   .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"
>
> I want it to simply be a numeric dataframe:
>
> mean  min max length
>    12      10    15     4
>    22      20    25     4
>    32      30     35    4
>
> which should return the following str
>
> examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4))
> examplar
> str(examplar)
>
> 'data.frame':	3 obs. of  4 variables:
>  $ mean  : num  12 22 32
>  $ min   : num  10 20 30
>  $ max   : num  15 25 35
>  $ length: num  4 4 4

You'll no doubt get answers that use 'aggregate', but
for such calculations I find 'tapply' much easier/clearer:

    res <- tapply(inJan2Test$AveragePM2_5,  ## what to compute on
                  inJan2Test$MyDay,         ## what to group by
                  function(x) c(mean = mean(x),  ## what to do for each group
                                min = min(x),
                                max = max(x),                            
                                length = length(x)))

The result will be a list of vectors, which you can
bind together:

    do.call(rbind, res)
    ##   min max mean length
    ## 1  10  15   12      4
    ## 2  20  25   22      4
    ## 3  30  35   32      4


(Though the result is a numeric matrix. But that is
 only one 'as.data.frame' away from a data.frame, if it
 has to be one.)

kind regards
    Enrico




> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; 
> PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
> Senior Statistician University of Maryland Center for Vascular Research;
>
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382

-- 
Enrico Schumann
Lucerne, Switzerland
https://enricoschumann.net



More information about the R-help mailing list