[R] aggregate produces results in unexpected format

Sorkin, John j@ork|n @end|ng |rom @om@um@ry|@nd@edu
Wed Dec 11 21:31:10 CET 2024


I am trying to use the aggregate function to run a function, catsbydat2, that produces the mean, minimum, maximum, and number of observations of the values in a dataframe, inJan2Test, by levels of the dataframe variable MyDay. The output should be in the form of a dataframe.

#my code:
# This function should process a data frame and return a data frame
# containing the mean, minimum, maximum, and number of observations
# in the data frame for each level of MyDay.
catsbyday2 <- function(df){
  # Create a matrix to hold the calculated values.
  xx <- matrix(nrow=1,ncol=4)
  # Give names to the columns.
  colnames(xx) <- c("Mean","min","max","Nobs")
  cat("This is the matrix that will hold the results\n",xx,"\n")

  # For each level of the indexing variable, MyDay, compute the
  # mean, minimum, maximum, and number of observations in the
  # dataframe passed to the function.
  xx[,1] <- mean(df)
  xx[,2] <- min(df)
  xx[,3] <- max(df)
  xx[,4] <- length(df)
  cat("These are the dimensions of the matrix in the function",dim(xx),"\n")
  print(xx)
  return(xx)
}

# Create data frame
inJan2Test <- data.frame(MyDay=rep(c(1,2,3),4),AveragePM2_5=c(10,20,30,
                                                              11,21,31,
                                                              12,22,32,
                                                              15,25,35))
str(inJan2Test)
cat("This is the data frame","\n")
inJan2Test

xx <- aggregate(inJan2Test[,"AveragePM2_5"],list(inJan2Test[,"MyDay"]),catsbyday2,simplify=FALSE)
xx
class(xx)
str(xx)
names(xx)

# Create a data frame in the format that I expect aggregate would return
examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4))
examplar
str(examplar)


While the output is correct (the mean, mean etc. are correctly calculated), the format of the output is not what I want.  

(1) Although the returned object appears to be a data frame, it does appear to be a "normal" data frame. (see the output of  
(2) The column names I define in the function are not part of the data frame that is created.
(3) The returned values on each row are separated by commas. I would expect them to be separated by spaces.
(4) When I run str() on the output it appears that the output dataframe contains a list. 
> str(xx)
'data.frame':	3 obs. of  2 variables:
 $ Group.1: num  1 2 3
 $ x      :List of 3
  ..$ : num [1, 1:4] 12 10 15 4
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"
  ..$ : num [1, 1:4] 22 20 25 4
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"
  ..$ : num [1, 1:4] 32 30 35 4
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:4] "Mean" "min" "max" "Nobs"

I want it to simply be a numeric dataframe:

mean  min max length
   12      10    15     4
   22      20    25     4
   32      30     35    4

which should return the following str

examplar <- data.frame(mean=c(12,22,32),min=c(10,20,30),max=c(15,25,35),length=c(4,4,4))
examplar
str(examplar)

'data.frame':	3 obs. of  4 variables:
 $ mean  : num  12 22 32
 $ min   : num  10 20 30
 $ max   : num  15 25 35
 $ length: num  4 4 4

John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; 
PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382





More information about the R-help mailing list