[R] aggregate formula - differing results

Ivan Calandra |v@n@c@|@ndr@ @end|ng |rom |e|z@@de
Mon Sep 4 11:44:13 CEST 2023


Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!

Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103", 
"HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41, 
121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83, 
1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2), 
Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 
32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42, 
25.149, 33.427, 20.683, 62.783, 26.417, 35.297), PLATWIDTH = c(38.84, 
NA, 15.33, 30.37, 11.44, 14.88, 13.86, NA, NA, 26.71), PLATTHICK = 
c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 4.58, NA, NA, 9.35), EPA = c(78, 
NA, 78, 54, 72, 49, 56, NA, NA, 56), THICKNESS = c(10.97, NA, 9.36, 6.4, 
5.89, 11.05, 4.9, NA, NA, 10.08), WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 
29.5, 4.5, NA, NA, 23), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT", 
"FLINT", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), 
row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class = 
"data.frame")

1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, 
na.rm = TRUE)

2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using dplyr, I get the same results as #1:
group_by(my_data, RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))

4) It gets weirder: using all columns in #1 give the same results as in 
#2 but different from #1 and #3
aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH, 
PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = mean, 
na.rm = TRUE)

So it seems it is not only due to the notation (cbind() vs. dot). Is it 
a bug? A peculiar thing in my dataset? I tend to think this could be due 
to some variables (or their names) as all notations seem to agree when I 
remove some variables (although I haven't found out which variable(s) is 
(are) at fault), e.g.:

my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103", 
"HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41, 
121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83, 
1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2), 
Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 
32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42, 
25.149, 33.427, 20.683, 62.783, 26.417, 35.297), RAWMAT = c("FLINT", 
"FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS", 
"HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 
113L, 114L, 115L), class = "data.frame")

aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, data = 
my_data2, FUN = mean, na.rm = TRUE)

aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)

group_by(my_data2, RAWMAT) %>%
   summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))


Thank you in advance for any hint.
Best wishes,
Ivan




	*LEIBNIZ-ZENTRUM*
*FÜR ARCHÄOLOGIE*

*Dr. Ivan CALANDRA*
**Head of IMPALA (IMaging Platform At LeizA)

*MONREPOS* Archaeological Research Centre, Schloss Monrepos
56567 Neuwied, Germany

T: +49 2631 9772 243
T: +49 6131 8885 543
ivan.calandra using leiza.de

leiza.de <http://www.leiza.de/>
<http://www.leiza.de/>
ORCID <https://orcid.org/0000-0003-3816-6359>
ResearchGate
<https://www.researchgate.net/profile/Ivan_Calandra>

LEIZA is a foundation under public law of the State of 
Rhineland-Palatinate and the City of Mainz. Its headquarters are in 
Mainz. Supervision is carried out by the Ministry of Science and Health 
of the State of Rhineland-Palatinate. LEIZA is a research museum of the 
Leibniz Association.


More information about the R-help mailing list