[R] means over factors in mlm terms
Gabor Grothendieck
ggrothendieck at gmail.com
Tue Nov 21 17:38:32 CET 2006
I am not sure I understand the question -- is it how to get the same
result you did but without warnings? The warnings are, of course,
there since you are trying to take means of factors and its not clear
what that really is supposed to do.
At any rate you could eliminate the factors and then take the means
of what is left or you could convert the factors to numeric if that makes
any sense. Here are a few alternatives using aggregate and
summaryBy from the doBy package
aggregate(data.matrix(soils), soils["Contour"], mean)
# same but removing factors
aggregate(data.matrix(soils[,-(2:4)]), soils["Contour"], mean)
library(doBy)
summaryBy(. ~ Contour, soils, keep.names = TRUE)
# same but removing factors
summaryBy(.~ Contour, soils[,-(3:4)], keep.names = TRUE)
On 11/21/06, Michael Friendly <friendly at yorku.ca> wrote:
> I'm trying to write a function to find the means over factors of the
> responses in a mlm (something I would do easily in SAS with PROC SUMMARY).
>
> The not-working stub of a function to do what I want is below,
> and my problem is that I don't know how to call aggregate (or
> some other function) in the context of terms in a linear model
> extracted from a lm/mlm object.
>
> means.mlm <- function(mod, terms, data) {
> responses <-dimnames(mod$fitted.values)[[2]]
> if (missing(terms)) terms <- mod$terms
> n.terms <- length(terms)
>
> for (term in 1:n.terms){
> label <- attr(terms[term], "term.labels")
> cat(label,":\n")
> means <- aggregate(data, list(label), FUN=mean)
> print(means)
> }
> }
>
> Here is a sample context-- a randomized block design
> with two crossed factors (Contour, Depth) and 9 responses (pH, N, ...
> Conduc). [An R-readable version of the data is appended at the bottom.]
>
> > soils <- read.delim("soils.dat")
> > str(soils)
> 'data.frame': 48 obs. of 14 variables:
> $ Group : int 1 1 1 1 2 2 2 2 3 3 ...
> $ Contour: Factor w/ 3 levels "Depression","Slope",..: 3 3 3 3 3 3 3 3
> 3 3 ...
> $ Depth : Factor w/ 4 levels "0-10","10-30",..: 1 1 1 1 2 2 2 2 3 3 ...
> $ Gp : Factor w/ 12 levels "D0","D1","D3",..: 9 9 9 9 10 10 10 10
> 11 11 ...
> $ Block : int 1 2 3 4 1 2 3 4 1 2 ...
> $ pH : num 5.4 5.65 5.14 5.14 5.14 5.1 4.7 4.46 4.37 4.39 ...
> $ N : num 0.188 0.165 0.26 0.169 0.164 0.094 0.1 0.112 0.112
> 0.058 ...
> $ Dens : num 0.92 1.04 0.95 1.1 1.12 1.22 1.52 1.47 1.07 1.54 ...
> $ P : int 215 208 300 248 174 129 117 170 121 115 ...
> $ Ca : num 16.4 12.3 13.0 11.9 14.2 ...
> $ Mg : num 7.65 5.15 5.68 7.88 8.12 ...
> $ K : num 0.72 0.71 0.68 1.09 0.7 0.81 0.39 0.7 0.74 0.77 ...
> $ Na : num 1.14 0.94 0.6 1.01 2.17 2.67 3.32 3.76 5.74 5.85 ...
> $ Conduc : num 1.09 1.35 1.41 1.64 1.85 3.18 4.16 5.14 5.73 6.45 ...
> >
> I fit a mlm model,
>
> > attach(soils)
> > soils.mod <- lm(cbind(pH,N,Dens,P,Ca,Mg,K,Na,Conduc) ~ Block +
> Contour*Depth)
>
> and then I want to get tables of the means over the factors in the model
> terms. From the console, I can get the means for a factor or model
> term (but with annoying warning messages
>
> > (m1<-aggregate(soils, list(Contour), mean))
> Group.1 Group Contour Depth Gp Block pH N Dens
> P Ca Mg
> 1 Depression 10.5 NA NA NA 2.5 4.691875 0.0873125 1.343125
> 188.1875 7.101250 8.98625
> 2 Slope 6.5 NA NA NA 2.5 4.746250 0.1059375 1.332500
> 159.4375 8.109375 8.32000
> 3 Top 2.5 NA NA NA 2.5 4.570000 0.1125625 1.271875
> 150.8750 8.877500 8.08750
> K Na Conduc
> 1 0.378125 5.823125 6.945625
> 2 0.415625 6.103125 6.963750
> 3 0.605000 4.872500 5.856250
> Warning messages:
> 1: argument is not numeric or logical: returning NA in:
> mean.default(X[[1]], ...)
> 2: argument is not numeric or logical: returning NA in:
> mean.default(X[[2]], ...)
> ... (suppressed) ...
> 9: argument is not numeric or logical: returning NA in:
> mean.default(X[[3]], ...)
> >
>
> When I call my function, I get:
>
> > means.mlm(soils.mod, data=soils)
> Block :
> Error in FUN(X[[1]], ...) : arguments must have same length
> > traceback()
> 6: stop("arguments must have same length")
> 5: FUN(X[[1]], ...)
> 4: lapply(x, tapply, by, FUN, ..., simplify = FALSE)
> 3: aggregate.data.frame(data, list(label), FUN = mean)
> 2: aggregate(data, list(label), FUN = mean)
> 1: means.mlm(soils.mod, data = soils)
> >
>
> The soils data:
>
> soils <-
> structure(list(Group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,
> 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9,
> 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12), Contour =
> structure(c(3,
> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2,
> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1), .Label = c("Depression", "Slope", "Top"), class =
> "factor"),
> Depth = structure(c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4,
> 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4,
> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), .Label = c("0-10",
> "10-30", "30-60", "60-90"), class = "factor"), Gp = structure(c(9,
> 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12,
> 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 1, 1, 1,
> 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), .Label = c("D0",
> "D1", "D3", "D6", "S0", "S1", "S3", "S6", "T0", "T1", "T3",
> "T6"), class = "factor"), Block = c(1, 2, 3, 4, 1, 2, 3,
> 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,
> 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1,
> 2, 3, 4), pH = c(5.4, 5.65, 5.14, 5.14, 5.14, 5.1, 4.7, 4.46,
> 4.37, 4.39, 4.17, 3.89, 3.88, 4.07, 3.88, 3.74, 5.11, 5.46,
> 5.61, 5.85, 4.57, 5.11, 4.78, 6.67, 3.96, 4, 4.12, 4.99,
> 3.8, 3.96, 3.93, 4.02, 5.24, 5.2, 5.3, 5.67, 4.46, 4.91,
> 4.79, 5.36, 3.94, 4.52, 4.35, 4.64, 3.82, 4.24, 4.22, 4.41
> ), N = c(0.188, 0.165, 0.26, 0.169, 0.164, 0.094, 0.1, 0.112,
> 0.112, 0.058, 0.078, 0.07, 0.077, 0.046, 0.055, 0.053, 0.247,
> 0.298, 0.145, 0.186, 0.102, 0.097, 0.122, 0.083, 0.059, 0.05,
> 0.086, 0.048, 0.049, 0.036, 0.048, 0.039, 0.194, 0.256, 0.136,
> 0.127, 0.087, 0.092, 0.047, 0.095, 0.054, 0.051, 0.032, 0.065,
> 0.038, 0.035, 0.03, 0.058), Dens = c(0.92, 1.04, 0.95, 1.1,
> 1.12, 1.22, 1.52, 1.47, 1.07, 1.54, 1.26, 1.42, 1.25, 1.54,
> 1.53, 1.4, 0.94, 0.96, 1.1, 1.2, 1.37, 1.3, 1.3, 1.42, 1.53,
> 1.5, 1.55, 1.46, 1.48, 1.28, 1.42, 1.51, 1, 0.78, 1, 1.13,
> 1.24, 1.47, 1.46, 1.26, 1.6, 1.53, 1.55, 1.46, 1.4, 1.47,
> 1.56, 1.58), P = c(215, 208, 300, 248, 174, 129, 117, 170,
> 121, 115, 112, 117, 127, 91, 91, 79, 261, 300, 242, 229,
> 156, 139, 214, 132, 98, 115, 148, 97, 108, 103, 109, 100,
> 445, 380, 259, 248, 276, 158, 121, 195, 148, 115, 82, 152,
> 105, 100, 97, 130), Ca = c(16.35, 12.25, 13.02, 11.92, 14.17,
> 8.55, 8.74, 9.49, 8.85, 4.73, 6.29, 6.61, 6.41, 3.82, 4.98,
> 5.86, 13.25, 12.3, 9.66, 13.78, 8.58, 8.58, 8.22, 12.68,
> 4.8, 5.06, 6.16, 7.49, 3.82, 4.78, 4.93, 5.66, 12.27, 11.39,
> 9.96, 9.12, 7.24, 7.37, 6.99, 8.59, 4.85, 6.34, 5.99, 4.43,
> 4.65, 4.56, 5.29, 4.58), Mg = c(7.65, 5.15, 5.68, 7.88, 8.12,
> 6.92, 8.16, 9.16, 10.35, 6.91, 7.95, 9.76, 10.96, 6.61, 8,
> 10.14, 7.55, 7.5, 6.76, 7.12, 9.92, 8.69, 7.75, 9.56, 10,
> 8.91, 7.58, 9.38, 8.8, 7.29, 7.47, 8.84, 6.27, 7.55, 8.08,
> 7.04, 9.4, 10.57, 9.91, 8.66, 9.62, 9.78, 9.73, 10.54, 9.85,
> 8.95, 8.37, 9.46), K = c(0.72, 0.71, 0.68, 1.09, 0.7, 0.81,
> 0.39, 0.7, 0.74, 0.77, 0.26, 0.41, 0.56, 0.5, 0.23, 0.41,
> 0.61, 0.68, 0.63, 0.62, 0.63, 0.42, 0.32, 0.55, 0.36, 0.28,
> 0.16, 0.4, 0.24, 0.24, 0.14, 0.37, 0.72, 0.78, 0.45, 0.55,
> 0.43, 0.59, 0.3, 0.48, 0.18, 0.34, 0.22, 0.22, 0.18, 0.33,
> 0.14, 0.14), Na = c(1.14, 0.94, 0.6, 1.01, 2.17, 2.67, 3.32,
> 3.76, 5.74, 5.85, 5.3, 8.3, 9.67, 7.67, 8.78, 11.04, 1.86,
> 2, 1.01, 3.09, 3.67, 4.7, 3.07, 8.3, 6.52, 7.91, 6.39, 9.7,
> 9.57, 9.67, 9.65, 10.54, 1.02, 1.63, 1.97, 1.43, 4.17, 5.07,
> 5.15, 4.17, 7.2, 8.52, 7.02, 7.61, 10.15, 10.51, 8.27, 9.28
> ), Conduc = c(1.09, 1.35, 1.41, 1.64, 1.85, 3.18, 4.16, 5.14,
> 5.73, 6.45, 8.37, 9.21, 10.64, 10.07, 11.26, 12.15, 2.61,
> 1.98, 0.76, 2.85, 3.24, 4.63, 3.67, 8.1, 7.72, 9.78, 9.07,
> 9.13, 11.57, 11.42, 13.32, 11.57, 0.75, 2.2, 2.27, 0.67,
> 5.08, 6.37, 6.82, 3.65, 10.14, 9.74, 8.6, 9.09, 12.26, 11.29,
> 9.51, 12.69)), .Names = c("Group", "Contour", "Depth", "Gp",
> "Block", "pH", "N", "Dens", "P", "Ca", "Mg", "K", "Na", "Conduc"
> ), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
> "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16",
> "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27",
> "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38",
> "39", "40", "41", "42", "43", "44", "45", "46", "47", "48"))
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
> Toronto, ONT M3J 1P3 CANADA
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list