[R] manipulating data via the factors of a term in a lm()
Michael Friendly
friendly at yorku.ca
Thu Apr 16 18:05:40 CEST 2009
[Env: R 2.8.1, Win XP]
For a package I'm working on, I need two small helper functions to
manipulate the
data used in an lm or mlm object, given the *name* of a term, which will
always be
a character string representing
a factor ("A") or an interaction of two or more factors ("A:B", "A:B:C",
...). I'd
prefer not to have to require() additional packages if possible. The
functions I need
are
[1] termMeans(mod, term): Find the (possibly vector) response means for
a model term, specified by a string, "A", "A:B", ...
[2] dataIndices(mod, term): Find indices of the observations in the data
corresponding to the combination of levels in the term
Here is a small example to illustrate what I need: a 3-factor mlm
factors <- expand.grid(A=factor(1:3),B=factor(1:2),C=factor(1:2))
n <- nrow(factors)
responses <-data.frame(Y1=10+round(10*rnorm(n)),Y2=10+round(10*rnorm(n)))
test <- data.frame(factors, responses)
mod <- lm(cbind(Y1,Y2) ~ A*B, data=test)
[1] Find the means for a model term, specified by a string, "A", "A:B",
"A:C", ...
Similar to:
> # main effect means
> aggregate(responses,by=list(test$A), mean)
Group.1 Y1 Y2
1 1 12.25 9.50
2 2 10.75 6.25
3 3 11.75 11.00
4 4 10.25 4.25
>
> #interaction means
> aggregate(responses,by=list(test$A,test$B), mean)
Group.1 Group.2 Y1 Y2
1 1 1 16.0 5.5
2 2 1 13.5 8.0
3 3 1 16.0 7.5
4 4 1 10.0 10.5
5 1 2 8.5 13.5
6 2 2 8.0 4.5
7 3 2 7.5 14.5
8 4 2 10.5 -2.0
>
Here is what I've tried so far:
# find means for a model term in an lm or mlm object
termMeans <- function(mod, term){
data <- model.frame(mod)
Y <- model.response(data)
factors <- data[, sapply(data, is.factor), drop=FALSE]
term.factors <- strsplit(term, ":")
means <- aggregate(Y, by=factors[,term.factors], mean)
}
but I'm missing something. Maybe there is an easier path using apply()
or by()?
> termMeans(mod, "A")
Error in .subset(x, j) : invalid subscript type 'list'
[2] For some plots, I need to be able to assign symbol characteristics
(col, pch, etc) by indexing a vector
in the form of arguments
col=col[index], pch=pch[index], ...
where index is a vector of integers 1:# levels in a model term. Thus,
for my example, with the data 'test'
> test
A B C Y1 Y2
1 1 1 1 22 6
2 2 1 1 22 2
3 3 1 1 16 11
4 4 1 1 2 13
5 1 2 1 0 19
6 2 2 1 -2 -7
7 3 2 1 5 8
8 4 2 1 37 -9
9 1 1 2 10 5
10 2 1 2 5 14
11 3 1 2 16 4
12 4 1 2 18 8
13 1 2 2 17 8
14 2 2 2 18 16
15 3 2 2 10 21
16 4 2 2 -16 5
I need a similar function dataIndex(mod, term):
dataIndex(mod,"A")
--> same as
> test$A
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
dataIndex(mod,"A:B") should deliver
[1] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
# find sequential indices for observations in a data frame
# corresponding to the unique combinations of the levels
# of a given model term
dataIndex <- function(mod, term){
data <- model.frame(mod)
Y <- model.response(data)
factors <- data[, sapply(data, is.factor), drop=FALSE]
term.factors <- strsplit(term, ":")
???
}
thanks,
-Michael
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list