[R] manipulating data via the factors of a term in a lm()

Michael Friendly friendly at yorku.ca
Thu Apr 16 18:05:40 CEST 2009


[Env: R 2.8.1, Win XP]

For a package I'm working on, I need two small helper functions to 
manipulate the
data used in an lm or mlm object, given the *name* of a term, which will 
always be
a character string representing
a factor ("A") or an interaction of two or more factors ("A:B", "A:B:C", 
...).  I'd
prefer not to have to require() additional packages if possible. The 
functions I need
are
[1] termMeans(mod, term): Find the (possibly vector) response means for 
a model term, specified by a string, "A", "A:B", ...
[2] dataIndices(mod, term): Find indices of the observations in the data 
corresponding to the combination of levels in the term

Here is a small example to illustrate what I need: a 3-factor mlm

factors <- expand.grid(A=factor(1:3),B=factor(1:2),C=factor(1:2))
n <- nrow(factors)
responses <-data.frame(Y1=10+round(10*rnorm(n)),Y2=10+round(10*rnorm(n)))
test <- data.frame(factors, responses)
mod <- lm(cbind(Y1,Y2) ~ A*B, data=test)

[1] Find the means for a model term, specified by a string, "A", "A:B", 
"A:C", ...
Similar to:

 > # main effect means
 >  aggregate(responses,by=list(test$A), mean)
  Group.1    Y1    Y2
1       1 12.25  9.50
2       2 10.75  6.25
3       3 11.75 11.00
4       4 10.25  4.25
 > 
 > #interaction means
 >  aggregate(responses,by=list(test$A,test$B), mean)
  Group.1 Group.2   Y1   Y2
1       1       1 16.0  5.5
2       2       1 13.5  8.0
3       3       1 16.0  7.5
4       4       1 10.0 10.5
5       1       2  8.5 13.5
6       2       2  8.0  4.5
7       3       2  7.5 14.5
8       4       2 10.5 -2.0
 > 

Here is what I've tried so far:

# find means for a model term in an lm or mlm object
termMeans <- function(mod, term){
    data <- model.frame(mod)
    Y <- model.response(data)
    factors <- data[, sapply(data, is.factor), drop=FALSE]
    term.factors <- strsplit(term, ":")
    means <- aggregate(Y, by=factors[,term.factors], mean)
}

but I'm missing something.  Maybe there is an easier path using apply() 
or by()?

 > termMeans(mod, "A")
Error in .subset(x, j) : invalid subscript type 'list'

[2] For some plots, I need to be able to assign symbol characteristics 
(col, pch, etc) by indexing a vector
in the form of arguments
     col=col[index], pch=pch[index], ...

where index is a vector of integers 1:# levels in a model term.  Thus, 
for my example, with the data 'test'

 > test
   A B C  Y1 Y2
1  1 1 1  22  6
2  2 1 1  22  2
3  3 1 1  16 11
4  4 1 1   2 13
5  1 2 1   0 19
6  2 2 1  -2 -7
7  3 2 1   5  8
8  4 2 1  37 -9
9  1 1 2  10  5
10 2 1 2   5 14
11 3 1 2  16  4
12 4 1 2  18  8
13 1 2 2  17  8
14 2 2 2  18 16
15 3 2 2  10 21
16 4 2 2 -16  5


I need a similar function dataIndex(mod, term):

dataIndex(mod,"A")
--> same as

 > test$A
 [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

dataIndex(mod,"A:B") should deliver

[1] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

# find sequential indices for observations in a data frame
# corresponding to the unique combinations of the levels
# of a given model term

dataIndex <- function(mod, term){
    data <- model.frame(mod)
    Y <- model.response(data)
    factors <- data[, sapply(data, is.factor), drop=FALSE]
    term.factors <- strsplit(term, ":")
    ???
}


thanks,
-Michael


-- 
Michael Friendly     Email: friendly AT yorku DOT ca 
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA




More information about the R-help mailing list