[R] matching columns of model matrix to those in original data.frame
Ross Boylan
ross at biostat.ucsf.edu
Sat Jul 27 04:23:10 CEST 2013
What is a reliable way to go from a column of a model matrix back to the column (or columns) of the original data source used to make the model
matrix? I can come up with a method that seems to work, but I don't see guarantees in the documentation that it will.
In particular, does the order of the term.labels match the order of columns for factors in a terms object? The documentation says the model.matrix
assign attribute uses the ordering of terms.labels.
If anyone can tell me if this approach is reliable, or of one that is, I would appreciate it.
Ross Boylan
Proposed function and a little example follow.
# return a vector v such that data[,v[i]] contributed to mm[,i]
# mm = model matrix produced by
# form = formula
# data = data
reverse.map <- function(mm, form, data){
tt <- terms(form, data=data)
ttf <- attr(tt, "factors")
mmi <- attr(mm, "assign")
# this depends on assign using same order as columns of factors
# entries in mmi that are 0 (the intercept) are silently dropped
ttf2 <- ttf[,mmi]
# take the first row that contributes
r <- apply(ttf2, 2, function(is) rownames(ttf)[is > 0][1])
match(r, colnames(data))
}
> ### experiment with mapping model matrix to original columns
> df <- sp2b[sample(nrow(sp2b), 8), c("pEthnic", "ethnic_sg", "rac_gay")]
> form <- ~pEthnic+ethnic_sg*rac_gay
> mm <- model.matrix(form, df)
> tt <- terms(form, data=df)
> ttf <- attr(tt, "factors")
> mmi <- attr(mm, "assign")
> df
pEthnic ethnic_sg rac_gay
1366 Afr Amer Afr Amer 3.25
3052 Afr Amer Afr Amer 1.75
3012 Latino Afr Amer 2.00
369 Afr Amer Asian/PI 2.00
529 White Asian/PI 2.00
194 Asian/PI Asian/PI 3.25
126 White Asian/PI 2.25
2147 Latino Latino 2.75
> colnames(mm)
[1] "(Intercept)" "pEthnicAsian/PI"
[3] "pEthnicLatino" "pEthnicOther"
[5] "pEthnicWhite" "ethnic_sgAsian/PI"
[7] "ethnic_sgLatino" "rac_gay"
[9] "ethnic_sgAsian/PI:rac_gay" "ethnic_sgLatino:rac_gay"
> ttf # term "factors"
pEthnic ethnic_sg rac_gay ethnic_sg:rac_gay
pEthnic 1 0 0 0
ethnic_sg 0 1 0 1
rac_gay 0 0 1 1
> mmi #model matrix "assign"
[1] 0 1 1 1 1 2 2 3 4 4
> reverse.map(mm, form, df)
[1] 1 1 1 1 2 2 3 2 2
More information about the R-help
mailing list