[R] matching columns of model matrix to those in original data.frame

Ross Boylan ross at biostat.ucsf.edu
Sat Jul 27 04:23:10 CEST 2013


What is a reliable way to go from a column of a model matrix back to the column (or columns) of the original data source used to make the model 
matrix?  I can come up with a method that seems to work, but I don't see guarantees in the documentation that it will.

In particular, does the order of the term.labels match the order of columns for factors in a terms object?  The documentation says the model.matrix 
assign attribute uses the ordering of terms.labels.

If anyone can tell me if this approach is reliable, or of one that is, I would appreciate it.

Ross Boylan

Proposed function and a little example follow.

# return a vector v such that data[,v[i]] contributed to mm[,i]
# mm = model matrix produced by
# form = formula
# data = data
reverse.map <- function(mm, form, data){
    tt <- terms(form, data=data)
    ttf <- attr(tt, "factors")
    mmi <- attr(mm, "assign")
    # this depends on assign using same order as columns of factors
    # entries in mmi that are 0 (the intercept) are silently dropped
    ttf2 <- ttf[,mmi]
    # take the first row that contributes
    r <- apply(ttf2, 2, function(is) rownames(ttf)[is > 0][1])
    match(r, colnames(data))
}

> ### experiment with mapping model matrix to original columns
> df <- sp2b[sample(nrow(sp2b), 8), c("pEthnic", "ethnic_sg", "rac_gay")]
> form <- ~pEthnic+ethnic_sg*rac_gay
> mm <- model.matrix(form, df)
> tt <- terms(form, data=df)
> ttf <- attr(tt, "factors")
> mmi <- attr(mm, "assign")
> df
      pEthnic ethnic_sg rac_gay
1366 Afr Amer  Afr Amer    3.25
3052 Afr Amer  Afr Amer    1.75
3012   Latino  Afr Amer    2.00
369  Afr Amer  Asian/PI    2.00
529     White  Asian/PI    2.00
194  Asian/PI  Asian/PI    3.25
126     White  Asian/PI    2.25
2147   Latino    Latino    2.75
> colnames(mm)
 [1] "(Intercept)"               "pEthnicAsian/PI"          
 [3] "pEthnicLatino"             "pEthnicOther"             
 [5] "pEthnicWhite"              "ethnic_sgAsian/PI"        
 [7] "ethnic_sgLatino"           "rac_gay"                  
 [9] "ethnic_sgAsian/PI:rac_gay" "ethnic_sgLatino:rac_gay"  
> ttf  # term "factors"
          pEthnic ethnic_sg rac_gay ethnic_sg:rac_gay
pEthnic         1         0       0                 0
ethnic_sg       0         1       0                 1
rac_gay         0         0       1                 1
> mmi  #model matrix "assign"
 [1] 0 1 1 1 1 2 2 3 4 4
> reverse.map(mm, form, df)
[1] 1 1 1 1 2 2 3 2 2



More information about the R-help mailing list