[R] About the interaction A:B

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Fri Mar 5 18:19:15 CET 2010


You neglected to state your name and affiliation, and your question 
demonstrates an allergy to R documentation.

Frank

blue sky wrote:
> The following is the code for the model.matrix. But it still doesn't
> answer why A:B is interpreted differently in Y~A+B+A:B and Y~A:B. By
> 'why', I mean how R internally does it and what is the rational behind
> the way of doing it?
> 
> And it didn't answer why in the model.matrix of Y~A, there are a-1
> terms from A plus the intercept, but in the model.matrix of Y~A:B,
> there are a*b terms (rather than a*b-1 terms) plus the intercept.
> Since the one coefficient of the lm of Y~A:B is going to be NA, why
> bother to include the corresponding term in the model matrix?
> 
> ############code below
> 
> set.seed(0)
> 
> a=3
> b=4
> 
> AB_effect=data.frame(
>   name=paste(
>     unlist(
>       do.call(
>         rbind
>         , rep(list(paste('A', letters[1:a],sep='')), b)
>         )
>       )
>     , unlist(
>       do.call(
>         cbind
>         , rep(list(paste('B', letters[1:b],sep='')), a)
>         )
>       )
>     , sep=':'
>     )
>   , value=rnorm(a*b)
>   , stringsAsFactors=F
>   )
> 
> max_n=10
> n=sample.int(max_n, a*b, replace=T)
> 
> AB=mapply(function(name, n){rep(name,n)}, AB_effect$name, n)
> 
> Y=AB_effect$value[match(unlist(AB), AB_effect$name)]
> 
> Y=Y+a*b*rnorm(length(Y))
> 
> sub_fr=as.data.frame(do.call(rbind, strsplit(unlist(AB), ':')))
> rownames(sub_fr)=NULL
> colnames(sub_fr)=c('A', 'B')
> 
> fr=data.frame(Y=Y,sub_fr)
> 
> my_subset=function(amm) {
>   coding=apply(
>     amm
>     ,1
>     , function(x) {
>       paste(x, collapse='')
>     }
>     )
>   amm[match(unique(coding), coding),]
> }
> 
> my_subset(model.matrix(Y ~ A*B,fr))
> my_subset(model.matrix(Y ~ (A+B)^2,fr))
> my_subset(model.matrix(Y ~ A + B + A:B,fr))
> my_subset(model.matrix(Y ~ A:B - 1,fr))
> my_subset(model.matrix(Y ~ A:B,fr))
> 
> On Fri, Mar 5, 2010 at 8:45 AM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> The way to understand this is to look at the output of model.matrix:
>>
>> model.matrix(fo, fr)
>>
>> for each formula you tried.  If your data is large you will have to
>> use a subset not to be overwhelmed with output.
>>
>> On Fri, Mar 5, 2010 at 9:08 AM, blue sky <bluesky315 at gmail.com> wrote:
>>> Suppose, 'fr' is data.frame with columns 'Y', 'A' and 'B'. 'A' has levels 'Aa'
>>> 'Ab' and 'Ac', and 'B' has levels 'Ba', 'Bb', 'Bc' and 'Bd'. 'Y'
>>> columns are numbers.
>>>
>>> I tried the following three sets of commands. I understand that A*B is
>>> equivalent to A+B+A:B. However, A:B in A+B+A:B is different from A:B
>>> just by itself (see the 3rd and 4th set of commands). Would you please
>>> help me understand why the meanings of A:B are different in different
>>> contexts?
>>>
>>> I also see the coefficient of AAc:BBd is NA (the last set of
>>> commands). I'm wondering why this coefficient is not removed from the
>>> 'coefficients' vector. Since lm(Y~A) has coefficients for (intercept),
>>> Ab, Ac (there are no NA's), I think that it is reasonable to make sure
>>> that there are no NA's when there are interaction terms, namely, A:B
>>> in this case.
>>>
>>> Thank you for answering my questions!
>>>
>>>> alm=lm(Y ~ A*B,fr)
>>>> alm$coefficients
>>> (Intercept)         AAb         AAc         BBb         BBc         BBd
>>>  -3.548176   -2.086586    7.003857    4.367800   11.887356   -3.470840
>>>   AAb:BBb     AAc:BBb     AAb:BBc     AAc:BBc     AAb:BBd     AAc:BBd
>>>  5.160865  -11.858425  -12.853116  -20.289611    6.727401   -2.327173
>>>> alm=lm(Y ~ A + B + A:B,fr)
>>>> alm$coefficients
>>> (Intercept)         AAb         AAc         BBb         BBc         BBd
>>>  -3.548176   -2.086586    7.003857    4.367800   11.887356   -3.470840
>>>   AAb:BBb     AAc:BBb     AAb:BBc     AAc:BBc     AAb:BBd     AAc:BBd
>>>  5.160865  -11.858425  -12.853116  -20.289611    6.727401   -2.327173
>>>> alm=lm(Y ~ A:B - 1,fr)
>>>> alm$coefficients
>>>  AAa:BBa    AAb:BBa    AAc:BBa    AAa:BBb    AAb:BBb    AAc:BBb    AAa:BBc
>>> -3.5481765 -5.6347625  3.4556808  0.8196231  3.8939016 -4.0349449  8.3391795
>>>  AAb:BBc    AAc:BBc    AAa:BBd    AAb:BBd    AAc:BBd
>>> -6.6005222 -4.9465744 -7.0190168 -2.3782017 -2.3423322
>>>> alm=lm(Y ~ A:B,fr)
>>>> alm$coefficients
>>> (Intercept)     AAa:BBa     AAb:BBa     AAc:BBa     AAa:BBb     AAb:BBb
>>> -2.34233221 -1.20584424 -3.29243033  5.79801305  3.16195534  6.23623377
>>>   AAc:BBb     AAa:BBc     AAb:BBc     AAc:BBc     AAa:BBd     AAb:BBd
>>> -1.69261273 10.68151168 -4.25819000 -2.60424217 -4.67668454 -0.03586951
>>>   AAc:BBd
>>>        NA
>>>

-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list