[R] Translating lm.object to SQL, C, etc function

Fri Feb 14 13:54:03 CET 2003

Dear Jeremy,

I've written replacements for the standard R contrast functions that 
produce the kind of more easily parsed (and more readable) contrast names 
that I think you have in mind. I intend to include these in the next 
release of the car package for R but haven't done so yet. Since the code 
isn't very long, I've appended it (and the .Rd documentation file to this 
note). Note that R does separate terms in an interaction with a colon.

I hope that this does what you need.
  John

---------------------------- Contrasts.R -----------------------------

# last modified 2 Dec 2002 by J. Fox
# all of these functions are adapted from functions in the R base package

contr.Treatment <- function (n, base = 1, contrasts = TRUE) {
     if (is.numeric(n) && length(n) == 1)
         levs <- 1:n
     else {
         levs <- n
         n <- length(n)
     }
     lev.opt <- getOption("decorate.contrasts")
     pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
     suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
     dec <- getOption("decorate.contr.Treatment")
     dec <- if (!contrasts) ""
            else if (is.null(dec)) "T."
            else dec
     contr.names <- paste(pre, dec, levs, suf, sep="")
     contr <- array(0, c(n, n), list(levs, contr.names))
     diag(contr) <- 1
     if (contrasts) {
         if (n < 2)
             stop(paste("Contrasts not defined for", n - 1, "degrees of 
freedom"))
         if (base < 1 | base > n)
             stop("Baseline group number out of range")
         contr <- contr[, -base, drop = FALSE]
     }
     contr
}

contr.Sum <- function (n, contrasts = TRUE)
{
     if (length(n) <= 1) {
         if (is.numeric(n) && length(n) == 1 && n > 1)
             levels <- 1:n
         else stop("Not enough degrees of freedom to define contrasts")
     }
     else levels <- n
     lenglev <- length(levels)
     lev.opt <- getOption("decorate.contrasts")
     pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
     suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
     dec <- getOption("decorate.contr.Sum")
     dec <- if (!contrasts) ""
            else if (is.null(dec)) "S."
            else dec
     show.lev <- getOption("contr.Sum.show.levels")
     contr.names <- if ((is.null(show.lev)) || show.lev) paste(pre, dec, 
levels, suf, sep="")
     if (contrasts) {
         cont <- array(0, c(lenglev, lenglev - 1), list(levels,
             contr.names[-lenglev]))
         cont[col(cont) == row(cont)] <- 1
         cont[lenglev, ] <- -1
     }
     else {
         cont <- array(0, c(lenglev, lenglev), list(levels,
             contr.names))
         cont[col(cont) == row(cont)] <- 1
     }
     cont
}

contr.Helmert <- function (n, contrasts = TRUE)
{
     if (length(n) <= 1) {
         if (is.numeric(n) && length(n) == 1 && n > 1)
             levels <- 1:n
         else stop("contrasts are not defined for 0 degrees of freedom")
     }
     else levels <- n
     lenglev <- length(levels)
     lev.opt <- getOption("decorate.contrasts")
     pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
     suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
     dec <- getOption("decorate.contr.Helmert")
     dec <- if (!contrasts) ""
            else if (is.null(dec)) "H."
            else dec
     nms <- if (contrasts) 1:lenglev else levels
     contr.names <- paste(pre, dec, nms, suf, sep="")
     if (contrasts) {
         cont <- array(-1, c(lenglev, lenglev - 1), list(levels,
             contr.names[-lenglev]))
         cont[col(cont) <= row(cont) - 2] <- 0
         cont[col(cont) == row(cont) - 1] <- 1:(lenglev - 1)
     }
     else {
         cont <- array(0, c(lenglev, lenglev), list(levels, contr.names))
         cont[col(cont) == row(cont)] <- 1
     }
     cont
}

------------------------------- Contrasts.Rd 
------------------------------------------

\name{Contrasts}
\alias{Contrasts}
\alias{contr.Treatment}
\alias{contr.Sum}
\alias{contr.Helmert}

\title{Functions to Construct Contrasts}
\description{
     These are substitutes for similarly named functions in the base package
     (note the uppercase letter starting the second word in each function 
name).
     The only difference is that the contrast functions from the car package
     produce easier-to-read names for the contrasts when they are used in 
statistical models.

     The functions and this documentation are adapted from the base package.
     }

\usage{
contr.Treatment(n, base = 1, contrasts = TRUE)

contr.Sum(n, contrasts = TRUE)

contr.Helmert(n, contrasts = TRUE)
}

\arguments{
   \item{n}{a vector of levels for a factor, or the number of levels.}
   \item{base}{an integer specifying which level is considered the baseline 
level.
     Ignored if \code{contrasts} is \code{FALSE}.}
   \item{contrasts}{a logical indicating whether contrasts should be computed.}
}

\details{
     These functions are used for creating contrast matrices for use in 
fitting analysis of variance and regression models.
     The columns of the resulting matrices contain contrasts which can be 
used for coding a factor with \code{n} levels.
     The returned value contains the computed contrasts. If the argument 
\code{contrasts} is \code{FALSE} then a square matrix is returned.

     Several aspects of these contrast functions are controlled by options 
set via the \code{options} command:
     \describe{
         \item{\code{decorate.contrasts}}{This option should be set to a 
2-element character vector containing the prefix and suffix
             characters to surround contrast names. If the option is not 
set, then \code{c("[", "]")} is used. For example, setting
             \code{options(decorate.contrasts=c(".", ""))} produces 
contrast names that are separated from factor names by a period.
             Setting \code{options(decorate.contrasts=c("", ""))} 
reproduces the behaviour of the R base contrast functions.}
         \item{\code{decorate.contr.Treatment}}{A character string to be 
appended to contrast names to signify treatment contrasts;
             if the option is unset, then \code{"T."} is used.}
         \item{\code{decorate.contr.Sum}}{Similar to the above, with 
default \code{"S."}.}
         \item{\code{decorate.contr.Helmert}}{Similar to the above, with 
default \code{"H."}.}
         \item{\code{contr.Sum.show.levels}}{Logical value: if \code{TRUE} 
(the default if unset),
             then level names are used for contrasts; if \code{FALSE}, then 
numbers are used, as in \code{contr.sum}
             in the \code{base} package.}
         }

     Note that there is no replacement for \code{contr.poly} in the 
\code{base} package (which produces
     orthogonal-polynomial contrasts) since this function already 
constructs easy-to-read contrast names.
}

\value{
     A matrix with \code{n} rows and \code{k} columns, with \code{k = n - 
1} if \code{contrasts} is \code{TRUE}
     and \code{k = n} if \code{contrasts} is \code{FALSE}.
}

\author{John Fox \email{jfox at mcmaster.ca}}

\seealso{\code{\link[base]{contr.treatment}}, \code{\link[base]{contr.sum}},
   \code{\link[base]{contr.helmert}}, \code{\link[base]{contr.poly}} }

\examples{
# contr.Treatment vs. contr.treatment in the base package:

data(Prestige)
lm(prestige ~ (income + education)*type, data=Prestige,
     contrasts=list(type="contr.Treatment"))

##  Call:
##  lm(formula = prestige ~ (income + education) * type, data = Prestige,
##      contrasts = list(type = "contr.Treatment"))
##
##  Coefficients:
##          (Intercept)                  income               education
##              2.275753                0.003522                1.713275
##          type[T.prof]              type[T.wc]     income:type[T.prof]
##              15.351896              -33.536652               -0.002903
##      income:type[T.wc]  education:type[T.prof]    education:type[T.wc]
##              -0.002072                1.387809                4.290875

lm(prestige ~ (income + education)*type, data=Prestige,
     contrasts=list(type="contr.treatment"))

##  Call:
##  lm(formula = prestige ~ (income + education) * type, data = Prestige,
##      contrasts = list(type = "contr.treatment"))
##
##  Coefficients:
##      (Intercept)              income           education
##          2.275753            0.003522            1.713275
##          typeprof              typewc     income:typeprof
##          15.351896          -33.536652           -0.002903
##      income:typewc  education:typeprof    education:typewc
##          -0.002072            1.387809            4.290875
}

\keyword{models}
\keyword{regression}

-------------------------------------------------------------------------------------------------------------------------------------------------

At 04:37 PM 2/14/2003 +1100, j+rhelp at howard.fm wrote:
>This is my first post to this list so I suppose a quick intro is in
>order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days,
>and love S already. I'm reading MASS and also John Fox's book - both have
>been very useful. My background in stat software was mainly SPSS (which
>I've never much liked - thanks heavens I've found S!), and Perl is my
>tool of choice for general-purpose programming (I chaired the
>perl6-language-data working group, responsible for improving the data
>analysis capabilities in Perl).
>
>I have just completed my first S project, and I now have 8 lm.objects.
>The models are all reasonably complex with multiple numeric and factor
>variables and some 2-way and 3-way interactions. I now need to use these
>models in other environments, such as C code, SQL functions (using CASE)
>and in Perl - I can not work out how to do this.
>
>The difficulty I am having is that the output of coef() is not really
>parsable, since there is no marker in the name of an coefficient of
>separate out the components. For instance, in SPSS the name of a
>coefficient might be:
>
>   var1=[a]*var2=[b]*var3
>
>...which is easy to write a little script to pull that apart and turn it
>into a line of SQL, C, or whatever. In S however the name looks like:
>
>   var1avar2bvar3
>
>...which provides no way to pull the bits apart.
>
>So my question is, how do I export an lm.object in some form that I can
>then apply to prediction in C, SQL, or some other language? All I'm
>looking for is some well-structured textual or data frame output that I
>can then manipulate with appropriate tools, whether it be S itself, or
>something like Perl.
>
>Thanks in advance for any suggestions (and apologies in advance if this
>is well documented somewhere!),