[R] Translating lm.object to SQL, C, etc function
John Fox
jfox at mcmaster.ca
Fri Feb 14 13:54:03 CET 2003
Dear Jeremy,
I've written replacements for the standard R contrast functions that
produce the kind of more easily parsed (and more readable) contrast names
that I think you have in mind. I intend to include these in the next
release of the car package for R but haven't done so yet. Since the code
isn't very long, I've appended it (and the .Rd documentation file to this
note). Note that R does separate terms in an interaction with a colon.
I hope that this does what you need.
John
---------------------------- Contrasts.R -----------------------------
# last modified 2 Dec 2002 by J. Fox
# all of these functions are adapted from functions in the R base package
contr.Treatment <- function (n, base = 1, contrasts = TRUE) {
if (is.numeric(n) && length(n) == 1)
levs <- 1:n
else {
levs <- n
n <- length(n)
}
lev.opt <- getOption("decorate.contrasts")
pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
dec <- getOption("decorate.contr.Treatment")
dec <- if (!contrasts) ""
else if (is.null(dec)) "T."
else dec
contr.names <- paste(pre, dec, levs, suf, sep="")
contr <- array(0, c(n, n), list(levs, contr.names))
diag(contr) <- 1
if (contrasts) {
if (n < 2)
stop(paste("Contrasts not defined for", n - 1, "degrees of
freedom"))
if (base < 1 | base > n)
stop("Baseline group number out of range")
contr <- contr[, -base, drop = FALSE]
}
contr
}
contr.Sum <- function (n, contrasts = TRUE)
{
if (length(n) <= 1) {
if (is.numeric(n) && length(n) == 1 && n > 1)
levels <- 1:n
else stop("Not enough degrees of freedom to define contrasts")
}
else levels <- n
lenglev <- length(levels)
lev.opt <- getOption("decorate.contrasts")
pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
dec <- getOption("decorate.contr.Sum")
dec <- if (!contrasts) ""
else if (is.null(dec)) "S."
else dec
show.lev <- getOption("contr.Sum.show.levels")
contr.names <- if ((is.null(show.lev)) || show.lev) paste(pre, dec,
levels, suf, sep="")
if (contrasts) {
cont <- array(0, c(lenglev, lenglev - 1), list(levels,
contr.names[-lenglev]))
cont[col(cont) == row(cont)] <- 1
cont[lenglev, ] <- -1
}
else {
cont <- array(0, c(lenglev, lenglev), list(levels,
contr.names))
cont[col(cont) == row(cont)] <- 1
}
cont
}
contr.Helmert <- function (n, contrasts = TRUE)
{
if (length(n) <= 1) {
if (is.numeric(n) && length(n) == 1 && n > 1)
levels <- 1:n
else stop("contrasts are not defined for 0 degrees of freedom")
}
else levels <- n
lenglev <- length(levels)
lev.opt <- getOption("decorate.contrasts")
pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
dec <- getOption("decorate.contr.Helmert")
dec <- if (!contrasts) ""
else if (is.null(dec)) "H."
else dec
nms <- if (contrasts) 1:lenglev else levels
contr.names <- paste(pre, dec, nms, suf, sep="")
if (contrasts) {
cont <- array(-1, c(lenglev, lenglev - 1), list(levels,
contr.names[-lenglev]))
cont[col(cont) <= row(cont) - 2] <- 0
cont[col(cont) == row(cont) - 1] <- 1:(lenglev - 1)
}
else {
cont <- array(0, c(lenglev, lenglev), list(levels, contr.names))
cont[col(cont) == row(cont)] <- 1
}
cont
}
------------------------------- Contrasts.Rd
------------------------------------------
\name{Contrasts}
\alias{Contrasts}
\alias{contr.Treatment}
\alias{contr.Sum}
\alias{contr.Helmert}
\title{Functions to Construct Contrasts}
\description{
These are substitutes for similarly named functions in the base package
(note the uppercase letter starting the second word in each function
name).
The only difference is that the contrast functions from the car package
produce easier-to-read names for the contrasts when they are used in
statistical models.
The functions and this documentation are adapted from the base package.
}
\usage{
contr.Treatment(n, base = 1, contrasts = TRUE)
contr.Sum(n, contrasts = TRUE)
contr.Helmert(n, contrasts = TRUE)
}
\arguments{
\item{n}{a vector of levels for a factor, or the number of levels.}
\item{base}{an integer specifying which level is considered the baseline
level.
Ignored if \code{contrasts} is \code{FALSE}.}
\item{contrasts}{a logical indicating whether contrasts should be computed.}
}
\details{
These functions are used for creating contrast matrices for use in
fitting analysis of variance and regression models.
The columns of the resulting matrices contain contrasts which can be
used for coding a factor with \code{n} levels.
The returned value contains the computed contrasts. If the argument
\code{contrasts} is \code{FALSE} then a square matrix is returned.
Several aspects of these contrast functions are controlled by options
set via the \code{options} command:
\describe{
\item{\code{decorate.contrasts}}{This option should be set to a
2-element character vector containing the prefix and suffix
characters to surround contrast names. If the option is not
set, then \code{c("[", "]")} is used. For example, setting
\code{options(decorate.contrasts=c(".", ""))} produces
contrast names that are separated from factor names by a period.
Setting \code{options(decorate.contrasts=c("", ""))}
reproduces the behaviour of the R base contrast functions.}
\item{\code{decorate.contr.Treatment}}{A character string to be
appended to contrast names to signify treatment contrasts;
if the option is unset, then \code{"T."} is used.}
\item{\code{decorate.contr.Sum}}{Similar to the above, with
default \code{"S."}.}
\item{\code{decorate.contr.Helmert}}{Similar to the above, with
default \code{"H."}.}
\item{\code{contr.Sum.show.levels}}{Logical value: if \code{TRUE}
(the default if unset),
then level names are used for contrasts; if \code{FALSE}, then
numbers are used, as in \code{contr.sum}
in the \code{base} package.}
}
Note that there is no replacement for \code{contr.poly} in the
\code{base} package (which produces
orthogonal-polynomial contrasts) since this function already
constructs easy-to-read contrast names.
}
\value{
A matrix with \code{n} rows and \code{k} columns, with \code{k = n -
1} if \code{contrasts} is \code{TRUE}
and \code{k = n} if \code{contrasts} is \code{FALSE}.
}
\author{John Fox \email{jfox at mcmaster.ca}}
\seealso{\code{\link[base]{contr.treatment}}, \code{\link[base]{contr.sum}},
\code{\link[base]{contr.helmert}}, \code{\link[base]{contr.poly}} }
\examples{
# contr.Treatment vs. contr.treatment in the base package:
data(Prestige)
lm(prestige ~ (income + education)*type, data=Prestige,
contrasts=list(type="contr.Treatment"))
## Call:
## lm(formula = prestige ~ (income + education) * type, data = Prestige,
## contrasts = list(type = "contr.Treatment"))
##
## Coefficients:
## (Intercept) income education
## 2.275753 0.003522 1.713275
## type[T.prof] type[T.wc] income:type[T.prof]
## 15.351896 -33.536652 -0.002903
## income:type[T.wc] education:type[T.prof] education:type[T.wc]
## -0.002072 1.387809 4.290875
lm(prestige ~ (income + education)*type, data=Prestige,
contrasts=list(type="contr.treatment"))
## Call:
## lm(formula = prestige ~ (income + education) * type, data = Prestige,
## contrasts = list(type = "contr.treatment"))
##
## Coefficients:
## (Intercept) income education
## 2.275753 0.003522 1.713275
## typeprof typewc income:typeprof
## 15.351896 -33.536652 -0.002903
## income:typewc education:typeprof education:typewc
## -0.002072 1.387809 4.290875
}
\keyword{models}
\keyword{regression}
-------------------------------------------------------------------------------------------------------------------------------------------------
At 04:37 PM 2/14/2003 +1100, j+rhelp at howard.fm wrote:
>This is my first post to this list so I suppose a quick intro is in
>order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days,
>and love S already. I'm reading MASS and also John Fox's book - both have
>been very useful. My background in stat software was mainly SPSS (which
>I've never much liked - thanks heavens I've found S!), and Perl is my
>tool of choice for general-purpose programming (I chaired the
>perl6-language-data working group, responsible for improving the data
>analysis capabilities in Perl).
>
>I have just completed my first S project, and I now have 8 lm.objects.
>The models are all reasonably complex with multiple numeric and factor
>variables and some 2-way and 3-way interactions. I now need to use these
>models in other environments, such as C code, SQL functions (using CASE)
>and in Perl - I can not work out how to do this.
>
>The difficulty I am having is that the output of coef() is not really
>parsable, since there is no marker in the name of an coefficient of
>separate out the components. For instance, in SPSS the name of a
>coefficient might be:
>
> var1=[a]*var2=[b]*var3
>
>...which is easy to write a little script to pull that apart and turn it
>into a line of SQL, C, or whatever. In S however the name looks like:
>
> var1avar2bvar3
>
>...which provides no way to pull the bits apart.
>
>So my question is, how do I export an lm.object in some form that I can
>then apply to prediction in C, SQL, or some other language? All I'm
>looking for is some well-structured textual or data frame output that I
>can then manipulate with appropriate tools, whether it be S itself, or
>something like Perl.
>
>Thanks in advance for any suggestions (and apologies in advance if this
>is well documented somewhere!),
More information about the R-help
mailing list