[R] defining a formula method for a weighted lm()

Thu Jan 6 15:33:25 CET 2011

No one replied to this, so I'll try again, with a simple example.  I 
calculate a set of log odds ratios, and turn them into a data frame as 
follows:

 > library(vcdExtra)
 > (lor.CM <- loddsratio(CoalMiners))
log odds ratios for Wheeze and Breathlessness by Age

    25-29    30-34    35-39    40-44    45-49    50-54    55-59    60-64
3.695261 3.398339 3.140658 3.014687 2.782049 2.926395 2.440571 2.637954
 >
 > (lor.CM.df <- as.data.frame(lor.CM))
   Wheeze Breathlessness   Age      LOR        ASE
1  W:NoW          B:NoB 25-29 3.695261 0.16471778
2  W:NoW          B:NoB 30-34 3.398339 0.07733658
3  W:NoW          B:NoB 35-39 3.140658 0.03341311
4  W:NoW          B:NoB 40-44 3.014687 0.02866111
5  W:NoW          B:NoB 45-49 2.782049 0.01875164
6  W:NoW          B:NoB 50-54 2.926395 0.01585918
7  W:NoW          B:NoB 55-59 2.440571 0.01452057
8  W:NoW          B:NoB 60-64 2.637954 0.02159903

Now I want to fit a linear model by WLS, LOR ~ Age, which can do directly as

 > lm(LOR ~ as.numeric(Age), weights=1/ASE, data=lor.CM.df)

Call:
lm(formula = LOR ~ as.numeric(Age), data = lor.CM.df, weights = 1/ASE)

Coefficients:
     (Intercept)  as.numeric(Age)
          3.5850          -0.1376

But, I want to do the fitting in my own function, the simplest version is

my.lm <- function(formula, data, subset, weights) {
	lm(formula, data, subset, weights)
}

But there is obviously some magic about formula objects and evaluation 
environments, because I don't understand why this doesn't work.

 > my.lm(LOR ~ as.numeric(Age), weights=1/ASE, data=lor.CM.df)
Error in model.frame.default(formula = formula, data = data, subset = 
subset,  :
   invalid type (closure) for variable '(weights)'
 >

A second question: Age is a factor, and as.numeric(Age) gives me 1:8.

What simple expression on lor.CM.df$Age would give me either the lower
limits (here: seq(25, 60, by = 5)) or midpoints of these Age intervals
(here: seq(27, 62, by = 5))?

best,
-Michael

On 12/16/2010 9:43 AM, Michael Friendly wrote:
> In the vcdExtra package on R-Forge, I have functions and generic methods
> for calculating log odds ratios
> for R x C x strata tables. I'd like to define methods for fitting
> weighted lm()s to the resulting loddsratio objects,
> but I'm having problems figuring out how to do this generally.
>
> # install.packages("vcdExtra", repos="http://R-Forge.R-Project.org")
> library(vcdExtra)
>
>  > fung.lor <- loddsratio(Fungicide)
>  > fung.lor
> log odds ratios for group and outcome by sex, strain
>
> strain
> sex 1 2
> M -1.596015 -0.8266786
> F -1.386294 -0.6317782
>  >
>  > fung.lor.df <- as.data.frame(fung.lor)
>  > fung.lor.df
> group outcome sex strain LOR ASE
> 1 Control:Treated Tumor:NoTumor M 1 -1.5960149 0.7394909
> 2 Control:Treated Tumor:NoTumor F 1 -1.3862944 0.9574271
> 3 Control:Treated Tumor:NoTumor M 2 -0.8266786 0.6587325
> 4 Control:Treated Tumor:NoTumor F 2 -0.6317782 1.1905545
>  >
>
> Now, I want to test whether the odds ratios differ by sex or strain, so
> I do a weighted lm()
>
>  > fung.mod <- lm(LOR ~ sex + strain, data=fung.lor.df, weights=1/ASE^2)
>  > anova(fung.mod)
> Analysis of Variance Table
>
> Response: LOR
> Df Sum Sq Mean Sq F value Pr(>F)
> sex 1 0.00744 0.00744 112.3 0.05990 .
> strain 1 0.84732 0.84732 12788.1 0.00563 **
> Residuals 1 0.00007 0.00007
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>  >
>
> I tried to write a generic formula method to do this, but I keep running
> into errors:
>
> lor <- function(x, ...)
> UseMethod("lor")
>
> lor.formula <- function(formula, data, subset, weights,
> model = TRUE, x = FALSE, y = FALSE,
> contrasts = NULL, ...) {
>
> data <- as.data.frame(data)
> if (missing(weights)) {
> if (! "ASE" %in% names(data)) stop("data does not contain an ASE column")
> data$weights <- 1/data$ASE^2
> }
> lm(formula, data, subset, weights=weights,
> model = model, x = x, y = y,
> contrasts = contrasts, ...)
> }
>
>  > lor(LOR ~ strain+sex, fung.lor)
> Error in xj[i] : invalid subscript type 'closure'
>  > lor(LOR ~ strain+sex, fung.lor.df)
> Error in xj[i] : invalid subscript type 'closure'
>  >
>  > traceback()
> 8: `[.data.frame`(list(LOR = c(-1.59601489210196, -1.38629436111989,
> -0.826678573184468, -0.631778234183653), strain = c(1L, 1L, 2L,
> 2L), sex = c(1L, 2L, 1L, 2L), `(weights)` = c(1.82866556836903,
> 1.09090909090909, 2.30452674897119, 0.705507123112907)), function (x,
> ...)
> UseMethod("subset"), , FALSE)
> 7: model.frame.default(formula = formula, data = data, subset = subset,
> weights = weights, drop.unused.levels = TRUE)
> 6: model.frame(formula = formula, data = data, subset = subset,
> weights = weights, drop.unused.levels = TRUE)
> 5: eval(expr, envir, enclos)
> 4: eval(mf, parent.frame())
> 3: lm(formula, data, subset, weights = weights, model = model, x = x,
> y = y, contrasts = contrasts, ...)
> 2: lor.formula(LOR ~ strain + sex, fung.lor.df)
> 1: lor(LOR ~ strain + sex, fung.lor.df)
>  >
>
> How can I make this work?
>
>
>

-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA