[R] defining a formula method for a weighted lm()
Michael Friendly
friendly at yorku.ca
Fri Jan 7 15:39:53 CET 2011
Thanks, Martin
Now I understand 'standard non-standard evaluation' magic, and the code in
http://developer.r-project.org/model-fitting-functions.txt
explains how this works.
Still, I can't help but think of this as evil-magic, for which some
anti-magic would be extremely useful, so that
a simple function like
my.lm <- function(formula, data, subset, weights, ...) {
lm(formula, data, subset, weights, ...)
}
would work as expected. Oh dear, I think I need some syntactic sugar in
my coffee!
-Michael
On 1/6/2011 2:16 PM, Martin Maechler wrote:
>>>>>> Michael Friendly<friendly at yorku.ca>
>>>>>> on Thu, 06 Jan 2011 09:33:25 -0500 writes:
> > No one replied to this, so I'll try again, with a simple example. I
> > calculate a set of log odds ratios, and turn them into a data frame as
> > follows:
>
> >> library(vcdExtra)
> >> (lor.CM<- loddsratio(CoalMiners))
> > log odds ratios for Wheeze and Breathlessness by Age
>
> > 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
> > 3.695261 3.398339 3.140658 3.014687 2.782049 2.926395 2.440571 2.637954
> >>
> >> (lor.CM.df<- as.data.frame(lor.CM))
> > Wheeze Breathlessness Age LOR ASE
> > 1 W:NoW B:NoB 25-29 3.695261 0.16471778
> > 2 W:NoW B:NoB 30-34 3.398339 0.07733658
> > 3 W:NoW B:NoB 35-39 3.140658 0.03341311
> > 4 W:NoW B:NoB 40-44 3.014687 0.02866111
> > 5 W:NoW B:NoB 45-49 2.782049 0.01875164
> > 6 W:NoW B:NoB 50-54 2.926395 0.01585918
> > 7 W:NoW B:NoB 55-59 2.440571 0.01452057
> > 8 W:NoW B:NoB 60-64 2.637954 0.02159903
>
> > Now I want to fit a linear model by WLS, LOR ~ Age, which can do directly as
>
> >> lm(LOR ~ as.numeric(Age), weights=1/ASE, data=lor.CM.df)
>
> > Call:
> > lm(formula = LOR ~ as.numeric(Age), data = lor.CM.df, weights = 1/ASE)
>
> > Coefficients:
> > (Intercept) as.numeric(Age)
> > 3.5850 -0.1376
>
> > But, I want to do the fitting in my own function, the simplest version is
>
> > my.lm<- function(formula, data, subset, weights) {
> > lm(formula, data, subset, weights)
> > }
>
> > But there is obviously some magic about formula objects and evaluation
> > environments, because I don't understand why this doesn't work.
>
>
> >> my.lm(LOR ~ as.numeric(Age), weights=1/ASE, data=lor.CM.df)
> > Error in model.frame.default(formula = formula, data = data, subset =
> > subset, :
> > invalid type (closure) for variable '(weights)'
> >>
>
> Yes, the "magic" has been called "standard non-standard evaluation"
> for a while (since August 2002, to be precise),
> and the http://developer.r-project.org/ web page has had two
> very relevant links since then, namely those mentioned in the
> following two lines there:
> ----------------------------
> # Description of the nonstandard evaluation rules in R 1.5.1 and some suggestions. (updated). Also an R function and docn for making model frames from multiple formulas.
>
> # Notes on model-fitting functions in R, and especially on how to enable all the safety features.
> ----------------------------
>
> For what you want, I think (but haven't tried) the second link, which is
> http://developer.r-project.org/model-fitting-functions.txt
> is still very relevant.
>
> Many many people (package authors) had to use something like
> that or just directly taken the lm function as an example..
> {{ but then probably failed the more subtle points on how to
> program residuals() , predict() , etc functions which you can
> also learn from model-fitting-functions.txt}}
>
>
> > A second question: Age is a factor, and as.numeric(Age) gives me 1:8.
>
> > What simple expression on lor.CM.df$Age would give me either the lower
> > limits (here: seq(25, 60, by = 5)) or midpoints of these Age intervals
> > (here: seq(27, 62, by = 5))?
>
> With
>
> data(CoalMiners, package = "vcd")
>
> here are some variations :
>
> > (Astr<- dimnames(CoalMiners)[[3]])
> [1] "25-29" "30-34" "35-39" "40-44" "45-49" "50-54" "55-59" "60-64"
> > sapply(lapply(strsplit(Astr, "-"), as.numeric), `[[`, 1)
> [1] 25 30 35 40 45 50 55 60
> > sapply(lapply(strsplit(Astr, "-"), as.numeric), `[[`, 2)
> [1] 29 34 39 44 49 54 59 64
> > sapply(lapply(strsplit(Astr, "-"), as.numeric), mean)
> [1] 27 32 37 42 47 52 57 62
>
> Or use the 2-row matrix and apply(*, 1) to that :
>
> > sapply(strsplit(Astr, "-"), as.numeric)
> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
> [1,] 25 30 35 40 45 50 55 60
> [2,] 29 34 39 44 49 54 59 64
>
> Regards,
> Martin Maechler, ETH Zurich
>
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list