[R] Translating lm.object to SQL, C, etc function
ripley@stats.ox.ac.uk
ripley at stats.ox.ac.uk
Fri Feb 14 09:07:03 CET 2003
The issue here is that coef() tells you the coefficients in R's internal
parametrization of the model, and that is of no use to you unless you have
a means of creating a model matrix in C, SQL or (heaven forbid) Perl. The
information needed to re-create a model matrix is stored in the lm fit,
but in ways that are going to be hard to use anywhere else (since they
include R functions). This is not perverse: what R does is very general,
*far* more so than SPSS. Formulae in lm can include poly() and ns()
terms, for example.
The only practical solution it seems to us is to ask R to create the model
matrix for new data. Then the things you are talking about are just the
colnames of that matrix, and don't need to be interpreted.
You may want to read the sources to find out how R does it: that area is
one of the most complex parts of the internals, and one in which bugs
continue to emerge.
On Fri, 14 Feb 2003 j+rhelp at howard.fm wrote:
> This is my first post to this list so I suppose a quick intro is in
> order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days,
> and love S already. I'm reading MASS and also John Fox's book - both have
> been very useful. My background in stat software was mainly SPSS (which
> I've never much liked - thanks heavens I've found S!), and Perl is my
> tool of choice for general-purpose programming (I chaired the
> perl6-language-data working group, responsible for improving the data
> analysis capabilities in Perl).
>
> I have just completed my first S project, and I now have 8 lm.objects.
> The models are all reasonably complex with multiple numeric and factor
> variables and some 2-way and 3-way interactions. I now need to use these
> models in other environments, such as C code, SQL functions (using CASE)
> and in Perl - I can not work out how to do this.
>
> The difficulty I am having is that the output of coef() is not really
> parsable, since there is no marker in the name of an coefficient of
> separate out the components. For instance, in SPSS the name of a
> coefficient might be:
>
> var1=[a]*var2=[b]*var3
>
> ...which is easy to write a little script to pull that apart and turn it
> into a line of SQL, C, or whatever. In S however the name looks like:
>
> var1avar2bvar3
>
> ...which provides no way to pull the bits apart.
I find that impossible to understand anyway, but doubt that it corresponds
to SPSS. For a variable V, label Va does not mean V=[a] except in unusual
special cases.
> So my question is, how do I export an lm.object in some form that I can
> then apply to prediction in C, SQL, or some other language? All I'm
> looking for is some well-structured textual or data frame output that I
> can then manipulate with appropriate tools, whether it be S itself, or
> something like Perl.
>
> Thanks in advance for any suggestions (and apologies in advance if this
> is well documented somewhere!),
>
> Jeremy
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list