[R] No LOGLM coefficients

Bill Venables wvenable at arcola.stats.adelaide.edu.au
Thu Apr 15 02:42:23 CEST 1999

>>>>> "Cor" == Cor en Aylin <Cor.Berrevoets at gironet.nl> writes:

    Cor> Dear R-helpers, Im trying to fit a Log-linear model on a
    Cor> dataset with bird counts from 60 sites over 14 years for
    Cor> 12 months each (factors for the models). One of the aims
    Cor> is to predict the missing values in this dataset with
    Cor> model predictions.  Ive first tried to work with GLM's,
    Cor> that worked fine except for models with one or more
    Cor> interaction-terms.  The GLMs run, run .. for hours.  So I
    Cor> switched to using LOGLM (MASS-library), and that worked
    Cor> swift.  The deviances were the same so that worked
    Cor> well.  The only problem is that my data contains both
    Cor> zero's (rarely) and quite a lot of NA's.  LOGLM doesn't
    Cor> report the params in these cases (the help file reports
    Cor> this) but I still want to use them for predicting my
    Cor> missing values.

    Cor> Does anybody have a suggestion to by-pass this problem ?

Yes, but you may not like it...

>>>>> "BDR" == Prof Brian D Ripley <ripley at stats.ox.ac.uk> writes:

    BDR> What you want is the fitted values, not the
    BDR> coefficients, I think.  You can get those, I believe,
    BDR> from loglm/loglin.  If not, you know who to bug: the
    BDR> author of loglm is WNV, although I made it work under R.

Gee, thanks a bunch Brian.  I was looking for something to do...

loglm (note: all lower case, even on that other OS) fits
log-linear models by iterative proportional scaling (IPS).  It
will give correct deviances and fitted values for all present
observations.  Structural zeros can be distingished from observed
zeros, too.  It will not automatically give fitted values for
missing observations, though, since the iterative scaling
algorithm provides no simple way to do this.  This is why no
predict.loglm function is provided, but one would not be all that
difficult to construct, in fact.  The idea is as follows:

Take the fitted values you have.  These have the the correct
multiplicative structure, so their logarithms have an exact (up
to iteration and round-off error) additive structure.  Fit the
corresponding ordinary unweighted linear model to these logged
values and the parameter estimates are the parameters you need to
predict the missing (or new) data.  Predict accordingly.  Bingo.

Well, nearly bingo.  What is lacking is any easy way to find
standard errors for such parameter estimates and hence for the
predicted values.  All the bits are there, but digging them out
is a bit of a chore, possibly as work-intensive as the very glm
procedure that loglm was intended to sidestep.

In summary, if all you want are predicted values, in principle
this is not too hard, but if you are a true statistician you
would never be happy with parameter estimates that do not have
at least some vague hint of error estimates attached.  That's why
I didn't bother writing it myself in the first place...

Added little note: Fitting the linear model looks as if you
would need to construct the same model matrix as the glm fitting
procedure would need, and if this were a large problem, as IPS
problems often are, it might pose a memory difficulty.  However
if you are really on the ball you could overcome this one by
using iterative *additive* scaling, aka Stephens' algorithm.  It
would be handy to have an IAS function in S+/R to handle large
linear models.  Does anyone have a spare long weekend coming up?
You could also accumulate the parameter estimates as the IPS
procedure progresses, but this would involve hacking into the C
routine and on S+ that is (I think) not part of the open code.

(There may be a note to JCGS here if anyone has the time and grit
to poke around.  It would make a good little Honours project.)

(The following contact details become official on 1 May 1999, but
the email address works now and may be used from now on.)
Bill Venables, Statistician, CMIS Environmetrics Project.

Physical address:                            Postal address:
CSIRO Marine Laboratories,                   PO Box 120,       
233 Middle St, Cleveland, Queensland         Cleveland, Qld, 4163
AUSTRALIA                                    AUSTRALIA

Telephone: +61 7 3826 7200     Email: Bill.Venables at cmis.csiro.au     
      Fax: +61 7 3826 7304
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list