lm-residuals and NA {was [R] Can't understand error message :-{}

Martin Maechler maechler at stat.math.ethz.ch
Fri Mar 5 09:23:15 CET 1999

>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk> writes:

    PD> John Logsdon <j.logsdon at lancaster.ac.uk> writes:
    >> On 2 Mar 1999, Peter Dalgaard BSA wrote:
    >> > 
    >> > (1) Missing values in response and/or regressors cause cases to be
    >> > discarded.  > (2) Plotting which of the y's against which x's ?
    >> > 
    >> > plot(mschmod$residuals ~ size94[complete.cases(mavgres,crimesch, >
    >> socstat,povnojob,ploinc94,aa94,hisp94,minty94,mixed94)])
    >> > 
    >> > should do the trick. Or, simpler but sneakier:
    >> > 
    >> > attach(sizef[rownames(mschmod$model),]) > plot(residuals(mschmod)
    >> ~ size94) > detach()
    >> > 
    >> > It should also work with:
    >> > 
    >> > evalq(plot(residuals(mschmod) ~ size94),
    >> sizef[rownames(mschmod$model),])
    >> > 
    >> > (none of the above is tested, since I don't have your data of
    >> course)
    >> The problems of plotting residuals vs fitted data/covariates where
    >> there are NAs caught me out a little while ago.  Would it not be
    >> better if the fitting functions lm, glm etc and plot were
    >> consistent?  Thus either (a) plot() omitted cases in the X or the Y
    >> which were NA before checking for length consistency or (b)
    >> residuals() etc included NA in the appropriate places.

    PD> (a) won't work if you think closer about it.
yes, agreed.

    PD> (b) might. I wouldn't
    PD> be surprised if there's a rationale for the way things are now, but
    PD> I can't seem to reconstruct it. Well, there's space saving of
    PD> course, but given the waste in other areas, that is hardly a
    PD> crucial point.  Possibly, consistent behaviour of drop(), etc. has
    PD> something to do with it.

Werner Stahel (in our stat group) has been using hacked versions of  lm
and some hacked lm methods which exactly address this,
i.e. they follow the "b)"  approach;  however I think that it's still a
hack that only works in some (most used) cases.
One would have to change quite a few  lm/glm/... methods probably.

I do think it'd be a worthwhile route, though incompatible with S.

Would one want to have a global option() to toggle this behavior?
It looks dangerous and undesirable (a la octave ..) to have functions
return different results depending on options().
The "contrasts" case is a half step in that direction, and it has had all
kind of adverse consequences.  
Ideally, options() should only affect the way results are *displayed*, 
not the way they are computed (and stored). 

Other opinions?
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list