[Rd] Documentation examples for lm and glm

Fox, John jfox @ending from mcm@@ter@c@
Mon Dec 17 15:21:33 CET 2018


Dear Martin,

I think that everyone agrees that it’s generally preferable to use the data argument to lm() and I have nothing significant to add to the substance of the discussion, but I think that it’s a mistake not to add to the current examples, for the following reasons:

(1) Relegating examples using the data argument to “see also” doesn’t suggest that using the argument is a best practice. Most users won’t bother to click the links.

(2) In my opinion, an new initial example using the data argument would more clearly suggest that this is the normally the best option.

(3) I think that it would also be desirable to add a remark to the explanation of the data argument, something like, “Although the argument is optional, it's generally preferable to specify it explicitly.” And similarly on the help page for glm().

My two (or three) cents.

John

  -------------------------------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Dec 17, 2018, at 3:05 AM, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
> 
>>>>>> David Hugh-Jones 
>>>>>>    on Sat, 15 Dec 2018 08:47:28 +0100 writes:
> 
>> I would argue examples should encourage good
>> practice. Beginners ought to learn to keep data in data
>> frames and not to overuse attach(). 
> 
> Note there's no attach() there in any of these examples!
> 
>> otherwise at their own risk, but they have less need of
>> explicit examples.
> 
> The glm examples are nice in sofar they show both uses.
> 
> I agree the lm() example(s) are  "didactically misleading" by
> not using data frames at all.
> 
> I disagree that only data frame examples should be shown.
> If  lm()  is one of the first R functions a beginneR must use --
> because they are in a basic stats class, say --  it may be
> *better* didactically to focus on lm()  in the very first
> example, and use data frames in a next one ...
> .... and instead of next one, we have the pretty clear comment
> 
>  ### less simple examples in "See Also" above
> 
> I'm not convinced (but you can try more) we should change those
> examples or add more there.
> 
> Martin
> 
>> On Fri, 14 Dec 2018 at 14:51, S Ellison
>> <S.Ellison using lgcgroup.com> wrote:
> 
>>> FWIW, before all the examples are changed to data frame
>>> variants, I think there's fairly good reason to have at
>>> least _one_ example that does _not_ place variables in a
>>> data frame.
>>> 
>>> The data argument in lm() is optional. And there is more
>>> than one way to manage data in a project. I personally
>>> don't much like lots of stray variables lurking about,
>>> but if those are the only variables out there and we can
>>> be sure they aren't affected by other code, it's hardly
>>> essential to create a data frame to hold something you
>>> already have.  Also, attach() is still part of R, for
>>> those folk who have a data frame but want to reference
>>> the contents across a wider range of functions without
>>> using with() a lot. lm() can reasonably omit the data
>>> argument there, too.
>>> 
>>> So while there are good reasons to use data frames, there
>>> are also good reasons to provide examples that don't.
>>> 
>>> Steve Ellison
>>> 
>>> 
>>>> -----Original Message----- > From: R-devel
>>> [mailto:r-devel-bounces using r-project.org] On Behalf Of Ben >
>>> Bolker > Sent: 13 December 2018 20:36 > To:
>>> r-devel using r-project.org > Subject: Re: [Rd] Documentation
>>> examples for lm and glm
>>>> 
>>>> 
>>>> Agree.  Or just create the data frame with those
>>> variables in it > directly ...
>>>> 
>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello,
>>>>> 
>>>>> something that has been on my mind for a decade or
>>> two has > > been the examples for lm() and glm(). They
>>> encourage poor style > > because of mismanagement of data
>>> frames. Also, having the > > variables in a data frame
>>> means that predict() > > is more likely to work properly.
>>>>> 
>>>>> For lm(), the variables should be put into a data
>>> frame.  > > As 2 vectors are assigned first in the
>>> general workspace they > > should be deleted afterwards.
>>>>> 
>>>>> For the glm(), the data frame d.AD is constructed but
>>> not used. Also, > > its 3 components were assigned first
>>> in the general workspace, so they > > float around
>>> dangerously afterwards like in the lm() example.
>>>>> 
>>>>> Rather than attached improved .Rd files here, they
>>> are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles > >
>>> You are welcome to use them!
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Thomas
>>>>> 
>>>>> ______________________________________________ > >
>>> R-devel using r-project.org mailing list > >
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>>> ______________________________________________ >
>>> R-devel using r-project.org mailing list >
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> 
>>> *******************************************************************
>>> This email and any attachments are confidential. Any
>>> u...{{dropped:12}}
> 
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list