[Rd] Documentation examples for lm and glm

Fox, John jfox @ending from mcm@@ter@c@
Mon Dec 17 16:23:07 CET 2018


Dear Heinz,

  ----------------------------------------------
> On Dec 17, 2018, at 10:19 AM, Heinz Tuechler <tuechler using gmx.at> wrote:
> 
> Dear All,
> 
> do you think that use of a data argument is best practice in the example below?

No, but it is *normally* or *usually* the best option, in my opinion.

Best,
 John

> 
> regards,
> 
> Heinz
> 
> ### trivial example
> plotwithline <- function(x, y) {
>    plot(x, y)
>    abline(lm(y~x)) ## data argument?
> }
> 
> set.seed(25)
> df0 <- data.frame(x=rnorm(20), y=rnorm(20))
> 
> plotwithline(df0[['x']], df0[['y']])
> 
> 
> 
> Fox, John wrote/hat geschrieben on/am 17.12.2018 15:21:
>> Dear Martin,
>> 
>> I think that everyone agrees that it’s generally preferable to use the data argument to lm() and I have nothing significant to add to the substance of the discussion, but I think that it’s a mistake not to add to the current examples, for the following reasons:
>> 
>> (1) Relegating examples using the data argument to “see also” doesn’t suggest that using the argument is a best practice. Most users won’t bother to click the links.
>> 
>> (2) In my opinion, an new initial example using the data argument would more clearly suggest that this is the normally the best option.
>> 
>> (3) I think that it would also be desirable to add a remark to the explanation of the data argument, something like, “Although the argument is optional, it's generally preferable to specify it explicitly.” And similarly on the help page for glm().
>> 
>> My two (or three) cents.
>> 
>> John
>> 
>>  -------------------------------------------------
>>  John Fox, Professor Emeritus
>>  McMaster University
>>  Hamilton, Ontario, Canada
>>  Web: http::/socserv.mcmaster.ca/jfox
>> 
>>> On Dec 17, 2018, at 3:05 AM, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>>> 
>>>>>>>> David Hugh-Jones
>>>>>>>>   on Sat, 15 Dec 2018 08:47:28 +0100 writes:
>>> 
>>>> I would argue examples should encourage good
>>>> practice. Beginners ought to learn to keep data in data
>>>> frames and not to overuse attach().
>>> 
>>> Note there's no attach() there in any of these examples!
>>> 
>>>> otherwise at their own risk, but they have less need of
>>>> explicit examples.
>>> 
>>> The glm examples are nice in sofar they show both uses.
>>> 
>>> I agree the lm() example(s) are  "didactically misleading" by
>>> not using data frames at all.
>>> 
>>> I disagree that only data frame examples should be shown.
>>> If  lm()  is one of the first R functions a beginneR must use --
>>> because they are in a basic stats class, say --  it may be
>>> *better* didactically to focus on lm()  in the very first
>>> example, and use data frames in a next one ...
>>> .... and instead of next one, we have the pretty clear comment
>>> 
>>> ### less simple examples in "See Also" above
>>> 
>>> I'm not convinced (but you can try more) we should change those
>>> examples or add more there.
>>> 
>>> Martin
>>> 
>>>> On Fri, 14 Dec 2018 at 14:51, S Ellison
>>>> <S.Ellison using lgcgroup.com> wrote:
>>> 
>>>>> FWIW, before all the examples are changed to data frame
>>>>> variants, I think there's fairly good reason to have at
>>>>> least _one_ example that does _not_ place variables in a
>>>>> data frame.
>>>>> 
>>>>> The data argument in lm() is optional. And there is more
>>>>> than one way to manage data in a project. I personally
>>>>> don't much like lots of stray variables lurking about,
>>>>> but if those are the only variables out there and we can
>>>>> be sure they aren't affected by other code, it's hardly
>>>>> essential to create a data frame to hold something you
>>>>> already have.  Also, attach() is still part of R, for
>>>>> those folk who have a data frame but want to reference
>>>>> the contents across a wider range of functions without
>>>>> using with() a lot. lm() can reasonably omit the data
>>>>> argument there, too.
>>>>> 
>>>>> So while there are good reasons to use data frames, there
>>>>> are also good reasons to provide examples that don't.
>>>>> 
>>>>> Steve Ellison
>>>>> 
>>>>> 
>>>>>> -----Original Message----- > From: R-devel
>>>>> [mailto:r-devel-bounces using r-project.org] On Behalf Of Ben >
>>>>> Bolker > Sent: 13 December 2018 20:36 > To:
>>>>> r-devel using r-project.org > Subject: Re: [Rd] Documentation
>>>>> examples for lm and glm
>>>>>> 
>>>>>> 
>>>>>> Agree.  Or just create the data frame with those
>>>>> variables in it > directly ...
>>>>>> 
>>>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello,
>>>>>>> 
>>>>>>> something that has been on my mind for a decade or
>>>>> two has > > been the examples for lm() and glm(). They
>>>>> encourage poor style > > because of mismanagement of data
>>>>> frames. Also, having the > > variables in a data frame
>>>>> means that predict() > > is more likely to work properly.
>>>>>>> 
>>>>>>> For lm(), the variables should be put into a data
>>>>> frame.  > > As 2 vectors are assigned first in the
>>>>> general workspace they > > should be deleted afterwards.
>>>>>>> 
>>>>>>> For the glm(), the data frame d.AD is constructed but
>>>>> not used. Also, > > its 3 components were assigned first
>>>>> in the general workspace, so they > > float around
>>>>> dangerously afterwards like in the lm() example.
>>>>>>> 
>>>>>>> Rather than attached improved .Rd files here, they
>>>>> are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles > >
>>>>> You are welcome to use them!
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Thomas
>>>>>>> 
>>>>>>> ______________________________________________ > >
>>>>> R-devel using r-project.org mailing list > >
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>> 
>>>>>> ______________________________________________ >
>>>>> R-devel using r-project.org mailing list >
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>>>> 
>>>>> *******************************************************************
>>>>> This email and any attachments are confidential. Any
>>>>> u...{{dropped:12}}
>>> 
>>>> ______________________________________________
>>>> R-devel using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list