[Rd] Documentation examples for lm and glm

Achim Zeileis Achim@Zeilei@ @ending from uibk@@c@@t
Mon Dec 17 00:26:46 CET 2018


On Sat, 15 Dec 2018, frederik using ofb.net wrote:

> I agree with Steve and Achim that we should keep some examples with no
> data frame. That's Objectively Simpler, whether or not it leads to
> clutter in the wrong hands. As Steve points out, we have attach()
> which is an excellent language feature - not to mention with().

Just for the record: Personally, I wouldn't recommend using lm() with 
attach() or with() but would always encourage using data= instead.

In my previous e-mail I just wanted to point out that a pragmatic step for 
the man page could be to keep one example without data= argument when 
adding examples with data=.

> I would go even further and say that the examples that are in lm() now
> should stay at the top. Because people may be used to referring to
> them, and also because Historical Order is generally a good order in
> which to learn things. However, if there is an important function
> argument ("data=") not in the examples, then we should add examples
> which use it. Likewise if there is a popular programming style
> (putting things in a data frame). So let's do something along the
> lines of what Thomas is requesting, but put it after the existing
> documentation? Please?
>
> On a bit of a tangent, I would like to see an example in lm() which
> plots my data with a fitted line through it. I'm probably betraying my
> ignorance here, but I was asked how to do this when showing R to a
> friend and I thought it should be in lm(), after all it seems a bit
> more basic than displaying a Normal Q-Q plot (whatever that is!
> gasp...). Similarly for glm(). Perhaps all this can be accomplished
> with merely doubling the size of the existing examples.
>
> Thanks.
>
> Frederick
>
> On Sat, Dec 15, 2018 at 02:15:52PM +0100, Achim Zeileis wrote:
>> A pragmatic solution could be to create a simple linear regression example 
>> with variables in the global environment and then another example with a 
>> data.frame.
>> 
>> The latter might be somewhat more complex, e.g., with several regressors 
>> and/or mixed categorical and numeric covariates to illustrate how 
>> regression and analysis of (co-)variance can be combined. I like to use 
>> MASS's whiteside data for this:
>> 
>> data("whiteside", package = "MASS")
>> m1 <- lm(Gas ~ Temp, data = whiteside)
>> m2 <- lm(Gas ~ Insul + Temp, data = whiteside)
>> m3 <- lm(Gas ~ Insul * Temp, data = whiteside)
>> anova(m1, m2, m3)
>> 
>> Moreover, some binary response data.frame with a few covariates might be a 
>> useful addition to "datasets". For example a more granular version of the 
>> "Titanic" data (in addition to the 4-way tabel ?Titanic). Or another 
>> relatively straightforward data set, popular in econometrics and social 
>> sciences is the "Mroz" data, see e.g., help("PSID1976", package = "AER").
>> 
>> I would be happy to help with these if such additions were considered for 
>> datasets/stats.
>> 
>> 
>> On Sat, 15 Dec 2018, David Hugh-Jones wrote:
>> 
>>> I would argue examples should encourage good practice. Beginners ought to
>>> learn to keep data in data frames and not to overuse attach(). Experts can
>>> do otherwise at their own risk, but they have less need of explicit
>>> examples.
>>> 
>>> On Fri, 14 Dec 2018 at 14:51, S Ellison <S.Ellison using lgcgroup.com> wrote:
>>> 
>>>> FWIW, before all the examples are changed to data frame variants, I think
>>>> there's fairly good reason to have at least _one_ example that does _not_
>>>> place variables in a data frame.
>>>> 
>>>> The data argument in lm() is optional. And there is more than one way to
>>>> manage data in a project. I personally don't much like lots of stray
>>>> variables lurking about, but if those are the only variables out there 
>>>> and
>>>> we can be sure they aren't affected by other code, it's hardly essential 
>>>> to
>>>> create a data frame to hold something you already have.
>>>> Also, attach() is still part of R, for those folk who have a data frame
>>>> but want to reference the contents across a wider range of functions
>>>> without using with() a lot. lm() can reasonably omit the data argument
>>>> there, too.
>>>> 
>>>> So while there are good reasons to use data frames, there are also good
>>>> reasons to provide examples that don't.
>>>> 
>>>> Steve Ellison
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: R-devel [mailto:r-devel-bounces using r-project.org] On Behalf Of Ben
>>>>> Bolker
>>>>> Sent: 13 December 2018 20:36
>>>>> To: r-devel using r-project.org
>>>>> Subject: Re: [Rd] Documentation examples for lm and glm
>>>>> 
>>>>>
>>>>>  Agree.  Or just create the data frame with those variables in it
>>>>> directly ...
>>>>> 
>>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote:
>>>>>> Hello,
>>>>>> 
>>>>>> something that has been on my mind for a decade or two has
>>>>>> been the examples for lm() and glm(). They encourage poor style
>>>>>> because of mismanagement of data frames. Also, having the
>>>>>> variables in a data frame means that predict()
>>>>>> is more likely to work properly.
>>>>>> 
>>>>>> For lm(), the variables should be put into a data frame.
>>>>>> As 2 vectors are assigned first in the general workspace they
>>>>>> should be deleted afterwards.
>>>>>> 
>>>>>> For the glm(), the data frame d.AD is constructed but not used. Also,
>>>>>> its 3 components were assigned first in the general workspace, so they
>>>>>> float around dangerously afterwards like in the lm() example.
>>>>>> 
>>>>>> Rather than attached improved .Rd files here, they are put at
>>>>>> www.stat.auckland.ac.nz/~yee/Rdfiles
>>>>>> You are welcome to use them!
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Thomas
>>>>>> 
>>>>>> ______________________________________________
>>>>>> R-devel using r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>>> 
>>>> *******************************************************************
>>>> This email and any attachments are confidential. Any u...{{dropped:12}}
>>> 
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>



More information about the R-devel mailing list