[Rd] Posting Guide

Sat Jun 7 17:07:18 CEST 2008

Might I suggest the following two additions:

For item (1), I suggest adding to the end of it something like  
"Consider attaching this output/data as a txt file if it is too  
large, or consider using one of the built in data sets (as produced  
e.g. by data() ) if they suffice to illustrate the problem."
I find it rather distracting to have to wade through pages and pages  
of the the output of dput before I can read the questions to be  
answered, and perhaps they are the kinds of questions that indeed can  
be answered without that output, in which case having it pasted  
straight into the text can be quite distracting. Unless we can at  
least convince them to append the output to the end, instead of the  
core of the message.

With regards to sessionInfo, I would consider it equally important,  
many times, to have the output of ls(), to make sure that functions  
etc are not masked by user defined global variables. But perhaps I'm  
alone in that? At least mention clearly that the code provided should  
be reproducible on a clean R workspace, or something like that?

I think creating this summary section to the posting guide is a great  
idea. The posting guide, though chock full with useful information on  
how to do a proper post, ends up having just way too much  
information, resulting, as experienced, in people not following it.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

On Jun 7, 2008, at 10:48 AM, hadley wickham wrote:

> Here's my attempt at making a little more friendly:
>
> Removed self-contained - implied by reproducible
> Used slightly less formal language (and you instead of the questioner)
> Fixed a couple of spelling mistakes
> Removed references to testing framework - I don't think that that term
> needs to be introduced
>
> -------
>
> For most questions, the main problem isn't answering the question, but
> understanding exactly what the question is, reproducing the problem  
> and
> checking the answer. To make easy for others to help you, you  
> should provide:
>
>  (1) reproducible, minimal code, and the data needed to run it.   
> That means
>      others can copy and paste from your email and see the same
>      output that you did.  An easy way to include data in an email  
> is to
>      include the output of dput(mydata)
>
>  (2) comments/explanations of what the code is supposed to do, and
>
>  (3) the version of R and the packages that you used, easily  
> produced by
>      sessionInfo().
>
> Without reproducible code, others have to spend a lot of time
> recreating the problem so that they can provide an answer that works.
> Do NOT assume the problem is so simple that it is not necessary.
>
> This can seem like a lot of work, but it often pays off by  
> revealing the
> solution without having to ask anyone else. Even if it doesn't,  
> your effort
> shows the list that you have tried to solve it yourself.
>
> It's also worthwhile spending some time writing a good subject line  
> that
> succinctly summarises your problem. This also helps others trying  
> to solve the
> same problem in the future as they can more easily locate relevant  
> messages.
>
> Hadley
>
> On Sat, Jun 7, 2008 at 8:38 AM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> Here is a second version of the summary.  Its been rearranged to
>> place most important info at top.  Also shortened it a bit.
>>
>> It still needs links to example posts, as suggested.  Anyone?
>>
>> Summary
>>
>> Surprisingly, the main problem for responders is not to answer the
>> posted questions but to quickly figure out what the question is,  
>> reproduce
>> it in their own R session and test their answer.
>>
>> Test Framework.  To faciliate that provide a test framework of:
>>
>>  (1) reproducible self-contained minimal code and data.  That means
>>      responders can copy it from the questioner's post and paste it
>>      into their session to see the same output without having to
>>      enter even one R command.
>>      NB. dput(mydata) produces mydata in reproducible form.
>>  (2) comments/explanations of what the code is intended to produce  
>> and
>>  (3) versions of all software used, e.g. sessionInfo().
>>
>> Without self-contained reproducible code the responder must not only
>> understand the question but must also create a test framework and  
>> that
>> typically takes more time than answering the question!  Its not fair
>> to ask the responder to provide all that on top of answering the
>> question.  Do NOT assume the problem is so simple that it is not
>> necessary.
>>
>> Effort. The effort taken to reduce the problem to its essentials and
>> produce a test framework often solves the problem avoiding the need
>> for a post in the first place.  It at the least shows that the
>> questioner tried to solve it themself.
>>
>> Subscribers.  The questioner should ensure that the thread is  
>> complete
>> and that it has an appropriate Subject.  The purpose of the post is
>> not only to help the questioner but also the other list subscribers
>> and those later searching the archives.
>>
>>
>>
>> On Fri, Jun 6, 2008 at 1:30 PM, Gabor Grothendieck
>> <ggrothendieck at gmail.com> wrote:
>>> People read the posting guide yet they are still unable to create  
>>> an acceptable
>>> post. e.g.
>>> https://stat.ethz.ch/pipermail/r-help/2008-June/164092.html
>>>
>>> I think the problem is that the guide is not clear or concise  
>>> enough.
>>> I suggest we add a summary at the beginning which gets to the heart
>>> of what a poster is expected to provide:
>>>
>>> Summary
>>>
>>> To maximize your change of getting a response when posting  
>>> provide (1)
>>> commented,
>>> (2) minimal, (3) self-contained and (4) reproducible code.  (This  
>>> one
>>> line summary
>>> also appears at the end of each message to r-help.)
>>>
>>> "Self-contained" and "reproducible" mean that a responder can  
>>> copy the
>>> questioner's code to
>>> the clipboard, paste it into their R session and see the same  
>>> problem
>>> you as the questioner
>>> see.  Note that dput(mydata) will display mydata in a  
>>> reproducible way.
>>> Self-contained and reproducible are needed because:
>>> (1) Self-Effort. It shows that the questioner tried to solve the
>>> problem by themself first.
>>> (2) Test framework. Often the responder needs to play with the  
>>> code a
>>> bit in order to respond
>>> or at least to give the best answer.  They can't do that without a
>>> test framework that includes
>>> the data and the code to run it and its not fair to ask them to not
>>> only answer the question but
>>> also to come up with test data and to complete incomplete code.
>>> (3) Archives. Questions and answers go into the archives so they are
>>> not only for the benefit of
>>> of the questioner but also for the benefit of all future  
>>> searchers of
>>> the archive.  That means
>>> that its not finished if you have solved the problem for yourself.
>>> You still need to ensure that
>>> the thread has a complete solution. (For that reason its also
>>> important to give a meaningful
>>> subject to each post.)
>>>
>>> "Commented" and "minimal" also reduce the time it takes to  
>>> understand
>>> the problem.
>>> Don't just dump your code as is into the message since you are just
>>> wasting your own
>>> time. Its not likely anyone will answer a message if the questioner
>>> has not taken the
>>> time to reduce it to its essential elements.  Surprisingly, quite
>>> often understanding what
>>> the problem is takes the responder most of the time -- not  
>>> solving the
>>> problem. Once the
>>> question is actually understood its often quite fast to answer.   
>>> Thus
>>> in addition to posting
>>> it in a minimal form, comment on it sufficiently so that the  
>>> responder
>>> knows what the code
>>> does and is intended to produce.  It may be obvious to the  
>>> questioner
>>> who is embroiled in
>>> the problem but that does not mean its obvious to others.
>>>
>>> Introduction
>>>
>>> .... rest of posting guide ...
>>>