[R] Documenting data

Bert Gunter bgunter.4567 at gmail.com
Thu Jun 30 18:43:01 CEST 2016


Private, since this is a trivial comment. Also, just my opinion, so
feel free to ignore.

Capture it, yes, but not necessarily as a function; just as a script
might do, and the tools mentioned can do this. As others have said,
your instincts are good, and you should just choose the methods that
work best for you.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Jun 30, 2016 at 8:46 AM, Pito Salas <pitosalas at brandeis.edu> wrote:
> Thanks to you both. I think you’re saying/implying that once I “test drive” a particular bit of cleaning I should capture it in a function which does it reproducibly against the raw data, and that becomes the best documentation for it. That makes sense.
>
> Pito Salas
> Brandeis Computer Science
> Feldberg 131
>
>> On Jun 30, 2016, at 11:44 AM, Robert Baer <rbaer at atsu.edu> wrote:
>>
>> You might look at:
>>
>> http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets
>>
>> You might also, try the  FIle | Compile Notebook  from within R-Studio (https://www.rstudio.com/) on your well-documented R-scripts to get a nice reproducible recording/report of data analysis workflow.  Similar functionality is available from basic R, but involves more work.  There are many other approaches, but the best choice depends on your precise needs.
>>
>> And, as a programmer, you are probably already familiar with things like:
>> https://google.github.io/styleguide/Rguide.xml
>>
>>
>>
>> On 6/30/2016 9:51 AM, Pito Salas wrote:
>>> I am studying statistics and using R in doing it. I come from software development where we document everything we do.
>>>
>>> As I “massage” my data, adding columns to a frame, computing on other data, perhaps cleaning, I feel the need to document in detail what the meaning, or background, or calculations, or whatever of the data is. After all it is now derived from my raw data (which may have been well documented) but it is “new.”
>>>
>>> Is this a real problem? Is there a “best practice” to address this?
>>>
>>> Thanks!
>>>
>>> Pito Salas
>>> Brandeis Computer Science
>>> Feldberg 131
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list