[R] What is your system for WorkFlow and Source Code Organizing in R ?
Jeff.Laake at noaa.gov
Sun Feb 21 18:24:35 CET 2010
On 2/20/2010 9:49 AM, Tal Galili wrote:
> Hello dear R users,
> Recently there has been several fascinating threads on the website
> stackoverflow<http://stackoverflow.com> regarding the subject of R WorkFlow
> best practices:
> - What best practices do you use for programming in
> - Workflow for statistical analysis and report
> - How to organize large R
> - Organizing R Source
> And although many people there gave very detailed answers, I have the
> feeling that there is much more wisdom on the subject that is still only
> available in this mailing list.
Thanks for this post as I was unaware of stackOverflow. Most of my work
is single user development with some projects with 2 or 3
collaborators. I'm a scientific analyst so my needs may be different
than others. To sum up my comments, I'm a big proponent of package
development for analysis and I've recently started incorporating
Sweave/Tex via Lyx into my publication/reporting.
> So I'll phrase a few questions, and I hope as many people will participate
> in sharing work styles tips:
> 1) What IDE do you use (and what features in it do you find are most
> important to you. besides syntax highlighting and indentation)
I'm using Tinn-R. Until recently I've only used it for syntax
highlighting but I'm learning that the project capability is quite
useful and I'm integrating that. However, a lot more could be done to
support package development.
> 2) Do you use a version control system? if so which, and how do you often
> use it?
Since I do all my work in packages as a single user, I simply zip up the
source directory and store in an archive directory. My own experience
using Source-Forge on a project has not left me impressed but maybe that
is my own lack of experience with it.
> 3) How (and when) do you document your code ?
I document each function (to some degree) as I go and then use those
comments and more in each package help file.
> 4) What guides you when you build/organize your folders, files (data, code
> and results) for a given project (when the project is small, medium and
I develop everything as a package unless it is a very small one time
only analysis. Thus I use the package structure. I can see from the
stackOverflow discussions that some use packages the way I do but in
general I think package development is viewed by most users as something
for CRAN submission only. They are missing out on a great system for
organizing projects which enables you to document data and code for
analysis which provides a simple system for sharing your analytical
techniques and analysis with others. If the data are included in or with
the package, another researcher can repeat your analysis and look at
each component. I use the example() functionality in the package man
file for the main script for the analysis. If I understood correctly
this handles the issue of using source() rather than functions for
reading, cleaning data etc that was discussed in one of the
stackOverflow threads. The script is contained in your example and your
true functions for analysis etc are in the package along with any data
(if appropriate). I'm not a naturally organized person and this gives
me the structure I need to stay organized.
> large) ?
> 5) Did I leave any important aspect of this subject out?
With regard to one of the comments on stackOverflow, I recently wrote a
small function to detach, build and reattach a package from within R. I
used to work on a package using a command window to build the package,
Tinn-R to edit functions and R to test. At a recent Seattle RUsers
meeting Joe Cheng suggested building the package from within Rgui, so I
wrote this function which gets rid of the need for the command window
and automates the detach and library() to reattach. I'd be interested
to know how others do this. What I'd like to see is for some way to
initiate that within Tinn-R. I understand it is possible in Eclipse.
# Detaches package if it is loaded; installs and builds package and
reloads the package
# pkg - name of your package
# pkg.dir - directory containing all of your package directories
# package source is assumed to be in pkg.dir/pkg/pkg
# I use this structure because this lets me keep binary
zip's and other files
# under pkg.dir/pkg. I also usually create
pkg.dir/pkg/archive and keep previous
# zipped source and binary versions in the archive
directory. If you don't want
# to create this nested structure use
setwd(file.path(pkg.dir)) below and it will
# assume that pkg.dir/pkg is the source package directory.
# You can modify the above defaults for your own use and in particular
# Value: None
> In this question I am (personally) less interested in how to style your
> code, or how to deal with doing R coding when several people are working on
> the project.
> p.s: If you know of older threads that dealt with these subjects (as I am
> sure there where several), sharing the link would be nice...
> Thanks for any insights shared,
> Contact me: Tal.Galili at gmail.com | 972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help