[R] recommended combo of apps for new user?
ggrothendieck at gmail.com
Sun Aug 19 02:58:21 CEST 2007
On 8/18/07, Martin Brown <mjb2000 at gmail.com> wrote:
> Hi there,
> I would like some advice, not so much about how to use R, but about software
> that I need to complement R. I've rooted around in the FAQ's and done a few
> searches on this mailing list but haven't quite found the perspective I
> I am an experienced data analyst in my field (forest ecology and ecological
> monitoring) but new to R. I am a long time user of SPSS and have gotten
> pretty handy with it. However, I am frustrated with SPSS for several
> reasons: There's the cost (I'm a freelancer; I pay for my software
> myself); the Windows dependence (I use Kubuntu as my usual OS now, and
> switching back and forth is a pain); the horrible inefficiency when I do
> certain types of file manipulations; and the inability to do the kind of
> publication-quality graphs I want... I've usually ended up using a
> commercial graphing program (another source of expense and limitation).
> I'd like to switch to using R on Kubuntu, for all those reasons. In
> addition I think the mathematical formality that R encourages might be good
> for me.
>From a strictly language perspective, mathematical formality is pretty
far from R. Its actually quite loose. Underneath there are some Lisp/Scheme
ideas but you are not very close to that as a user.
> However, reviewing the FAQ's on the R project web site makes me realize that
> I've been using SPSS as three kinds of software really: a DBMS; a
> statistical analysis package; and a graphing package. It looks like moving
> to R might involve learning three kinds of software, not just one. I
> 1) What open-source DBMS works most seamlessly with R? I have seen MySQL
> recommended but wonder if there are alternatives. I sometimes need to
> handle big data files. In fact a lot of my work involves exploratory and
> descriptive analyses of rather large and messy databases from ecological
> monitoring, rather than statistical tests per se. In SPSS the data files I
> have been generating have dozens of columns and thousands of rows, often
> with value and variable labels helpful for documenting my work.
Databases. SQLite is the easiest to install since its embedded rather
than client/server so I would use that unless your application requires
client/server or other features of MySQL. MySQL is probably the most
popular of the free data bases so that would be the next one to go with.
If you intend to create a commercial application you might want to
consider Postgres instead of MySQL as the latter charges for
commercial implementations but Postgres does not. Some heavy
Postgres users might feel that it should be considered after SQLite
rather than MySQL and there is a certain amount of arbitrariness here.
See the R packages RSQLite, RMySQL and DBI. The R packages sqldf and
SQLiteDF are beginning to blur the boundary between R and the database.
> 2) For the purpose of creating publication-quality graphs, do R users
> typically need to go outside of the R system? If so, what open-source
> programs would you all recommend?
Graphics. R should be ok. Check out:
and also google for
R Graphics Gallery
> 3) Any other software I need to learn that would make my work in R more
> productive? (for example, a code editor).
Other. You need to know a text editor. I use vim but there are
many good choices here with ESS being one that is often mentioned.
If you intend to write C routines to run with R then, of course, you
need to know C.
For certain R packages that interface with outside software (tcltk, Rgraphviz,
Ryacas, XML, etc.) you will need to know something about the interfaced-to
software if you intend to use those packages.
For package development you will need to know latex and possibly subversion,
i.e. svn, the UNIX screen program, tar and various other UNIX commands.
Certain auxilliary programs that come with and are used with R are written
in perl although its unlikely you will need to know it.
More information about the R-help