[R] recommended combo of apps for new user?

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Aug 19 07:28:23 CEST 2007

Some additional comments on the DBMS front.

(a) SPSS is not a DBMS, so it is not clear that you need this. But if you 
do and are storing valuable data in a DBMS a lot of further questions come 
into play, like how you are going to do backups.  I'd say PostgreSQL was 
really only for professional-level administrators.  My sysadmins recommend 
MySQL for most people.  We do also run PostgreSQL and they find it a lot 
trickier to maintain.

'dozens of columns and thousands of rows' is not big.  A data frame with 
50 columns and 5000 rows would only take 2Mb to store, and R will easily 
handle 100x with 4GB of RAM (and if you have less, get 4GB).  So storing 
data in .rda (R's save() format) is most likely viable.  R's indexing etc 
operations make it good at data manipulation, and using a DBMS will 
involve learning SQL, a non-trivial cost.

(b) You have a choice of interfaces to a DBMS, RODBC and the DBI+ family, 
e.g. DBI+RMySQL and DBI+RSQLite.  I'm biased, but I find RODBC more 
intuitive, and many people have reported it to be faster.  If all you want 
is non-permanent storage for manipulation of large data sets, consider 
also SQLiteDF.

On Sat, 18 Aug 2007, Duncan Murdoch wrote:

> Martin Brown wrote:
>> [i sent this message earlier but apparently should have sent it plain
>> text, as follows..]
>> Hi there,
>> I would like some advice, not so much about how to use R, but about
>> software that I need to complement R.  I've rooted around in the FAQ's
>> and done a few searches on this mailing list but haven't quite found
>> the perspective I need.
>> I am an experienced data analyst in my field (forest ecology and
>> ecological monitoring) but new to R. I am a long time user of SPSS and
>> have gotten pretty handy with it.  However, I am frustrated with SPSS
>> for several reasons:  There's the cost (I'm a freelancer; I pay for my
>> software myself);  the Windows dependence (I use Kubuntu as my usual
>> OS now, and switching back and forth is a pain); the horrible
>> inefficiency when I do certain types of file manipulations; and the
>> inability to do the kind of publication-quality graphs I want... I've
>> usually ended up using a commercial graphing program (another source
>> of expense and limitation).
>> I'd like to switch to using R on Kubuntu, for all those reasons.  In
>> addition I think the mathematical formality that R encourages might be
>> good for me.
>> However, reviewing the FAQ's on the R project web site makes me
>> realize that I've been using SPSS as three kinds of software really:
>> a DBMS; a statistical analysis package; and a graphing package.  It
>> looks like moving to R might involve learning three kinds of software,
>> not just one.  I wonder:
>> 1) What open-source DBMS works most seamlessly with R?  I have seen
>> MySQL recommended but wonder if there are alternatives.  I sometimes
>> need to handle big data files.  In fact a lot of my work involves
>> exploratory and descriptive analyses of rather large and messy
>> databases from ecological monitoring, rather than statistical tests
>> per se.  In SPSS the data files I have been generating have dozens of
>> columns and thousands of rows, often with value and variable labels
>> helpful for documenting my work.

See above.

> I think you won't find much difference in the R interface between MySQL,
> PostgreSQL, or SQLite.  The choice should be made based on the qualities
> of the database (and I don't know enough about the differences to give a
> recommendaton.)
>> 2) For the purpose of creating publication-quality graphs, do R users
>> typically need to go outside of the R system? If so, what open-source
>> programs would you all recommend?
> R is great for this, but you might need to go outside for some
> specialized stuff (e.g. medical imaging).
>> 3) Any other software I need to learn that would make my work in R
>> more productive? (for example, a code editor).
> A lot of people are happy with ESS mode in Emacs.
> Duncan Murdoch
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list