[R] recommended combo of apps for new user?

Sun Aug 19 17:46:00 CEST 2007

Regarding RODBC vs. DBI-based packages (RSQLite, RMySQL, etc.) its
my perception, possibly mistaken, that apart from any consideration of
the R packages themselves, ODBC (which originated in the Windows world)
is more widely used on Windows than UNIX.  Also ODBC has the problem
that one must configure it which puts an extra step into the process.  Clear
documentation on how to do such ODBC configuration may be difficult to find.

On the other hand the RODBC package itself seems to be maintained
very well and is typically available for new versions of R before the
DBI-based packages.

On 8/19/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> Some additional comments on the DBMS front.
>
> (a) SPSS is not a DBMS, so it is not clear that you need this. But if you
> do and are storing valuable data in a DBMS a lot of further questions come
> into play, like how you are going to do backups.  I'd say PostgreSQL was
> really only for professional-level administrators.  My sysadmins recommend
> MySQL for most people.  We do also run PostgreSQL and they find it a lot
> trickier to maintain.
>
> 'dozens of columns and thousands of rows' is not big.  A data frame with
> 50 columns and 5000 rows would only take 2Mb to store, and R will easily
> handle 100x with 4GB of RAM (and if you have less, get 4GB).  So storing
> data in .rda (R's save() format) is most likely viable.  R's indexing etc
> operations make it good at data manipulation, and using a DBMS will
> involve learning SQL, a non-trivial cost.
>
> (b) You have a choice of interfaces to a DBMS, RODBC and the DBI+ family,
> e.g. DBI+RMySQL and DBI+RSQLite.  I'm biased, but I find RODBC more
> intuitive, and many people have reported it to be faster.  If all you want
> is non-permanent storage for manipulation of large data sets, consider
> also SQLiteDF.
>
> On Sat, 18 Aug 2007, Duncan Murdoch wrote:
>
> > Martin Brown wrote:
> >> [i sent this message earlier but apparently should have sent it plain
> >> text, as follows..]
> >>
> >> Hi there,
> >>
> >> I would like some advice, not so much about how to use R, but about
> >> software that I need to complement R.  I've rooted around in the FAQ's
> >> and done a few searches on this mailing list but haven't quite found
> >> the perspective I need.
> >>
> >> I am an experienced data analyst in my field (forest ecology and
> >> ecological monitoring) but new to R. I am a long time user of SPSS and
> >> have gotten pretty handy with it.  However, I am frustrated with SPSS
> >> for several reasons:  There's the cost (I'm a freelancer; I pay for my
> >> software myself);  the Windows dependence (I use Kubuntu as my usual
> >> OS now, and switching back and forth is a pain); the horrible
> >> inefficiency when I do certain types of file manipulations; and the
> >> inability to do the kind of publication-quality graphs I want... I've
> >> usually ended up using a commercial graphing program (another source
> >> of expense and limitation).
> >>
> >> I'd like to switch to using R on Kubuntu, for all those reasons.  In
> >> addition I think the mathematical formality that R encourages might be
> >> good for me.
> >>
> >> However, reviewing the FAQ's on the R project web site makes me
> >> realize that I've been using SPSS as three kinds of software really:
> >> a DBMS; a statistical analysis package; and a graphing package.  It
> >> looks like moving to R might involve learning three kinds of software,
> >> not just one.  I wonder:
> >>
> >> 1) What open-source DBMS works most seamlessly with R?  I have seen
> >> MySQL recommended but wonder if there are alternatives.  I sometimes
> >> need to handle big data files.  In fact a lot of my work involves
> >> exploratory and descriptive analyses of rather large and messy
> >> databases from ecological monitoring, rather than statistical tests
> >> per se.  In SPSS the data files I have been generating have dozens of
> >> columns and thousands of rows, often with value and variable labels
> >> helpful for documenting my work.
>
> See above.
>
> >
> > I think you won't find much difference in the R interface between MySQL,
> > PostgreSQL, or SQLite.  The choice should be made based on the qualities
> > of the database (and I don't know enough about the differences to give a
> > recommendaton.)
> >> 2) For the purpose of creating publication-quality graphs, do R users
> >> typically need to go outside of the R system? If so, what open-source
> >> programs would you all recommend?
> >>
> > R is great for this, but you might need to go outside for some
> > specialized stuff (e.g. medical imaging).
> >
> >> 3) Any other software I need to learn that would make my work in R
> >> more productive? (for example, a code editor).
> >
> > A lot of people are happy with ESS mode in Emacs.
> >
> > Duncan Murdoch
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>