[R] A comment about R:
rb.glists at gmail.com
Thu Jan 5 13:05:27 CET 2006
As someone who has been using Stata for a while now (and I started without a programming background), I recently had to
move to R because of the rich spatial packages. Here is my 0.001 cent to this thread.
-----------------WHAT I LOVE ABOUT STATA--------------------------
a) Total control
In Stata I feel like I had TOTAL CONTROL. I put my data in a directory, I can look at it, generate new variables
(columns), reshape, collapse, and expand my data, and all the while I use the list command [list (my variables) in
1/10)] over and over again to make sure I am doing what I want. List is probably my favorite Stata command.
As far as I am concerned Stata, has three main types of files
1. The data file (*.dta) which is my "spread sheet" in which I have my variables (columns, vectors or whatever you want
to call it)
2. The do file (*.do) which is my set of commands for a particular analysis
3. The log file (*.log) (which is text of smcl output from my do file)
Just looking at the extensions in any given directory, I would know what is what and I am able to organise my project,
(infact I put the three types in different sub-directories but work in one main project directory). Some have said R
allows you to think through your analysis, well, I can swear that Stata has brought the same discipline to me. Key
questions I always ask myself
- what peculiarities are there about my data (do I have unique observations...1 record per household, or multiple
records and what does this mean for my analysis...do I need to collapse it, or reshape it).
- what do I want to do (write down a few lines of what I want to do and expected output)
c) Ease of use
I feel that most of Stata's commands were intuitively named and I find it easy to use (a choice of the GUI, command
prompt, or the dofile editor)
-----------------MY FIRST 30 DAYS WITH R--------------------------
Moving to R was a totally different experience, and in part its the whole concept of objects (and I still dont get them
:-) ). My first assignment was to get the R equivalents of the three files as well as my main Stata commands (and
frankly, the only one that is clear now is the script which is R's equivalent of the Stata do file).
A few have asked about the relevance of reproducing Stata (or SAS for that matter) commands in R. Well someone correctly
pointed out that the challenge is in the mind set. Stata users have a Stata mindset so by being able to reproduce some
basic work done in Stata in R, you are many steps closer to understanding the workings of R.
So yes I did whine in the first few weeks about how hard R is. Some have attributed the whining about R to laziness...I
disagree, the learning curve is simply steep. I there salute Roger Bivand's effort to reproduce the example on the Stata
website and I second efforts by others to do this for other programs.
Now dont get me wrong, I am not ungrateful for the tons of material make freely available by the R community (top on
this list being R itself), however, most of this material is terse and most of the time I have had to go over it a few
times (and may still not get it).
But even more, I am yet to find material dedicated to basic data management (indeed bits of data management are dropped
here and there in the manuals and online material) however, a dedicated book (which I would gladly buy) is lacking.
In this same thread there has been discussion on splitting the R-Help list. I have reservations about this (we had the
same discussion on the Stata list and the consensus was to maintain the status quo). Geographically splitting the list
simply reinforces the inequalities birthed out of the original development of R. Some countries or regions are bound to
have more exciting lists thanks to the initial distribution of resource persons. Sending the beginners to their own list
is nothing short of crippling them (let the one eyed lead the blind....hmmm....bad idea). Not only will it cripple your
thinking, but it can instill bad prgramming practices that may be hard to drop. I look back at the Stata stuff I wrote 6
years ago and I am ashamed by how much real estate I wasted writing line onto line that could be cut down in less than
1/10th. How did I learn...well, I passively and faithfully read each email that was posted and saved in my scrap book
elegant bits of code.
Finally, I have been on the Statalist for close to six years and we do get our fair share of "homework type" questions
and people get told off (though not with the frequency and "harshness" of this list). Infact some one once whined about
a rude reply he got from his posting and someone wrote to inform him that there were much harsher lists adding that
R-Help list is not for the faint hearted (two reasons, one being that the typical posting may sound like rocket science
to most, the other being that there is very little tolerance for those who fail to adhere to the posting guide). May be
this is a good thing because it forces people to think twice (100 times for me) before posting, but on the downside,
this could traumatize a poor soul and put him/her off R all together (but then you may say....this is not a Church nor
is it Dr Phils show and we are not in the business of making you feel good. Well....R is open source and the notion of
strength in numbers certainly holds). It is not hard to see who is posting a cry for help for the first time (my first
subject line was mayday mayday and I was told off :-), ofcourse now I get it ). My approach is usually to help such a
one but point them towards the posting guide (hopefully, they dont make the same mistake again and yet they dont feel
like big "fools")
This thread was birthed out of the Micheal Mitchells article (I have read his book as well as the great amounts of
helpful material he has made available on his website). The key questions asked as a result of his article were
- Was he praising with damnation or damning with Praise?
- Did what he posted about R hold water? If so, what can be made better?
From the emails posted so far, the jury is still out on these questions and I am enjoying the discussion.
More information about the R-help