[R] A comment about R:

Fri Jan 6 16:18:16 CET 2006

~~~~~~~~~~~~~~~
... blame me for not having sent below message initially in
plain text format. Sorry!
~~~~~~~~~~~~~~~

I just got into R for most of the Xmas vacations and was about to ask 
for helping  pointer on how to get a hold of R when I came across this 
thread. I've read through  most it and would like to comment from a 
novice user point of view. I've a strong  programming background but 
limited statistical experience and no knowledge on  competing packages. 
I'm working as a senior engineer in electronics.

Yes, the learning curve is steep. Most of the docu is extremely terse. 
Learning is mostly from examples (a wiki was proposed in another 
mail...), documentation uses no graphical elements at all. So, when it 
comes to things like xyplot in lattice: where would I get the concepts 
behind panels, superpanels, and the like?

ok., this is steep and terse, but after a while I'll get over it... 
That's life. The general concept is great, things can be expressed very 
densly: Potential  is here.... I quickly had 200 lines of my own code 
together, doing what it should -  or so I believed.

Next I did:
  matrix<-matrix(1:100, 10, 10)
  image(matrix)
  locator()
Great: I can interactively work with my graphs... But then:
  filled.contour(matrix)
  locator()
Oops - wrong coordinates returned. Bug. Apparently, locator() doen't 
realize that fitted.contour() has a color bar to the right and scales x 
wrongly...

Here is what really shocked me:

> str(bar) `data.frame':   206858 obs. of  12 variables:  ...
> str(mean(bar[,6:12]))
  Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
  ...
> str(sd(bar[,6:12]))
  Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
  ...
> prcomp(bar[,6:12])->foo
> str(foo$x)
  num [1:206858, 1:7] -0.4187 -0.4015  0.0218 -0.4438 -0.3650 ...
  ...
> str(mean(foo$x))
  num -1.07e-13
> str(sd(foo$x))
  Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
  ...

So, sd returns a vector independent on whether the arguement is a matrix 
or data.frame, but mean reacts differently and returns a vector only 
against a data.frame?

The problem here is not that this is difficult to learn - the problem is 
the complete absense of a concept. Is a data.frame an 'extended' matrix 
with columns of different types or  something different? Since the 
numeric mean (I expected a vector) is recycled nicely  when used in a 
vector context, this makes debugging code close to impossible. Since  sd 
returns a vector, things like mean + 4*sd vary sufficiently across the 
data elements that I assume working code... I don't get any warning 
signal that something is wrong here.

The point in case is the behavior of locator() on a filled.contour() 
object: Things apparently  have been programmed and debugged from 
example rather than concept.

Now, in another posting I read that all this is a feature to discourge 
inexperienced users from statistics and force you to think before you do 
things. Whilst I support this concept of thinking: Did I miss something 
in statistics? I was in the believe that mean and sd were relatively 
close to each other conceptually... (here, they are even in different 
packages...)

I will continue using R for the time being. But whether I can recommend 
it to my work  collegues remains to be seen: How could I ever trust 
results returned?

I'm still impressed by some of the efficiency, but my trust is deeply 
shaken...
-----------------------------------------------------------------------
Stefan Eichenberger        mailto:Stefan.Eichenberger at se-kleve.com