[R] A comment about R:
Stefan Eichenberger
Stefan.Eichenberger at se-kleve.com
Fri Jan 6 16:18:16 CET 2006
~~~~~~~~~~~~~~~
... blame me for not having sent below message initially in
plain text format. Sorry!
~~~~~~~~~~~~~~~
I just got into R for most of the Xmas vacations and was about to ask
for helping pointer on how to get a hold of R when I came across this
thread. I've read through most it and would like to comment from a
novice user point of view. I've a strong programming background but
limited statistical experience and no knowledge on competing packages.
I'm working as a senior engineer in electronics.
Yes, the learning curve is steep. Most of the docu is extremely terse.
Learning is mostly from examples (a wiki was proposed in another
mail...), documentation uses no graphical elements at all. So, when it
comes to things like xyplot in lattice: where would I get the concepts
behind panels, superpanels, and the like?
ok., this is steep and terse, but after a while I'll get over it...
That's life. The general concept is great, things can be expressed very
densly: Potential is here.... I quickly had 200 lines of my own code
together, doing what it should - or so I believed.
Next I did:
matrix<-matrix(1:100, 10, 10)
image(matrix)
locator()
Great: I can interactively work with my graphs... But then:
filled.contour(matrix)
locator()
Oops - wrong coordinates returned. Bug. Apparently, locator() doen't
realize that fitted.contour() has a color bar to the right and scales x
wrongly...
Here is what really shocked me:
> str(bar) `data.frame': 206858 obs. of 12 variables: ...
> str(mean(bar[,6:12]))
Named num [1:7] 1.828 2.551 3.221 1.875 0.915 ...
...
> str(sd(bar[,6:12]))
Named num [1:7] 0.0702 0.1238 0.1600 0.1008 0.0465 ...
...
> prcomp(bar[,6:12])->foo
> str(foo$x)
num [1:206858, 1:7] -0.4187 -0.4015 0.0218 -0.4438 -0.3650 ...
...
> str(mean(foo$x))
num -1.07e-13
> str(sd(foo$x))
Named num [1:7] 0.32235 0.06380 0.02254 0.00337 0.00270 ...
...
So, sd returns a vector independent on whether the arguement is a matrix
or data.frame, but mean reacts differently and returns a vector only
against a data.frame?
The problem here is not that this is difficult to learn - the problem is
the complete absense of a concept. Is a data.frame an 'extended' matrix
with columns of different types or something different? Since the
numeric mean (I expected a vector) is recycled nicely when used in a
vector context, this makes debugging code close to impossible. Since sd
returns a vector, things like mean + 4*sd vary sufficiently across the
data elements that I assume working code... I don't get any warning
signal that something is wrong here.
The point in case is the behavior of locator() on a filled.contour()
object: Things apparently have been programmed and debugged from
example rather than concept.
Now, in another posting I read that all this is a feature to discourge
inexperienced users from statistics and force you to think before you do
things. Whilst I support this concept of thinking: Did I miss something
in statistics? I was in the believe that mean and sd were relatively
close to each other conceptually... (here, they are even in different
packages...)
I will continue using R for the time being. But whether I can recommend
it to my work collegues remains to be seen: How could I ever trust
results returned?
I'm still impressed by some of the efficiency, but my trust is deeply
shaken...
-----------------------------------------------------------------------
Stefan Eichenberger mailto:Stefan.Eichenberger at se-kleve.com
More information about the R-help
mailing list