[R] A comment about R:
Bob Green
bgreen at dyson.brisnet.org.au
Wed Jan 4 02:36:46 CET 2006
>Hello,
>Unlike most posts on the R mailing list I feel qualified to comment on
>this one. For about 3 months I have been trying to learn use R, after
>having used various versions of SPSS for about 10 years.
I think it is far too simplistic to ascribe non-use of R to laziness. This
may well be the case for some, however, I have read 5-6 books on R, waded
through on-line resources, read the documentation and asked multiple
questions via e-mails - and still find even some of the basics very difficult.
There are several reasons for this:
1. For some tasks R is extremely user-unfriendly. Some comparative examples:
(a) In running a chi-square analysis in SPSS the following syntax is included
/STATISTIC=CHISQ
/CELLS= COUNT EXPECTED ROW COLUMN TOTAL RESID .
this produces expected and observed counts, row & column percentages,
residuals, chi-square & Fisher's exact test + other output.
In R, it is a herculean task to produce similar output . It certainly,
can't be produced in 2 lines as far as I can tell.
(b) in SPSS if I want to compare multiple variables by a single dependent
variable this is readily performed
CROSSTABS
/TABLES=baserdis baserenh basersoc baseradd socbest disbest entbest
addbest worsdis worsphy by group
I used the chi-square example again, but the same applies for a t-test. I
started looking into how to do something similar in R, with the t-test
command but gave up. R does force the user to take a more considered
approach to analysis.
(c) To obtain a correlation matrix in R with the correlation & p-value is
no simple task -
In SPSS this is obtained via:
GET
FILE='D:\a study\data\dat\key data\master data.sav'.
NONPAR CORR
/VARIABLES= goodnum badnum good5 bad5 avfreq avdayamt
/PRINT=KENDALL TWOTAIL
/MISSING=PAIRWISE .
In R something like this is required -
> by(mydat, mydat$group, function(x) {
+ nm <- names(x)
+ rho <- matrix(, 6, 2)
+ rho.nm <- matrix(, 6, 2)
+ k <- 1
+ for(i in 2:4) {
+ for(j in (i + 1):5) {
+ x.i <- x[, i]
+ x.j <- x[, j]
+ ct <- cor.test(x.i, x.j, method=c("kendall") , alternative =c("two-sided"))
+ rho[k, 1] <- ct$estimate
+ rho[k, 2] <- round(ct$p-value, 3)
+ rho.nm[k, ] <- c(nm[i], nm[j])
+ k <- k + 1
+ }
+ }
+ rho <- cbind(as.data.frame(rho.nm), as.data.frame(rho))
+ names(rho) <- c("freq.i", "freq.j", "cor", "p-value")
+ rho
+ })
2) It is not always clear what the output produced by R, is. The
Mann-Whitney U-test is a good example. In R, it seems a standardised value
is obtained. I was advised that it is easy enough to check this as R is
open-source, but at least for me, I don't believe I would understand this
code anyway. It is confusing when comparative programs such as R and SPSS
produce dis-similar results. For the user it is important to be able to
fairly easily reconcile such differences, to engender confidence in results.
3) I find the help files in R quite difficult to understand. For example,
see help(t.test). It is almost assumed by the examples that you know what
to do. Personally, I would find some form of simple decision tree easier
-e.g. If you want to perform a t-test with the dependent variable in one
column and the dependent use in another use t.test(AVFREQ~GROUP) . If you
want to perform a t-test with the dependent variable in separate columns
(each column representing a different group) use - t.test(AVFREQ1, AVFREQ2) .
4) My initial approach to using R, was to run commands I had used commonly
in SPSS and compare the results. I have only got as far as basic ANOVA.
This has been time-consuming and at times it has been difficult to obtain
advice. Some people on the R list have been extremely generous with their
time and knowledge, and I have much appreciated this assistance. At other
times I see responses met with something like arrogance. With the
sophistication of R, there is also an elitism. This is a barrier to R
being more widely accepted and used.
5) differences in terminology - this is just part of the learning process,
but I still found it took quite some time to work out simple commands and
what different analyses were called.
6) system administrators may be wary of freeware.
No doubt for the sophisticated user, my comments may seem trite and easily
resolved, however I believe my comments have some relevance as to why R is
not more readily used or accepted.
Bob Green
More information about the R-help
mailing list