[R] A comment about R:

Wed Jan 4 02:36:46 CET 2006

>Hello,

>Unlike most posts on the R mailing list I feel qualified to comment on 
>this one.  For about 3 months I have been trying to learn use R,  after 
>having  used various versions of SPSS for about  10 years.

I think it is far too simplistic to ascribe non-use of R to laziness.  This 
may well be the case for some, however, I have read 5-6 books on R, waded 
through on-line resources,  read the documentation and asked multiple 
questions via e-mails - and still find even some of the basics very difficult.

There are several reasons for this:

1. For some tasks R is extremely user-unfriendly.  Some comparative examples:

(a) In running a chi-square analysis in SPSS the following syntax is included

/STATISTIC=CHISQ
   /CELLS= COUNT EXPECTED ROW COLUMN TOTAL RESID .

this produces expected and observed counts, row & column percentages, 
residuals, chi-square & Fisher's exact  test + other output.

In R, it is a herculean task to produce similar output . It certainly, 
can't be produced in 2 lines as far as I can tell.

(b)  in SPSS if I want to compare multiple variables by a single dependent 
variable this is readily performed

CROSSTABS
   /TABLES=baserdis  baserenh  basersoc baseradd socbest disbest entbest 
addbest worsdis worsphy by group

I used the chi-square example again, but the same applies for a t-test. I 
started looking into how  to do something similar in R, with the t-test 
command but gave up. R does force the user to take a more considered 
approach to analysis.

(c) To obtain a correlation matrix in R with the correlation & p-value is 
no simple task -

In SPSS this is obtained via:

GET
   FILE='D:\a study\data\dat\key data\master data.sav'.
NONPAR CORR
   /VARIABLES= goodnum badnum good5 bad5 avfreq avdayamt
   /PRINT=KENDALL TWOTAIL
   /MISSING=PAIRWISE .

In R something like this is required -

 > by(mydat, mydat$group, function(x) {
+ nm <- names(x)
+ rho <- matrix(, 6, 2)
+ rho.nm <- matrix(, 6, 2)
+ k <- 1
+ for(i in 2:4) {
+ for(j in (i + 1):5) {
+ x.i <- x[, i]
+ x.j <- x[, j]
+ ct <- cor.test(x.i, x.j, method=c("kendall") , alternative =c("two-sided"))
+ rho[k, 1] <- ct$estimate
+ rho[k, 2] <- round(ct$p-value, 3)
+ rho.nm[k, ] <- c(nm[i], nm[j])
+ k <- k + 1
+ }
+ }
+ rho <- cbind(as.data.frame(rho.nm), as.data.frame(rho))
+ names(rho) <- c("freq.i", "freq.j", "cor", "p-value")
+ rho
+ })

2) It is not always clear what the output produced by R, is. The 
Mann-Whitney U-test is a good example. In R, it seems a standardised value 
is obtained. I was advised that it is easy enough to check this as R is 
open-source, but at least for me, I don't believe I would understand this 
code anyway. It is confusing when comparative programs such as R and SPSS 
produce dis-similar results. For the user it is important to be able to 
fairly easily reconcile such differences, to engender confidence in results.

3) I find the help files in R quite difficult to understand.  For example, 
see help(t.test).  It is almost assumed by the examples that you know what 
to do. Personally, I would find some form of simple decision tree easier 
-e.g. If you want to perform a t-test with the dependent variable in one 
column and the dependent use in another use t.test(AVFREQ~GROUP) . If you 
want to perform a t-test with the dependent variable in separate columns 
(each column representing a different group) use - t.test(AVFREQ1, AVFREQ2) .

4) My initial approach to using R, was to run commands I had used commonly 
in SPSS and compare the results. I have only got as far  as basic ANOVA. 
This has been time-consuming and at times it has been difficult to obtain 
advice. Some people on the R list have been extremely generous with their 
time and knowledge, and I have much appreciated this assistance. At other 
times I see responses met  with something like arrogance. With the 
sophistication of R, there is also an elitism.  This is a barrier to R 
being more widely accepted and used.

5) differences in terminology - this is just part of the learning process, 
but I still found it took quite some time to work out simple commands and 
what different analyses were called.

6) system administrators may be wary of freeware.

No doubt for the sophisticated user, my comments may seem trite and easily 
resolved, however I believe my comments have some relevance as to why R is 
not more readily used or accepted.

Bob Green