Hi the list,

_**** In short ****__* *_
R is too complex for basic level non statistician students especially 
when they compare it to the %$*#& Excel. It could be interesting to 
create a package that would propose the very basic tools with a very 
simplified structure.


_**** More precisely ****__* *_
Reading this list, I have the feeling that R is less and less for 
specialist, more and more for occasional users. On my side, I am 
teaching R to sports student... Statistics is definitely NOT their 
favorite topic. They just need to be basic user: they need to know how 
to compute a mean, median, some basic test (parametric or not), some 
regression. They also need to be able to deal with real data with 
missing value.
To their point of vu (point of vu that I share most of the time), one 
major difficulty in R is the "non uniformity" of syntaxes:

- Why does median not work on ordered factor?
- Why does cor(age,size,na.rm=TRUE) not work?
- Why does summary report the quartiles but not the standard deviation?
- Why is it t.test(age~groupe) and cor.test(age,size)?
- Why is it boxplot(age) and barplot(table(age))?
- Why is it as.numeric but ordered?
- Sir, we hate changing the working directory and using the %$*#& 
function "read.csv". Why not stopping working on R and start using Excel 
again, like in all other classes with all the other teachers? (I do NOT 
share this one :-) )
- ...

The calculation of a statistics relatively to a specific modality (like 
calculation of the mean(size) but only for men then for women) is also a 
basic needs, one of the first things they want to do. But unfortunately, 
it is very complicated: it needs the introduction of vector, logical 
operator, lines selection using a logical vector. And every syntax 
mistake is fatal. Under Excel, you need to know how to sort the data, 
then use the function "=mean"...

It is why I start to create a package (called "R light" but that can 
change) that would propose a simplified and more uniform version of the 
tool that low levels user need (call the "light tools"?). The syntax 
could be very simple but should suffer NO exception:
 
function(variable) : works on a variable
function(variable~groups): works on a variable relatively to each group 
modality.
Then Lmean will compute the mean, Lvar will compute the variance, 
Lmedian the median and so on.
 
*_*** Examples *** _*
>  age <- c(15,16,18,15,16,12)
>  groups <- c("M","F","M","F","M","F")
 
>  Lmean(age)
[1] 15.33333
Lmean(age~groups)
       F        M
14.33333 16.33333
 
>  Lvar(age)
 [1] 3.866667
 
>  Lvar(age~groups)
       F        M
4.333333 2.333333
 
>  marks <- ordered("A","B","A","C","B","C",levels=c("C","B","A"))
>  Lmedian(marks)
[1] B
Levels: C < B < A
>  Lmedian(marks~groups)
F M
C A
Levels: C < B < A
 

That way, R would be simpler than Excel...
 
 
*_*** What next? *** _*
So I start to create the package but it takes me a lot of time (like 
always when creating a package). Since this can be useful for a lot of R 
teacher, I think that some people might be interested in collaborating 
on this project. Several needs:
 
- Brain storming on what should be in Rlight: What are the tools those 
basic users that know nothing on computing and not that much on 
statistics do need?
- Brain storming on the syntax: I define "function(variable)" and 
"function(variable~groups)", some other syntax might be interesting 
specially for more advance concept.
- Writing documentation: well you guess it by reading this email, I am 
NOT a native English speaker...
- Writing some L function.
 
You will find the code that already exists and some tests here:
http://christophe.genolini.free.fr/aTelecharger/Rlight.r
 


Christophe Genolini

Associate professor
      University of Paris X
      INSERM U699
R package maintainer:
     - r2lUniv
     - kml
R Tutorial (on CRAN):
     - A (Not So) Short Introduction to S4
     - Petit Manuel de S4
     - Lire; Compter ; Tester... avec R
     - Construire un Package
     - R, bonnes pratiques

	[[alternative HTML version deleted]]


