[R] Seeking feedback on my first attempt at R programming

Thu Dec 23 06:53:41 CET 2010

On Wed, Dec 22, 2010 at 4:26 PM, Paul Miller <pjmiller_57 at yahoo.com> wrote:
> Hello Everyone,
>
> Below is my first attempt at R programming. The code replicates example 5.1 from Common Statistical Methods for Clinical Research with SAS Examples. I was hoping that people more experienced than myself would be willing to take a look and let me know what I did well and what could have been done better. I'd be particularly grateful if anyone could tell me why the two user defined functions commented out at the bottom of the code don't work whereas the one I ultimately went with did.
>
> Thanks,
>
> Paul
>
> ######################################
> #### Chapter 5: The Two-Sample T-Test ####
> ######################################
>
> #### Example 5.1: Two-Sample t-Test using FEV1 Changes Data ####
>
> connection <- textConnection("
> 101 A 1.35 <NA>   103 A 3.22 3.55   106 A 2.78 3.15
> 108 A 2.45 2.30   109 A 1.84 2.37   110 A 2.81 3.20
> 113 A 1.90 2.65   116 A 3.00 3.96   118 A 2.25 2.97
> 120 A 2.86 2.28   121 A 1.56 2.67   124 A 2.66 3.76
> 102 P 3.01 3.90   104 P 2.24 3.01   105 P 2.25 2.47
> 107 P 1.65 1.99   111 P 1.95 <NA>   112 P 3.05 3.26
> 114 P 2.75 2.55   115 P 1.60 2.20   117 P 2.77 2.56
> 119 P 2.06 2.90   122 P 1.71 <NA>   123 P 3.54 2.92
> ")
>
> fev <- data.frame(scan(connection,
> list(patno = 0, trtgrp = "", fev0 = 0, fev6 = 0), na.strings="<NA>"))
>
> fev <- within(fev,trtgrp <- factor(trtgrp, labels = c("ABC-123","Placebo")))
> fev <- within(fev,chg <- fev6 - fev0)
>
> summaryfn <- function(x)
> data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x),
> t.value=mean(x)/(sd(x)/sqrt(sum(complete.cases(x)))),
> p.value=2*pt(-abs(mean(x)/(sd(x)/sqrt(sum(complete.cases(x))))),df=sum(complete.cases(x))-1))
>
> cfev <- na.omit(fev)
> by(cfev[3:5],cfev[2],summaryfn)
>
> fligner.test(chg ~ trtgrp, data=fev)
> t.test(chg ~ trtgrp, data=fev, var.equal=TRUE)
>
> #summaryfn <- function(x)
> #   data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x),
> #    t.value=t.test(x)$statistic,
> #    p.value=t.test(x)$p.value)
> #cfev <- na.omit(fev)
> #by(cfev[3:5],cfev[2],summaryfn)
>
> #summaryfn <- function(x)
> #   data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x),
> #    t.value=as.numeric(t.test(x)[1]),
> #    p.value=as.numeric(t.test(x)[3]))
> #cfev <- na.omit(fev)
> #by(cfev[3:5],cfev[2],summaryfn)
>

1.  Better to stay away from within unless you really need it.   Use
with or transform:

fev <- transform(fev,
  trtgrp = factor(trtgrp, labels = c("ABC-123","Placebo")),
  chg = fev6 - fev0
)

2. mean and sd act column-wise but t.test does not.   A minimal change
to make it work would be to add an apply to the t.value and p.value:

summaryfn2a <- function(x)
   data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x),
    t.value=apply(x, 2, function(x) (t.test)(x)$statistic),
    p.value=apply(x, 2, function(x) (t.test)(x)$p.value))
cfev <- na.omit(fev)
by(cfev[3:5],cfev[2],summaryfn2a)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com