[R] by (tapply) and for loop differences
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Tue Jul 5 12:37:11 CEST 2005
"Bashir Saghir (Aztek Global)" <Saghir.Bashir at ucb-group.com> writes:
> I am getting a difference in results when running some analysis using by and
> tapply compare to using a for loop. I've tried searching the web but had no
> luck with the keywords I used.
>
> I've attached a simple example below to illustrates my problem. I get a
> difference in the mean of yvar, diff and the p-value using tapply & by
> compared to a for loop. I cannot see what I am doing wrong. Can anyone help?
>
> > # Simulate some data - I'll do 2 simulations...
> >
> > xvar = rnorm(40, 20, 5)
> > yvar = rnorm(40, 22, 2)
> > num = factor(rep(1:2, each=20))
> > sdat = data.frame(cbind(num, xvar, yvar))
> >
> > # Define a function to do a simple t test and return some values...
> >
> > kindtest = function(varx, vary){
> + res = t.test(varx, vary)
> + x.mn = res$estimate[1]
> + y.mn = res$estimate[2]
> + diff = y.mn-x.mn
> + pval = res$p.value
> + cat("Mean xvar =", x.mn, " Mean yvar =", y.mn)
> + cat(" diff =", diff, " p-value=", pval, "\n\n")
> + list(x.mn=x.mn, y.mn=y.mn, diff=diff, pval=pval)
> + }
>
> ## Results from by and tapply
>
> > attach(sdat)
> > bres = by(xvar, num, kindtest, yvar)
> Mean xvar = 19.8904 Mean yvar = 21.97729 diff = 2.086891 p-value=
> 0.06222805
> Mean xvar = 19.88329 Mean yvar = 21.97729 diff = 2.093996 p-value=
> 0.05245329
>
> > tres = tapply(xvar, num, kindtest, yvar)
> Mean xvar = 19.8904 Mean yvar = 21.97729 diff = 2.086891 p-value=
> 0.06222805
> Mean xvar = 19.88329 Mean yvar = 21.97729 diff = 2.093996 p-value=
> 0.05245329
>
> > detach(sdat,1)
>
> ## Results from for
>
> > for(i in 1:2) {
> + subdat= subset(sdat, num==i)
> + kindtest(subdat$xvar, subdat$yvar)
> + }
> Mean xvar = 19.8904 Mean yvar = 21.98615 diff = 2.095746 p-value=
> 0.07319223
> Mean xvar = 19.88329 Mean yvar = 21.96843 diff = 2.085141 p-value=
> 0.05850057
>
The fact that the by/tapply approach is giving you the same Mean yvar
for both groups should be a dead giveaway....
Stick print(varx) and print(vary) into kindtest, and you'll see the
point. You are passing yvar *without* subsetting (and since the t.test
isn't paired, it can hardly be expected to complain that x and y
differ in length...).
This is probably closer to the mark:
by(sdat, num, with, kindtest(xvar, yvar))
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list