[R] Getting the groupmean for each person
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon May 10 13:52:59 CEST 2004
On Mon, 10 May 2004, Liaw, Andy wrote:
> Both of you might have missed my question from Friday: For very long `x'
> (e.g., length=50000), indexing by names can take a long time. See that
> thread for detail. (For small data, you can hardly tell the difference.)
That's solved in R-devel as of this morning. You need a million to see a
significant time in indexing.
However, I think that in this case you should be indexing by the codes of
a factor, as tapply is guaranteed to produce results in the order of the
levels of f (after conversion to a factor). So the natural way to index
by a factor is the default one.
It may come as no surprise then that lda has code like
group.means <- tapply(x, list(rep(g, p), col(x)), mean)
X <- x - group.means[g, ]
where g is a factor.
> Also, I'm trying to write the function in a way that one can pass in more
> than one grouping variables in a list, much like tapply. The version I
> shown is a simplified version to demonstrate the `problem' I had. I
> obviously missed the fact that tapply returns 1D array...
>
> Best,
> Andy
>
> > From: kjetil at acelerate.com
> >
> > On 10 May 2004 at 10:09, Christophe Pallier wrote:
> >
> > >
> > >
> > > Liaw, Andy wrote:
> > >
> > > >Suppose I
> > > >define the function:
> > > >
> > > >fun <- function(x, f) {
> > > > m <- tapply(x, f, mean)
> > > > ans <- x - m[match(f, unique(f))]
> > > > names(ans) <- names(x)
> > > > ans
> > > >}
> > > >
> > > >
> > > >
> > >
> > > May I ask what is the purpose of match(f,unique(f)) ?
> > >
> > > To remove the group means, I have be using:
> > >
> > > x-tapply(x,f,mean)[f]
> > >
> > > for a while, (and I am now changing to
> > > x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of
> >
> > wouldn't
> > sweep(as.array(x), 1, tapply(x,f,mean)[as.character(f)] , "-")
> >
> > be more natural?
> >
> > Kjetil Halvorsen
> >
> > > indexing named vectors with factors )
> > >
> > > The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular
> > > order in the result of tapply, no? It seems a bit dangerous to me.
> > >
> > >
> > > Christophe Pallier
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list