[R] Still trying to avoid loops

William Dunlap wdunlap at tibco.com
Wed Feb 4 21:49:16 CET 2015


A useful technique when it is easy to compute a vector from an ordered
data.frame but you need to do it for an unordered one is to compute the
order
vector 'ord', compute the vector from df[ord,], and use df[ord,...] <-
vector
to reorder the vector.  In your case you could do:
  > dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')),
  +                   D=c(5,3,1,3,2,4))
  > ord <- with(dat_2, order(S, D)) # order by subject, break ties by date
  > dat_2$visitNo <- integer(nrow(dat_2)) # will fill this in next
  > dat_2$visitNo[ord] <- with(dat_2[ord,], ave(visitNo, S, FUN=seq_along))
  > dat_2
    S D visitNo
  1 a 5       2
  2 c 3       2
  3 a 1       1
  4 b 3       1
  5 c 2       1
  6 c 4       3

Now this is different from your answer, c(2,2,1,1,2,3).  Which is correct?

You can also do the reordering of the result from the ordered dataset by
subscripting the right hand side with [order(ord)], but I find using [ord]
on left side easier to remember.
  with(dat_2[ord,], ave(visitNo, S, FUN=seq_along))[order(ord)]
  [1] 2 2 1 1 1 3



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Feb 4, 2015 at 12:07 PM, Tom Wright <tom at maladmin.com> wrote:

> Thanks, I was not aware of order().
> I did deliberately mess up the order of S. The following example breaks
> your solution
> dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')),
>                   D=c(5,3,1,3,2,4))
>
> which should give the answer c(2,2,1,1,2,3)
>
> Your solution does indicate that sorting the data correctly before
> starting might solve the problem.
>
>
> On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote:
> > Hello,
> >
> > Aren't the levels of your example wrong? If the levels are
> > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do
> > the job.
> >
> > unname(unlist(tapply(dat$D, dat$S, order)))
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Em 04-02-2015 19:34, Tom Wright escreveu:
> > > Given a dataframe:
> > >
> dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')),
> > >             D=c(1,5,3,2,3,4))
> > >
> > > where S is a subject identifier and D a visit (actually a date in my
> > > real dataset). I would like to generate another column giving the visit
> > > number
> > >
> > > R=c(2,1,1,1,2,3)
> > >
> > > My current solution uses nested loops and is slow and ugly. I've looked
> > > at by() but can't see how to keep the order of R correct.
> > >
> > > Thanks,
> > > Tom
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list