[R] the first and last observation for each subject

Fri Jan 2 19:16:27 CET 2009

I think there's a pretty simple solution here, though probably not the
most efficient:

t(sapply(split(a,a$ID),
    function(q) with(q,c(ID=unique(ID),x=unique(x),y=max(y)-min(y)))))

Using 'unique' instead of min or [[1]] has the advantage that if x is
in fact not time-invariant, this gives an error rather than silently
ignore inconsistencies.

Trying to package up this idiom into a function leads to:

select <-
  function(df, groupby, selection)
   {
     pf <- parent.frame()
     fields <- substitute(selection)
     t(sapply(split(df,eval(substitute(groupby),df,enclos=pf)),
             function(q) eval(fields,q,enclos=pf)))  }

which I admit is rather ugly (and does no error-checking), but it does work:

> select(a,ID,list(min(ID),unique(x),max(y)-min(y)))
    [,1] [,2] [,3]
  1 1    10   20
  2 2    12   15
  3 3    5    5

Perhaps some of the more experienced people on the list could show me
how to write this more cleanly.

           -s

On Fri, Jan 2, 2009 at 4:20 AM, gallon li <gallon.li at gmail.com> wrote:
> I have the following data
>
> ID x y time
> 1  10 20 0
> 1  10 30 1
> 1 10 40 2
> 2 12 23 0
> 2 12 25 1
> 2 12 28 2
> 2 12 38 3
> 3 5 10 0
> 3 5 15 2
> .....
>
> x is time invariant, ID is the subject id number, y is changing over time.
>
> I want to find out the difference between the first and last observed y
> value for each subject and get a table like
>
> ID x y
> 1 10 20
> 2 12 15
> 3 5 5
> ......
>
> Is there any easy way to generate the data set?
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>