[R] the first and last observation for each subject
hadley wickham
h.wickham at gmail.com
Fri Jan 2 14:52:42 CET 2009
On Fri, Jan 2, 2009 at 3:20 AM, gallon li <gallon.li at gmail.com> wrote:
> I have the following data
>
> ID x y time
> 1 10 20 0
> 1 10 30 1
> 1 10 40 2
> 2 12 23 0
> 2 12 25 1
> 2 12 28 2
> 2 12 38 3
> 3 5 10 0
> 3 5 15 2
> .....
>
> x is time invariant, ID is the subject id number, y is changing over time.
>
> I want to find out the difference between the first and last observed y
> value for each subject and get a table like
>
> ID x y
> 1 10 20
> 2 12 15
> 3 5 5
> ......
>
> Is there any easy way to generate the data set?
One approach is to use the plyr package, as documented at
http://had.co.nz/plyr. The basic idea is that your problem is easy to
solve if you have a subset for a single subject value:
one <- subset(DF, ID == 1)
with(one, y[length(y)] - y[1])
The difficulty is splitting up the original dataset in to subjects,
applying the solution to each piece and then joining all the results
back together. This is what the plyr package does for you:
library(plyr)
# ddply is for splitting up data frames and combining the results
# into a data frame. .(ID) says to split up the data frame by the subject
# variable
ddply(DF, .(ID), function(one) with(one, y[length(y)] - y[1]))
# if you want a more informative variable name in the result
# return a named vector:
ddply(DF, .(ID), function(one) c(diff = with(one, y[length(y)] - y[1])))
# plyr takes care of labelling the result for you.
You don't say why you want to include x, or what to do if x is not
invariant, but here are couple of options:
# Split up by ID and x
ddply(DF, .(ID, x), function(one) c(diff = with(one, y[length(y)] - y[1])))
# Return the first x value
ddply(DF, .(ID), function(one) {
with(one, c(
x = x[1],
diff = y[length(y)] - y[1]
))
})
# Throw an error is x is not unique
ddply(DF, .(ID), function(one) {
stopifnot(length(unique(one$x)) == 1)
with(one, c(
x = x[1],
diff = y[length(y)] - y[1]
))
})
Regards,
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list