[R] combining same-day lab measurements with 'apply'

hadley wickham h.wickham at gmail.com
Thu Oct 16 00:54:11 CEST 2008


Hi Dylan,

You might want to have a look at the plyr package which is designed to
make these sorts of tasks easier  - http://had.co.nz/plyr.  The site
includes a ~20 page introductory pdf.

Hadley

On Wed, Oct 15, 2008 at 3:45 PM, dylan boyd <dylan.boyd at gmail.com> wrote:
> Another request for help implementing the 'apply' functions to avoid a
> loop structure...
>
> I am working with a data set that includes lab measurements taken at
> different dates for the subjects, with some subjects having more
> results than others.  I would like to average lab results for each
> subject that were taken on the same day.  I can do this using a for
> loop, but would like to know how to efficiently accomplish the same
> thing without looping as I will likely have to do the same with a much
> larger data set.
>
> At the end of this post are examples of what I'm starting with and
> what I want the result to look like:
>
> I tried another suggestion I saw on this list using a list object for
> the index of a call to 'tapply' as in:
>
>> new.x <- tapply(x, list(id, date), mean)
>
> but this produced a table-like object referencing every subject id
> with every date in the dataset - too large for the full data set and
> also would require serious re-working (at least with the tools I know)
> to return to the original dataframe structure.
>
> Another attempt was pasting the id and date together to create a
> single indexing vector.  I could get this to work, but it seems clumsy
> to be substring'ing the names attribute of the resulting dataframe and
> implementing this with id's that range from 1 to 3 digits further
> complicates things:
>
>> new.x <- tapply(x, paste(id, date),mean)
>> data.frame(
> +   id  = substr(names(new.x),start=1,stop=1),
> +   x   = new.x,
> +   date  = as.Date(substr(names(new.x),start=3,stop=100)))
>             id    x       date
> 2 2005-12-15  2 21.0 2005-12-15
> 2 2006-01-13  2 22.5 2006-01-13
> 3 2000-04-05  3 17.0 2000-04-05
> 4 2003-05-23  4 18.0 2003-05-23
> 4 2003-07-08  4 27.0 2003-07-08
> 4 2003-11-30  4 24.5 2003-11-30
> 5 2001-04-19  5 23.0 2001-04-19
>
> I could get this to work, but it seems clumsy to be substring'ing the
> names attribute of the resulting dataframe and implementing.  Also,
> the full data set has subject id's that range from 1 to 3 digits
> further complicates things the 'substr' call (although it just
> occurred to me that I could use strsplit as well..).
>
> It may be irrelevant, but the 'date' variable is a Date class object.
> I've tried first converting this to a character object but didn't get
> anywhere.  Further, I'll use the dates later with difftime to figure
> the subjects' age at the onset of their condition, so I'd like to
> avoid converting between classes too much.
>
> Any advice would be greatly appreciated.  Here is the code to build
> the sample data and the working for loop as well:
>
>> dum <- data.frame(
> +   id  = c(2,2,2,3,4,4,4,4,5,5),
> +   x   = sample(15:30,length(id)),
> +   date  = as.Date(c("12/15/2005","1/13/2006","1/13/2006","4/5/2000","5/23/2003",
> +     "7/8/2003","11/30/2003","11/30/2003","4/19/2001","4/19/2001"),format="%m/%d/%Y")
> +   )
>> id.list <- unique(id)
>> dum
>   id  x       date
> 1   2 21 2005-12-15
> 2   2 22 2006-01-13
> 3   2 23 2006-01-13
> 4   3 17 2000-04-05
> 5   4 18 2003-05-23
> 6   4 27 2003-07-08
> 7   4 25 2003-11-30
> 8   4 24 2003-11-30
> 9   5 26 2001-04-19
> 10  5 20 2001-04-19
>>
>
>
>> output <- NULL
>> for (i in seq(along=id.list)) {
> +   sel <- dum$id==id.list[i]
> +   x.averaged  <- tapply(dum$x[sel], dum$date[sel], mean, na.rm=TRUE)
> +   dat  <-  data.frame(id.list[i], x.averaged, names(x.averaged))
> +   output  <- rbind(output, dat)
> + }
>> names(output) <- names(dum)
>> rownames(output)  <- NULL
>> output
>  id    x       date
> 1  2 24.0 2005-12-15
> 2  2 22.0 2006-01-13
> 3  3 19.0 2000-04-05
> 4  4 22.0 2003-05-23
> 5  4 26.0 2003-07-08
> 6  4 28.5 2003-11-30
> 7  5 21.0 2001-04-19
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/



More information about the R-help mailing list