[R] foreloop? aggregating time series data into groups

Joshua Wiley jwiley.psych at gmail.com
Mon Nov 1 22:04:17 CET 2010


Welcome to R and the help list!

On Mon, Nov 1, 2010 at 12:34 PM, blurg <ian.jhsph at gmail.com> wrote:
> I have a data set similar to the set below where 1 and 2 indicate test
> results and 0 indicates time points in between where there are no test
> results.  I would like to allocate the time points leading up to a test
> result with the value of the test result.
> What I have:     What I want:
> 1                     1
> 0                     1
> 0                     1
> 0                     1
> 1                     1
> 0                     2
> 0                     2
> 2                     2
> 0                     1
> 0                     1
> 1                     1
> 0                     2
> 2                     2
> I have attempted methods creating a data.frame of the the breaks/changes in
> of values to from 0 to 1 or to 2.
> x<-c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
> x1 <- which(diff(x) == 1)
> x2 <- which(diff(x) == 2)

## Functions that *I think* does what you want
myfun <- function(x) {
  dat <- rle(x)
  i <- which(dat$values == 0)
  dat$lengths[i + 1] <- with(dat, lengths[i + 1] + lengths[i])
  return(with(dat, rep(values[-i], lengths[-i])))

## Three test pieces of data
x <- c(0,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
y <- c(1,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,1)
z <- c(1,2,0,1,0,0,0,0,1,0,1,0,0,0,2,1,0,0,0,2,0,0,0,0)

## your example, works
## test case 2 (begins with a number), works
## test case 3 (ends with 0), fails

So, if things work how I think they do, that function should do what
you need as long as the last value is not 0, which kind of makes sense
because what value would be assigned anyways?

Side note, I created a sample vector with 10 million elements, and it
took about 9 seconds to run it through my function.

@list members, I welcome someone checking my work, I'm uneasy about a
couple aspects generalizing properly.

> What ever the solution, I can't be entered by hand due to the size of the
> dataset (>10 million and change). Any ideas?  This is my first time posting
> to this forum and I am relatively new to R, so please don't flame me to
> hard.

Although this list can certainly be tough at times, for your peace of
mind you pretty much did everything right as far as I am concerned.
You described your problem, included a small set of sample data that
was easily read into R (for future reference say you have a more
complex object that is not as easy to create, dput() will save you and
us trouble), and even showed what you tried to do.

Finally, in your explanation you gave both sample data AND desired
outcome.  This gives us a "gold standard" to test our code against,
rather than hoping our results match what your described you want.  I
am always thrilled when I'm not left re-reading a paragraph long,
English explanation that can be shown nicely with a few numbers.

> Desperate times call for desperate measures.

and assuming you have put forth some effort trying to solve it
yourself and took the time to help us answer your question (as you
clearly did here), the help list should not be a desperate measure :)



> --
> View this message in context: http://r.789695.n4.nabble.com/foreloop-aggregating-time-series-data-into-groups-tp3022667p3022667.html
> Sent from the R help mailing list archive at Nabble.com.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles

More information about the R-help mailing list