[R] Help with a problem

Sun Jul 18 15:51:30 CEST 2010

You can also use 'embed' to create a list of indices you can use to do the test:

> dat
           ds    c1 c2
1  2010-04-03   100  0
2  2010-04-30 11141 15
3  2010-05-01     3 16
4  2010-05-02  7615 14
5  2010-05-03  6910 17
6  2010-05-04  5035  3
7  2010-05-05  3007 15
8  2010-05-06     4 14
9  2010-05-07  8335 17
10 2010-05-08  2897 13
11 2010-05-09  6377 17
12 2010-05-10  3177 17
13 2010-05-11  7946 15
14 2010-05-12  8705  0
15 2010-05-13  9030 16
16 2010-05-14  8682 16
17 2010-05-15  8440  1
> # create index to check against
> indx <- embed(seq(nrow(dat)), 7)
> result <- apply(indx, 1, function(x){
+     # whatever condition you want
+     sum(dat$c1[x] >= 5000 & dat$c2[x] > 0)
+ })
>
> result
 [1] 4 4 4 4 4 3 3 3 4 4 5
>

On Sun, Jul 18, 2010 at 3:06 AM, Stephan Kolassa <Stephan.Kolassa at gmx.de> wrote:
> Hi all,
>
> zoo::rollmean() is a nice idea. But if I understand Mike correctly, he wants
> 5 out of any 7 consecutive logicals to be TRUE, where these 5 do not
> necessarily need to be consecutive themselves. (remaining open question:
> could, e.g., the condition on c1 be TRUE for rows 1,2,3,4,5 and on c2 for
> rows 3,4,5,6,7, or would it need to be TRUE for the same rows?). Then
> something like this would make sense:
>
> any(rollmean(dat$c1>=100,7)>=5/7-.01 & rollmean(dat$c2>=8,7)>=5/7-.01))
>
> or
>
> any(rollmean(dat$c1>=100,7)>=5/7-.01 & dat$c2>=8,7)>=5/7-.01))
>
> depending on the open question above.
>
> The "-.01" above may be necessary in light of FAQ 7.31.
>
> HTH,
> Stephan
>
>
>
> Joshua Wiley schrieb:
>>
>> Hi Michael,
>>
>> The days in your example do not look continuous (at least from my
>> thinking), so you may have extra requirements in mind, but take a look
>> at this code.  My general thought was first to turn each column into a
>> logical vector (c1 >= 100 and c2 >= 8).  Taking advantage of the fact
>> that R treats TRUE as 1 and FALSE as 0, compute a rolling mean.  If
>> (and only if) 5 consecutive values are TRUE, the mean will be 1.  Next
>> I added the rolling means for each column, and then tested whether any
>> were 2 (i.e., 1 + 1).
>>
>> Cheers,
>>
>> Josh
>>
>> ###################
>> #Load required package
>> library(zoo)
>>
>> #Your data with ds converted to Date
>> #from dput()
>> dat <-
>> structure(list(ds = structure(c(14702, 14729, 14730, 14731, 14732,
>> 14733, 14734, 14735, 14736, 14737, 14738, 14739, 14740, 14741,
>> 14742, 14743, 14744), class = "Date"), c1 = c(100L, 11141L, 3L,
>> 7615L, 6910L, 5035L, 3007L, 4L, 8335L, 2897L, 6377L, 3177L, 7946L,
>> 8705L, 9030L, 8682L, 8440L), c2 = c(0L, 15L, 16L, 14L, 17L, 3L,
>> 15L, 14L, 17L, 13L, 17L, 17L, 15L, 0L, 16L, 16L, 1L)), .Names = c("ds",
>> "c1", "c2"), row.names = c(NA, -17L), class = "data.frame")
>>
>> #Order by ds
>> dat <- dat[order(dat$ds), ]
>>
>> yourvar <- 0
>>
>> #Test that 5 consecutive values from c1 AND c2 meet requirements
>> if(any(
>>  c(rollmean(dat$c1 >= 100, 5) + rollmean(dat$c2 >= 8, 5)) == 2)
>>   ) {yourvar <- 1}
>>
>> ###################
>>
>> On Sat, Jul 17, 2010 at 2:38 PM, Michael Hess <mlhess at med.umich.edu>
>> wrote:
>>>
>>> Sorry for not being clear.
>>>
>>> In the dataset there are around 100 or so days of data (in the case also
>>> rows of data)
>>>
>>> I need to make sure that the person meets that c1 is at least 100 AND c2
>>> is at least 8 for 5 of 7 continuous days.
>>>
>>> I will play with what I have and see if I can find out how to do this.
>>>
>>> Thanks for the help!
>>>
>>> Michael
>>>
>>>>>> Stephan Kolassa  07/17/10 4:50 PM >>>
>>>
>>> Mike,
>>>
>>> I am slightly unclear on what you want to do. Do you want to check rows
>>> 1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all*
>>> rows you are looking at, and same for c2?
>>>
>>> You can sort your data like this:
>>> data <- data[order(data$ds),]
>>>
>>> Type ?order for help. But also do this for added enlightenment...:
>>>
>>> library(fortunes)
>>> fortune("dog")
>>>
>>> Next, your analysis on the sorted data frame. As I said, I am not
>>> entirely clear on what you are looking at, but the following may solve
>>> your problem with choices "1 to 7" and "any one" above.
>>>
>>> foo <- 0
>>> for ( ii in 1:(nrow(data)-8) ) {
>>>  if (any(data$c1[ii+seq(0,6)]>=100) & any(data$c2[ii+seq(0,6)]>=8)) {
>>>    foo <- 1
>>>    break
>>>  }
>>> }
>>>
>>> The variable "foo" should contain what you want it to. Look at ?any
>>> (and, if this does not do what you want it to, at ?all) for further info.
>>>
>>> No doubt this could be vectorized, but I think the loop is clear enough.
>>>
>>> Good luck!
>>> Stephan
>>>
>>>
>>>
>>> Michael Hess schrieb:
>>>>
>>>> Hello R users,
>>>>
>>>> I am a researcher at the University of Michigan looking for a solution
>>>> to an R problem.  I have loaded my data in from a mysql database and it
>>>> looks like this
>>>>
>>>>> data
>>>>
>>>>           ds c1 c2
>>>> 1  2010-04-03        100           0
>>>> 2  2010-04-30      11141          15
>>>> 3  2010-05-01      3          16
>>>> 4  2010-05-02       7615          14
>>>> 5  2010-05-03       6910          17
>>>> 6  2010-05-04       5035          3
>>>> 7  2010-05-05       3007          15
>>>> 8  2010-05-06       4          14
>>>> 9  2010-05-07       8335          17
>>>> 10 2010-05-08       2897          13
>>>> 11 2010-05-09       6377          17
>>>> 12 2010-05-10       3177          17
>>>> 13 2010-05-11       7946          15
>>>> 14 2010-05-12       8705          0
>>>> 15 2010-05-13       9030          16
>>>> 16 2010-05-14       8682          16
>>>> 17 2010-05-15       8440          15
>>>>
>>>>
>>>> What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at
>>>> least 100 AND c2 is at least 8. If it is not, start with check rows 2,8 and
>>>> if not there 3,9....until it loops over the entire file.   If it finds a set
>>>> that matches, set a new variable equal to 1, if never finds a match, set it
>>>> equal to 0.
>>>>
>>>> I have done this in stata but on this project we are trying to use R.
>>>>  Is this something that can be done in R, if so, could someone point me in
>>>> the correct direction.
>>>>
>>>> Thanks,
>>>>
>>>> Michael Hess
>>>> University of Michigan
>>>> Health System
>>>>
>>>> **********************************************************
>>>> Electronic Mail is not secure, may not be read every day, and should not
>>>> be used for urgent or sensitive issues
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> **********************************************************
>>> Electronic Mail is not secure, may not be read every day, and should not
>>> be used for urgent or sensitive issues
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?