[R] Help with a problem
jim holtman
jholtman at gmail.com
Sun Jul 18 15:51:30 CEST 2010
You can also use 'embed' to create a list of indices you can use to do the test:
> dat
ds c1 c2
1 2010-04-03 100 0
2 2010-04-30 11141 15
3 2010-05-01 3 16
4 2010-05-02 7615 14
5 2010-05-03 6910 17
6 2010-05-04 5035 3
7 2010-05-05 3007 15
8 2010-05-06 4 14
9 2010-05-07 8335 17
10 2010-05-08 2897 13
11 2010-05-09 6377 17
12 2010-05-10 3177 17
13 2010-05-11 7946 15
14 2010-05-12 8705 0
15 2010-05-13 9030 16
16 2010-05-14 8682 16
17 2010-05-15 8440 1
> # create index to check against
> indx <- embed(seq(nrow(dat)), 7)
> result <- apply(indx, 1, function(x){
+ # whatever condition you want
+ sum(dat$c1[x] >= 5000 & dat$c2[x] > 0)
+ })
>
> result
[1] 4 4 4 4 4 3 3 3 4 4 5
>
On Sun, Jul 18, 2010 at 3:06 AM, Stephan Kolassa <Stephan.Kolassa at gmx.de> wrote:
> Hi all,
>
> zoo::rollmean() is a nice idea. But if I understand Mike correctly, he wants
> 5 out of any 7 consecutive logicals to be TRUE, where these 5 do not
> necessarily need to be consecutive themselves. (remaining open question:
> could, e.g., the condition on c1 be TRUE for rows 1,2,3,4,5 and on c2 for
> rows 3,4,5,6,7, or would it need to be TRUE for the same rows?). Then
> something like this would make sense:
>
> any(rollmean(dat$c1>=100,7)>=5/7-.01 & rollmean(dat$c2>=8,7)>=5/7-.01))
>
> or
>
> any(rollmean(dat$c1>=100,7)>=5/7-.01 & dat$c2>=8,7)>=5/7-.01))
>
> depending on the open question above.
>
> The "-.01" above may be necessary in light of FAQ 7.31.
>
> HTH,
> Stephan
>
>
>
> Joshua Wiley schrieb:
>>
>> Hi Michael,
>>
>> The days in your example do not look continuous (at least from my
>> thinking), so you may have extra requirements in mind, but take a look
>> at this code. My general thought was first to turn each column into a
>> logical vector (c1 >= 100 and c2 >= 8). Taking advantage of the fact
>> that R treats TRUE as 1 and FALSE as 0, compute a rolling mean. If
>> (and only if) 5 consecutive values are TRUE, the mean will be 1. Next
>> I added the rolling means for each column, and then tested whether any
>> were 2 (i.e., 1 + 1).
>>
>> Cheers,
>>
>> Josh
>>
>> ###################
>> #Load required package
>> library(zoo)
>>
>> #Your data with ds converted to Date
>> #from dput()
>> dat <-
>> structure(list(ds = structure(c(14702, 14729, 14730, 14731, 14732,
>> 14733, 14734, 14735, 14736, 14737, 14738, 14739, 14740, 14741,
>> 14742, 14743, 14744), class = "Date"), c1 = c(100L, 11141L, 3L,
>> 7615L, 6910L, 5035L, 3007L, 4L, 8335L, 2897L, 6377L, 3177L, 7946L,
>> 8705L, 9030L, 8682L, 8440L), c2 = c(0L, 15L, 16L, 14L, 17L, 3L,
>> 15L, 14L, 17L, 13L, 17L, 17L, 15L, 0L, 16L, 16L, 1L)), .Names = c("ds",
>> "c1", "c2"), row.names = c(NA, -17L), class = "data.frame")
>>
>> #Order by ds
>> dat <- dat[order(dat$ds), ]
>>
>> yourvar <- 0
>>
>> #Test that 5 consecutive values from c1 AND c2 meet requirements
>> if(any(
>> c(rollmean(dat$c1 >= 100, 5) + rollmean(dat$c2 >= 8, 5)) == 2)
>> ) {yourvar <- 1}
>>
>> ###################
>>
>> On Sat, Jul 17, 2010 at 2:38 PM, Michael Hess <mlhess at med.umich.edu>
>> wrote:
>>>
>>> Sorry for not being clear.
>>>
>>> In the dataset there are around 100 or so days of data (in the case also
>>> rows of data)
>>>
>>> I need to make sure that the person meets that c1 is at least 100 AND c2
>>> is at least 8 for 5 of 7 continuous days.
>>>
>>> I will play with what I have and see if I can find out how to do this.
>>>
>>> Thanks for the help!
>>>
>>> Michael
>>>
>>>>>> Stephan Kolassa 07/17/10 4:50 PM >>>
>>>
>>> Mike,
>>>
>>> I am slightly unclear on what you want to do. Do you want to check rows
>>> 1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all*
>>> rows you are looking at, and same for c2?
>>>
>>> You can sort your data like this:
>>> data <- data[order(data$ds),]
>>>
>>> Type ?order for help. But also do this for added enlightenment...:
>>>
>>> library(fortunes)
>>> fortune("dog")
>>>
>>> Next, your analysis on the sorted data frame. As I said, I am not
>>> entirely clear on what you are looking at, but the following may solve
>>> your problem with choices "1 to 7" and "any one" above.
>>>
>>> foo <- 0
>>> for ( ii in 1:(nrow(data)-8) ) {
>>> if (any(data$c1[ii+seq(0,6)]>=100) & any(data$c2[ii+seq(0,6)]>=8)) {
>>> foo <- 1
>>> break
>>> }
>>> }
>>>
>>> The variable "foo" should contain what you want it to. Look at ?any
>>> (and, if this does not do what you want it to, at ?all) for further info.
>>>
>>> No doubt this could be vectorized, but I think the loop is clear enough.
>>>
>>> Good luck!
>>> Stephan
>>>
>>>
>>>
>>> Michael Hess schrieb:
>>>>
>>>> Hello R users,
>>>>
>>>> I am a researcher at the University of Michigan looking for a solution
>>>> to an R problem. I have loaded my data in from a mysql database and it
>>>> looks like this
>>>>
>>>>> data
>>>>
>>>> ds c1 c2
>>>> 1 2010-04-03 100 0
>>>> 2 2010-04-30 11141 15
>>>> 3 2010-05-01 3 16
>>>> 4 2010-05-02 7615 14
>>>> 5 2010-05-03 6910 17
>>>> 6 2010-05-04 5035 3
>>>> 7 2010-05-05 3007 15
>>>> 8 2010-05-06 4 14
>>>> 9 2010-05-07 8335 17
>>>> 10 2010-05-08 2897 13
>>>> 11 2010-05-09 6377 17
>>>> 12 2010-05-10 3177 17
>>>> 13 2010-05-11 7946 15
>>>> 14 2010-05-12 8705 0
>>>> 15 2010-05-13 9030 16
>>>> 16 2010-05-14 8682 16
>>>> 17 2010-05-15 8440 15
>>>>
>>>>
>>>> What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at
>>>> least 100 AND c2 is at least 8. If it is not, start with check rows 2,8 and
>>>> if not there 3,9....until it loops over the entire file. If it finds a set
>>>> that matches, set a new variable equal to 1, if never finds a match, set it
>>>> equal to 0.
>>>>
>>>> I have done this in stata but on this project we are trying to use R.
>>>> Is this something that can be done in R, if so, could someone point me in
>>>> the correct direction.
>>>>
>>>> Thanks,
>>>>
>>>> Michael Hess
>>>> University of Michigan
>>>> Health System
>>>>
>>>> **********************************************************
>>>> Electronic Mail is not secure, may not be read every day, and should not
>>>> be used for urgent or sensitive issues
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> **********************************************************
>>> Electronic Mail is not secure, may not be read every day, and should not
>>> be used for urgent or sensitive issues
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list