[R] complicated IF

jim holtman jholtman at gmail.com
Wed Jan 22 14:48:16 CET 2014


Here is the change to create a Sunday in a week if it does not exist.
I took out the Sunday (2009-11-01) for testing and you will notice
that week 201129 did not have a Sunday, so it has NaN as the result.

> x <- read.table(text = "       Date nrec
+
+ 1 2011-07-17  667
+
+ 2 2011-07-18  266
+
+ 3 2009-10-29   29
+
+ 4 2009-10-30  211
+
+ 5 2009-10-31  237", header = TRUE, as.is = TRUE)
> # convert to Date
> x$Date <- as.Date(x$Date)
> # add week of year
> x$week <- format(x$Date, "%Y%W")
> # add the day of week
> x$day <- format(x$Date, "%w")
> # process each week, substituting the mean if Sunday exists
> result <- do.call(rbind
+     , lapply(split(x, x$week), function(.week){
+         means <- mean(.week$nrec[.week$day %in% c('0', '5', '6')])
+         # check if Sunday exists; if not, create it
+         if (!any(.week$day == '0')){
+             # create a new entry for Sunday
+             .week <- rbind(.week[1, ], .week)  # new entry in row 1
+             # convert date to Sunday by backing off the days of the week
+             .week$Date[1L] <- .week$Date[1L] - as.numeric(.week$day[1L]) + 7
+             .week$day[1L] <- '0'  # make it a Sunday
+         }
+         .week$nrec[.week$day == '0'] <- means
+         .week
+         })
+     )
>
>
> result
                Date nrec   week day
200943.3  2009-11-01  224 200943   0  #<<< added
200943.31 2009-10-29   29 200943   4
200943.4  2009-10-30  211 200943   5
200943.5  2009-10-31  237 200943   6
201128    2011-07-17  667 201128   0
201129.2  2011-07-24  NaN 201129   0  # no other days to average
201129.21 2011-07-18  266 201129   1
> x
        Date nrec   week day
1 2011-07-17  667 201128   0
2 2011-07-18  266 201129   1
3 2009-10-29   29 200943   4
4 2009-10-30  211 200943   5
5 2009-10-31  237 200943   6
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Jan 22, 2014 at 8:25 AM, Bill <william108 at gmail.com> wrote:
> Hello Jim,
>
> Thanks for this. I will study it. One thing, you wrote "# process each week,
> substituting the mean if Sunday exists". Even if Sunday's data does not
> exist, I need an entry for Sunday if Friday or Saturday (or both) exist. I
> don't yet understand what you wrote so I am not sure if that is the case.
> Bill
>
>
> On Wed, Jan 22, 2014 at 10:04 PM, jim holtman <jholtman at gmail.com> wrote:
>>
>> Here's one way of doing it.  Does not use "complicated" IFs; just
>> splits the data and works on it.
>>
>> > x <- read.table(text = "       Date nrec
>> +
>> + 1 2011-07-17  667
>> +
>> + 2 2011-07-18  266
>> +
>> + 3 2009-10-29   29
>> +
>> + 4 2009-10-30  211
>> +
>> + 5 2009-10-31  237
>> +
>> + 6 2009-11-01  898", header = TRUE, as.is = TRUE)
>> > # convert to Date
>> > x$Date <- as.Date(x$Date)
>> > # add week of year
>> > x$week <- format(x$Date, "%Y%W")
>> > # add the day of week
>> > x$day <- format(x$Date, "%w")
>> > # process each week, substituting the mean if Sunday exists
>> > result <- do.call(rbind
>> +     , lapply(split(x, x$week), function(.week){
>> +         means <- mean(.week$nrec[.week$day %in% c('0', '5', '6')])
>> +         .week$nrec[.week$day == '0'] <- means
>> +         .week
>> +         })
>> +     )
>> >
>> >
>> > result
>>                Date     nrec   week day
>> 200943.3 2009-10-29  29.0000 200943   4
>> 200943.4 2009-10-30 211.0000 200943   5
>> 200943.5 2009-10-31 237.0000 200943   6
>> 200943.6 2009-11-01 448.6667 200943   0
>> 201128   2011-07-17 667.0000 201128   0
>> 201129   2011-07-18 266.0000 201129   1
>> > x
>>         Date nrec   week day
>> 1 2011-07-17  667 201128   0
>> 2 2011-07-18  266 201129   1
>> 3 2009-10-29   29 200943   4
>> 4 2009-10-30  211 200943   5
>> 5 2009-10-31  237 200943   6
>> 6 2009-11-01  898 200943   0
>> >
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> On Wed, Jan 22, 2014 at 6:33 AM, Bill <william108 at gmail.com> wrote:
>> > Hello. I am trying  to work out some complicated if() logic.
>> > I thought of using which() and if() but cannot get it.
>> >
>> > I have a dataframe that looks like this:
>> >
>> > head(deleteFridayTest)
>> >
>> >        Date nrec
>> >
>> > 1 2011-07-17  667
>> >
>> > 2 2011-07-18  266
>> >
>> > 3 2009-10-29   29
>> >
>> > 4 2009-10-30  211
>> >
>> > 5 2009-10-31  237
>> >
>> > 6 2009-11-01  898
>> >
>> > I want to take the values in nrec for consecutive Friday, Saturday and
>> > Sundays and average them and replace Sundays value with that average.
>> >
>> > I came up with this:
>> >
>> > deleteFridayTest[dayOfWeek(deleteFridayTest$Date)=="Sun",]$nrec <-
>> > (deleteFridayTest[dayOfWeek(deleteFridayTest$Date)=="Sun",]$nrec +
>> > deleteFridayTest[dayOfWeek(deleteFridayTest$Date)=="Sat",]$nrec +
>> > deleteFridayTest[dayOfWeek(deleteFridayTest$Date)=="Fri",]$nrec)/3
>> >
>> > but this won't work for my data because sometimes one or more of the
>> > days
>> > of data may be missing. For example Friday's data could be missing, or
>> > Friday and Saturday, or Sunday may be missing, or they all may be
>> > missing,
>> > etc.
>> >
>> > The rule I want to implement is that
>> > if any of Friday, Saturday, or Sunday is available then I want to have
>> > an
>> > entry for Sunday (call it 'X'). If all 3 days are missing then nothing
>> > should be done and there will be no entry for X. If any of the days Fri,
>> > Sat, Sun are available then X should be the "average" of those values
>> > (e.g.
>> > if two days are available then sum and divide by 2, if just one day is
>> > available then just use that value for X).
>> >
>> > Can anyone suggest how to go about this?
>> >  Thank you.
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list