[R] How to select max data according to week?
Eric Berger
er|cjberger @end|ng |rom gm@||@com
Wed Jun 19 16:08:33 CEST 2019
Hi Bert,
I agree that our interpretation is slightly different, which is why I wrote:
"If one can take the actual week of the year as an acceptable definition of
week, then here's my approach."
Regards,
Eric
On Wed, Jun 19, 2019 at 5:04 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
> Eric:
>
> I believe you're doing something different than I did. I broke up each
> month into biweekly periods, 2+ per month. You seem to be grouping the
> overall entire period into biweekly intervals -- apologies if I'm wrong,
> but if I understood correctly, that's not the same thing. I do not know
> which of us -- if either -- has interpreted her query correctly.
>
> Cheers,
> Bert
>
> On Wed, Jun 19, 2019 at 2:35 AM Eric Berger <ericjberger using gmail.com> wrote:
>
>> Hi Siti,
>> I didn't test Bert's code but I assume it's fine. :-)
>> I would take a different approach than Bert. I prefer to use a package
>> such as lubridate to handle the date wrangling, and a package such as dplyr
>> to handle the grouping and max extraction.
>> It may be overkill for this problem, but these are great packages to
>> become familiar with.
>> If one can take the actual week of the year as an acceptable definition
>> of week, then here's my approach.
>>
>> library(lubridate)
>> library(dplyr)
>>
>> # Step 1: start with Bert's code to create sample data
>> ## create some example data for 3 months in 2000
>> d<- 2e7 +c(113:131,201:228, 301:330) ## dates
>> conc <- runif(length(d)) # concentrations
>>
>> # Step 2: collect the data into a data frame
>> df <- data.frame( dt=d, conc=conc)
>>
>> # Step 3: use lubridate's ymd() function to parse the dates, its week()
>> function to identify the week of the year, and define the new column
>> 'wkpair' that groups the weeks 2-at-a-time
>> df2 <- dplyr::mutate( df,
>> wkpair=as.integer(floor(lubridate::week(lubridate::ymd(dt) )/2)) )
>>
>> # Step 4: group by the wkpair and use dplyr's summarise to get the info
>> you wanted
>> df3 <- dplyr::group_by(df2,wkpair) %>%
>> dplyr::summarise( from=min(dt), to=max(dt), maxconc=max(conc))
>> %>%
>> dplyr::select(from,to,maxconc)
>>
>> df3
>>
>> # A tibble: 6 x 3
>> from to maxconc
>> <dbl> <dbl> <dbl>
>> 1 20000113 20000121 0.963
>> 2 20000122 20000204 0.988
>> 3 20000205 20000218 0.939
>> 4 20000219 20000303 0.883
>> 5 20000304 20000317 0.863
>> 6 20000318 20000330 0.765
>>
>> HTH,
>> Eric
>>
>>
>>
>> On Tue, Jun 18, 2019 at 9:39 PM Bert Gunter <bgunter.4567 using gmail.com>
>> wrote:
>>
>>> My apologies. I negected to cc r-help. -- Bert
>>>
>>>
>>>
>>> On Tue, Jun 18, 2019 at 11:21 AM Bert Gunter <bgunter.4567 using gmail.com>
>>> wrote:
>>>
>>> >
>>> > I assume that 20000215 means year 2000, month 2, day 15.
>>> > I also assume that you want maxes for the first 2 weeks of a month, the
>>> > second 2 weeks, and any remaining days.
>>> > I also assume that this might be desired for arbitrary numbers of
>>> years,
>>> > months, and days.
>>> >
>>> > The following is one way to do this. As it's a slight pain to cut and
>>> > paste email data as text into R (use ?dput or R code to run to provide
>>> > example data instead), I just made up my own. You'll have to do the
>>> > following within a data frame through extraction or by using with() of
>>> > course.
>>> >
>>> > ## create some example data for 3 months in 2000
>>> > d<- 2e7 +c(113:131,201:228, 301:330) ## dates
>>> > conc <- runif(length(d)) # concentrations
>>> >
>>> > ## convert the date to character to extract year, month, and day
>>> > cdate <- as.character(d)
>>> > ## use substr to to the extraction
>>> > year <- substr(cdate,1,4)
>>> > mon <- substr(cdate,5,6)
>>> > day <- substr(cdate, 7,8)
>>> >
>>> > ## convert day to numeric and use cut() to group into the biweekly
>>> periods.
>>> > d14 <- cut(as.numeric(day), c(0,14.5,28.5, 32))
>>> >
>>> > ## Use tapply() to create your desired table of results.
>>> > tapply(conc, list(year, d14, mon), max, na.rm = TRUE)
>>> >
>>> > ## Results
>>> >
>>> > , , 01
>>> >
>>> > (0,14.5] (14.5,28.5] (28.5,32]
>>> > 2000 0.7357389 0.9655391 0.7962965
>>> >
>>> > , , 02
>>> >
>>> > (0,14.5] (14.5,28.5] (28.5,32]
>>> > 2000 0.8193979 0.9487207 NA
>>> >
>>> > , , 03
>>> >
>>> > (0,14.5] (14.5,28.5] (28.5,32]
>>> > 2000 0.9718919 0.9997093 0.168659
>>> >
>>> >
>>> > Cheers,
>>> > Bert
>>> >
>>> > Bert Gunter
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Jun 18, 2019 at 8:53 AM SITI AISYAH BINTI ZAKARIA <
>>> > aisyahzakaria using unimap.edu.my> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I'm Aisyah..I have a problem to run my R coding. I want to select
>>> maximum
>>> >> value according to week.
>>> >>
>>> >> here is my data
>>> >>
>>> >> Date O3_Conc
>>> >> 20000101 0.033
>>> >> 20000102 0.023
>>> >> 20000103 0.025
>>> >> 20000104 0.041
>>> >> 20000105 0.063
>>> >> 20000106 0.028
>>> >> 20000107 0.068
>>> >> 20000108 0.048
>>> >> 20000109 0.037
>>> >> 20000110 0.042
>>> >> 20000111 0.027
>>> >> 20000112 0.035
>>> >> 20000113 0.063
>>> >> 20000114 0.035
>>> >> 20000115 0.042
>>> >> 20000116 0.028
>>> >>
>>> >> I want to find the max value from column O3_Conc for only 14 days that
>>> >> refer to biweekly in month. And the next 14 days for the max value.
>>> >>
>>> >> I hope that I can get the result like this:
>>> >>
>>> >> Date Max O3_Conc
>>> >> 20000101 - 20000114 0.068
>>> >> 20000115 - 20000129 0.061
>>> >>
>>> >> I try many coding but still unavailable.
>>> >>
>>> >> this example my coding
>>> >>
>>> >> library(plyr)
>>> >> data.frame(CA0003)
>>> >>
>>> >> # format weeks as per requirement (replace "00" with "52" and
>>> >> adjust corresponding year)
>>> >> tmp <- list()
>>> >> tmp$y <- format(df$Date, format="%Y")
>>> >> tmp$w <- format(df$Date, format="%U")
>>> >> tmp$y[tmp$w=="00"] <-
>>> as.character(as.numeric(tmp$y[tmp$w=="00"]) -
>>> >> 14)
>>> >> tmp$w[tmp$w=="00"] <- "884"
>>> >> df$week <- paste(tmp$y, tmp$w, sep = "-")
>>> >>
>>> >> # get summary
>>> >> df2 <- ddply(df, .(week),transform, O3_Conc=max(O3_Conc))
>>> >>
>>> >> # include week ending date
>>> >> tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week
>>> ==x,
>>> >> "O3_Conc"])[[1]])
>>> >> df2$week.ending <- sapply(tmp$week.ending, max(O3_Conc, TRUE)
>>> >>
>>> >> output
>>> >> Site_Id Site_Location
>>> Date
>>> >> Year O3_Conc Month Day week
>>> >> 1 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000101
>>> >> 2000 0.033 1 1 NULL-NULL
>>> >> 2 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000102
>>> >> 2000 0.023 1 2 NULL-NULL
>>> >> 3 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000103
>>> >> 2000 0.025 1 3 NULL-NULL
>>> >> 4 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000104
>>> >> 2000 0.041 1 4 NULL-NULL
>>> >> 5 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000105
>>> >> 2000 0.063 1 5 NULL-NULL
>>> >> 6 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000106
>>> >> 2000 0.028 1 6 NULL-NULL
>>> >> 7 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000107
>>> >> 2000 0.068 1 7 NULL-NULL
>>> >> 8 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000108
>>> >> 2000 0.048 1 8 NULL-NULL
>>> >> 9 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000109
>>> >> 2000 0.037 1 9 NULL-NULL
>>> >> 10 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000110
>>> >> 2000 0.042 1 10 NULL-NULL
>>> >> 11 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>> 20000111
>>> >> 2000 0.027 1 11 NULL-NULL
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> This message has been scanned by E.F.A. Project and is believed to be
>>> >> clean.
>>> >>
>>> >> ______________________________________________
>>> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >>
>>> >
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
[[alternative HTML version deleted]]
More information about the R-help
mailing list