[R] Sum data according to date in sequence
@vi@e@gross m@iii@g oii gm@ii@com
@vi@e@gross m@iii@g oii gm@ii@com
Sat Nov 4 18:47:53 CET 2023
There may be a point to consider about the field containing dates in the request below. Yes, much code will "work" just fine if the column are is seen as text as you can group by that too. The results will perhaps not be in the order by row that you expected but you can do your re-sorting perhaps even more efficiently after your summarise() either by converting the fewer remaining rows to a form of date or by transforming the text dates into an order of year/month/date that then sorts properly in forward or reverse order as needed.
Converting lots of rows to date is not a cheap process and grouping by that more complex date data structure may be harder. Heck, it may even make sense to use the text form of dates organized as a factor as the grouping becomes sort of pre-done.
The above comments are not saying any other solutions offered are wrong but simply discussing whether, especially for larger data sets, there are ways that could be more efficient.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rui Barradas
Sent: Saturday, November 4, 2023 12:56 PM
To: roslinazairimah zakaria <roslinaump using gmail.com>; jim holtman <jholtman using gmail.com>
Cc: r-help mailing list <r-help using r-project.org>
Subject: Re: [R] Sum data according to date in sequence
Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu:
> Hi all,
>
> This is the data:
>
>> dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
> "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
> "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
> "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
> "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
> "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
> "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
> "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
> 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
> 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
> 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
> 20L), class = "data.frame")
>
>
> I would like to sum EnergykW data by the date. E.g. all values for
> EnergykWh on 1/14/2016
>
>
> On Fri, Nov 3, 2023 at 8:10 AM jim holtman <jholtman using gmail.com> wrote:
>
>> How about send a 'dput' of some sample data. My guess is that your date
>> is 'character' and not 'Date'.
>>
>> Thanks
>>
>> Jim Holtman
>> *Data Munger Guru*
>>
>>
>> *What is the problem that you are trying to solve?Tell me what you want to
>> do, not how you want to do it.*
>>
>>
>> On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria <
>> roslinaump using gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> I have this set of data. I would like to sum the EnergykWh according date
>>> sequences.
>>>
>>>> head(dt1,20) StationName date time EnergykWh
>>> 1 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09 4.680496
>>> 2 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50 6.272414
>>> 3 PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22 1.032782
>>> 4 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 8:25 11.004884
>>> 5 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
>>> 6 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17 6.658797
>>> 7 PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46 4.808874
>>> 8 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19 1.469384
>>> 9 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12 2.996239
>>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12 0.303222
>>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22 4.988339
>>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16 8.131804
>>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19 0.117156
>>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24 3.285669
>>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 9:54 1.175608
>>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16 3.677487
>>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53 1.068393
>>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03 8.820755
>>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00 8.138583
>>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016 8:58 9.057500
>>>
>>> I have tried this:
>>> library(dplyr)
>>> sums <- dt1 %>%
>>> group_by(date) %>%
>>> summarise(EnergykWh = sum(EnergykWh))
>>>
>>> head(sums,20)
>>>
>>> The date is not by daily sequence but by year sequence.
>>>
>>>> head(sums,20)# A tibble: 20 × 2
>>> date EnergykWh
>>> <chr> <dbl> 1 1/1/2017 25.3 2 1/1/2018 61.0 3
>>> 1/1/2019 0.627 4 1/1/2020 10.7 5 1/10/2017 69.4 6
>>> 1/10/2018 54.5 7 1/10/2019 49.1 8 1/10/2020 45.9 9
>>> 1/11/2017 73.9 10 1/11/2018 53.3 11 1/11/2019 93.5 12
>>> 1/11/2020 66.7 13 1/12/2017 78.6 14 1/12/2018 42.2 15
>>> 1/12/2019 22.7 16 1/12/2020 80.9 17 1/13/2017 85.6 18
>>> 1/13/2018 46.4 19 1/13/2019 40.0 20 1/13/2020 121.
>>>
>>>
>>>
>>> Thank you very much for any help given.
>>>
>>>
>>> --
>>> *Roslinazairimah Zakaria*
>>> *Tel: +609-5492370; Fax. No.+609-5492766*
>>>
>>> *Email: roslinazairimah using ump.edu.my <roslinazairimah using ump.edu.my>;
>>> roslinaump using gmail.com <roslinaump using gmail.com>*
>>> Faculty of Industrial Sciences & Technology
>>> University Malaysia Pahang
>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
Hello,
Here are two solutions.
1. Base R
Though I don't coerce the date column to class "Date", it seems to work.
aggregate(EnergykWh ~ date, dt1, sum)
#> date EnergykWh
#> 1 1/14/2016 11.98569
#> 2 1/15/2016 32.56938
#> 3 1/16/2016 21.29181
#> 4 1/17/2016 22.88083
#> 5 1/18/2016 9.05750
2. Package dplyr.
First column date is coerced from class "character" to class "Date".
Then the grouped sums are computed.
suppressPackageStartupMessages(
library(dplyr)
)
dt1 %>%
mutate(date = as.Date(date, "%m/%d/%Y")) %>%
summarise(EnergykWh = sum(EnergykWh), .by = date)
#> date EnergykWh
#> 1 2016-01-14 11.98569
#> 2 2016-01-15 32.56938
#> 3 2016-01-16 21.29181
#> 4 2016-01-17 22.88083
#> 5 2016-01-18 9.05750
As you can see, the results are the same.
Also, this exact problem is one of the most asked on StackOverflow.
Maybe you could try searching there for a solution. My code above is
also exactly the code in [1], though I had already this answer written.
I only checked after :(.
[1]
https://stackoverflow.com/questions/61548758/r-how-sum-values-by-group-by-date
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list