[R] Query about calculating the monthly average of daily data columns

jim holtman jho|tm@n @end|ng |rom gm@||@com
Sun Oct 20 18:17:43 CEST 2019


Does this do what you want:

> library(tidyverse)

> input <- read_delim("PERMNO DATE Spread
+ 111 19940103 0.025464308
+ 111 19940104 0.064424296
+ 111 19940105 0.018579337
+ 111 19940106 0.018872211
 ..." ... [TRUNCATED]

> # drop last two digits to get the month
> monthly <- input %>%
+   group_by(PERMNO, month = DATE %/% 100) %>%
+   summarise(avg = mean(Spread))
> monthly
# A tibble: 12 x 3
# Groups:   PERMNO [3]
   PERMNO  month      avg
    <dbl>  <dbl>    <dbl>
 1    111 199401 0.0416
 2    111 199402 0.0508
 3    111 199403 0.0567
 4    111 199404 0.0466
 5    112 199401 0.000533
 6    112 199402 0.000593
 7    112 199403 0.000471
 8    112 199404 0.000587
 9    113 199401 0.000692
10    113 199402 0.000591
11    113 199403 0.000677
12    113 199404 0.000555
>


Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Sun, Oct 20, 2019 at 5:10 AM Subhamitra Patra <subhamitra.patra using gmail.com>
wrote:

> Dear Sir,
>
> Thank you very much for your suggestions.
>
> Due to certain inconveniences, I was unable to work on your suggestions.
>
> Today I worked on both suggestions and got the result that I really wanted
> that monthly averages for each country.
>
> Here, I am asking one more query (just for learning purpose) that if my
> country name and its respective variable is in the panel format, and I want
> to take the monthly average for each country, how the code will be
> arranged. For your convenience, I am providing a small data sample below.
>
> PERMNO DATE Spread
> 111 19940103 0.025464308
> 111 19940104 0.064424296
> 111 19940105 0.018579337
> 111 19940106 0.018872211
> 111 19940107 0.065279782
> 111 19940110 0.063485905
> 111 19940111 0.018355453
> 111 19940112 0.064135683
> 111 19940113 0.063519987
> 111 19940114 0.018277351
> 111 19940117 0.018628417
> 111 19940118 0.065630229
> 111 19940119 0.018713152
> 111 19940120 0.019119037
> 111 19940121 0.068342043
> 111 19940124 0.020843244
> 111 19940125 0.019954211
> 111 19940126 0.018980321
> 111 19940127 0.066827165
> 111 19940128 0.067459235
> 111 19940131 0.068682559
> 111 19940201 0.02081465
> 111 19940202 0.068236091
> 111 19940203 0.068821406
> 111 19940204 0.020075648
> 111 19940207 0.066070584
> 111 19940208 0.066068837
> 111 19940209 0.019077072
> 111 19940210 0.065894875
> 111 19940211 0.018847478
> 111 19940214 0.065040844
> 111 19940215 0.01880332
> 111 19940216 0.018836199
> 111 19940217 0.066888865
> 111 19940218 0.067116793
> 111 19940221 0.068809742
> 111 19940222 0.068230213
> 111 19940223 0.069502855
> 111 19940224 0.070383523
> 111 19940225 0.020430811
> 111 19940228 0.067087257
> 111 19940301 0.066776479
> 111 19940302 0.019959031
> 111 19940303 0.066596469
> 111 19940304 0.019131334
> 111 19940307 0.019312528
> 111 19940308 0.067349909
> 111 19940309 0.068916431
> 111 19940310 0.068620043
> 111 19940311 0.070494844
> 111 19940314 0.071056842
> 111 19940315 0.071042517
> 111 19940316 0.072401771
> 111 19940317 0.071940001
> 111 19940318 0.07352884
> 111 19940321 0.072671688
> 111 19940322 0.072652595
> 111 19940323 0.021352138
> 111 19940324 0.069933727
> 111 19940325 0.068717467
> 111 19940328 0.020470748
> 111 19940329 0.020003748
> 111 19940330 0.065833717
> 111 19940331 0.065268388
> 111 19940401 0.018762356
> 111 19940404 0.064914179
> 111 19940405 0.064706743
> 111 19940406 0.018764175
> 111 19940407 0.06524806
> 111 19940408 0.018593449
> 111 19940411 0.064913949
> 111 19940412 0.01872089
> 111 19940413 0.018729328
> 111 19940414 0.018978773
> 111 19940415 0.065477137
> 111 19940418 0.064614365
> 111 19940419 0.064184148
> 111 19940420 0.018553192
> 111 19940421 0.066872771
> 111 19940422 0.06680782
> 111 19940425 0.067467961
> 111 19940426 0.02014297
> 111 19940427 0.062464016
> 111 19940428 0.062357052
> 112 19940429 0.000233993
> 112 19940103 0.000815264
> 112 19940104 0.000238165
> 112 19940105 0.000813632
> 112 19940106 0.000236915
> 112 19940107 0.000809102
> 112 19940110 0.000801642
> 112 19940111 0.000797932
> 112 19940112 0.000795251
> 112 19940113 0.000795186
> 112 19940114 0.000231359
> 112 19940117 0.000232134
> 112 19940118 0.000233718
> 112 19940119 0.000233993
> 112 19940120 0.000234694
> 112 19940121 0.000235753
> 112 19940124 0.000808653
> 112 19940125 0.000235604
> 112 19940126 0.000805068
> 112 19940127 0.000802337
> 112 19940128 0.000801768
> 112 19940131 0.000233517
> 112 19940201 0.000797431
> 112 19940202 0.000233338
> 112 19940203 0.000233826
> 112 19940204 0.000799519
> 112 19940207 0.000798105
> 112 19940208 0.000792245
> 112 19940209 0.000231113
> 112 19940210 0.000233413
> 112 19940211 0.000798168
> 112 19940214 0.000233282
> 112 19940215 0.000797848
> 112 19940216 0.000785165
> 112 19940217 0.000228426
> 112 19940218 0.000786783
> 112 19940221 0.00078343
> 112 19940222 0.000781459
> 112 19940223 0.000776264
> 112 19940224 0.000226399
> 112 19940225 0.000779066
> 112 19940228 0.000773603
> 112 19940301 0.000226487
> 112 19940302 0.000775233
> 112 19940303 0.000227017
> 112 19940304 0.000227854
> 112 19940307 0.000782814
> 112 19940308 0.000229164
> 112 19940309 0.000787033
> 112 19940310 0.000784049
> 112 19940311 0.000228984
> 112 19940314 0.00078697
> 112 19940315 0.000782567
> 112 19940316 0.000228516
> 112 19940317 0.000786347
> 112 19940318 0.000229236
> 112 19940321 0.000230107
> 112 19940322 0.000792689
> 112 19940323 0.000787284
> 112 19940324 0.000787221
> 112 19940325 0.000227978
> 112 19940328 0.000228713
> 112 19940329 0.000228894
> 112 19940330 0.000229255
> 112 19940331 0.000231003
> 112 19940401 0.000796567
> 112 19940404 0.000790668
> 112 19940405 0.00078195
> 112 19940406 0.000780475
> 112 19940407 0.000228355
> 112 19940408 0.000781723
> 112 19940411 0.000775741
> 112 19940412 0.000226647
> 112 19940413 0.000778876
> 112 19940414 0.000777336
> 112 19940415 0.000775253
> 112 19940418 0.000226362
> 112 19940419 0.000779554
> 112 19940420 0.000774824
> 112 19940421 0.000225582
> 112 19940422 0.000225724
> 112 19940425 0.000773361
> 112 19940426 0.0002256
> 112 19940427 0.000776416
> 113 19940428 0.000280542
> 113 19940429 0.000964148
> 113 19940103 0.000962654
> 113 19940104 0.000281768
> 113 19940105 0.000962219
> 113 19940106 0.000961965
> 113 19940107 0.000958602
> 113 19940110 0.000280056
> 113 19940111 0.000956348
> 113 19940112 0.000952171
> 113 19940113 0.000948176
> 113 19940114 0.000275607
> 113 19940117 0.000275773
> 113 19940118 0.000276738
> 113 19940119 0.000947068
> 113 19940120 0.000940959
> 113 19940121 0.000275224
> 113 19940124 0.000948489
> 113 19940125 0.000940076
> 113 19940126 0.0009309
> 113 19940127 0.000269955
> 113 19940128 0.000270328
> 113 19940131 0.000924234
> 113 19940201 0.000924038
> 113 19940202 0.000269088
> 113 19940203 0.000270247
> 113 19940204 0.000270562
> 113 19940207 0.00092656
> 113 19940208 0.000921819
> 113 19940209 0.000920361
> 113 19940210 0.000268958
> 113 19940211 0.000924758
> 113 19940214 0.000266768
> 113 19940215 0.000911325
> 113 19940216 0.000909294
> 113 19940217 0.000905887
> 113 19940218 0.000262919
> 113 19940221 0.000262978
> 113 19940222 0.000263189
> 113 19940223 0.000904439
> 113 19940224 0.000263512
> 113 19940225 0.000906184
> 113 19940228 0.000265198
> 113 19940301 0.000906126
> 113 19940302 0.000264357
> 113 19940303 0.000265392
> 113 19940304 0.000912495
> 113 19940307 0.000910641
> 113 19940308 0.000266143
> 113 19940309 0.000910113
> 113 19940310 0.000909277
> 113 19940311 0.000905056
> 113 19940314 0.00090285
> 113 19940315 0.000898831
> 113 19940316 0.000896118
> 113 19940317 0.000261294
> 113 19940318 0.000892563
> 113 19940321 0.000890852
> 113 19940322 0.00088639
> 113 19940323 0.000258509
> 113 19940324 0.000260286
> 113 19940325 0.000889354
> 113 19940328 0.000888373
> 113 19940329 0.000885049
> 113 19940330 0.000259116
> 113 19940331 0.000259474
> 113 19940401 0.000260316
> 113 19940404 0.000897493
> 113 19940405 0.000894592
> 113 19940406 0.000260435
> 113 19940407 0.000260989
> 113 19940408 0.000262061
> 113 19940411 0.000262262
> 113 19940412 0.000263604
> 113 19940413 0.000908682
> 113 19940414 0.000265348
> 113 19940415 0.000265637
> 113 19940418 0.00026862
> 113 19940419 0.000918882
> 113 19940420 0.000909904
> 113 19940421 0.000901725
> 113 19940422 0.000900062
> 113 19940425 0.000893547
> 113 19940426 0.000260899
> Here, the 1st column is the name of the countries panel which I identified
> in number, 2nd is the date column arranged in yyyy-mm-dd format, and 3rd is
> the daily variable for each country for 4 months of 1994. I need to take
> the monthly average of spread variable for each country (i.e. noted as 111,
> 112, and 113) in the above example. In short, my monthly spread variable
> should be sorted on the basis of both PERMNO (i.e. country identifier), and
> months of the particular year.
>
> Please educate me that in this data format, how the average code can be
> written?
>
> Thank you very much.
>
>
>
> [image: Mailtrack]
> <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> >
> Sender
> notified by
> Mailtrack
> <
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> >
> 10/20/19,
> 05:31:23 PM
>
> On Mon, Sep 16, 2019 at 4:31 PM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>
> > Original email did not come through (some problems with formating).
> >
> > Hi
> >
> > No, on contrary. I **am** suggesting to change date column to real date
> > asi it is easy to handle with appropriate functions.
> >
> > Here are some fake data
> >
> > > str(spdat)
> > 'data.frame':   260 obs. of  3 variables:
> > $ dates   : Date, format: "1995-01-01" "1995-01-02" "1995-01-03"
> > "1995-01-04" ...
> > $ coutryA : num  0.188 0.405 -0.107 -0.596 -0.529 ...
> > $ countryB: num  9.4 10.76 11.24 8.26 10.71 ..
> >
> > > head(spdat)
> >        dates    coutryA  countryB
> > 1 1995-01-01  0.1875060  9.402851
> > 2 1995-01-02  0.4045193 10.755112
> > 3 1995-01-03 -0.1073904 11.243663
> > 4 1995-01-04 -0.5959683  8.256424
> > 5 1995-01-05 -0.5293772 10.705431
> > 6 1995-01-06 -0.2228029 10.171461
> >
> > First I melt it
> > spdat.m <- melt(spdat, id.var="dates")
> >
> > > head(spdat.m)
> >        dates variable      value
> > 1 1995-01-01  coutryA  0.1875060
> > 2 1995-01-02  coutryA  0.4045193
> > 3 1995-01-03  coutryA -0.1073904
> > 4 1995-01-04  coutryA -0.5959683
> > 5 1995-01-05  coutryA -0.5293772
> > 6 1995-01-06  coutryA -0.2228029
> >
> > I do aggregation
> >
> > > spdat.ag <- aggregate(spdat.m$value, list(format(spdat.m$dates,
> > "%m.%Y"), spdat.m$variable), mean)
> >
> > And now I use dcast  to get required result.
> >
> > > dcast(spdat.ag, Group.1~Group.2)
> > Using x as value column: use value.var to override.
> >    Group.1      coutryA  countryB
> > 1  01.1995  0.098688137 10.177696
> > 2  02.1995  0.352264682  9.609261
> > 3  03.1995  0.155521876 10.043503
> > 4  04.1995 -0.166092393 10.129844
> > 5  05.1995  0.164665188 10.308275
> > 6  06.1995  0.260633585 10.210129
> > 7  07.1995  0.003671979 10.549016
> > 8  08.1995  0.045295990 10.087435
> > 9  09.1995 -0.145488206  9.689876
> > 10 10.1995 -0.225645950  9.743744
> > 11 11.1995  0.030273383 10.025435
> > 12 12.1995  0.043557468 10.105626
> >
> > Cheers
> > Petr
> >
> > Here are the data.
> >
> > > dput(spdat)
> > spdat  <- structure(list(dates = structure(c(9131, 9132, 9133, 9134,
> 9135,
> > 9136, 9137, 9138, 9139, 9140, 9141, 9142, 9143, 9144, 9145, 9146,
> > 9147, 9148, 9149, 9150, 9151, 9152, 9162, 9163, 9164, 9165, 9166,
> > 9167, 9168, 9169, 9170, 9171, 9172, 9173, 9174, 9175, 9176, 9177,
> > 9178, 9179, 9180, 9181, 9190, 9191, 9192, 9193, 9194, 9195, 9196,
> > 9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207,
> > 9208, 9209, 9210, 9211, 9212, 9221, 9222, 9223, 9224, 9225, 9226,
> > 9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237,
> > 9238, 9239, 9240, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258,
> > 9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269,
> > 9270, 9271, 9272, 9273, 9282, 9283, 9284, 9285, 9286, 9287, 9288,
> > 9289, 9290, 9291, 9292, 9293, 9294, 9295, 9296, 9297, 9298, 9299,
> > 9300, 9301, 9302, 9303, 9312, 9313, 9314, 9315, 9316, 9317, 9318,
> > 9319, 9320, 9321, 9322, 9323, 9324, 9325, 9326, 9327, 9328, 9329,
> > 9330, 9331, 9332, 9343, 9344, 9345, 9346, 9347, 9348, 9349, 9350,
> > 9351, 9352, 9353, 9354, 9355, 9356, 9357, 9358, 9359, 9360, 9361,
> > 9362, 9363, 9364, 9365, 9374, 9375, 9376, 9377, 9378, 9379, 9380,
> > 9381, 9382, 9383, 9384, 9385, 9386, 9387, 9388, 9389, 9390, 9391,
> > 9392, 9393, 9394, 9404, 9405, 9406, 9407, 9408, 9409, 9410, 9411,
> > 9412, 9413, 9414, 9415, 9416, 9417, 9418, 9419, 9420, 9421, 9422,
> > 9423, 9424, 9456, 9435, 9436, 9437, 9438, 9439, 9440, 9441, 9442,
> > 9443, 9444, 9445, 9446, 9447, 9448, 9449, 9450, 9451, 9452, 9453,
> > 9454, 9455, 9486, 9465, 9466, 9467, 9468, 9469, 9470, 9471, 9472,
> > 9473, 9474, 9475, 9476, 9477, 9478, 9479, 9480, 9481, 9482, 9483,
> > 9484, 9151), class = "Date"), coutryA = c(0.187506004416315,
> > 0.404519257417805, -0.107390371811605, -0.595968278805544,
> > -0.529377240936012,
> > -0.222802921207767, 0.413182392872818, 0.689673026532298,
> > -1.2768723266992,
> > -0.506308625809406, 0.113859233745174, -0.0963423819877653,
> > 0.323987304768398,
> > 1.63846917270538, 0.893233423250338, 0.297732439150487,
> 0.949323101836486,
> > -0.599518074708052, 0.366372319197032, -2.25734971953878,
> > -0.190971733204918,
> > -0.0874143568874351, 1.46699645184047, 0.00702170238687361,
> > 0.11221346278474,
> > -0.8060359607624, 0.340842350476532, 0.798838328074708,
> 0.449214745851041,
> > -0.664972890558734, 0.521830282184173, -1.35020467264521,
> > -0.95240631225826,
> > 1.25607320999849, 1.57018988549379, 0.99477900888445, -0.936218787378207,
> > 1.48489932847779, 0.529222943794807, 0.0995675049147771,
> > 0.477770516727839,
> > 1.64567253670186, -0.0212651530684566, 0.558952796713992,
> > 0.0409979382929057,
> > 0.428675380654606, 0.0919422583362682, -0.819694497340459,
> > 1.23998830450888,
> > 0.607498144489643, -1.27724580163097, 1.41634774644371,
> > -0.579094515769707,
> > 2.02039606694223, 0.0740478208705996, -1.69826944583929,
> > -0.321482399813063,
> > -0.489198601400924, 2.0066750686302, -1.90624857752322, 0.46762405849973,
> > 1.31264724137396, -0.0473627194710677, 0.141362267796145,
> > 0.329709761206515,
> > 0.518454586458572, -1.39489985851779, -0.388303591187678,
> > -0.668922704543522,
> > 0.0735115674875065, 1.30737242978235, 0.198503397980751,
> > 0.257831448122427,
> > -1.31173539205588, -1.45147941969116, 0.359725782295977,
> > 0.612882118056585,
> > -0.0733768753346202, -0.508349204402508, 1.35776663767231,
> > 0.997807735669086,
> > -1.41717534266382, -0.894170593324238, -0.68578120845151,
> > -0.211509378018794,
> > 0.436738904337909, -1.46932152770435, 0.0817388759874159,
> > -0.0389350881653141,
> > 0.709198476466861, -0.963669144724435, -0.548607422521798,
> > -0.896886885575286,
> > 0.322231150840934, 1.37327611339939, 0.0310213133870952,
> > 0.796577750757324,
> > -0.2010067423637, -0.241723752424226, 1.37547329580654,
> -1.15382202538982,
> > 0.101454200596915, 0.273663839664217, 1.8315140887841, 1.86096518756473,
> > -0.536393730924719, -0.45845011727266, 1.10226256157127,
> > -0.385596991265563,
> > 3.20218061566932, -1.25865250042183, -0.13613128784276,
> 0.483329357746514,
> > -0.597187329618306, 0.710977603908319, -1.07945708269043,
> > -0.477626236401394,
> > 1.51034914684104, 2.35886426985999, -0.0250526828683629,
> > -0.29439443478131,
> > 0.665774016744828, 0.464027472251246, 0.226658374792016,
> > -0.802597030454373,
> > 0.825517059805602, -1.11293193130819, -1.27677400513873,
> 1.60776237113347,
> > 1.12490009531342, 0.95767047134623, 0.0475745549797055,
> > -0.0591587460876868,
> > -0.690617365240342, -1.62111622938126, -1.3545210707469, 1.8607927043106,
> > 0.764367674339969, 1.49261525602638, 0.549570728337346,
> -1.29658399741794,
> > -1.6289903797869, 0.00573336252135834, 0.0300702149640632,
> > 0.440810830115721,
> > 0.663568666361326, -0.126685900835146, -0.00221628368438927,
> > 0.815321995886579, -0.499280888368945, -0.271814047751667,
> > -0.071025546459042,
> > 1.73165491816826, -0.0294770299043331, 0.833605607221529,
> > -0.670108794857159,
> > -0.303323318026829, 1.29039844459134, -0.818806702120603,
> > -0.445515595649677,
> > -0.0128796557666887, 0.320923705586147, 0.230597275812536,
> > -1.54009153212366,
> > -0.294702981688559, 0.581209734391958, 0.121384768986639,
> > 0.502914098451111,
> > -1.59018268505718, -0.635101104166451, 1.48005776676403,
> > -0.25631761189957,
> > 0.171947814411552, 0.444646195980014, 0.172655758440111,
> > -0.00432159794094836,
> > -0.549321974240026, 0.585055026451421, -1.22813371480849,
> > 0.846807540195381,
> > 0.319629441352597, 0.393525732059709, -1.40275675444594,
> 1.11062585584811,
> > 0.214809571213853, -0.636432711800391, -0.283087127251573,
> > -1.46385553207618,
> > 0.436928676930225, -1.34231945433777, 0.451281957595763,
> > -0.523155001924496,
> > -2.69416779107545, 1.5513477373689, 0.989632029400036, 1.34636075948993,
> > 0.346147428691405, -0.464527560160041, 0.337233933370495,
> > 1.11331396366389,
> > -1.00060600083316, -0.734784444487169, 1.40476315358621,
> 1.01671092179193,
> > -0.0144306250829694, -0.923555930346906, -1.02275966525015,
> > 0.619422010219383,
> > 0.603484309754755, -0.774553813657576, 0.0932792545556387,
> > -0.651884521428279,
> > -0.61965612647073, -1.22104834441579, -1.31439612639271,
> > -2.87707752518163,
> > -0.0343801084491906, -0.640678302378492, -1.38653452986558,
> > 0.884963139028743,
> > -0.657454283462004, 0.462842665244993, -0.20881674837534,
> 0.6345884135548,
> > 0.707165108434729, -0.162090928425892, -0.998662309785188,
> > 1.3130254639318,
> > 0.191890764940071, -0.0493619237876962, -0.55183232511689,
> > 0.470263932874487,
> > -0.217088645692971, 0.231550037620628, -0.530406537266415,
> > -0.616522469083808,
> > 0.329347084038265, 1.49420692610475, 1.91750823142859, 0.753536143872474,
> > 0.766584887163714, -0.259803384094296, -0.402463714097741,
> > -0.0229799209735185,
> > -0.259677990559218, -1.41529707261105, 0.191362852138627,
> > 1.54483266684747,
> > -1.17947655378489, -0.426265411073274, 0.723010460481118,
> > 1.37405142869537,
> > -0.374771207936141, 0.0513905365832423, -0.369432731236118,
> > -0.945441984794364,
> > 0.179506648255554, 0.31971255725438, -1.25117960937319, 2.46228549580083
> > ), countryB = c(9.4028512714591, 10.7551115504652, 11.2436629228434,
> > 8.25642360904389, 10.7054313972395, 10.1714609666091, 10.3726975056206,
> > 10.6594299429162, 8.56250595443296, 10.5612153841067, 8.07612112826519,
> > 9.94704207511951, 11.392407273156, 10.4257690445268, 10.6339442533038,
> > 10.5314883825356, 10.3506665399033, 10.2426403362978, 10.8437715647564,
> > 10.8247200587034, 11.2449815064171, 9.2898697883112, 9.05418978124619,
> > 10.6080277672463, 9.19882175737148, 11.3589722806948, 10.4139334238189,
> > 9.44305216810892, 9.58426470056472, 8.84208362003176, 10.8125431356391,
> > 7.71357872650814, 8.73526671289828, 10.714693958224, 9.49976972594189,
> > 9.41896864941478, 7.33073349261249, 10.5268398982262, 9.92255671125184,
> > 10.5665378092202, 10.5035704895405, 7.93682068228084, 10.882421050424,
> > 9.3237610577468, 8.42128120513304, 9.46103753451174, 10.3472215515392,
> > 11.0483414005193, 10.3421689244599, 7.85120280889754, 11.6327644046254,
> > 9.57620745972098, 10.6553844719749, 10.8490250129346, 10.2742492933876,
> > 9.55428072119304, 9.0976351049804, 10.0731951766966, 10.6956488509516,
> > 11.1530744146062, 10.3496303024767, 9.52734980693306, 9.64478424894734,
> > 9.28301632295047, 10.9568790570688, 11.6052870914912, 9.58530202776537,
> > 11.1338134902295, 8.66685735149472, 11.0230863576875, 10.8000609212302,
> > 10.6510296259782, 11.831292326569, 9.53836122448974, 9.55805411377422,
> > 9.90336204189518, 9.36377040999133, 11.7041009433341, 9.95628297574831,
> > 10.718111342931, 10.4562688422472, 8.85976383099186, 8.94085496683824,
> > 8.19538394018434, 10.1058448260449, 9.70821289789561, 9.08757962570738,
> > 10.6599997541876, 10.0521137258219, 9.9924295699559, 11.8730269098299,
> > 11.2634470795907, 11.3854762443416, 9.56742053529845, 10.4101561978503,
> > 9.53376547865009, 9.75410966995361, 9.92804558924886, 8.36231430067066,
> > 10.7486459346681, 12.0143881312685, 11.0083060332839, 9.32820954213586,
> > 10.8420346742049, 9.73064414798223, 10.7593902723319, 10.976622155215,
> > 10.1039774975157, 8.36317871802524, 9.21809894958653, 10.1015362220683,
> > 11.4655736295123, 9.65528297274543, 9.67844310028008, 10.1516820910267,
> > 8.38764450852642, 10.163558398201, 11.1432463477237, 12.0509818193223,
> > 10.9896913965091, 11.1772406550953, 9.14396687337779, 9.93338627749979,
> > 10.9548864433126, 8.64911301751956, 11.706463972364, 11.1012846649741,
> > 8.7805267197408, 11.5802098773954, 10.2268513542863, 10.3509617168731,
> > 9.09646558899397, 11.2706666647314, 11.3984335011704, 11.4808985388742,
> > 10.5608771133999, 10.3684356806175, 10.4815588822618, 10.5818867877558,
> > 12.2561035284691, 8.6464271477849, 10.3412351841865, 10.7577574534162,
> > 11.1124067479261, 9.91627943243343, 10.6356898895291, 10.2107566441478,
> > 10.0672734202575, 10.2385787014999, 11.7112606160069, 10.0453801263575,
> > 8.84654136100724, 10.2173421609193, 9.27919801705716, 10.4755578829547,
> > 7.69340209082122, 9.24705253848083, 10.8415406794597, 8.69603117680965,
> > 11.2589214416702, 10.5425642239737, 10.1389355042458, 9.17267675180435,
> > 12.3052338002213, 10.0181674985326, 12.2715476751051, 9.64516268052739,
> > 10.6305299379912, 10.1829347684655, 9.97983942366781, 10.2559847744715,
> > 10.3092266661814, 9.75215330673072, 10.250464278709, 9.31118800061454,
> > 10.3310666767171, 9.09703848990093, 10.241195105962, 8.57290406448477,
> > 8.98090855172704, 8.64653101832229, 12.6791587435376, 9.56000538681993,
> > 10.4062255533723, 11.067091476284, 10.5255014737268, 10.2240941949978,
> > 9.13081571869084, 9.5942352120783, 9.2753466212409, 10.2789293993548,
> > 8.10255065585342, 9.48751297655077, 8.51198576785003, 9.46310532206947,
> > 9.86727270762806, 11.5149248124739, 9.31557156735022, 9.34351230206303,
> > 10.022139448869, 11.4111350893792, 8.57891783464065, 10.3761090924661,
> > 9.38300408584683, 9.33694577526158, 9.25815555686085, 9.29856853889735,
> > 8.4250073823245, 8.83022950824832, 9.1510846172981, 10.2553042376765,
> > 10.0739540955956, 9.04955917463259, 10.8927827168631, 9.44611041690694,
> > 10.7883395708593, 10.6010088332078, 7.72560864006592, 10.1760839916637,
> > 11.5576569894392, 11.384809257294, 8.73504353987083, 9.00585942714512,
> > 9.62327893504013, 10.3527072699866, 10.5220100705827, 8.74921668696853,
> > 8.56415116683662, 12.1348451793815, 10.9496674323819, 9.64443817181322,
> > 9.52977454697087, 10.4281877186725, 8.52701721410292, 11.6911584965782,
> > 10.2300108250139, 8.65368821276485, 11.7733431942379, 10.2060233777681,
> > 9.57291673029552, 9.82687667895106, 10.5939736188493, 11.2510605726337,
> > 10.3383384488323, 9.92301237292945, 10.0164623230529, 10.4939857044034,
> > 10.5631769648289, 10.935731043532, 11.0659359187168, 8.51697010486427,
> > 9.79512310587405, 9.35132038807071, 11.3286703149903, 10.4621597293933,
> > 10.4099459919071, 8.86246315190942, 9.30054044639769, 9.40346575227191,
> > 9.59278722974697)), row.names = c(NA, -260L), class = "data.frame")
> >
> >
> >
> >
> >
> > From: Subhamitra Patra <mailto:subhamitra.patra using gmail.com>
> > Sent: Friday, September 13, 2019 3:59 PM
> > To: PIKAL Petr <mailto:petr.pikal using precheza.cz>; r-help mailing list
> > <mailto:r-help using r-project.org>
> > Subject: Re: [R] Query about calculating the monthly average of daily
> data
> > columns
> >
> > Dear PIKAL,
> >
> > Thank you very much for your suggestion.
> >
> > I tried your previous suggested code and getting the average value for
> > each month for both country A, and B. But in your recent email, you are
> > suggesting not to change the date column to real date. If I am going
> > through your recently suggested code, i.e.
> >
> >  "aggregate(value column, list(format(date column, "%m.%Y"), country
> > column), mean)"
> >
> > I am getting an Error that "aggregate(value, list(format(date, "%m.%Y"),
> > country), mean) : object 'value' not found".
> >
> > Here, my query "may I need to define the date column, country column, and
> > value column separately?"
> >
> > Further, I need something the average value result like below in the data
> > frame
> >
> > Month       Country A   Country B
> > Jan 1994    26.66         35.78
> > Feb 1994    26.13         29.14
> >
> > so that it will be easy for me to export to excel, and to use for the
> > further calculations.
> >
> > Please suggest me in this regard.
> >
> > Thank you.
> >
> >
> >
> >
> >
> >
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> > Sender notified by
> >
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&
> > 09/13/19, 07:22:53 PM
> >
> >
> >
> > On Fri, Sep 13, 2019 at 7:03 PM PIKAL Petr <mailto:
> petr.pikal using precheza.cz>
> > wrote:
> > Hi
> >
> > I am almost 100% sure that you would spare yourself much trouble if you
> > changed your date column to real date
> >
> > ?as.Date
> >
> > reshape your wide format to long one
> > library(reshape2)
> > ?melt
> >
> > to get 3 column data.frame with one date column, one country column and
> > one value column
> >
> > use ?aggregate and ?format to get summary value
> >
> > something like
> > aggregate(value column, list(format(date column, "%m.%Y"), country
> > column), mean)
> >
> > But if you insist to scratch your left ear with right hand accross your
> > head, you could continue your way.
> >
> > Cheers
> > Petr
> >
> > > -----Original Message-----
> > > From: R-help <mailto:r-help-bounces using r-project.org> On Behalf Of
> > Subhamitra
> > > Patra
> > > Sent: Friday, September 13, 2019 3:20 PM
> > > To: Jim Lemon <mailto:drjimlemon using gmail.com>; r-help mailing list
> > <r-help using r-
> > > http://project.org>
> > > Subject: Re: [R] Query about calculating the monthly average of daily
> > data
> > > columns
> > >
> > > Dear Sir,
> > >
> > > Yes, I understood the logic. But, still, I have a few queries that I
> > mentioned
> > > below your answers.
> > >
> > > "# if you only have to get the monthly averages, it can be done this
> way
> > > > spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> > > >
> > > > B. Here, I need to define the no. of months, and years separately,
> > right?
> > > > or else what 2, and 3 (in bold) indicates?
> > > >
> > >
> > > To get the grouping variable of sequential months that you want, you
> only
> > > need the month and year values of the dates in the first column. First
> I
> > used
> > > the "strsplit" function to split the date field at the hyphens, then
> used
> > > "sapply" to extract ("[") the second (month) and *third (year)* parts
> as
> > two
> > > new columns. Because you have more than one year of data, you need the
> > > year values or you will group all Januarys, all Februarys and so on.
> > > Notice how I pass both of the new columns as a list (a data frame is a
> > type of
> > > list) in the call to get the mean of each month.
> > >
> > > 1. Here, as per my understanding, the "3" indicates the 3rd year,
> right?
> > > But, you showed an average for 2 months of the same year. Then, what
> "3"
> > > in the  spdat$year object indicate?
> > >
> > >
> > > C. From this part, I got the exact average values of both January and
> > > > February of 1994 for country A, and B. But, in code, I have a query
> > > > that I need to define  spdat$returnA, and  spdat$returnB separately
> > > > before writing this code, right? Like this, I need to define for each
> > > > 84 countries separately with their respective number of months, and
> > > > years before writing this code, right?
> > > >
> > >
> > > I don't think so. Because I don't know what your data looks like, I am
> > > guessing that for each row, it has columns for each of the 84
> countries.
> > I
> > > don't know what these columns are named, either. Maybe:
> > >
> > > date             Australia   Belarus   ...    Zambia
> > > 01/01/1994   20             21                 22
> > > ...
> > >
> > > Here, due to my misunderstanding about the code, I was wrong. But, what
> > > data structure you guessed, it is absolutely right that for each row, I
> > have
> > > columns for each of the 84 countries. So, I think, I need to define the
> > date
> > > column with no. of months, and years once for all the countries.
> > > Therefore, I got my answer to the first and third question in the
> > previous
> > > email (what you suggested) that I no need to define the column of each
> > > country, as the date, and no. of observations are same for all
> countries.
> > > But, the no. of days are different for each month, and similarly, for
> > each
> > > year. So, I think I need to define date for each year separately.
> > Hence, I have
> > > given an example of 12 months, for 2 years (i.e. 1994, and 1995), and
> > have
> > > written the following code. Please correct me in case I am wrong.
> > >
> > >  spdat<-data.frame(
> > >
> > >
> >
> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(r
> > > ep(1,21),rep(2,20),
> > > rep(3,23), rep(4,21),
> > >
> >
> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12
> > > ,22)
> > > ),rep(1994,260)
> > >  dates1=
> > >
> >
> paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,2
> > > 2),rep(2,20),
> > > rep(3,23), rep(4,20),
> > >
> >
> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12
> > > ,21)
> > > ),rep(1995,259) ,sep="-")
> > >
> > > Concerning the exporting of structure of the dataset to excel, I will
> > have
> > > 12*84 matrix. But, please suggest me the way to proceed for the large
> > > sample. I have mentioned below what I understood from your code. Please
> > > correct me if I am wrong.
> > > 1. I need to define the date for each year as the no. of days in each
> > month
> > > are different for each year (as mentioned in my above code). For
> > instance, in
> > > my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
> > > 2. Need to define the date column as character.
> > > 3. Need to define the monthly average for each month, and year. So, now
> > > code will be as follows.
> > >
> >
> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)
> > >   %%%%As I need all months average sequentially.
> > > spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
> > >
> > > Here, this meaning of "3", I am really unable to get.
> > >
> > > 4. Need to define each country with each month and year as mentioned in
> > > the last part of your code.
> > >
> > > Please suggest me in this regard.
> > >
> > > Thank you.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > [image: Mailtrack]
> > > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> > > mpaign=signaturevirality5&>
> > > Sender
> > > notified by
> > > Mailtrack
> > > <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_ca
> > > mpaign=signaturevirality5&>
> > > 09/13/19,
> > > 06:41:41 PM
> > >
> > > On Fri, Sep 13, 2019 at 4:24 PM Jim Lemon <mailto:drjimlemon using gmail.com
> >
> > wrote:
> > >
> > > > Hi Subhamitra,
> > > > I'll try to write my answers adjacent to your questions below.
> > > >
> > > > On Fri, Sep 13, 2019 at 6:08 PM Subhamitra Patra <
> > > > mailto:subhamitra.patra using gmail.com> wrote:
> > > >
> > > >> Dear Sir,
> > > >>
> > > >> Thank you very much for your suggestion.
> > > >>
> > > >> Yes, your suggested code worked. But, actually, I have data from 3rd
> > > >> January 1994 to 3rd August 2017 for very large (i.e. for 84
> > > >> countries) sample. From this, I have given the example of the years
> > > >> up to 2000. Before applying the same code for the long 24 years, I
> > > >> want to learn the logic behind the code. Actually, some part of the
> > > >> code is not understandable to me which I mentioned in the bold
> letter
> > as
> > > follows.
> > > >>
> > > >> "spdat<-data.frame(
> > > >>
> >  dates=paste(c(1:30,1:28),c(rep(1,30),rep(2,28)),rep(1994,58),sep="-"),
> > > >>   returnA=sample(*15:50*,58,TRUE),returnB=sample(*10:45*,58,TRUE))"
> > > >>
> > > >> A. Here, I need to define the no. of days in a month, and the no. of
> > > >> countries name separately, right? But, what is meant by 15:50, and
> > > >> 10:45 in return A, and B respectively?
> > > >>
> > > >
> > > > To paraphrase Donald Trump, this is FAKE DATA! I have no idea what
> the
> > > > real values of return are, so I made them up using the "sample"
> > function.
> > > > However, this is not meant to mislead anyone, just to show how
> > > > whatever numbers are in your data can be used in calculations. The
> > > > colon (":") operator creates a sequence of numbers starting with the
> > > > one to the left and ending with the one to the right.
> > > >
> > > >>
> > > >> "# if you only have to get the monthly averages, it can be done this
> > > >> way
> > > >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"["*,2*)
> > > >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",*3*)"
> > > >>
> > > >> B. Here, I need to define the no. of months, and years separately,
> > right?
> > > >> or else what 2, and 3 (in bold) indicates?
> > > >>
> > > >
> > > > To get the grouping variable of sequential months that you want, you
> > > > only need the month and year values of the dates in the first column.
> > > > First I used the "strsplit" function to split the date field at the
> > > > hyphens, then used "sapply" to extract ("[") the second (month) and
> > > > third (year) parts as two new columns. Because you have more than one
> > > > year of data, you need the year values or you will group all
> Januarys,
> > > > all Februarys and so on. Notice how I pass both of the new columns as
> > > > a list (a data frame is a type of
> > > > list) in the call to get the mean of each month.
> > > >
> > > >>
> > > >> "# get the averages by month and year - is this correct?
> > > >> monthlyA<-by(*spdat$returnA*,spdat[,c("month","year")],mean)
> > > >> monthlyB<-by(*spdat$returnB*,spdat[,c("month","year")],mean)"
> > > >>
> > > >> C. From this part, I got the exact average values of both January
> and
> > > >> February of 1994 for country A, and B. But, in code, I have a query
> > > >> that I need to define  spdat$returnA, and  spdat$returnB separately
> > > >> before writing this code, right? Like this, I need to define for
> each
> > > >> 84 countries separately with their respective number of months, and
> > > >> years before writing this code, right?
> > > >>
> > > >
> > > > I don't think so. Because I don't know what your data looks like, I
> am
> > > > guessing that for each row, it has columns for each of the 84
> > > > countries. I don't know what these columns are named, either. Maybe:
> > > >
> > > > date             Australia   Belarus   ...    Zambia
> > > > 01/01/1994   20             21                 22
> > > > ...
> > > >
> > > >
> > > >> Yes, after obtaining the monthly average for each country's data, I
> > > >> need to use them for further calculations. So, I want to export the
> > > >> result to excel. But, until understanding the code, I think I
> willn't
> > > >> able to apply for the entire sample, and cannot be able to discuss
> > > >> the format of the resulted column to export to excel.
> > > >>
> > > >
> > > > Say that we perform the grouped mean calculation for the first two
> > > > country columns like this:
> > > > monmeans<-sapply(spdat[,2:3],by,spdat[,c("month","year")],mean)
> > > > monmeans
> > > >     Australia  Belarus
> > > > [1,]  29.70000 30.43333
> > > > [2,]  34.17857 27.39286
> > > >
> > > > We are presented with a 2x2 matrix of monthly means in just the
> format
> > > > someone might use for importing into Excel. The first row is January
> > > > 1994, the second February 1994 and so on. By expanding the columns to
> > > > include all the countries in your data, You should have the result
> you
> > want.
> > > >
> > > > Jim
> > > >
> > >
> > >
> > > --
> > > *Best Regards,*
> > > *Subhamitra Patra*
> > > *Phd. Research Scholar*
> > > *Department of Humanities and Social Sciences* *Indian Institute of
> > > Technology, Kharagpur*
> > > *INDIA*
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > > guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> > partnerů PRECHEZA a.s. jsou zveřejněny na:
> > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> > about processing and protection of business partner’s personal data are
> > available on website:
> > https://www.precheza.cz/en/personal-data-protection-principles/
> > Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> > důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> > odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> > documents attached to it may be confidential and are subject to the
> legally
> > binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
> >
> >
> >
> > --
> > Best Regards,
> > Subhamitra Patra
> > Phd. Research Scholar
> > Department of Humanities and Social Sciences
> > Indian Institute of Technology, Kharagpur
> > INDIA
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list