[R] monthly median in a daily dataset
Uwe Ligges
Thu Dec 23 11:59:16 CET 2010
On 21.12.2010 08:15, SNV Krishna wrote:
> Hi Dennis,
>
> I am looking for similar function and this post is useful. But a strange
> thing is happening when I try which I couldn't figure out (details below).
> Could you or anyone help me understand why this is so?
>
>> df = data.frame(date = seq(as.Date("2010-1-1"), by = "days", length =
> 250))
>> df$value = cumsum(rnorm(1:250))
>
> When I use the statement (as given in ?aggregate help file) the following
> error is displayed
>> aggregate(df$value, by = months(df$date), FUN = median)
> Error in aggregate.data.frame(as.data.frame(x), ...) :
> 'by' must be a list
The error message is quite helpful, you need a list of all the elements
you'd have after the "~" in a formula, in this case only the date:
aggregate(df$value, by = list(date = months(df$date)), FUN = median)
> But it works when I use as was suggested
>> aggregate(value~months(date), data = df, FUN = median)
> months(date) value
> 1 April 15.5721440
> 2 August -0.1261205
> 3 February -1.0230631
> 4 January -0.9277885
> 5 July -2.1890907
> 6 June 1.3045260
> 7 March 11.4126371
> 8 May 2.1625091
>
> The second question, is it possible to have the median across the months and
> years. Say I have daily data for last five years the above function will
> give me the median of Jan of all the five years, while I want Jan-2010,
> Jan-2009 and so... Wish my question is clear.
Just use Year-Month as the grouping criterion as follows:
aggregate(x=df$value, by = list(date = format(df$date, "%Y-%m")), FUN =
median)
Uwe Ligges
> Any assistance will be greatly appreciated and many thanks for the same.
>
> Regards,
>
> Krishna
>
>
Dennis Murphy<djmuser at gmail.com>
> Hi:
>
> There is a months() function associated with Date objects, so you should be
> able to do something like
>
> aggregate(value ~ months(date), data = data$flow$daily, FUN = median)
>
> Here's a toy example because your data are not in a ready form:
>
> df<- data.frame(date = seq(as.Date('2010-01-01'), by = 'days', length =
> 250),
> val = rnorm(250))
>> aggregate(val ~ months(date), data = df, FUN = median)
> months(date) val
> 1 April -0.18864817
> 2 August -0.16203705
> 3 February 0.03671700
> 4 January 0.04500988
> 5 July -0.12753151
> 6 June 0.09864811
> 7 March 0.23652105
> 8 May 0.25879994
> 9 September 0.53570764
>
> HTH,
> Dennis
>
On Sun, Dec 19, 2010 at 2:31 PM, HUXTERE<emilyhuxter at gmail.com> wrote:
>
>>
>> Hello,
>>
>> I have a multi-year dataset (see below) with date, a data value and a flag
>> for the data value. I want to find the monthly median for each month in
>> this
>> dataset and then plot it. If anyone has suggestions they would be greatly
>> apperciated. It should be noted that there are some dates with no values
>> and
>> they should be removed.
>>
>> Thanks
>> Emily
>>
>>> print ( str(data$flow$daily) )
>> 'data.frame': 16071 obs. of 3 variables:
>> $ date :Class 'Date' num [1:16071] -1826 -1825 -1824 -1823 -1822 ...
>> $ value: num NA NA NA NA NA NA NA NA NA NA ...
>> $ flag : chr "" "" "" "" ...
>> NULL
>>
>> 520 2008-11-01 0.034
>> 1041 2008-11-02 0.034
>> 1562 2008-11-03 0.034
>> 2083 2008-11-04 0.038
>> 2604 2008-11-05 0.036
>> 3125 2008-11-06 0.035
>> 3646 2008-11-07 0.036
>> 4167 2008-11-08 0.039
>> 4688 2008-11-09 0.039
>> 5209 2008-11-10 0.039
>> 5730 2008-11-11 0.038
>> 6251 2008-11-12 0.039
>> 6772 2008-11-13 0.039
>> 7293 2008-11-14 0.038
>> 7814 2008-11-15 0.037
>> 8335 2008-11-16 0.037
>> 8855 2008-11-17 0.037
>> 9375 2008-11-18 0.037
>> 9895 2008-11-19 0.034 B
>> 10415 2008-11-20 0.034 B
>> 10935 2008-11-21 0.033 B
>> 11455 2008-11-22 0.034 B
>> 11975 2008-11-23 0.034 B
>> 12495 2008-11-24 0.034 B
>> 13016 2008-11-25 0.034 B
>> 13537 2008-11-26 0.033 B
>> 14058 2008-11-27 0.033 B
>> 14579 2008-11-28 0.033 B
>> 15068 2008-11-29 0.034 B
>> 15546 2008-11-30 0.035 B
>> 521 2008-12-01 0.035 B
>> 1042 2008-12-02 0.034 B
>> 1563 2008-12-03 0.033 B
>> 2084 2008-12-04 0.031 B
>> 2605 2008-12-05 0.031 B
>> 3126 2008-12-06 0.031 B
>> 3647 2008-12-07 0.032 B
>> 4168 2008-12-08 0.032 B
>> 4689 2008-12-09 0.032 B
>> 5210 2008-12-10 0.033 B
>> 5731 2008-12-11 0.033 B
>> 6252 2008-12-12 0.032 B
>> 6773 2008-12-13 0.031 B
>> 7294 2008-12-14 0.030 B
>> 7815 2008-12-15 0.030 B
>> 8336 2008-12-16 0.029 B
>> 8856 2008-12-17 0.028 B
>> 9376 2008-12-18 0.028 B
>> 9896 2008-12-19 0.028 B
>> 10416 2008-12-20 0.027 B
>> 10936 2008-12-21 0.027 B
>> 11456 2008-12-22 0.028 B
>> 11976 2008-12-23 0.028 B
>> 12496 2008-12-24 0.029 B
>> 13017 2008-12-25 0.029 B
>> 13538 2008-12-26 0.029 B
>> 14059 2008-12-27 0.030 B
>> 14580 2008-12-28 0.030 B
>> 15069 2008-12-29 0.030 B
>> 15547 2008-12-30 0.031 B
>> 15851 2008-12-31 0.031 B
>
>
> Dear List,
>
> I have a set of distributions recorded at an equal interval of time and I
> would like to plot them as series of horizontal histograms (with the x-axis
> representing time, and y-axis representing the bins) since the distribution
> shifts from unimodal to multimodal in several occasions. What I would like
> to see is something close to a violinplot, but I do not want a kernel
> density estimate...
>
> Thanks in Advance,
> Enrico
>
>
Paolo Rossi<statmailinglists at googlemail.com>
> I would like to know how to turn a variable into a string. I have tried
> as.symbol and as.name but it doesnt work for what I'd like to do
>
> Essentially, I'd like to feed the function below with two variables. This
> works fine in the bit working out number of elements in each variable.
>
> In the print(sprintf("OK with %s and %s\n", var1, var2)) line I would like
> var1 and var2 to be magically substituted with a string containing the name
> of var1 and name of var2.
>
> Thanks in advance
>
> Paolo
>
>
>
> haveSameLength<- function(var1, var2) {
> if (length(var1)==length(var2))
> {
> print(sprintf("OK with %s and %s\n", var1, var2))
> } else {
> print("Problems!!")
> }
> }
>
>
>
>
Phil Spector<spector at stat.berkeley.edu>
>
> Paolo -
> One way to make the function do what you want is to replace
> the line
>
> print(sprintf("OK with %s and %s\n", var1, var2))
>
> with
>
> cat('OK with',substitute(var1),'and',substitute(var2),'\n')
>
> With sprintf, you'd need
>
> print(sprintf("OK with %s and %s\n", deparse(substitute(var1)),
> deparse(substitute(var2))))
>
> but since you're just printing the string returned by sprintf, I'd
> go with cat.
>
> - Phil Spector
> Statistical Computing Facility
> Department of Statistics
> UC Berkeley
spector at stat.berkeley.edu
>
>
On Mon, 20 Dec 2010, Paolo Rossi wrote:
>
>> I would like to know how to turn a variable into a string. I have tried
>> as.symbol and as.name but it doesnt work for what I'd like to do
>>
>> Essentially, I'd like to feed the function below with two variables. This
>> works fine in the bit working out number of elements in each variable.
>>
>> In the print(sprintf("OK with %s and %s\n", var1, var2)) line I would
> like
>> var1 and var2 to be magically substituted with a string containing the
> name
>> of var1 and name of var2.
>>
>> Thanks in advance
>>
>> Paolo
>>
>>
>>
>> haveSameLength<- function(var1, var2) {
>> if (length(var1)==length(var2))
>> {
>> print(sprintf("OK with %s and %s\n", var1, var2))
>> } else {
>> print("Problems!!")
>> }
>> }
>>
>>
>
>
>
Duncan Murdoch<murdoch.duncan at gmail.com>
On 19/12/2010 7:21 PM, Paolo Rossi wrote:
>> I would like to know how to turn a variable into a string. I have tried
>> as.symbol and as.name but it doesnt work for what I'd like to do
>>
>> Essentially, I'd like to feed the function below with two variables. This
>> works fine in the bit working out number of elements in each variable.
>>
>> In the print(sprintf("OK with %s and %s\n", var1, var2)) line I would
> like
>> var1 and var2 to be magically substituted with a string containing the
> name
>> of var1 and name of var2.
>
> The name of var1 is var1, so I assume you mean the expression passed to
> your function and bound to var1. In that case, what you want is
>
> deparse(substitute(var1))
>
> Watch out: if the expression is really long, that can be a vector with
> more than one element. See ?deparse for ways to deal with that.
>
> Duncan Murdoch
>
>>
>> Thanks in advance
>>
>> Paolo
>>
>>
>>
>> haveSameLength<- function(var1, var2) {
>> if (length(var1)==length(var2))
>> {
>> print(sprintf("OK with %s and %s\n", var1, var2))
>> } else {
>> print("Problems!!")
>> }
>> }
>>
>>
>
>
Duncan Murdoch<murdoch.duncan at gmail.com>
On 17/12/2010 4:36 PM, Jeff Breiwick wrote:
>> All,
>>
>> I had a simple function call I used to open up a dos shell running R under
>> Win XP:
>> system("cmd.exe", wait=FALSE, invisible=FALSE).
>>
>> This does not work with R 2.12.1 - I get a window that briefly flashes
> open
>> but then disappears. Does anyone know the method to open a DOS command
>> window in running R with Win XP? Thank you.
>
> This is a new bug in 2.12.1, which I am about to fix in R-patched. The
> problem was that it was passing a null input stream to cmd.exe, which
> saw an immediate EOF, and quit. A similar thing happened in Rterm,
> where system("cmd") should drop into a command shell in the same window,
> but it would immediately exit.
>
> Duncan Murdoch
>
>
>
Dennis Murphy<djmuser at gmail.com>
>
> Hi:
>
> You can get a violin plot in lattice rather straightforwardly. It's easiest
> if time is an ordered factor, but you can also do it if time is numeric; in
> the latter case, the code associated with Figure 10.14 in the Lattice book
> provides a template to start with:
> http://lmdvr.r-forge.r-project.org/figures/figures.html
>
> To get horizontal violin plots, use time as the y variable and start by
> replacing panel.boxplot with panel.violin; see the help page of the latter
> if more specific options are required. It also contains an example using a
> panel function.
>
> I don't know how you expect to get horizontal histograms without setting the
> time variable to be a factor. If you have enough time periods, the result
> will not be pretty. If you have a fairly large number of time periods, the
> best distributional displays are boxplots, violin plots, beanplots or some
> variation of that general concept.
>
> Since neither data nor code were offered, one can only speculate so far as
> to what your intentions might be. A reproducible example with data and code
> would undoubtedly elicit more useful responses.
>
> HTH,
> Dennis
>
>
On Sun, Dec 19, 2010 at 4:03 PM, Enrico R. Crema
>
>
>> Dear List,
>>
>> I have a set of distributions recorded at an equal interval of time and I
>> would like to plot them as series of horizontal histograms (with the
> x-axis
>> representing time, and y-axis representing the bins) since the
> distribution
>> shifts from unimodal to multimodal in several occasions. What I would
> like
>> to see is something close to a violinplot, but I do not want a kernel
>> density estimate...
>>
>> Thanks in Advance,
>> Enrico
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>
> Many Thanks Dennis,
>
> The distributions are simulated ordinal data all bounded in the same upper
> and lower limit, and I wanted to plot how the distribution changes through
> time. Since the distributions are often multimodal boxplots were not useful
> so I made some violinplots... My practical solution which I'm testing right
> now is to create a matrix of frequencies and then plot these as a series of
> horrizontal barplots (after normalising each distribution) , using the
> offset parameter to control the temporal sequence....It actually works fine,
> but I was wondering if there were better ways...
>
>
> Enrico
>
>
>
On 20 Dec 2010, at 01:47, Dennis Murphy wrote:
>
>> Hi:
>>
>> You can get a violin plot in lattice rather straightforwardly. It's
> easiest if time is an ordered factor, but you can also do it if time is
> numeric; in the latter case, the code associated with Figure 10.14 in the
> Lattice book provides a template to start with:
> http://lmdvr.r-forge.r-project.org/figures/figures.html
>>
>> To get horizontal violin plots, use time as the y variable and start by
> replacing panel.boxplot with panel.violin; see the help page of the latter
> if more specific options are required. It also contains an example using a
> panel function.
>>
>> I don't know how you expect to get horizontal histograms without setting
> the time variable to be a factor. If you have enough time periods, the
> result will not be pretty. If you have a fairly large number of time
> periods, the best distributional displays are boxplots, violin plots,
> beanplots or some variation of that general concept.
>>
>> Since neither data nor code were offered, one can only speculate so far as
> to what your intentions might be. A reproducible example with data and code
> would undoubtedly elicit more useful responses.
>>
>> HTH,
>> Dennis
>>
>>
>
wrote:
>> Dear List,
>>
>> I have a set of distributions recorded at an equal interval of time and I
> would like to plot them as series of horizontal histograms (with the x-axis
> representing time, and y-axis representing the bins) since the distribution
> shifts from unimodal to multimodal in several occasions. What I would like
> to see is something close to a violinplot, but I do not want a kernel
> density estimate...
>>
>> Thanks in Advance,
>> Enrico
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>
>
Jorge Ivan Velez<jorgeivanvelez at gmail.com>
>
> Hi Enrico,
>
> Is this close to what you want to do?
>
> http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=109
>
> HTH,
> Jorge
>
>
On Sun, Dec 19, 2010 at 7:03 PM, Enrico R. Crema<> wrote:
>
>> Dear List,
>>
>> I have a set of distributions recorded at an equal interval of time and I
>> would like to plot them as series of horizontal histograms (with the
> x-axis
>> representing time, and y-axis representing the bins) since the
> distribution
>> shifts from unimodal to multimodal in several occasions. What I would
> like
>> to see is something close to a violinplot, but I do not want a kernel
>> density estimate...
>>
>> Thanks in Advance,
>> Enrico
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>
Bill.Venables at csiro.au
> I find this function useful for digging out months from Date objects
>
> Month<- function(date, ...)
> factor(month.abb[as.POSIXlt(date)$mon + 1], levels = month.abb)
>
> For this little data set below this is what it gives
>
>> with(data, tapply(value, Month(date), median, na.rm = TRUE))
> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> NA NA NA NA NA NA NA NA NA NA 0.035 0.030
>
> Here is another useful little one:
>
> Year<- function(date, ...)
> as.POSIXlt(date)$year + 1900
>
> So if you wanted the median by year and month you could do
>
>> with(data, tapply(value, list(Year(date), Month(date)), median, na.rm =
> TRUE))
> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> 2008 NA NA NA NA NA NA NA NA NA NA 0.035 0.03
>
> (The result is a matrix, which in this case has only one row, of course.)
>
> See how you go.
>
> Bill Venables.
>
> Hello,
>
> I have a multi-year dataset (see below) with date, a data value and a flag
> for the data value. I want to find the monthly median for each month in this
> dataset and then plot it. If anyone has suggestions they would be greatly
> apperciated. It should be noted that there are some dates with no values and
> they should be removed.
>
> Thanks
> Emily
>
>> print ( str(data$flow$daily) )
> 'data.frame': 16071 obs. of 3 variables:
> $ date :Class 'Date' num [1:16071] -1826 -1825 -1824 -1823 -1822 ...
> $ value: num NA NA NA NA NA NA NA NA NA NA ...
> $ flag : chr "" "" "" "" ...
> NULL
>
> 520 2008-11-01 0.034
> 1041 2008-11-02 0.034
> 1562 2008-11-03 0.034
> 2083 2008-11-04 0.038
> 2604 2008-11-05 0.036
> 3125 2008-11-06 0.035
> 3646 2008-11-07 0.036
> 4167 2008-11-08 0.039
> 4688 2008-11-09 0.039
> 5209 2008-11-10 0.039
> 5730 2008-11-11 0.038
> 6251 2008-11-12 0.039
> 6772 2008-11-13 0.039
> 7293 2008-11-14 0.038
> 7814 2008-11-15 0.037
> 8335 2008-11-16 0.037
> 8855 2008-11-17 0.037
> 9375 2008-11-18 0.037
> 9895 2008-11-19 0.034 B
> 10415 2008-11-20 0.034 B
> 10935 2008-11-21 0.033 B
> 11455 2008-11-22 0.034 B
> 11975 2008-11-23 0.034 B
> 12495 2008-11-24 0.034 B
> 13016 2008-11-25 0.034 B
> 13537 2008-11-26 0.033 B
> 14058 2008-11-27 0.033 B
> 14579 2008-11-28 0.033 B
> 15068 2008-11-29 0.034 B
> 15546 2008-11-30 0.035 B
> 521 2008-12-01 0.035 B
> 1042 2008-12-02 0.034 B
> 1563 2008-12-03 0.033 B
> 2084 2008-12-04 0.031 B
> 2605 2008-12-05 0.031 B
> 3126 2008-12-06 0.031 B
> 3647 2008-12-07 0.032 B
> 4168 2008-12-08 0.032 B
> 4689 2008-12-09 0.032 B
> 5210 2008-12-10 0.033 B
> 5731 2008-12-11 0.033 B
> 6252 2008-12-12 0.032 B
> 6773 2008-12-13 0.031 B
> 7294 2008-12-14 0.030 B
> 7815 2008-12-15 0.030 B
> 8336 2008-12-16 0.029 B
> 8856 2008-12-17 0.028 B
> 9376 2008-12-18 0.028 B
> 9896 2008-12-19 0.028 B
> 10416 2008-12-20 0.027 B
> 10936 2008-12-21 0.027 B
> 11456 2008-12-22 0.028 B
> 11976 2008-12-23 0.028 B
> 12496 2008-12-24 0.029 B
> 13017 2008-12-25 0.029 B
> 13538 2008-12-26 0.029 B
> 14059 2008-12-27 0.030 B
> 14580 2008-12-28 0.030 B
> 15069 2008-12-29 0.030 B
> 15547 2008-12-30 0.031 B
> 15851 2008-12-31 0.031 B
