[R] Struggling with zoo and aggregate
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Mar 26 13:22:27 CEST 2012
On Sun, Mar 25, 2012 at 10:20 PM, Thomas Adams <thomas.adams at noaa.gov> wrote:
> Gabor,
>
> Thank you for your help -- it did help me a lot. However, with my data:
>
> lead_time cycle r_squared fcst_date
> 1 6 0 5.405095e-02 07/31/2010
> 2 12 0 5.521620e-06 07/31/2010
> 3 18 0 1.565910e-04 07/31/2010
> 4 24 0 8.646822e-02 07/31/2010
> 5 30 0 1.719604e-02 07/31/2010
> 6 36 0 5.768113e-04 07/31/2010
> 7 42 0 2.501269e-06 07/31/2010
> 8 48 0 6.451727e-02 07/31/2010
> 9 6 12 2.857931e-01 07/31/2010
> 10 12 12 1.138635e-01 07/31/2010
> 11 18 12 2.225503e-02 07/31/2010
> 12 24 12 1.182031e-03 07/31/2010
> 13 30 12 8.841142e-04 07/31/2010
> 14 36 12 1.082490e-01 07/31/2010
> 15 42 12 1.502887e-05 07/31/2010
> 17 6 0 8.689588e-02 08/01/2010
> 18 12 0 5.884336e-04 08/01/2010
> 19 18 0 2.219316e-07 08/01/2010
> 20 24 0 3.960752e-02 08/01/2010
> 21 30 0 1.087413e-04 08/01/2010
> 23 42 0 3.583030e-07 08/01/2010
> 24 48 0 2.907109e-05 08/01/2010
> 25 6 12 8.693451e-02 08/01/2010
> 26 12 12 3.208215e-02 08/01/2010
> 27 18 12 0.000000e+00 08/01/2010
> 28 6 0 2.962669e-02 08/02/2010
> 29 6 12 2.363506e-05 08/02/2010
> 30 12 12 9.050178e-03 08/02/2010
>
> from:
>
>> z <- read.zoo(q,index = 4, FUN = as.yearmon, format = "%m/%d/%Y",aggregate
>> = mean)
>
> I get:
>> z
> lead_time cycle r_squared
> Jul 2010 25.60000 5.600000 0.05034771
> Aug 2010 18.46154 4.615385 0.02191903
>
> what I need is to NOT have the lead_time and cycle averaged, but only have
> the r_squared values averaged by lead_time and cycle. I can not seem to
> figure out the correct syntax to do this. I assume I use something like:
>
> q_agg<-aggregate(q,by=list(q$lead_time,q$cycle),index = 4, FUN = as.yearmon,
> format = "%m/%d/%Y")
>
> but I get errors or nonsense when I follow with...
>
> z <- read.zoo(q_agg,index = 4, FUN = as.yearmon, format =
> "%m/%d/%Y",aggregate = mean)
>
> or some variation of this.
>
> Regards,
> Tom
>
>
>
> On Sat, Mar 24, 2012 at 10:58 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>>
>> On Sat, Mar 24, 2012 at 10:44 PM, Thomas Adams <thomas.adams at noaa.gov>
>> wrote:
>> > All:
>> >
>> > I have a SQlite database where I have stored some verification data by
>> > date
>> > & time (cycle Z/UTC), lead_time as well as type, duration, etc. I would
>> > like to analyze & plot the data as monthly averages. I have looked at a
>> > bunch of examples which use some combination of zoo and aggregate, but I
>> > have not been able to successfully apply bits and pieces from the
>> > examples
>> > I have found. Any help is appreciated. BTW, I calculate mae (mean
>> > absolute
>> > error), mse (mean squared error), me (mean error), and other measures
>> > obtained by using the R verification package.
>> >
>> > The example below is limited to 20 records and shows lead_time,
>> > r_squared,
>> > (forecast) cycle, fcst_date (forecast date) -- the full data set is just
>> > over 2 years of daily data with 3 forecast cycles (00Z, 12Z, and 18Z)
>> > daily.
>> >
>> > >From my query, below) how do I construct an appropriate data structure
>> > to
>> > analyze & plot the data as monthly averages?
>> >
>> > Regards,
>> > Tom
>> >
>> >> q<-dbGetQuery(con,"select lead_time,r_squared,cycle,fcst_date from
>> > verify_table where duration=6 limit 20")
>> >> q
>> > lead_time r_squared cycle fcst_date
>> > 1 6 5.405095e-02 00 07/31/2010
>> > 2 12 5.521620e-06 00 07/31/2010
>> > 3 18 1.565910e-04 00 07/31/2010
>> > 4 24 8.646822e-02 00 07/31/2010
>> > 5 30 1.719604e-02 00 07/31/2010
>> > 6 36 5.768113e-04 00 07/31/2010
>> > 7 42 2.501269e-06 00 07/31/2010
>> > 8 48 6.451727e-02 00 07/31/2010
>> > 9 6 2.857931e-01 12 07/31/2010
>> > 10 12 1.138635e-01 12 07/31/2010
>> > 11 18 2.225503e-02 12 07/31/2010
>> > 12 24 1.182031e-03 12 07/31/2010
>> > 13 30 8.841142e-04 12 07/31/2010
>> > 14 36 1.082490e-01 12 07/31/2010
>> > 15 42 1.502887e-05 12 07/31/2010
>> > 16 48 NA 12 07/31/2010
>> > 17 6 8.689588e-02 00 08/01/2010
>> > 18 12 5.884336e-04 00 08/01/2010
>> > 19 18 2.219316e-07 00 08/01/2010
>> > 20 24 3.960752e-02 00 08/01/2010
>> >
>>
>> Try this:
>>
>> Lines <- "lead_time r_squared cycle fcst_date
>> 1 6 5.405095e-02 00 07/31/2010
>> 2 12 5.521620e-06 00 07/31/2010
>> 3 18 1.565910e-04 00 07/31/2010
>> 4 24 8.646822e-02 00 07/31/2010
>> 5 30 1.719604e-02 00 07/31/2010
>> 6 36 5.768113e-04 00 07/31/2010
>> 7 42 2.501269e-06 00 07/31/2010
>> 8 48 6.451727e-02 00 07/31/2010
>> 9 6 2.857931e-01 12 07/31/2010
>> 10 12 1.138635e-01 12 07/31/2010
>> 11 18 2.225503e-02 12 07/31/2010
>> 12 24 1.182031e-03 12 07/31/2010
>> 13 30 8.841142e-04 12 07/31/2010
>> 14 36 1.082490e-01 12 07/31/2010"
>>
>> library(zoo)
>> q <- read.table(text = Lines)
>>
>> z <- read.zoo(q, index = 4, FUN = as.yearmon, format = "%m/%d/%Y",
>> aggregate = mean)
>> plot(z)
>>
>> See the 5 vignettes that come with zoo as well as ?read.zoo, ?plot.zoo
>> and ?xyplot.zoo
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>
>
>
>
> --
>
> Thomas E Adams
> National Weather Service
> Ohio River Forecast Center
> 1901 South State Route 134
> Wilmington, OH 45177
>
> EMAIL: thomas.adams at noaa.gov
>
> VOICE: 937-383-0528
> FAX: 937-383-0033
>
>
Regarding the revised question:
library(zoo) # yearmon
library(chron) # chron
library(ggplot2) # qplot
q <- read.table(text = Lines, as.is = TRUE)
# aggregate by lead_time, cycle and year/month
q.ag <- aggregate(r_squared ~.,
transform(q, fcst_date =
as.Date(as.yearmon(chron(fcst_date)))),
mean)
# plot in cycle by lead_time grid
qplot(fcst_date, r_squared, data = q.ag) + facet_grid(cycle ~ lead_time)
See ?aggregate and the ggplot2 package documentation .
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list