[R] Looking for the first observation within the month

Gabor Grothendieck ggrothendieck at gmail.com
Sun May 27 17:15:13 CEST 2007


One additional simplification.  If we use simplify = FALSE then
tapply won't simplify its answer to numeric and we can
avoid using as.Date in the last solution:

 window(z, tapply(time(z), as.yearmon(time(z)), head, 1, simplify = FALSE))


On 5/27/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Here is one additional solution, also using zoo.  Using z from
> the prior solution as.yearmon(time(z)) is, as before, the year/month
> of each date and tapply(time(z), as.yearmon(time(z)), head, 1)
> gets the first date within each month; however, tapply converts it
> to numeric so we use as.Date to convert it back again.  Then
> we use window to select those dates.
>
> window(z, as.Date(tapply(time(z), as.yearmon(time(z)), head, 1)))
>
>
> On 5/27/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > Use the zoo package to represent data like this.
> >
> > Here time(z) is a vector of the dates and as.yearmon(time(z))
> > is the year/month of each date.  With FUN=head1, ave picks out the first
> > date in any month and aggregate then aggregates over all
> > values in the same year/month choosing the first one.
> >
> > Lines <- "Date                    Observation
> >
> > 2007-05-23              20
> > 2007-05-22              30
> > 2007-05-21              10
> >
> > 2007-04-10              50
> > 2007-04-09              40
> > 2007-04-07              30
> >
> > 2007-03-05              10
> > "
> >
> > library(zoo)
> >
> > # z <- read.zoo("myfile.dat", header = TRUE)
> > z <- read.zoo(textConnection(Lines), header = TRUE)
> >
> > head1 <- function(x, n = 1) head(x, n)
> > aggregate(z, ave(time(z), as.yearmon(time(z)), FUN = head1), head1)
> >
> >
> > For more on zoo try:
> >
> > library(zoo)
> > vignette("zoo")
> >
> > and also read the Help Desk article in R News 4/1 about dates.
> >
> >
> >
> > On 5/27/07, Albert Pang <albert.pang at mac.com> wrote:
> > > Hi all, I have a simple data frame, first list is a list of dates (in
> > > "%Y-%m-%d" format) and second list an observation on that particular
> > > date.  There might not be observations everyday.  Let's just say
> > > there are no observations on saturdays and sundays.  Now I want to
> > > select the first observation of every month into a list.  Is there an
> > > easy way to do that?
> > >
> > > Date                    Observation
> > > ----                    -----------
> > > 2007-05-23              20
> > > 2007-05-22              30
> > > 2007-05-21              10
> > >
> > > 2007-04-10              50
> > > 2007-04-09              40
> > > 2007-04-07              30
> > >
> > > 2007-03-05              10
> > >
> > > The result I need is the data frame
> > >
> > > 2007-05-21              10
> > > 2007-04-07              30
> > > 2007-03-05              10
> > >
> > > or I am equally happy with just the vector c(10, 30, 10)
> > >
> > > I am new to R and after going through the manuals and the
> > > documentation I can gather, I have come up with a convoluted way of
> > > doing it
> > >
> > > 1)  I first get the Date into a vector.  (I am articificially
> > > reproducing this vector below and call it A)
> > >
> > >  > A<-c( as.Date("2007-05-23"), as.Date("2007-05-22"), as.Date
> > > ("2007-05-21"), as.Date("2007-04-10"), as.Date("2007-04-09"), as.Date
> > > ("2007-04-07"), as.Date("2007-03-05"))
> > >  > A
> > > [1] "2007-05-23" "2007-05-22" "2007-05-21" "2007-04-10" "2007-04-09"
> > > [6] "2007-04-07" "2007-03-05"
> > >
> > >
> > > 2)  use cut with breaks falling on the months
> > >
> > >  > B<-cut(A, breaks="month")
> > >  > B
> > > [1] 2007-05-01 2007-05-01 2007-05-01 2007-04-01 2007-04-01 2007-04-01
> > > [7] 2007-03-01
> > > Levels: 2007-03-01 2007-04-01 2007-05-01
> > >
> > >
> > > 3)  then split to get a list of vectors group by the boundary of the
> > > date
> > >
> > >  > C<-split(A, B)
> > >  > C
> > > $`2007-03-01`
> > > [1] "2007-03-05"
> > >
> > > $`2007-04-01`
> > > [1] "2007-04-10" "2007-04-09" "2007-04-07"
> > >
> > > $`2007-05-01`
> > > [1] "2007-05-23" "2007-05-22" "2007-05-21"
> > >
> > >
> > > 4)  in a for loop I loop through the elements within the list (the
> > > elements are vectors of dates) with each vector I find the minimum
> > > and concatentate it to a final vector D
> > >
> > >  > D<-numeric()
> > >  > for ( i in 1:length(C)){ D <- c( D, min(C[[i]]))}
> > >  > class(D)<-"Date"
> > >  > D
> > > [1] "2007-03-05" "2007-04-07" "2007-05-21"
> > >
> > > Next with D, I then go back and find out the positions of the
> > > elements in D within A.  And then use the result as an index vector
> > > into the vector of observations (which is not shown here)  I feel
> > > sure I am doing it the stupid way (or the procedural way)
> > >
> > > Is there a more declarative way of doing it?  Any pointers will be
> > > greatly appreciated!
> > >
> > > Thanks a lot in advance,
> > >
> > > Albert Pang
> > >
> > >
> > >
> > >
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
>



More information about the R-help mailing list