[R] "Holes" in a data frame with time intervals

jim holtman jholtman at gmail.com
Mon Mar 28 14:29:42 CEST 2011


Does this do what you want:

> x <- read.table(textConnection("B1          A15         30          2001-01-01  2001-10-15
+ B1          A15         28          2001-10-16  2001-12-31
+ B1          A15         32          2002-01-01  2003-04-18
+ B1          A15         33          2003-04-19  2004-12-31
+ B1          A15         29          2005-03-01  2010-12-31
+ B1          A15         30          2011-02-12  9999-12-31"))
> closeAllConnections()
> x$V4 <- as.Date(x$V4)
> x$V5 <- as.Date(x$V5)
> # create dataframe with intervals
> y <- rbind(cbind(x$V4, 1), cbind(x$V5 + 1, -1))
> y <- y[order(y[,1]),]
> y <- cbind(y, count = cumsum(y[,2]))
> y
                 count
 [1,]   11323  1     1
 [2,]   11611  1     2
 [3,]   11611 -1     1
 [4,]   11688  1     2
 [5,]   11688 -1     1
 [6,]   12161  1     2
 [7,]   12161 -1     1
 [8,]   12784 -1     0
 [9,]   12843  1     1
[10,]   14975 -1     0
[11,]   15017  1     1
[12,] 2932897 -1     0
> # find counts of zero to determine intervals
> missing <- which(y[,'count'] == 0)
> # remove any index equal to last one
> missing <- missing[missing != nrow(y)]
> # print intervals
> paste(as.Date(y[missing,1], origin = '1970-1-1')
+     , 'to'
+     , as.Date(y[missing + 1, 1] - 1, origin = '1970-1-1')
+     )
[1] "2005-01-01 to 2005-02-28" "2011-01-01 to 2011-02-11"
>
>


On Mon, Mar 28, 2011 at 7:56 AM,  <ANGELO.LINARDI at bancaditalia.it> wrote:
> Good morning,
>
>
>
> I am facing a problem very easy to solve with a program, but not too
> easy (at least IMHO) with a "declarative" approach.
>
> I have a dataframe df with some information about bank branches with a
> validity time associated (start date/end date, format YYYY-MM-DD) to
> some attributes (for example number of employees assigned).
>
>
>
> The following example will clarify this description:
>
>
>
> BANK_ID     BRANCH_ID   EMPLOYEE #  STARTDATE   ENDDATE
>
> B1          A15         30          2001-01-01  2001-10-15
>
> B1          A15         28          2001-10-16  2001-12-31
>
> B1          A15         32          2002-01-01  2003-04-18
>
> B1          A15         33          2003-04-19  2004-12-31
>
> B1          A15         29          2005-03-01  2010-12-31
>
> B1          A15         30          2011-02-12  9999-12-31
>
> ........................................................................
> ........................................................................
> .....................
>
>
>
> I would like to find the "missing time intervals" ("holes" - in the
> example 2005-01-01 to 2005-02-28 and from 2011-01-01 to 2011-02-11).
>
> The "programmer's way" would be:
>
>
>
> *         Sort the data by "key" + start date
>
> *         For each occurrence add 1 day to end date and compare the
> result with the start date of the following occurrence
>
>
>
> Can someone help me in finding a "declarative" way to do it ?
>
>
>
> Thank you in advance
>
>
>
> Angelo Linardi
>
>
>
>
>
>
>
>
>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on
> the Bank nor to be understood as creating any obligation on its part except where
> provided for in a written agreement. This e-mail is confidential. If you have received it
> by mistake, please inform the sender by reply e-mail and delete it from your system.
> Please also note that the unauthorized disclosure or use of the message or any
> attachments could be an offence. Thank you for your cooperation. **
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list