[R] simplify a dataframe

Arnaud Michel michel.arnaud at cirad.fr
Wed Jul 17 22:44:50 CEST 2013


Thanks Arun and Rui for your helps
Michel
Le 17/07/2013 22:20, arun a écrit :
> #or
> library(plyr)
> res<-ddply(df1,.(INDX),summarize,Debut=head(Debut,1),Fin=tail(Fin,1))
> res$INDX<-factor(res$INDX,levels=unique(df1$INDX))
> res[order(res$INDX),-1]
> #       Debut        Fin
> #3 24/01/1995 31/12/1997
> #4 02/02/1995 12/03/1995
> #1 13/03/1995 30/06/1995
> #2 01/01/1996 31/01/1996
> A.K.
>
>
>
> ----- Original Message -----
> From: arun <smartpink111 at yahoo.com>
> To: Arnaud Michel <michel.arnaud at cirad.fr>
> Cc: R help <r-help at r-project.org>; Rui Barradas <ruipbarradas at sapo.pt>
> Sent: Wednesday, July 17, 2013 4:14 PM
> Subject: Re: [R] simplify a dataframe
>
> Hi,
> You could try:
>
> df1[,1:2]<-lapply(df1[,1:2],as.character)
>   df2New<- data.frame(Deb=unique(with(df1,ave(Debut,INDX,FUN=function(x) head(x,1)))),Fin=unique(with(df1,ave(Fin,INDX,FUN=function(x) tail(x,1)))))
> identical(df2New,df2)
> #[1] TRUE
>
> A.K.
>
>
> ----- Original Message -----
> From: Arnaud Michel <michel.arnaud at cirad.fr>
> To: Rui Barradas <ruipbarradas at sapo.pt>; R help <r-help at r-project.org>; arun <smartpink111 at yahoo.com>
> Cc:
> Sent: Wednesday, July 17, 2013 4:03 PM
> Subject: Re: [R] simplify a dataframe
>
>    Thank you for the question (1)
> Sorry for the imprecision for the question (2) :
> Suppose the date frame df
> df1 <- data.frame(
> Debut =c ( "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995"
> ,"28/02/1995"
> ,"01/03/1995", "13/03/1995", "01/01/1996", "31/01/1996") ,
> Fin = c ( "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995",
> "28/02/1995",
> "12/03/1995", "30/06/1995", "30/01/1996", "31/01/1996") ,
> INDX = c(6,6,6,  11,11,11, 4,  5,5) )
>
>
> I would like replace df1  by df2
>
> df2 <- data.frame(
> Deb  = c("24/01/1995",     "02/02/1995",     "13/03/1995",
> "01/01/1996") ,
> Fin  = c("31/12/1997", "12/03/1995",     "30/06/1995",
> "31/01/1996") )
>
> Explication :
> The lines 1, 2 3 of df1 (who have same value of index =6) are replaced
> by only one line with
> value of Debut of df2 = Debut of line 1 of df1
> value of Fin of df2 = Fin of line 3 of df1
>
> The lines 4,5,6 of df1 (who have same value of index =11) are replaced
> by only one line with
> value of Debut of df2 = Debut of line 4 of df1
> and value of fin of df2 = Fin of line 6 of df1
>
> The line 7 of df1 (who have same value of index =4) are replaced by only
> one line with
> value of Debut of df2 = Debut of line 7of df1
> and value of fin of df2 = Fin of line 7of df1
> ==> No change
>
> The lines 8,9 of df1 (who have same value of index =5) are replaced by
> only one line with
> value of Debut of df2 = Debut of line 8of df1
> and value of fin of df2 = Fin of line 9 of df1
>
> df1
>          Debut        Fin INDX
> 1 24/01/1995 30/04/1997    6
> 2 01/05/1997 30/12/1997    6
> 3 31/12/1997 31/12/1997    6
> 4 02/02/1995 27/02/1995   11
> 5 28/02/1995 28/02/1995   11
> 6 01/03/1995 12/03/1995   11
> 7 13/03/1995 30/06/1995    4
> 8 01/01/1996 30/01/1996    5
> 9 31/01/1996 31/01/1996    5
>
>            Deb        Fin
> 1 24/01/1995 31/12/1997
> 2 02/02/1995 12/03/1995
> 3 13/03/1995 30/06/1995
> 4 01/01/1996 31/01/1996
> Thank you for your helps
> Michel
>
> Le 17/07/2013 19:57, Rui Barradas a écrit :
>> Hello,
>>
>> As for question (1), try the following.
>>
>>
>> y2 <- cumsum(c(TRUE, diff(x1) > 0))
>> identical(as.integer(y1), y2)  # y1 is of class "numeric"
>>
>>
>> As for question (2) I'm not understanding it.
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 17-07-2013 18:21, Arnaud Michel escreveu:
>>> Hi Arun
>>>
>>> I have two questions always about the question of symplify a dataframe
>>>
>>> I would like
>>> 1)  to transform the vector x1 into the vector y1
>>> x1 <- c(1,1,1,-1000,         1,-1000, 1,1,1,1,1,1,-1000)
>>> y1 <- c(1,1,1,1,                    2,2, 3,3,3,3,3,3,3)
>>>
>>>
>>> 2) to transform the vectors Debut and Fin by taking into account INDX
>>> into the two vectors Deb and Fin
>>> Debut <- c (
>>> "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" ,"28/02/1995"
>>> ,"01/03/1995",
>>> "13/03/1995", "01/01/1996", "31/01/1996", "24/01/1995", "01/07/1995"
>>> ,"01/09/1995",
>>>      "01/07/1997", "01/01/1998", "01/08/1998", "01/01/2000",
>>> "17/01/2000","29/02/2000")
>>>
>>> Fin <- c (
>>> "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", "28/02/1995",
>>> "12/03/1995",
>>> "30/06/1995", "30/01/1996", "31/01/1996", "30/06/1995", "31/08/1995",
>>> "30/06/1997",
>>> "31/12/1997", "31/07/1998", "31/12/1999", "16/01/2000", "28/02/2000",
>>> "29/02/2000")
>>>
>>> INDX <- c(6,6,6,                    11,11,11, 4,        5,5)
>>>
>>>
>>> Deb  <- c("*24/01/1995*",     "*02/02/1995*", "*13/03/1995*",
>>> "*01/01/1996*")
>>> Fi n  <-  c("*31/12/1997*", "*12/03/1995*", "*30/06/1995*",
>>> "*31/01/1996*")
>>>
>>>
>>>          Debut        Fin INDX
>>> *24/01/1995* 30/04/1997    6
>>> 01/05/1997 30/12/1997    6
>>> 31/12/1997 *31/12/1997*    6
>>> *02/02/1995* 27/02/1995   11
>>> 28/02/1995 28/02/1995   11
>>> 01/03/1995 *12/03/1995*   11
>>> *13/03/1995* *30/06/1995*    4
>>> *01/01/1996* 30/01/1996    5
>>> 31/01/1996 *31/01/1996*    5
>>> ................
>>>
>>> Thanks for your help
>>>
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>

-- 
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31



More information about the R-help mailing list