[R] Change values in a dateframe-Speed TEST

Arnaud Michel michel.arnaud at cirad.fr
Thu Jul 25 08:35:56 CEST 2013


But I just noticed that the two solutions are not comparable :
the change concern only Nom and Prenom (solution Berend) and not also 
Sexe or Date.de.naissance orother variables (solution Arun) that can 
changed. But my question was badly put.
Michel

Le 25/07/2013 08:06, Arnaud Michel a écrit :
> Hi
>
> For a dataframe with name PaysContrat1 and with
> nrow(PaysContrat1)
> [1] 52366
>
> the test of system.time is :
>
> system.time(droplevels(do.call(rbind,lapply(split(PaysContrat1,PaysContrat1$Matricule), 
>
> FUN=function(x) {x[,c("Nom","Prénom")] <- 
> x[nrow(x),c("Nom","Prénom"),drop=TRUE];x}))))
>    user  system elapsed
>   14.03    0.00   14.04
>
> system.time(droplevels(PaysContrat1[with(PaysContrat1,ave(seq_along(Matricule),Matricule,FUN=min)) 
> ,]  ))
>    user  system elapsed
>     0.2     0.0     0.2
>
> Michel
>
> Le 24/07/2013 15:29, arun a écrit :
>> Hi Michel,
>> You could try:
>>
>>
>> df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),]) 
>>
>> row.names(df1New)<-1:nrow(df1New)
>> df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),]) 
>>
>> row.names(df2New)<-1:nrow(df2New)
>>   identical(df1New,df1)
>> #[1] TRUE
>>   identical(df2New,df2)
>> #[1] TRUE
>> A.K.
>>
>>
>>
>> ----- Original Message -----
>> From: Arnaud Michel <michel.arnaud at cirad.fr>
>> To: R help <r-help at r-project.org>
>> Cc:
>> Sent: Wednesday, July 24, 2013 2:39 AM
>> Subject: [R] Change values in a dateframe
>>
>> Hello
>>
>> I have the following problem :
>> The dataframe TEST has multiple lines for a same person because :
>> there are differents values of Nom or differents values of Prenom
>> but the values of Matricule or Sexe or Date.de.naissance are the same.
>>
>> TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
>> 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER",
>> "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"
>> ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
>> 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine",
>> "Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor"
>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class =
>> "factor"),
>>       Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>>       1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", 
>> "07/12/1947",
>>       "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class =
>> "factor")), .Names = c("Matricule",
>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
>> row.names = c(NA,
>> -11L))
>>
>>
>> I would want to make homogeneous the information and would like built 2
>> dataframes :
>> df1 wich has the value of Nom and Prenom of the first lines of TEST when
>> there are different values. The other values (Matricule or Sexe or
>> Date.de.naissance) are unchanged
>>
>> df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
>> 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER",
>> "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom =
>> structure(c(6L,
>> 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
>> "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor"
>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class =
>> "factor"),
>>       Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>>       1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", 
>> "07/12/1947",
>>       "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class =
>> "factor")), .Names = c("Matricule",
>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
>> row.names = c(NA,
>> -11L))
>>
>> df2 wich has the value of Nom and Prenom of the last lines of TEST when
>> there are different values. The other values (Matricule or Sexe or
>> Date.de.naissance) are unchanged.
>>
>> df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
>> 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE",
>> "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"),
>>       Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
>>       5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel",
>>       "Michèle", "Michelle", "Victor"), class = "factor"), Sexe =
>> structure(c(1L,
>>       1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin",
>>       "Masculin"), class = "factor"), Date.de.naissance = 
>> structure(c(4L,
>>       2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
>>       "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", 
>> "29/12/1936",
>>       "30/03/1935"), class = "factor")), .Names = c("Matricule",
>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
>> row.names = c(NA,
>> -11L))
>>
>> Thank for your helps
>> Michel
>>
>

-- 
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31



More information about the R-help mailing list