[R] Change values in a dateframe-Speed TEST

Thu Jul 25 08:50:36 CEST 2013

On 25-07-2013, at 08:35, Arnaud Michel <michel.arnaud at cirad.fr> wrote:

> But I just noticed that the two solutions are not comparable :
> the change concern only Nom and Prenom (solution Berend) and not also Sexe or Date.de.naissance orother variables (solution Arun) that can changed. But my question was badly put.

Indeed:-)

But that can be remedied with (small correction w.r.t. initial solution: drop=TRUE removed; not relevant here)

r1 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                    FUN=function(x) {x[,1:ncol(x)] <- x[1,1:ncol(x)];x})))

and

r2 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                    FUN=function(x) {x[,1:ncol(x)] <- x[nrow(x),1:ncol(x)];x})))

Less elegant than alternative with ave

Berend

> Michel
> 
> Le 25/07/2013 08:06, Arnaud Michel a écrit :
>> Hi
>> 
>> For a dataframe with name PaysContrat1 and with
>> nrow(PaysContrat1)
>> [1] 52366
>> 
>> the test of system.time is :
>> 
>> system.time(droplevels(do.call(rbind,lapply(split(PaysContrat1,PaysContrat1$Matricule), 
>> FUN=function(x) {x[,c("Nom","Prénom")] <- x[nrow(x),c("Nom","Prénom"),drop=TRUE];x}))))
>>   user  system elapsed
>>  14.03    0.00   14.04
>> 
>> system.time(droplevels(PaysContrat1[with(PaysContrat1,ave(seq_along(Matricule),Matricule,FUN=min)) ,]  ))
>>   user  system elapsed
>>    0.2     0.0     0.2
>> 
>> Michel
>> 
>> Le 24/07/2013 15:29, arun a écrit :
>>> Hi Michel,
>>> You could try:
>>> 
>>> 
>>> df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),]) 
>>> row.names(df1New)<-1:nrow(df1New)
>>> df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),]) 
>>> row.names(df2New)<-1:nrow(df2New)
>>>  identical(df1New,df1)
>>> #[1] TRUE
>>>  identical(df2New,df2)
>>> #[1] TRUE
>>> A.K.
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Arnaud Michel <michel.arnaud at cirad.fr>
>>> To: R help <r-help at r-project.org>
>>> Cc:
>>> Sent: Wednesday, July 24, 2013 2:39 AM
>>> Subject: [R] Change values in a dateframe
>>> 
>>> Hello
>>> 
>>> I have the following problem :
>>> The dataframe TEST has multiple lines for a same person because :
>>> there are differents values of Nom or differents values of Prenom
>>> but the values of Matricule or Sexe or Date.de.naissance are the same.
>>> 
>>> TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
>>> 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER",
>>> "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"
>>> ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
>>> 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine",
>>> "Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor"
>>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
>>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class =
>>> "factor"),
>>>      Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>>>      1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
>>>      "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class =
>>> "factor")), .Names = c("Matricule",
>>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
>>> row.names = c(NA,
>>> -11L))
>>> 
>>> 
>>> I would want to make homogeneous the information and would like built 2
>>> dataframes :
>>> df1 wich has the value of Nom and Prenom of the first lines of TEST when
>>> there are different values. The other values (Matricule or Sexe or
>>> Date.de.naissance) are unchanged
>>> 
>>> df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
>>> 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER",
>>> "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom =
>>> structure(c(6L,
>>> 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
>>> "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor"
>>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
>>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class =
>>> "factor"),
>>>      Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>>>      1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
>>>      "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class =
>>> "factor")), .Names = c("Matricule",
>>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
>>> row.names = c(NA,
>>> -11L))
>>> 
>>> df2 wich has the value of Nom and Prenom of the last lines of TEST when
>>> there are different values. The other values (Matricule or Sexe or
>>> Date.de.naissance) are unchanged.
>>> 
>>> df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
>>> 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE",
>>> "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"),
>>>      Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
>>>      5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel",
>>>      "Michèle", "Michelle", "Victor"), class = "factor"), Sexe =
>>> structure(c(1L,
>>>      1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin",
>>>      "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L,
>>>      2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
>>>      "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936",
>>>      "30/03/1935"), class = "factor")), .Names = c("Matricule",
>>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
>>> row.names = c(NA,
>>> -11L))
>>> 
>>> Thank for your helps
>>> Michel
>>> 
>> 
> 
> -- 
> Michel ARNAUD
> Chargé de mission auprès du DRH
> DGDRD-Drh - TA 174/04
> Av Agropolis 34398 Montpellier cedex 5
> tel : 04.67.61.75.38
> fax : 04.67.61.57.87
> port: 06.47.43.55.31
>