[R] Change values in a dateframe

Arnaud Michel michel.arnaud at cirad.fr
Wed Jul 24 10:18:56 CEST 2013


Thank you Berend
It is exactly what I wanted.
Michel
Le 24/07/2013 09:48, Berend Hasselman a écrit :
> On 24-07-2013, at 08:39, Arnaud Michel <michel.arnaud at cirad.fr> wrote:
>
>> Hello
>>
>> I have the following problem :
>> The dataframe TEST has multiple lines for a same person because :
>> there are differents values of Nom or differents values of Prenom
>> but the values of Matricule or Sexe or Date.de.naissance are the same.
>>
>> TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
>> 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER",
>> "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"
>> ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
>> 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine",
>> "Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor"
>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = "factor"),
>>     Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>>     1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
>>     "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule",
>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA,
>> -11L))
>>
>>
>> I would want to make homogeneous the information and would like built 2 dataframes :
>> df1 wich has the value of Nom and Prenom of the first lines of TEST when there are different values. The other values (Matricule or Sexe or Date.de.naissance) are unchanged
>>
>> df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
>> 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER",
>> "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom = structure(c(6L,
>> 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
>> "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor"
>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = "factor"),
>>     Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
>>     1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
>>     "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = c("Matricule",
>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA,
>> -11L))
>>
>> df2 wich has the value of Nom and Prenom of the last lines of TEST when there are different values. The other values (Matricule or Sexe or Date.de.naissance) are unchanged.
>>
>> df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
>> 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE",
>> "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"),
>>     Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
>>     5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel",
>>     "Michèle", "Michelle", "Victor"), class = "factor"), Sexe = structure(c(1L,
>>     1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin",
>>     "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L,
>>     2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
>>     "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936",
>>     "30/03/1935"), class = "factor")), .Names = c("Matricule",
>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names = c(NA,
>> -11L))
>>
> Something like this
>
> r1 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
>                      FUN=function(x) {x[,c("Nom","Prenom")] <- x[1,c("Nom","Prenom"),drop=TRUE];x})))
> rownames(r1) <- NULL
> r1
>
> r2 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
>                      FUN=function(x) {x[,c("Nom","Prenom")] <- x[nrow(x),c("Nom","Prenom"),drop=TRUE];x})))
> rownames(r2) <- NULL
> r2
>
> #> identical(r1,df1)
> #[1] TRUE
> #> identical(r2,df2)
> #[1] TRUE
>
> Note: I had to change the Prenom and Sexe columns because of encoding issues. but that shouldn't have any influence on the above.
>
> Berend
>
>
>

-- 
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31



More information about the R-help mailing list