[R] Change values in a dateframe

Arnaud Michel michel.arnaud at cirad.fr
Wed Jul 24 08:39:38 CEST 2013


Hello

I have the following problem :
The dataframe TEST has multiple lines for a same person because :
there are differents values of Nom or differents values of Prenom
but the values of Matricule or Sexe or Date.de.naissance are the same.

TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER",
"JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"
), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine",
"Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = 
"factor"),
     Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
     1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
     "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = 
"factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", 
row.names = c(NA,
-11L))


I would want to make homogeneous the information and would like built 2 
dataframes :
df1 wich has the value of Nom and Prenom of the first lines of TEST when 
there are different values. The other values (Matricule or Sexe or 
Date.de.naissance) are unchanged

df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER",
"JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom = 
structure(c(6L,
3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
"Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = 
"factor"),
     Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
     1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
     "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = 
"factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", 
row.names = c(NA,
-11L))

df2 wich has the value of Nom and Prenom of the last lines of TEST when 
there are different values. The other values (Matricule or Sexe or 
Date.de.naissance) are unchanged.

df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE",
"LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"),
     Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
     5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel",
     "Michèle", "Michelle", "Victor"), class = "factor"), Sexe = 
structure(c(1L,
     1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin",
     "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L,
     2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
     "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936",
     "30/03/1935"), class = "factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", 
row.names = c(NA,
-11L))

Thank for your helps
Michel

-- 
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31



More information about the R-help mailing list