[R] Change values in a dateframe-Speed TEST
Arnaud Michel
michel.arnaud at cirad.fr
Thu Jul 25 08:06:11 CEST 2013
Hi
For a dataframe with name PaysContrat1 and with
nrow(PaysContrat1)
[1] 52366
the test of system.time is :
system.time(droplevels(do.call(rbind,lapply(split(PaysContrat1,PaysContrat1$Matricule),
FUN=function(x) {x[,c("Nom","Prénom")] <-
x[nrow(x),c("Nom","Prénom"),drop=TRUE];x}))))
user system elapsed
14.03 0.00 14.04
system.time(droplevels(PaysContrat1[with(PaysContrat1,ave(seq_along(Matricule),Matricule,FUN=min))
,] ))
user system elapsed
0.2 0.0 0.2
Michel
Le 24/07/2013 15:29, arun a écrit :
> Hi Michel,
> You could try:
>
>
> df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),])
> row.names(df1New)<-1:nrow(df1New)
> df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),])
> row.names(df2New)<-1:nrow(df2New)
> identical(df1New,df1)
> #[1] TRUE
> identical(df2New,df2)
> #[1] TRUE
> A.K.
>
>
>
> ----- Original Message -----
> From: Arnaud Michel <michel.arnaud at cirad.fr>
> To: R help <r-help at r-project.org>
> Cc:
> Sent: Wednesday, July 24, 2013 2:39 AM
> Subject: [R] Change values in a dateframe
>
> Hello
>
> I have the following problem :
> The dataframe TEST has multiple lines for a same person because :
> there are differents values of Nom or differents values of Prenom
> but the values of Matricule or Sexe or Date.de.naissance are the same.
>
> TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
> 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER",
> "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"
> ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
> 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine",
> "Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor"
> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class =
> "factor"),
> Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
> 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
> "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class =
> "factor")), .Names = c("Matricule",
> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
> row.names = c(NA,
> -11L))
>
>
> I would want to make homogeneous the information and would like built 2
> dataframes :
> df1 wich has the value of Nom and Prenom of the first lines of TEST when
> there are different values. The other values (Matricule or Sexe or
> Date.de.naissance) are unchanged
>
> df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
> 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER",
> "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom =
> structure(c(6L,
> 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
> "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor"
> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class =
> "factor"),
> Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
> 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
> "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class =
> "factor")), .Names = c("Matricule",
> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
> row.names = c(NA,
> -11L))
>
> df2 wich has the value of Nom and Prenom of the last lines of TEST when
> there are different values. The other values (Matricule or Sexe or
> Date.de.naissance) are unchanged.
>
> df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
> 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE",
> "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"),
> Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
> 5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel",
> "Michèle", "Michelle", "Victor"), class = "factor"), Sexe =
> structure(c(1L,
> 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin",
> "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L,
> 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
> "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936",
> "30/03/1935"), class = "factor")), .Names = c("Matricule",
> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame",
> row.names = c(NA,
> -11L))
>
> Thank for your helps
> Michel
>
--
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31
More information about the R-help
mailing list