[R] Problems of data processing

Jacques VESLOT jacques.veslot at cirad.fr
Tue Jan 17 12:28:15 CET 2006


sorry, i let toto$Date in the function within lapply() instead of x$Date !

now, it works :

 > toto
  Num         Date Place  X  Y
1   1  1/1/04 0:48    x1  1  1
2   2  1/1/04 8:02    x1 NA NA
3   4  1/1/04 1:55    x4  3  7
4   3  1/1/04 2:14    x3  2  9
5   4  1/1/04 1:19    x4  3  7
6   4  1/1/04 1:02    x4  3  7
7   5 1/1/04 11:15    x5  6  8
8   5  1/1/04 9:06    x5  6  8
9   5 1/1/04 10:32    x5  6  8

 > toto <- do.call("rbind", lapply(split(toto, toto$Num),
+    function(x) x[which.min(as.POSIXct(strptime(x$Date, "%d/%m/%y 
%H:%M"))),]))

 > toto
  Num        Date Place  X  Y
1   1 1/1/04 0:48    x1  1  1
2   2 1/1/04 8:02    x1 NA NA
3   3 1/1/04 2:14    x3  2  9
4   4 1/1/04 1:02    x4  3  7
5   5 1/1/04 9:06    x5  6  8

 > toto <- merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
 > toto
  Place Num        Date X Y
1    x1   1 1/1/04 0:48 1 1
2    x1   2 1/1/04 8:02 1 1
3    x3   3 1/1/04 2:14 2 9
4    x4   4 1/1/04 1:02 3 7
5    x5   5 1/1/04 9:06 6 8



Florent Bonneu a écrit :

> Thank you very much for your help but I think there is an error for 
> the answer to the first problem  I spent time on searching the 
> solution but I failed to find it. I tried to put "which.max" instead 
> of "which.min" but it doesn't work. I tried to do my best but i didn't 
> have any idea to solve this problem.
>
> An example :
>
> Num <- c(1,2,4,3,4,4,5,5,5)
> Date <- c("1/1/04 0:48","1/1/04 8:02", "1/1/04 1:55", "1/1/04 2:14", 
> "1/1/04 1:19", "1/1/04 1:02", "1/1/04 11:15", "1/1/04 9:06", "1/1/04 
> 10:32")
> Place <- c("x1","x1","x4","x3","x4","x4","x5","x5","x5")
> X <- c(1,NA,3,2,3,3,6,6,6)
> Y <- c(1,NA,7,9,7,7,8,8,8)
> toto <- data.frame(Num,Date,Place,X,Y)
> toto[order(toto$Num,as.numeric(as.POSIXct(strptime(toto$Date, 
> "%d/%m/%y %H:%M")))),]
> toto <- merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
> help <- do.call("rbind", lapply(split(toto, toto$Num),
>   function(x) x[which.min(as.numeric(as.POSIXct(strptime(toto$Date, 
> "%d/%m/%y %H:%M")))),]))
> help
>
> The solution must be
>
> Num <- c(1,2,3,4,5)
> Date <- c("1/1/04 0:48","1/1/04 8:02", "1/1/04 2:14", "1/1/04 1:02", 
> "1/1/04 9:06")
> Place <- c("x1","x1","x3","x4","x5")
> X <- c(1,1,2,3,6)
> Y <- c(1,1,9,7,8)
> toto <- data.frame(Num,Date,Place,X,Y)
>
>
> Any suggestion is welcome.
>
> Florent Bonneu.
>
>
>
> Jacques VESLOT wrote:
>
>> OK ! so try this:
>> merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
>>
>>
>> Florent Bonneu a écrit :
>>
>>> Indeed,
>>> X <- c(1,Na,2,3,3,3,6,6)
>>> Y <- c(1,Na,9,7,7,7,8,8)
>>>
>>> I want to obtain one line for each Num. It's not a problem if there 
>>> are several lines for the same place, because my identifier is Num. 
>>> I just want to get X and Y well-informed in an other line for the 
>>> same place. For example, "Num=2" is at the place "x1", like "Num=1", 
>>> but we don't have the coordinates X and Y for "Num=2".  Now, the 
>>> same coordinates are well-informed for "Num=1", so i want to 
>>> retrieve this coordinates in my line "Num=2" for my columns X and Y.
>>>
>>>
>>>
>>> Jacques VESLOT wrote:
>>>
>>>> something wrong in X and Y definitions... but this could work:
>>>>
>>>> do.call("rbind", lapply(split(toto, toto$Num),
>>>>    function(x) x[which.min(as.POSIXct(strptime(toto$Date, "%d/%m/%y 
>>>> %H:%M"))),]))
>>>>
>>>> i don't understand the second query; do you want to keep the first 
>>>> line when there are several lines for the same place ?
>>>>
>>>>
>>>> Florent Bonneu a écrit :
>>>>
>>>>> I have two problems for the data processing of my large data base 
>>>>> (50000 rows). For example, a sample is as follows
>>>>>
>>>>> Num <- c(1,2,3,4,4,4,5,5)
>>>>> Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 
>>>>> 2:14", "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06")
>>>>> Place <- c("x1","x1","x3","x4","x4","x4","x5","x5")
>>>>> X <- c(1,””,2,3,3,3,6,6)
>>>>> Y <- c(1,””,9,7,7,7,8,8)
>>>>>
>>>>> toto <- data.frame(Num,Date,Place,X,Y)
>>>>>
>>>>> The first problem is to keep one line for each Num with the 
>>>>> “minimum” date. I managed to do it with loops but i would like a 
>>>>> solution without using loops. It will be better for my large data 
>>>>> base.
>>>>>
>>>>> The other one is to retrieve the coordinates ill-informed. For 
>>>>> example, for the same place “x1”, Num=2 doesn't have X and Y. But, 
>>>>> we have this information for Num=1.
>>>>>
>>>>> The example data base must be like this
>>>>>
>>>>> Num <- c(1,2,3,4,5)
>>>>> Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 
>>>>> 2:14", "1/1/04 9:05")
>>>>> Place <- c("x1","x1","x3","x4","x5")
>>>>> X <- c(1,1,2,3,6)
>>>>> Y <- c(1,1,9,7,8)
>>>>>
>>>>> toto <- data.frame(Num,Date,Place,X,Y)
>>>>> Somebody know how to do ?
>>>>> Thanks.
>>>>>
>>>>> Florent Bonneu
>>>>> Laboratoire de Statistique et Probabilités
>>>>> bureau 148  bât. 1R2
>>>>> Université Toulouse 3
>>>>> 118 route de Narbonne - 31062 Toulouse cedex 9
>>>>> bonneu at cict.fr <mailto:bonneu at cict.fr>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at stat.math.ethz.ch mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide! 
>>>>> http://www.R-project.org/posting-guide.html
>>>>>
>>>>>  
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>




More information about the R-help mailing list