[R] Time value not sorting properly

Joshua Wiley jwiley.psych at gmail.com
Fri Jul 9 02:49:41 CEST 2010


Jared,

I am not sure how you converted your 'time' variable from a factor to
numeric, but you probably actually want to convert it to one of the
'time' classes.  To learn more about them in R, see ?DateTimeClasses
Another nice feature of these special time classes is that they can
handle year, month, day, and time all in one column.  This means you
only need to sort by two columns (ID and time).  You can also look at
?strptime for details on converting character strings into time
variables.  An example using your data follows below.

Best regards,

Josh

samp.dat <- structure(list(ID = c(2836L, 2836L, 2836L, 2836L, 2836L, 2836L,
2836L, 2836L, 2836L, 2836L, 2836L, 2836L, 2836L, 2836L, 2836L,
2836L), year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L
), month = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L), day = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), time = structure(c(12L, 13L, 14L,
15L, 16L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L), .Label = c("0:01:35",
"10:00:15", "11:00:44", "12:00:17", "13:00:38", "14:00:25", "15:00:53",
"16:00:11", "17:00:23", "18:00:47", "21:01:13", "3:00:50", "6:00:20",
"7:00:42", "8:00:42", "9:00:12"), class = "factor"), Lat = c(-1.2402597,
-1.2397508, -1.2431248, -1.2396636, -1.2304111, -1.2255532, -1.2248113,
-1.2251362, -1.2246384, -1.2245949, -1.2269631, -1.2264911, -1.2251153,
-1.2315372, -1.2578944, -1.242075), Long = c(35.5405911, 35.5406318,
35.5388285, 35.5285848, 35.5139149, 35.5162895, 35.5147305, 35.491731,
35.4918846, 35.4918647, 35.4880909, 35.4837137, 35.4817967, 35.4806165,
35.4670629, 35.5449559), test = c(77L, 120L, 214L, 300L, 345L,
436L, 528L, 585L, 665L, 727L, 813L, 846L, 928L, 1027L, 1093L,
1132L)), .Names = c("ID", "year", "month", "day", "time", "Lat",
"Long", "test"), class = "data.frame", row.names = c(NA, -16L
))

str(samp.dat)

#first combine all time columns using paste()
#then convert to POSIXlt
samp.dat$time2 <- strptime(x = paste(samp.dat$year, "-",
           samp.dat$month, "-",
           samp.dat$day, " ",
           samp.dat$time,
           sep=""),
         format = "%Y-%m-%d %H:%M:%S")

str(samp.dat) #note how 'time2' is actually a time class now

#ordering becomes easier
temp.or <- order(samp.dat$ID, samp.dat$time2, decreasing=FALSE)

samp.dat <- samp.dat[temp.or, ]

samp.dat #print to screen


On Thu, Jul 8, 2010 at 4:28 PM, Jared Stabach
<jstabach at rams.colostate.edu> wrote:
> I have a dataframe of animal locations that I need to have in incremental
> order so that I can calculate the distance traveled between each time step.
> However, I have identified a few values that don't seem to sort properly.
> For instance, the last value in the table below should be the first value
> after sorting, since its time value is '00:01:35'.  But, for some reason, it
> seems to be recognized after the '21:01:13' value.  I also defined the time
> column as a numeric value (originally a factor) with the result shown in the
> 'test' column.  As the value is reported as '1132', it seems there is an
> issue with the time value listed.
>
>  ID      year    month  day  time          Lat
> Long            test
> 2836  2010   7         1      03:00:50     -1.2402597    35.5405911  77
> 2836  2010   7         1      06:00:20     -1.2397508    35.5406318  120
> 2836  2010   7         1      07:00:42     -1.2431248    35.5388285  214
> 2836  2010   7         1      08:00:42     -1.2396636    35.5285848  300
> 2836  2010   7         1      09:00:12     -1.2304111    35.5139149  345
> 2836  2010   7         1      10:00:15     -1.2255532    35.5162895  436
> 2836  2010   7         1      11:00:44     -1.2248113    35.5147305  528
> 2836  2010   7         1      12:00:17     -1.2251362    35.4917310  585
> 2836  2010   7         1      13:00:38     -1.2246384    35.4918846  665
> 2836  2010   7         1      14:00:25     -1.2245949    35.4918647  727
> 2836  2010   7         1      15:00:53     -1.2269631    35.4880909  813
> 2836  2010   7         1      16:00:11     -1.2264911    35.4837137  846
> 2836  2010   7         1      17:00:23     -1.2251153    35.4817967  928
> 2836  2010   7         1      18:00:47     -1.2315372    35.4806165  1027
> 2836  2010   7         1      21:01:13     -1.2578944    35.4670629  1093
> 2836  2010   7         1      00:01:35     -1.2420750    35.5449559  1132
>
> The code I used to sort the dataframe is:
>
> # Sort dataset so values are in incremental order
> temp.or
> <-order(wildebeest$ID,wildebeest$year,wildebeest$month,wildebeest$day,wildebeest$time,decreasing=FALSE)
> wildebeest <-wildebeest[temp.or,]
> Eventually, I will have around 400,000 records, so my script is designed at
> problem solving these errors.  Is there something that I am missing or is
> there something in this field that could possibly be hidden?  Any
> suggestions?
>
> Thanks in advance for any help.
>
> Jared
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list