[R] cut POSIX results in NA - bug?
Petr Pikal
petr.pikal at precheza.cz
Wed Nov 3 16:51:01 CET 2004
Dear prof. Ripley
Thank you very much for explanation (without it I would not
consider include.lowest has something to do with my observation).
I changed my code to get rid of single final POSIXdates.
BTW there is no mention in cut.POSIXt help page about
include.lowest and I think that in case of dates it does something
what is maybe not so *understandable* (61 minutes in one hour).
datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min")
# part of a datum variable
datum[1379:1381]
[1] "2004-09-01 12:58:00 Støedn\355 Evropa (letn\355 èas)"
"2004-09-01 12:59:00 Støedn\355 Evropa (letn\355 èas)"
[3] "2004-09-01 13:00:00 Støedn\355 Evropa (letn\355 èas)"
>
# the last item seems to me to belong to time from 13:00:00 to
13:59:00 e.g. it is part of thirteen's hour of a day
cut(datum[1370:1381],"hour", include.lowest=T)
# it will include it to previous hour
[1] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01
12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01
12:00:00
[7] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01
12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01
12:00:00
Levels: 2004-09-01 12:00:00
cut(datum[1370:1381],"hour")
# this will drop it from result, correct but unfortunate
[1] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01
12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01
12:00:00
[7] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01
12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 <NA>
Levels: 2004-09-01 12:00:00
# so as a result an hour can have 61 minutes
levels(cut(datum[1321:1381],"hour", include.lowest=T))
[1] "2004-09-01 12:00:00"
length(cut(datum[1321:1381],"hour", include.lowest=T)) #???
[1] 61
Is it correct?
Thank you again.
Best regards
Petr Pikal
On 3 Nov 2004 at 11:20, Prof Brian Ripley wrote:
> On Wed, 3 Nov 2004, Petr Pikal wrote:
>
> > Dear all
> >
> > I try to make hourly average by cut() function, which almost works
> > as *I* expected. What puzled me is that if there is only one item at
> > the end of your data it results in NA.
> >
> > Example will explain what I mean
> >
> > datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min")
> >
> > cut(datum[1370:1381],"hour", labels=F)
> > [1] 1 1 1 1 1 1 1 1 1 1 1 NA
> >
> > cut(datum[1370:1382],"hour", labels=F)
> > [1] 1 1 1 1 1 1 1 1 1 1 1 2 2
> >
> > I do not understand why the last item in first call is NA. I found
> > it only when there was a switch from DST to standard time as it
> > coused a trouble in one of my functions and I found there is NA
> > value where I did not expected it.
>
> cut(datum[1370:1381],"hour", labels=F, include.lowest=T)
>
> is what you need. See ?cut, in the See Also, which says
>
> include.lowest: logical, indicating if an 'x[i]' equal to the lowest
> (or highest, for 'right = FALSE') 'breaks' value should be
> included.
>
> > I can make some workaround but can you please explain me why
> > first call results in NA value at the end of a vector and if it is
> > *intended* behaviour.
>
> It is the documented behaviour, for better or for worse.
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self) 1 South
> Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG,
> UK Fax: +44 1865 272595
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
Petr Pikal
petr.pikal at precheza.cz
More information about the R-help
mailing list