[R] duplicate values
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Nov 16 19:43:03 CET 2008
Is the question 'duplicated next to each other' or 'duplicated anywhere
later'? I read it as the latter, so would use
dup <- duplicated(x$dt)
or
dup <- duplicated(x[c("Date", "time")]
Also, be very careful as Date-time values like this can be duplicated and
refer to different times on days when DST ends. E.g. there are both
"2008-10-26 02:30:00 CEST"
"2008-10-26 02:30:00 CET"
in the timezone of Germany (at least with the names my system gives me in
English).
On Sun, 16 Nov 2008, jim holtman wrote:
> This should do it for you:
>
>> x <- read.table(textConnection( "Date time Temperature
> + 1 2008-6-1 00:00:00 5
> + 2 2008-6-1 02:00:00 5
> + 3 2008-6-1 03:00:00 6
> + 4 2008-6-1 03:00:00 0
> + 5 2008-6-1 04:00:00 6
> + 6 2008-6-1 04:00:00 0
> + 7 2008-6-1 05:00:00 7
> + 8 2008-6-1 06:00:00 7"), header=TRUE)
>> closeAllConnections()
>> # create datetime
>> x$dt <- as.POSIXct(paste(x$Date, x$time))
>> # create list of duplicate values next to each other
>> dup <- c(FALSE, diff(x$dt) == 0)
>> # remove
>> x[!dup,]
> Date time Temperature dt
> 1 2008-6-1 00:00:00 5 2008-06-01 00:00:00
> 2 2008-6-1 02:00:00 5 2008-06-01 02:00:00
> 3 2008-6-1 03:00:00 6 2008-06-01 03:00:00
> 5 2008-6-1 04:00:00 6 2008-06-01 04:00:00
> 7 2008-6-1 05:00:00 7 2008-06-01 05:00:00
> 8 2008-6-1 06:00:00 7 2008-06-01 06:00:00
>
>
> On Sun, Nov 16, 2008 at 1:10 PM, Antje Nöthlich <antno at web.de> wrote:
>> Hei R Users,
>>
>> i have the following dataframe:
>>
>> Datetime Temperature and many more collumns
>> 1 2008-6-1 00:00:00 5
>> 2 2008-6-1 02:00:00 5
>> 3 2008-6-1 03:00:00 6
>> 4 2008-6-1 03:00:00 0
>> 5 2008-6-1 04:00:00 6
>> 6 2008-6-1 04:00:00 0
>> 7 2008-6-1 05:00:00 7
>> 8 2008-6-1 06:00:00 7
>> . . .
>> . . .
>> . . .
>> 3000 2008-8-31 00:00:00 3
>>
>>
>> the problem is that row 3 & 4 and row 5 & 6 have the same "Datetime" value but they differ in the values of the "Temperature" column.
>> Now for the whole dataframe i would like to delete rows that have the same "Datetime" value as the prior row.
>> I have tried unique(dataframe), but it does not work here because the rows are no real duplicates of each other.
>> thanks in advance for your help!
>>
>> Antje
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list