[R] Selecting ranges of dates from a dataframe
David Winsemius
dwinsemius at comcast.net
Fri Mar 11 13:12:08 CET 2011
On Mar 10, 2011, at 8:23 AM, Benjamin Stier wrote:
> Hello list!
>
> I have a data.frame which looks like this:
>> serv
> datum op.read op.write read write
> 1 2011-01-29 10:00:00 0 0 0 0
> 2 2011-01-29 10:00:01 0 0 0 0
> 3 2011-01-29 10:00:02 0 0 0 0
> 4 2011-01-29 10:00:03 0 4 0 647168
> 5 2011-01-29 10:00:04 0 0 0 0
> 6 2011-01-29 10:00:05 0 14 0 1960837
> 7 2011-01-29 10:00:06 0 0 0 0
> ...
> 115 2011-01-30 10:00:54 0 0 0 0
> 116 2011-01-30 10:00:55 0 0 0 0
> 117 2011-01-30 10:00:56 0 0 0 0
> 118 2011-01-30 10:00:57 54 0 29184 0
> 119 2011-01-30 10:00:58 204 0 122880 0
> 120 2011-01-30 10:00:59 0 0 0 0
> ...
>
> I want to compare read/write from each day. I already have a
> solution, but it
> is pretty slow.
See if this is any faster:
> aggregate(serv[, c("read", "write")], list(format(serv$datum, "%Y-
%m-%d")), sum)
Group.1 read write
1 2011-01-29 1021439 11726356
2 2011-01-30 1089534 4634910
>
> # read the data
> serv <- read.delim("cut.inp")
>
> # Reformat the dates from the file
> serv$datum <- strptime(serv$datum, "%Y-%m-%d %H:%M:%S")
>
> # select all single days
> dates.serv <- unique(strptime(serv$datum, format="%Y-%m-%d"))
>
> # create a data.frame
> values <- data.frame(row.names=1, datum=numeric(0),
> write=numeric(0), read=numeric(0))
> for(i in as.character(dates.serv)) {
> # build up a values for a day-range
> searchstart <- as.POSIXlt(paste(i, "00:00:00", sep=" "))
> searchend <- as.POSIXlt(paste(i, "23:59:59", sep=" "))
> # select all values from a specific day
> day <- serv[(serv$datum >= searchstart & serv$datum <=
> searchend),]
> write <- as.numeric(sum(as.numeric(day$write)))
> read <- as.numeric(sum(as.numeric(day$read)))
> # add to the data.frame
> values <- rbind(values, data.frame(datum=i, write=write,
> read=read))
> }
>
> This is my first try using R for statistics so I'm sure this isn't
> the best
> solution.
> The for-loop does it's job, but as I said is really slow. My data is
> for 21
> days and 1 line per second.
> Is there a better way to select the date-ranges instead of a for-
> loop? The
> line where I select all values for "day" seems to be the heaviest.
> Any idea?
>
> Kind regards,
>
> Benjamin
>
> PS: I attached some sample data, in case you want to try for yourself.
> <cut.inp>______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list