[R] How to extract Friday data from daily data.
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Nov 6 01:24:42 CET 2010
On Fri, Nov 5, 2010 at 1:22 PM, thornbird <huachang396 at gmail.com> wrote:
>
> I am new to Using R for data analysis. I have an incomplete time series
> dataset that is in daily format. I want to extract only Friday data from it.
> However, there are two problems with it.
>
> First, if Friday data is missing in that week, I need to extract the data of
> the day prior to that Friday (e.g. Thursday).
>
> Second, sometimes there are duplicate Friday data (say Friday morning and
> afternoon), but I only need the latest one (Friday afternoon).
>
> My question is how I can only extract the Friday data and make it a new
> dataset so that I have data for every single week for the convenience of
> data analysis.
>
There are several approaches depending on exactly what is to be
produced. We show two of them here using zoo.
# read in data
Lines <- " views number timestamp day time
1 views 910401 1246192687 Sun 6/28/2009 12:38
2 views 921537 1246278917 Mon 6/29/2009 12:35
3 views 934280 1246365403 Tue 6/30/2009 12:36
4 views 986463 1246888699 Mon 7/6/2009 13:58
5 views 995002 1246970243 Tue 7/7/2009 12:37
6 views 1005211 1247079398 Wed 7/8/2009 18:56
7 views 1011144 1247135553 Thu 7/9/2009 10:32
8 views 1026765 1247308591 Sat 7/11/2009 10:36
9 views 1036856 1247436951 Sun 7/12/2009 22:15
10 views 1040909 1247481564 Mon 7/13/2009 10:39
11 views 1057337 1247568387 Tue 7/14/2009 10:46
12 views 1066999 1247665787 Wed 7/15/2009 13:49
13 views 1077726 1247778752 Thu 7/16/2009 21:12
14 views 1083059 1247845413 Fri 7/17/2009 15:43
15 views 1083059 1247845824 Fri 7/17/2009 18:45
16 views 1089529 1247914194 Sat 7/18/2009 10:49"
library(zoo)
# read in and create a zoo series
# - skip= over the header
# - index=. the time index is third non-removed column.
# - format=. convert the index to Date class using indicated format
# - col.names= as specified
# - aggregate= over duplicate dates keeping last
# - colClasses= specifies "NULL" for columns we want to remove
colClasses <-
c("NULL", "NULL", "numeric", "numeric", "NULL", "character", "NULL")
col.names <- c(NA, NA, "views", "number", NA, NA, NA)
# z <- read.zoo("myfile.dat", skip = 1, index = 3,
z <- read.zoo(textConnection(Lines), skip = 1, index = 3,
format = "%m/%d/%Y", col.names = col.names,
aggregate = function(x) tail(x, 1), colClasses = colClasses)
## Now that we have read it in lets process it
## 1.
# extract all Thursdays and Fridays
z45 <- z[format(time(z), "%w") %in% 4:5,]
# keep last entry in each week
# and show result on R console
z45[!duplicated(format(time(z45), "%U"), fromLast = TRUE), ]
# 2. alternative approach
# above approach labels each point as it was originally labelled
# so if Thursday is used it gets the date of that Thursday
# Another approach is to always label the resulting point as Friday
# and also use the last available value even if its not Thursday
# create daily grid
g <- seq(start(z), end(z), by = "day")
# fill in daily grid so Friday is filled in with prior value
# if Friday is NA
z.filled <- na.locf(z, xout = g)
# extract Fridays (including those filled in from previous)
# and show result on R console
z.filled[format(time(z.filled), "%w") == "5", ]
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list