[R] Irregular time series frequencies

sartene at voila.fr sartene at voila.fr
Fri Nov 1 16:27:38 CET 2013


Thanks a lot Achim!

This helped a lot. I do not have exactly what I want yet, but I now have promising ideas to gather my data and find what I'm looking for (especially as.numeric(x, 
units = "hours")).

Regards,


Sartene Bel


> Message du 31/10/13 à 08h48
> De : "Achim Zeileis" 
> A : sartene at voila.fr
> Copie à : r-help at r-project.org
> Objet : Re: [R] Irregular time series frequencies
> 
> On Wed, 30 Oct 2013, sartene at voila.fr wrote:
> 
> > Hi everyone,
> >
> > I have a data frame with email addresses in the first column and in the second column a list of times (of different lengths) at which an email was sent from 
the 
> > user in the first column.
> >
> > Here is an example of my data:
> >
> > Email Email_sent
> > john at doe.com "2013-09-26 15:59:55" "2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54" 
> > jane at shoe.com "2013-09-26 09:50:28" "2013-09-26 14:41:24" "2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02" "2013-09-27 14:41:10" 
> > "2013-09-27 15:37:36"
> > ...
> >
> > I cannot find any way to calculate the frequencies between each email sent for each user:
> > john at doe.com 0.02 email / hour
> > jane at shoe.com 0.15 email / hour
> > ...
> >
> > Can anyone help me on this problem?
> 
> You could do something like this:
> 
> ## scan your data file
> d <- scan(, what = "character")
> 
> ## here I use the data from above
> d <- scan(textConnection('john at doe.com "2013-09-26 15:59:55"
> "2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54"
> jane at shoe.com "2013-09-26 09:50:28" "2013-09-26 14:41:24"
> "2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02"
> "2013-09-27 14:41:10" "2013-09-27 15:37:36"'), what = "character")
> 
> ## find position of e-mail addresses
> n <- grep("@", dc, fixed = TRUE)
> 
> ## extract list of dates
> n <- c(n, length(d) + 1)
> x <- lapply(1:(length(n) - 1),
> function(i) as.POSIXct(d[(n[i] + 1):(n[i+1] - 1)]))
> 
> ## add e-mail addresses as names
> names(x) <- d[head(n, -1)]
> 
> ## functions that could extract quantities of interest such as
> ## number of mails per hour or mean time difference etc.
> meantime <- function(timevec)
> mean(as.numeric(diff(timevec), units = "hours"))
> numperhour <- function(timevec)
> length(timevec) / as.numeric(diff(range(timevec)), units = "hours")
> 
> ## apply to full list
> sapply(x, numperhour)
> sapply(x, meantime)
> 
> ## apply to list by date
> sapply(x, function(timevec) tapply(timevec, as.Date(timevec), numperhour))
> sapply(x, function(timevec) tapply(timevec, as.Date(timevec), meantime))
> 
> hth,
> Z
> 
> > The ultimate goal (which seems amibitious at this time) is to calculate, for each user, the frequencies between each mail per day, between the first email sent 
> > and the last email sent each day (to avoid taking nights into account), i.e.:
> >
> > 2013-09-26 2013-09-27
> > john at doe.com 1.32 emails / hour 0.56 emails / hour
> > jane at shoe.com 10.57 emails / hour 2.54 emails / hour
> > ...
> >
> > At this time it seems pretty impossible, but I guess I will eventually find a way :-)
> >
> > Thanks a lot,
> >
> >
> > Sartene Bel
> > R learner
> > ___________________________________________________________
> > Qu'y a-t-il ce soir à la télé ? D'un coup d'?il, visualisez le programme sur Voila.fr http://tv.voila.fr/programmes/chaines-tnt/ce-soir.html
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
___________________________________________________________
Qu'y a-t-il ce soir à la télé ? D'un coup d'œil, visualisez le programme sur Voila.fr http://tv.voila.fr/programmes/chaines-tnt/ce-soir.html



More information about the R-help mailing list