[R] scaling to multiple data files
jholtman at gmail.com
Tue Jan 11 17:39:59 CET 2011
I am not sure exactly what your data represents. For example, from
looking at the data it appears that user1 and user2 have been logged
on for about 4 days; is that what the data is saying? If you are
keeping track of users, why not write out a file that has the
start/end time for each user's session. The first time you see them,
put an entry in a table and as soon as they don't show up in your
sample, write out a record for them. With that information is it easy
to create a report of the number of unique people over time.
On Tue, Jan 11, 2011 at 10:47 AM, Jason Edgecombe
<jason at rampaginggeek.com> wrote:
> I have logging information for multiple machines, which I am trying to
> summarize and graph. So far, I process each host individually, but I would
> like to summarize the user count across multiple hosts. I want to answer the
> question "how many unique users logged in on a certain day across a group of
> I'm not quite sure how to scale the data frame and analysis to summarize
> multiple hosts, though. I'm still getting a feel for using R.
> Here is a snippet of data for one host. the user_count column is generated
> from the users column using my custom function "usercount()". the samples
> are taken roughly once per minute and only unique samples are recorded.
> (i.e. use na.locf() to uncompress the data). Samples may occur twice in the
> same minute and are rarely aligned on the same time.
> Here is the original data before I turn t into a zoo series and run
> na.locf() over it so I can aggregate a single host by day. I'm open to a
> better way.
> users datetime user_count
> 1 user1 & user2 2007-03-29 19:16:30 2
> 2 user1 & user2 2007-03-31 00:04:46 2
> 3 user1 & user2 2007-04-02 11:49:20 2
> 4 user1 & user2 2007-04-02 12:02:04 2
> 5 user1 & user2 2007-04-02 12:44:02 2
> 6 user1 & user2 & user3 2007-04-02 16:34:05 3
> structure(list(users = c("user1 & user2", "user1 & user2", "user1 & user2",
> "user1 & user2", "user1 & user2", "user1 & user2 & user3"), datetime =
> 1175313886, 1175528960, 1175529724, 1175532242, 1175546045), class =
> "POSIXct"), tzone = "US/Eastern"), user_count = c(2, 2, 2, 2,
> 2, 3)), .Names = c("users", "datetime", "user_count"), row.names = c(NA,
> 6L), class = "data.frame")
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help