[R] Odp: Problem with aggregating data across time points

Petr PIKAL petr.pikal at precheza.cz
Fri Jul 2 17:55:13 CEST 2010


Hi

did you try aggregate?

> aggregate(data[, 5:8],list(data$Date), sum, na.rm=T)
   Group.1 verbal self.harm violence_objects violence
1               0         0                0        0
2 01/04/07     25        15                3        9
3 02/04/07     24         6                8       13
4 03/04/07     17        13                0       10
> aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T)

BTW, why do you have empty rows?

Regards
Petr




> Hello-
> 
> I have a dataset which basically looks like this:
> 
> Location   Sex       Date          Time   Verbal    Self harm
> Violence_objects   Violence
>   A             1      1-4-2007       1800      3             0
>             1                       3
>   A             1      1-4-2007       1230      2            1
>            2                       4
>   D             2      2-4-2007       1100      0            4
>            0                       0
> ...
> 
> I've put a dput of the first section of the data at the end of this
> email. Basically I have these data for several days across all of the
> dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on
> until 31-12-2009. The last four variables which you can see at the end
> of the email are my dependent variables, they are different types of
> violent and self harming behaviour shown by patients in a psychiatric
> hospital.
> 
> What I want to do is:
> 
> A) sum each of the dependent variables for each of the dates (so e.g.
> in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and
> 3+4=7 for each of the variables)
> 
> B) do this sum, but only in each location this time (location is the
> first variable)- so the sum for 1-4-2007 in location A, sum for
> 1-4-2007 in location B, and so on and so on. Because this is divided
> across locations, some dates will have no data going into them and
> will return 0 sums. Crucially I still want these dates to appear- so
> e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2
> 0 0, then 23-5-2008 0 0 0 0 again, and etc.
> 
> I've had several abortive attempts and done some Googling but have got
> nowhere. I'd greatly appreciate any advice.
> 
> Many thanks,
> Chris Beeley
> (Institute of Mental Health, UK)
> 
> 
> structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L,
> 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L,
> 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L,
> 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L,
> 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L,
> 4L, 4L, 4L), .Label = c("", "A", "B", "C", "D", "E", "F"), class = 
"factor"),
>     Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L,
>     2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L,
>     1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L,
>     2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA,
>     1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L
>     ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
>     2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
>     2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L,
>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L,
>     4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L,
>     4L, 4L, 4L, 4L), .Label = c("", "01/04/07", "02/04/07", "03/04/07"
>     ), class = "factor"), Time = structure(c(1L, 28L, 1L, 1L,
>     1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L,
>     34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L,
>     10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L,
>     36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L,
>     29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L,
>     7L, 12L, 1L, 15L), .Label = c("", " ", "02:24:00", "03:44:00",
>     "04:30:00", "07:00:00", "08:35:00", "09:20:00", "09:30:00",
>     "10:00:00", "10:15:00", "10:45:00", "11:00:00", "11:20:00",
>     "11:30:00", "11:35:00", "11:50:00", "12:00:00", "12:25:00",
>     "12:30:00", "12:45:00", "15:00:00", "15:15:00", "15:30:00",
>     "15:35:00", "17:15:00", "17:50:00", "18:00:00", "19:00:00",
>     "19:30:00", "19:50:00", "20:00:00", "20:30:00", "20:55:00",
>     "22:15:00", "22:30:00", "22:35:00", "22:40:00", "23:10:00"
>     ), class = "factor"), verbal = c(NA, 3L, NA, NA, NA, 3L,
>     0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L,
>     NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA,
>     NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L,
>     0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L,
>     0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA,
>     NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L,
>     2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA,
>     0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L,
>     NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA,
>     1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA,
>     0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L,
>     NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
>     0L, 0L, NA, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>     4L, 0L, 4L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, 0L, 0L, NA,
>     0L, 0L, NA, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L), violence = c(NA,
>     0L, NA, NA, NA, 0L, 1L, NA, NA, 3L, 0L, NA, 0L, 1L, 1L, 1L,
>     NA, 1L, 1L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
>     3L, 3L, NA, 2L, NA, NA, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
>     0L, 3L, 0L, NA, 0L, 0L, NA, 2L, 0L, NA, 0L, NA, 0L, 0L, NA,
>     0L, 0L, NA, 0L, 0L, 0L, NA, 3L, 3L, 2L, 0L, 0L)), .Names = 
c("Location",
> "Sex", "Date", "Time", "verbal", "self.harm", "violence_objects",
> "violence"), class = "data.frame", row.names = c(NA, -73L))
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list