[R] Odp: Problem with aggregating data across time points

Chris Beeley chris.beeley at gmail.com
Sat Jul 3 14:06:13 CEST 2010


Thanks for all your help, that has worked a treat. To answer your questions, I want to include the zero rows because I am going to analyse using mixed models (with dummies for day of week, location etc.) and I thought it was necessary to include a complete list of time variables, but now I'm wondering if it is necessary. As for the empty rows, the database is generated automatically by the incidents reporting system and is a bit of a mess, so I want to make sure that the code doesn't stumble over such things. 

Thanks again all!



On 2 Jul 2010, at 17:14, David Winsemius <dwinsemius at comcast.net> wrote:

> 
> On Jul 2, 2010, at 11:55 AM, Petr PIKAL wrote:
> 
>> Hi
>> 
>> did you try aggregate?
>> 
>>> aggregate(data[, 5:8],list(data$Date), sum, na.rm=T)
>>  Group.1 verbal self.harm violence_objects violence
>> 1               0         0                0        0
>> 2 01/04/07     25        15                3        9
>> 3 02/04/07     24         6                8       13
>> 4 03/04/07     17        13                0       10
>>> aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T)
> 
> That address his A) request:
> 
> Here is the application of aggregate to his B) request (I think):
> 
> # Not e that Date is not of class Date but is rather a factor that includes "" as a level.
> 
> > aggregate(series[, 5:8],list(series$Date, series$Location), sum, na.rm=T)
>    Group.1 Group.2 verbal self.harm violence_objects violence
> 1                        0         0                0        0
> 2                 A      0         0                0        0
> 3  01/04/07       A      7         1                0        3
> 4  02/04/07       A      8         2                0        1
> 5  03/04/07       A      0         0                0        2
> 6                 B      0         0                0        0
> 7  01/04/07       B      3         2                0        1
> 8  02/04/07       B      4         2                0        0
> 9  03/04/07       B      4         0                0        3
> 10                C      0         0                0        0
> 11 01/04/07       C      4         2                3        2
> 12 02/04/07       C      0         0                4        2
> 13 03/04/07       C      1         1                0        5
> 14                D      0         0                0        0
> 15 01/04/07       D      7         6                0        3
> 16 02/04/07       D      0         0                0        9
> 17 03/04/07       D      4        11                0        0
> 18                E      0         0                0        0
> 19 01/04/07       E      4         3                0        0
> 20 02/04/07       E      4         0                4        0
> 21 03/04/07       E      8         1                0        0
> 22                F      0         0                0        0
> 23 01/04/07       F      0         1                0        0
> 24 02/04/07       F      8         2                0        1
> 
> So perhaps an output with less extraneous input would be better:
> 
> > with(series[series$Date != "", ],
>        aggregate(list(verbal=verbal, self.harm=self.harm, viol_obj=violence_objects, violence=violence),
>                  list(Date, Location),
>                  sum, na.rm=T)
>       )
> 
>    Group.1 Group.2 verbal self.harm viol_obj violence
> 1  01/04/07       A      7         1        0        3
> 2  02/04/07       A      8         2        0        1
> 3  03/04/07       A      0         0        0        2
> 4  01/04/07       B      3         2        0        1
> 5  02/04/07       B      4         2        0        0
> 6  03/04/07       B      4         0        0        3
> 7  01/04/07       C      4         2        3        2
> 8  02/04/07       C      0         0        4        2
> 9  03/04/07       C      1         1        0        5
> 10 01/04/07       D      7         6        0        3
> 11 02/04/07       D      0         0        0        9
> 12 03/04/07       D      4        11        0        0
> 13 01/04/07       E      4         3        0        0
> 14 02/04/07       E      4         0        4        0
> 15 03/04/07       E      8         1        0        0
> 16 01/04/07       F      0         1        0        0
> 17 02/04/07       F      8         2        0        1
>> 
>> BTW, why do you have empty rows?
>> 
>> Regards
>> Petr
>> 
>> 
>> 
>> 
>>> Hello-
>>> 
>>> I have a dataset which basically looks like this:
>>> 
>>> Location   Sex       Date          Time   Verbal    Self harm
>>> Violence_objects   Violence
>>> A             1      1-4-2007       1800      3             0
>>>           1                       3
>>> A             1      1-4-2007       1230      2            1
>>>          2                       4
>>> D             2      2-4-2007       1100      0            4
>>>          0                       0
>>> ...
>>> 
>>> I've put a dput of the first section of the data at the end of this
>>> email. Basically I have these data for several days across all of the
>>> dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on
>>> until 31-12-2009. The last four variables which you can see at the end
>>> of the email are my dependent variables, they are different types of
>>> violent and self harming behaviour shown by patients in a psychiatric
>>> hospital.
>>> 
>>> What I want to do is:
>>> 
>>> A) sum each of the dependent variables for each of the dates (so e.g.
>>> in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and
>>> 3+4=7 for each of the variables)
>>> 
>>> B) do this sum, but only in each location this time (location is the
>>> first variable)- so the sum for 1-4-2007 in location A, sum for
>>> 1-4-2007 in location B, and so on and so on. Because this is divided
>>> across locations, some dates will have no data going into them and
>>> will return 0 sums. Crucially I still want these dates to appear- so
>>> e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2
>>> 0 0, then 23-5-2008 0 0 0 0 again, and etc.
>>> 
>>> I've had several abortive attempts and done some Googling but have got
>>> nowhere. I'd greatly appreciate any advice.
>>> 
>>> Many thanks,
>>> Chris Beeley
>>> (Institute of Mental Health, UK)
>>> 
>>> 
>>> structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L,
>>> 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L,
>>> 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L,
>>> 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L,
>>> 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L,
>>> 4L, 4L, 4L), .Label = c("", "A", "B", "C", "D", "E", "F"), class =
>> "factor"),
>>>   Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L,
>>>   2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L,
>>>   1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L,
>>>   2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA,
>>>   1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L
>>>   ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
>>>   2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
>>>   2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L,
>>>   3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L,
>>>   4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L,
>>>   4L, 4L, 4L, 4L), .Label = c("", "01/04/07", "02/04/07", "03/04/07"
>>>   ), class = "factor"), Time = structure(c(1L, 28L, 1L, 1L,
>>>   1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L,
>>>   34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L,
>>>   10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L,
>>>   36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L,
>>>   29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L,
>>>   7L, 12L, 1L, 15L), .Label = c("", " ", "02:24:00", "03:44:00",
>>>   "04:30:00", "07:00:00", "08:35:00", "09:20:00", "09:30:00",
>>>   "10:00:00", "10:15:00", "10:45:00", "11:00:00", "11:20:00",
>>>   "11:30:00", "11:35:00", "11:50:00", "12:00:00", "12:25:00",
>>>   "12:30:00", "12:45:00", "15:00:00", "15:15:00", "15:30:00",
>>>   "15:35:00", "17:15:00", "17:50:00", "18:00:00", "19:00:00",
>>>   "19:30:00", "19:50:00", "20:00:00", "20:30:00", "20:55:00",
>>>   "22:15:00", "22:30:00", "22:35:00", "22:40:00", "23:10:00"
>>>   ), class = "factor"), verbal = c(NA, 3L, NA, NA, NA, 3L,
>>>   0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L,
>>>   NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA,
>>>   NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L,
>>>   0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L,
>>>   0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA,
>>>   NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L,
>>>   2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA,
>>>   0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L,
>>>   NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA,
>>>   1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA,
>>>   0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L,
>>>   NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
>>>   0L, 0L, NA, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>>   4L, 0L, 4L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, 0L, 0L, NA,
>>>   0L, 0L, NA, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L), violence = c(NA,
>>>   0L, NA, NA, NA, 0L, 1L, NA, NA, 3L, 0L, NA, 0L, 1L, 1L, 1L,
>>>   NA, 1L, 1L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
>>>   3L, 3L, NA, 2L, NA, NA, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
>>>   0L, 3L, 0L, NA, 0L, 0L, NA, 2L, 0L, NA, 0L, NA, 0L, 0L, NA,
>>>   0L, 0L, NA, 0L, 0L, 0L, NA, 3L, 3L, 2L, 0L, 0L)), .Names =
>> c("Location",
>>> "Sex", "Date", "Time", "verbal", "self.harm", "violence_objects",
>>> "violence"), class = "data.frame", row.names = c(NA, -73L))
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT
> 



More information about the R-help mailing list