[R] Sum Question

Dennis Murphy djmuser at gmail.com
Thu Jun 30 22:15:47 CEST 2011


Hi:

Here's a data.table solution. After I read in your data as a data
frame named dd, I used str() to check its contents:
> str(dd)
'data.frame':   19 obs. of  5 variables:
 $ RID     : int  43 95 230 230 235 235 247 247 321 321 ...
 $ SCRNO   : Factor w/ 6 levels "HBA0020036","HBA0020087",..: 1 2 3 3
4 4 5 5 6 6 ...
 $ VISCODE : Factor w/ 1 level "bl": 1 1 1 1 1 1 1 1 1 1 ...
 $ RECNO   : int  1 1 2 1 2 1 1 2 13 5 ...
 $ CONTTIME: int  9 3 3 28 5 6 5 4 2 13 ...

If you were getting CONTTIME as a factor, I'm guessing you put all of
this into a matrix (cbind?) and then read it into data.table. If so,
you need to spend a little time reading up on the differences between
matrices and data frames. A data table is meant to be a generalization
of a data frame.  It's important that you know the classes of your
objects and how to coerce them from one class to another if necessary.
That aside,

> library(data.table)
data.table 1.6
Quick start guide : vignette("datatable-intro")
Homepage : http://datatable.r-forge.r-project.org/
Help : help("data.table") or ?data.table (includes fast start examples)
> dt <- data.table(dd, key = 'SCRNO')
> dt[, list(csum = sum(CONTTIME)), by = SCRNO]
          SCRNO csum
[1,] HBA0020036    9
[2,] HBA0020087    3
[3,] HBA0020209   31
[4,] HBA0020213   11
[5,] HBA0020222    9
[6,] HBA0020292   70

Using the list() wrapper is useful, especially if you want to output
multiple variables or if you want to assign a name to the derived
summary variable.

HTH,
Dennis



On Thu, Jun 30, 2011 at 9:20 AM, Edgar Alminar <eaalminar at ucsd.edu> wrote:
>>> I did this:
>>>
>>> library(data.table)
>>>
>>> dd <- data.table(bl)
>>> dd[,sum(as.integer(CONTTIME)), by = SCRNO]
>>>
>>> (I used as.integer because I got an error message: sum not meaningful for factors)
>>>
>>> And got this:
>>>
>>>            SCRNO  V1
>>>  [1,] HBA0020036 111
>>>  [2,] HBA0020087  71
>>>  [3,] HBA0020209 140
>>>  [4,] HBA0020213 189
>>>  [5,] HBA0020222 174
>>>  [6,] HBA0020292 747
>>>  [7,] HBA0020310  57
>>>  [8,] HBA0020317 291
>>>  [9,] HBA0020365 417
>>> [10,] HBA0020366 124
>>>
>>> All the sums are way too big. Is there something making it not add up correctly?
>>>
>>> Original dataset:
>>>
>     RID      SCRNO VISCODE RECNO CONTTIME
> 338   43 HBA0020036      bl     1        9
> 1187  95 HBA0020087      bl     1        3
> 3251 230 HBA0020209      bl     2        3
> 3258 230 HBA0020209      bl     1       28
> 3321 235 HBA0020213      bl     2        5
> 3351 235 HBA0020213      bl     1        6
> 3436 247 HBA0020222      bl     1        5
> 3456 247 HBA0020222      bl     2        4
> 4569 321 HBA0020292      bl    13        2
> 4572 321 HBA0020292      bl     5       13
> 4573 321 HBA0020292      bl     1       25
> 4576 321 HBA0020292      bl     7        5
> 4578 321 HBA0020292      bl     8        2
> 4581 321 HBA0020292      bl     4        4
> 4582 321 HBA0020292      bl     9        5
> 4586 321 HBA0020292      bl    12        2
> 4587 321 HBA0020292      bl     6        2
> 4590 321 HBA0020292      bl    10        3
> 4591 321 HBA0020292      bl    11        7
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list