[R] Sum Question

Marc Schwartz marc_schwartz at me.com
Thu Jun 30 19:30:27 CEST 2011


On Jun 30, 2011, at 11:20 AM, Edgar Alminar wrote:

>>> I did this:
>>> 
>>> library(data.table)
>>> 
>>> dd <- data.table(bl)
>>> dd[,sum(as.integer(CONTTIME)), by = SCRNO]
>>> 
>>> (I used as.integer because I got an error message: sum not meaningful for factors)
>>> 
>>> And got this:
>>> 
>>>           SCRNO  V1
>>> [1,] HBA0020036 111
>>> [2,] HBA0020087  71
>>> [3,] HBA0020209 140
>>> [4,] HBA0020213 189
>>> [5,] HBA0020222 174
>>> [6,] HBA0020292 747
>>> [7,] HBA0020310  57
>>> [8,] HBA0020317 291
>>> [9,] HBA0020365 417
>>> [10,] HBA0020366 124
>>> 
>>> All the sums are way too big. Is there something making it not add up correctly?
>>> 
>>> Original dataset:
>>> 
>     RID      SCRNO VISCODE RECNO CONTTIME
> 338   43 HBA0020036      bl     1        9
> 1187  95 HBA0020087      bl     1        3
> 3251 230 HBA0020209      bl     2        3
> 3258 230 HBA0020209      bl     1       28
> 3321 235 HBA0020213      bl     2        5
> 3351 235 HBA0020213      bl     1        6
> 3436 247 HBA0020222      bl     1        5
> 3456 247 HBA0020222      bl     2        4
> 4569 321 HBA0020292      bl    13        2
> 4572 321 HBA0020292      bl     5       13
> 4573 321 HBA0020292      bl     1       25
> 4576 321 HBA0020292      bl     7        5
> 4578 321 HBA0020292      bl     8        2
> 4581 321 HBA0020292      bl     4        4
> 4582 321 HBA0020292      bl     9        5
> 4586 321 HBA0020292      bl    12        2
> 4587 321 HBA0020292      bl     6        2
> 4590 321 HBA0020292      bl    10        3
> 4591 321 HBA0020292      bl    11        7


That is not the entire dataset....HBA0020366 is missing, as an example.

I don't use the data.table package, but if you are getting an error indicating that CONTTIME is a factor, then something is wrong with either the data itself (there are non-numeric entries) or the way in which it was entered/imported into R.

Thus, I would first check your data for errors. Use str(YourDataSet) to review its structure and if CONTTIME is a factor, check into the data to see why.

Lastly, review this R FAQ:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f

Just as an alternative, with your data in 'DF':

> DF
     RID      SCRNO VISCODE RECNO CONTTIME
338   43 HBA0020036      bl     1        9
1187  95 HBA0020087      bl     1        3
3251 230 HBA0020209      bl     2        3
3258 230 HBA0020209      bl     1       28
3321 235 HBA0020213      bl     2        5
3351 235 HBA0020213      bl     1        6
3436 247 HBA0020222      bl     1        5
3456 247 HBA0020222      bl     2        4
4569 321 HBA0020292      bl    13        2
4572 321 HBA0020292      bl     5       13
4573 321 HBA0020292      bl     1       25
4576 321 HBA0020292      bl     7        5
4578 321 HBA0020292      bl     8        2
4581 321 HBA0020292      bl     4        4
4582 321 HBA0020292      bl     9        5
4586 321 HBA0020292      bl    12        2
4587 321 HBA0020292      bl     6        2
4590 321 HBA0020292      bl    10        3
4591 321 HBA0020292      bl    11        7


> aggregate(CONTTIME ~ DF$SCRNO, data = DF, sum)
    DF$SCRNO CONTTIME
1 HBA0020036        9
2 HBA0020087        3
3 HBA0020209       31
4 HBA0020213       11
5 HBA0020222        9
6 HBA0020292       70


See ?aggregate

HTH,

Marc Schwartz



More information about the R-help mailing list