[R] Problem with "apply"
Tobias Verbeke
tobias.verbeke at telenet.be
Wed Apr 22 21:23:28 CEST 2009
Marc Schwartz wrote:
> The cut() function will do what you want in a vectorized fashion. See ?cut
>
> However, that being said, I would strongly advise that you read Frank's
> page on the categorizing of continuous variables:
>
> http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous
>
> before you proceed.
A simple example of how to use it for your problem would be
set.seed(158)
ages <- sample(0:100, 50, TRUE)
head(ages)
ageGroups <- cut(ages, breaks = c(-1,5,15,30,70,80,150), right = FALSE,
labels = c("0-4", "5-14", "15-29", "30-69", "70-79", "80+"))
head(ageGroups)
See ?cut
HTH,
Tobias
> On Apr 22, 2009, at 1:56 PM, Alan Cohen wrote:
>
>> Hi R users,
>>
>> I am trying to assign ages to age classes for a large data set
>> (123,000 records), and using a for-loop was too slow, so I wrote a
>> function and used apply. However, the function does not properly
>> assign the first two classes (the rest are fine). It appears that
>> when age is one digit, it does not get assigned properly.
>>
>> I tried to provide a small-scale work-up (at the end of the email) but
>> it does not reproduce the problem; the best I can do is to provide my
>> code and the output below. As you can see, I've confirmed that age is
>> numeric, that all values are integers, and that pieces of the code
>> work independently. Any thoughts would be appreciated.
>>
>> To add to the mystery, depending which rows of my data set I select, I
>> get different problems. mds[1:100,] gives the problem above, as do
>> mds[100:200,] , mds[150:250,] and mds[10000:10100,]. However, with
>> mds[200:300,], mds[250:350,] and mds[1000:1100,], only ages with 3
>> digits are correctly assigned - all ages <100 are returned as NA.
>>
>> I'm using R v 2.8.1 on Windows XP.
>>
>> Cheers,
>> Alan Cohen
>> Centre for Global Health Research,
>> Toronto,ON
>>
>>> ageassign <- function(x){
>> + y <- NA
>> + if (x[11] %in% c(0:4)) {y <- "0-4"}
>> + else if (x[11] %in% c(5:14)) {y <- "5-14" }
>> + else if (x[11] %in% c(15:29)) {y <- "15-29" }
>> + else if (x[11] %in% c(30:69)) {y <- "30-69"}
>> + else if (x[11] %in% c(70:79)) {y <- "70-79"}
>> + else if (x[11] %in% c(80:125)) {y <- "80+"}
>> + return(y)
>> + }
>>> jj <- apply(mds[1:100,],1,FUN=ageassign)
>>> jj
>> 1 2 3 4 5 6 7 8
>> 9 10 11 12 13
>> NA "80+" "30-69" "30-69" "80+" NA "30-69" "30-69" "70-79"
>> "15-29" "15-29" "30-69" "70-79"
>> 14 15 16 17 18 19 20 21
>> 22 23 24 25 26
>> "80+" NA "30-69" "30-69" "30-69" "80+" "80+" "15-29" "70-79"
>> "30-69" "70-79" "70-79" "30-69"
>> 27 28 29 30 31 32 33 34
>> 35 36 37 38 39
>> "70-79" "80+" NA "80+" "70-79" NA "15-29" "15-29"
>> NA NA "70-79" "30-69" "30-69"
>> 40 41 42 43 44 45 46 47
>> 48 49 50 51 52
>> "70-79" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "70-79"
>> "15-29" "30-69" NA "15-29" "30-69"
>> 53 54 55 56 57 58 59 60
>> 61 62 63 64 65
>> "30-69" NA "70-79" "30-69" "30-69" "30-69" "30-69" "15-29"
>> "30-69" "30-69" "70-79" "30-69" NA
>> 66 67 68 69 70 71 72 73
>> 74 75 76 77 78
>> "30-69" "30-69" "30-69" "30-69" "30-69" "80+" "30-69" "80+"
>> "70-79" "30-69" "30-69" "30-69" NA
>> 79 80 81 82 83 84 85 86
>> 87 88 89 90 91
>> "30-69" "30-69" "30-69" NA "80+" "30-69" "30-69" "30-69"
>> NA "15-29" "30-69" "30-69" "30-69"
>> 92 93 94 95 96 97 98 99 100
>> "30-69" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "30-69" "30-69"
>>> mds[1:100,11]
>> [1] 3 82 40 35 82 1 37 57 71 22 21 52 73 86 1 43 60 63 84 88 29 73
>> 69 75 73 43 75 83 4 83 77 1 27
>> [34] 15 1 6 76 51 45 71 54 64 69 70 48 38 74 26 37 4 18 63 59 8 78
>> 63 67 62 50 21 66 69 75 57 4 50
>> [67] 58 60 61 62 83 69 92 75 30 49 69 1 69 63 69 0 93 64 59 69 2 25
>> 32 60 66 67 54 53 64 79 59 49 59
>> [100] 64
>>> table(mds[,11])
>>
>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13
>> 14 15 16 17 18 19
>> 3123 6441 3856 2884 1968 1615 1386 1088 1098 721 943 681 511 380
>> 426 835 571 555 719 653
>> 20 21 22 23 24 25 26 27 28 29 30 31 32 33
>> 34 35 36 37 38 39
>> 879 715 672 631 655 773 680 713 769 538 685 566 729 702
>> 652 766 683 723 821 675
>> 40 41 42 43 44 45 46 47 48 49 50 51 52 53
>> 54 55 56 57 58 59
>> 774 650 908 892 784 925 781 1043 1161 924 1087 827 1261 1356
>> 1297 1272 1277 1614 1831 1523
>> 60 61 62 63 64 65 66 67 68 69 70 71 72 73
>> 74 75 76 77 78 79
>> 1702 1251 1954 2157 1901 2090 1874 2705 3085 2529 2488 1777 2701 2586
>> 2308 2020 1801 2269 2486 1856
>> 80 81 82 83 84 85 86 87 88 89 90 91 92 93
>> 94 95 96 97 98 99
>> 1762 1047 1413 1326 967 1013 753 870 884 531 601 277 364 301
>> 193 288 149 174 169 470
>> 100 101 102 103 104 105 106 107 108 114 115 117 118 120 125
>> 15 2 5 7 2 4 1 1 2 1 1 2 2 2 1
>>> mode(mds[,11])
>> [1] "numeric"
>>
>>> mds[1,11] %in% c(0:4)
>> [1] TRUE
>>> if (mds[1,11] %in% c(0:4)) {y <- "0-4"}
>>> y
>> [1] "0-4"
>>
>>> xx <- matrix(trunc(runif(30,0,125)),15,2)
>>> aassign <- function(x){
>> + y <- NA
>> + if (x[2] %in% c(0:4)) {y <- "0-4"}
>> + else if (x[2] %in% c(5:14)) {y <- "5-14" }
>> + else if (x[2] %in% c(15:29)) {y <- "15-29" }
>> + else if (x[2] %in% c(30:69)) {y <- "30-69"}
>> + else if (x[2] %in% c(70:79)) {y <- "70-79"}
>> + else if (x[2] %in% c(80:125)) {y <- "80+"}
>> + return(y)
>> + }
>>> jj <- apply(xx,1,FUN=aassign)
>>> t(xx)
>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
>> [,13] [,14] [,15]
>> [1,] 23 98 107 94 76 103 106 40 66 11 109 101
>> 96 37 18
>> [2,] 11 57 58 91 43 123 103 77 4 79 64
>> 10 8 105 76
>>> jj
>> [1] "5-14" "30-69" "30-69" "80+" "30-69" "80+" "80+" "70-79"
>> "0-4" "70-79" "30-69" "5-14"
>> [13] "5-14" "80+" "70-79"
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list