[R] applying a set of rules to each row
Bert Gunter
gunter.berton at gene.com
Wed Jan 26 22:28:35 CET 2011
... or perhaps just break things up with assignments and do it in stages.
-- Bert
On Wed, Jan 26, 2011 at 12:52 PM, David Winsemius
<dwinsemius at comcast.net> wrote:
> I remember something about the degree of nesting of ifelse calls being
> limited to 7 deep (???) that makes me worry about this approach. You may
> want to look at the arules package or the data.table package or the sqldf
> package for approaches that are specifically constructed with this sort of
> processing in mind.
>
> --
> David.
>
> On Jan 26, 2011, at 3:42 PM, KATSCHKE, ADRIAN CIV DFAS wrote:
>
>> Yes. That is exactly what I would like to have running. Here is the first
>> attempt I made at using a nested ?ifelse statement for one of the retirement
>> plans. The variables are all there but with different names. ageYOSstart is
>> ageFedStart, SCDCivLeave is srvCompDT. I haven't gotten this working. I am
>> not sure that it is the correct way to do what I would like.
>>
>> ## Regular retirement eligibility date for FERS employees
>> retData.All$regRetireDT2[retData.All$retireSystem == "FERS"] <-
>> with(retData.All[retData.All$retireSystem == "FERS",],
>> ifelse(DOB < "01/01/53", ## Born before
>> 1953 minimum retirement age of 55
>> ifelse(ageYOSstart < 26,
>> dates(DOB+(365.25*55)),
>> ifelse((ageYOSstart >= 26 &
>> ageYOSstart < 31), dates(SCDCivLeave*(365.25*30)),
>> ifelse((ageYOSstart >=
>> 31 & ageYOSstart < 41), dates(DOB+(365.25*60)),
>>
>> ifelse((ageYOSstart >= 41 & ageYOSstart < 43),
>>
>> dates(SCDCivLeave+(365.25*20)),
>>
>> ifelse((ageYOSstart >= 43 & ageYOSstart < 58),
>>
>> dates(DOB+(365.25*62)),
>>
>> ifelse(ageYOSstart >= 58,
>>
>> dates(SCDCivLeave+(365.25*5)), NA)))))),
>> ifelse((DOB < "12/31/69" & DOB >
>> "01/01/53"), ## Born between 1953 and 1969 MRA of 56
>> ifelse(ageYOSstart < 27,
>> dates(DOB+(365.25*56)),
>> ifelse((ageYOSstart
>> >= 27 & ageYOSstart < 31),
>>
>> dates(SCDCivLeave+(365.25*30)),
>>
>> ifelse((ageYOSstart >= 31 & ageYOSstart < 41),
>>
>> dates(DOB+(365.25*60)),
>>
>> ifelse((ageYOSstart >= 41 & ageYOSstart < 43),
>>
>> dates(SCDCivLeave+(365.25*20)),
>>
>> ifelse((ageYOSstart >= 43 & ageYOSstart < 58),
>>
>> dates(DOB+(365.25*62)),
>>
>> ifelse(ageYOSstart >= 58,
>>
>> dates(SCDCivLeave+(365.25*5)),
>>
>> NA))))))),
>> ifelse(DOB >= "01/01/69", ## Born
>> after 1969 Min Retire Age of 57
>> ifelse(ageYOSstart < 28,
>> dates(DOB+(365.25*57)),
>> ifelse((ageYOSstart
>> >= 28 & ageYOSstart < 31),
>>
>> dates(SCDCivLeave+(365.25*30)),
>>
>> ifelse((ageYOSstart >= 31 & ageYOSstart < 41),
>>
>> dates(DOB+(365.25*20)),
>>
>> ifelse((ageYOSstart >= 41 & ageYOSstart < 43),
>>
>> dates(SCDCivLeave+(365.25*20)),
>>
>> ifelse((ageYOSstart >= 43 & ageYOSstart < 57),
>>
>> dates(DOB+(365.25*62)),
>>
>> ifelse(ageYOSstart >= 58,
>>
>> dates(SCDCivLeave+(365.25*5)),
>>
>> NA))))))), NA))
>>
>> Adrian
>>
>>
>>> If I understand you correctly, you want ?ifelse, which works on the
>>> full logical vectors of rules applied to the variables, not
>>> if....else, which works on only a single logical.
>>
>>> -- Bert Gunter
>>
>> On Wed, Jan 26, 2011 at 12:18 PM, KATSCHKE, ADRIAN CIV DFAS
>> <ADRIAN.KATSCHKE at dfas.mil> wrote:
>>>
>>> All,
>>>
>>> I would like to apply a set of rules to each row of the sample data set
>>> below. The rule sets are the guidelines for determining an individual's
>>> date for retirement eligibility. The rules are found in this document,
>>> http://www.opm.gov/feddata/RetirementPaperFinal_v4.pdf. I am only
>>> interested in the top two categories for retirement eligibility, the
>>> CSRS and FERS plans.
>>>
>>> The data set has four variables Date of Birth (DOB), service computation
>>> date (srvCompDT), retirement plan (retirePlan), and the age at which the
>>> employee entered federal service (ageFedStart). The service computation
>>> date is used to compute the date eligible for retirement. The retirement
>>> plan indicates what system the employee is enrolled under.
>>>
>>> The data does contain a few other retirement plans, for now I want to
>>> just ignore those plans. I have labeled plans as 1-CSRS and 2-FERS, and
>>> 3-Other. My first attempt at applying the rules was through a complex
>>> nesting of ifelse statements, this was not very successful and quite
>>> difficult to follow. I then wrote a function and tried using "apply"
>>> unsuccessfully. The function is shown below.
>>>
>>> I would like to put a short script or function together that would allow
>>> for an efficient application of the rules to each of the employees. I am
>>> trying to avoid a loop, because my data set is quite large, and I may
>>> need to update my data set regularly and re-run the analysis and reports
>>> that will come from this work.
>>>
>>> Any advice or guidance on building the function or code to apply the
>>> rules would be quite helpful.
>>>
>>> retireHelp <-
>>> structure(list(DOB = structure(c(-6642, -5134, -3444, -5598,
>>> -4356, 5737, -4894, -1951, -2950, 2467, 6945, 4908, -7930, -7236,
>>> -7727, -77, 4158, -7892, -6028, -7132, -5959, 2309, -2494, -3513,
>>> -383, -216, -3369, -5861, 3674, -10265, -8986, -5023, -4862,
>>> 1526, -1022, 2175, -11790, -278, -7275, -5084, -1842, 430, -2220,
>>> -7444, 440, 4285, -7812, 3335, -7271, -6825, -1098, -1670, -10219,
>>> -7131, 5963, 704, -7662, 4219, -2813, 5147, -7334, -8223, -5922,
>>> -7497, -9276, -1291, -11640, -5631, 518, -7268, -2105, -5901,
>>> -690, -8146, -7059, 133, 1176, -6091, -2895, -6020, -4724, -3616,
>>> -5059, -8253, -2604, -12400, -4776, -3671, -9326, -7000, -5574,
>>> -3248, 4255, -1358, -6255, 8, -7115, -1701, -5227, 9, -517, -8674,
>>> -2554, -4069, -2077, -9872, -6534, 2970, -8307, -3020, -1343,
>>> -8897, -2304, -7424, 2078, -8274, -5559, -8888, -9262, -8473,
>>> -4088, -2429, -8006, -1091, 5015, 2765, 4036, 3101, -3743, 5103,
>>> -10018, -12095, -7646, -5966, -6208, -5784, -1325, -4288, -1665,
>>> -1409, 4685, -7881, -3413, 2738, -2201, 1217, -5113, 206, -1292,
>>> -1725, 10, -2978, -1895, -830, -105, -2395, -3496, -8244, -9956,
>>> -6494, -4678, -4077, 575, 2013, -3411, 3824, -4356, 4523, -5836,
>>> -6350, -5337, -41, -2001, -6632, -970, -6790, -2828, -4061, 476,
>>> 5854, -9648, -4227, 850, 2619, -7747, -2672, 4069, -12618, -6898,
>>> -4178, -1772, -1643, -2064, -157, 4551, -8688, -6087, -2040,
>>> -7239, -783), format = "m/d/y", origin = structure(c(1, 1, 1970
>>> ), .Names = c("month", "day", "year")), class = c("dates", "times"
>>> )), srvCompDT = structure(c(743, 12429, 3585, 4364, 13227, 13578,
>>> 13591, 8585, 9587, 13913, 14753, 13247, 2246, 1439, 8845, 7018,
>>> 12625, -552, 5688, 7080, 13255, 13549, 12709, 13969, 13997, 9532,
>>> 13689, 1226, 13549, 4093, 13423, 13801, 3181, 14809, 13353, 9457,
>>> 7745, 8986, 4759, 4486, 6449, 11172, 8669, 3344, 13745, 12275,
>>> 5081, 13605, 8006, 3048, 6330, 13521, 5254, 1733, 14095, 8516,
>>> 4848, 13521, 5970, 14697, 8291, 139, 11435, 3567, 8961, 5775,
>>> 3602, 1409, 11577, 12163, 12258, 13156, 9472, 7963, 1362, 10332,
>>> 9557, 3997, 7509, 4691, 3133, 5877, 6782, 11449, 13283, 8040,
>>> 11565, 3425, 7860, 1790, 10778, 13199, 12625, 5889, 3317, 9831,
>>> 1068, 8040, 7123, 9104, 12836, 7928, 12764, 8922, 5324, -1004,
>>> 1806, 10263, 5635, 10310, 5625, 8861, 14613, 3896, 10316, 5725,
>>> 12751, 6113, 2997, 112, 5707, 4987, -1018, 8055, 13885, 13073,
>>> 14585, 14865, 14935, 14390, 9735, 7654, 4557, 661, 1638, 1112,
>>> 14011, 3086, 7032, 13942, 13325, 6735, 13900, 12673, 10148, 14193,
>>> 14767, 8447, 6114, 10688, 13544, 7106, 8587, 14753, 7886, 12280,
>>> 11946, 13662, 3332, 2108, 13977, 6203, 8369, 13857, 8369, 11486,
>>> 8306, 12466, 12639, 7270, 4325, 13843, 14026, 14039, 6147, 7676,
>>> 5781, 7038, 9187, 14640, 6174, 11491, 13913, 13787, 13465, 8854,
>>> 13152, 1826, 1412, 4317, 5794, 5548, 8951, 12947, 12639, 5345,
>>> 5961, 4637, 6465, 13717), format = "m/d/y", origin = structure(c(1,
>>> 1, 1970), .Names = c("month", "day", "year")), class = c("dates",
>>> "times")), retirePlan = c(1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>>> 1, 3, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1,
>>> 2, 2, 2, 2, 3, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1,
>>> 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 3, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1,
>>> 3, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2,
>>> 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,
>>> 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2,
>>> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>>> 1, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>>> 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
>>> ageFedStart = c(20.22, 48.08, 19.24, 27.27, 48.14, 21.47,
>>> 50.61, 28.85, 34.32, 31.34, 21.38, 22.83, 27.86, 23.75, 45.37,
>>> 19.43, 23.18, 20.1, 32.08, 38.91, 52.61, 30.77, 41.62, 47.86,
>>> 39.37, 26.69, 46.7, 19.4, 27.04, 39.31, 61.35, 51.54, 22.02,
>>> 36.37, 39.36, 19.94, 53.48, 25.36, 32.95, 26.2, 22.7, 29.41,
>>> 29.81, 29.54, 36.43, 21.88, 35.3, 28.12, 41.83, 27.03, 20.34,
>>> 41.59, 42.36, 24.27, 22.26, 21.39, 34.25, 25.47, 24.05, 26.15,
>>> 42.78, 22.89, 47.52, 30.29, 49.93, 19.35, 41.73, 19.27, 30.28,
>>> 53.2, 39.32, 52.18, 27.82, 44.1, 23.06, 27.92, 22.95, 27.62,
>>> 28.48, 29.33, 21.51, 25.99, 32.42, 53.94, 43.5, 55.96, 44.74,
>>> 19.43, 47.05, 24.07, 44.77, 45.03, 22.92, 19.84, 26.21, 26.89,
>>> 22.4, 26.67, 33.81, 24.9, 36.56, 45.45, 41.94, 35.57, 20.26,
>>> 24.28, 22.83, 19.97, 38.17, 36.5, 19.08, 48.62, 46.32, 30.99,
>>> 22.55, 38.33, 50.13, 41.07, 33.56, 23.5, 26.82, 20.3, 19.13,
>>> 25.04, 24.28, 28.22, 28.88, 32.21, 51.14, 25.43, 54.08, 54.07,
>>> 33.41, 18.14, 21.48, 18.88, 41.99, 20.19, 23.81, 42.03, 23.66,
>>> 40.02, 47.4, 27.2, 33.81, 35.53, 54.43, 22.56, 20.28, 33.98,
>>> 37.05, 27.61, 28.7, 42.66, 21.88, 40.18, 42.28, 59.98, 36.38,
>>> 23.55, 51.07, 28.15, 21.34, 32.43, 32.25, 20.98, 34.67, 21.75,
>>> 50.58, 37.29, 26.45, 38.01, 43.88, 56.59, 19.49, 39.61, 23.57,
>>> 30.39, 23.85, 24.05, 43.32, 43.03, 35.76, 30.58, 58.08, 31.56,
>>> 24.87, 39.55, 22.75, 23.26, 20.71, 19.69, 30.16, 35.88, 22.14,
>>> 38.42, 32.99, 18.28, 37.52, 39.7)), .Names = c("DOB", "srvCompDT",
>>> "retirePlan", "ageFedStart"), row.names = c(NA, 200L), class =
>>> "data.frame")
>>>
>>> rrDT <- function(retSys, ageFedStart, birthDT, serviceCompDT){
>>> if(retSys == "CSRS") {
>>> if(ageFedStart < 25) rtDT <- dates(birthDT+(365.25*55))
>>> else if (ageFedStart >= 25 & ageFedStart < 30) rtDT <-
>>> dates(serviceCompDT+(365.25*30))
>>> else if (ageFedStart >= 30 & ageFedStart < 40) rtDT <-
>>> dates(birthDT+(365.25*60))
>>> else if (ageFedStart >= 40 & ageFedStart < 45) rtDT <-
>>> dates(serviceCompDT+(365.25*20))
>>> else if (ageFedStart >= 45 & ageFedStart < 60) rtDT <-
>>> dates(birthDT+(365.25*65))
>>> else if (ageFedStart >= 60) rtDT <-
>>> dates(serviceCompDT+(365.25*5))
>>> else rtDT <- NA
>>> }
>>> else if (retSys == "FERS") {
>>> if (birthDT < "01/01/53") {
>>> if(ageFedStart < 25) rtDT <- dates(birthDT+(365.25*55))
>>> else if (ageFedStart >= 25 & ageFedStart < 30) rtDT <-
>>> dates(serviceCompDT+(365.25*30))
>>> else if (ageFedStart >= 30 & ageFedStart < 40) rtDT <-
>>> dates(birthDT+(365.25*60))
>>> else if (ageFedStart >= 40 & ageFedStart < 42) rtDT <-
>>> dates(serviceCompDT+(365.25*20))
>>> else if (ageFedStart >= 42 & ageFedStart < 57) rtDT <-
>>> dates(birthDT+(365.25*62))
>>> else if (ageFedStart >= 57) rtDT <-
>>> dates(serviceCompDT+(365.25*5))
>>> else rtDT <- NA
>>> }
>>> else if (birthDT >= "01/01/53" & birthDT < "01/01/70") {
>>> if(ageFedStart < 26) rtDT <- dates(birthDT+(365.25*56))
>>> else if (ageFedStart >= 27 & ageFedStart < 30) rtDT <-
>>> dates(serviceCompDT+(365.25*30))
>>> else if (ageFedStart >= 30 & ageFedStart < 40) rtDT <-
>>> dates(birthDT+(365.25*60))
>>> else if (ageFedStart >= 40 & ageFedStart < 42) rtDT <-
>>> dates(serviceCompDT+(365.25*20))
>>> else if (ageFedStart >= 42 & ageFedStart < 57) rtDT <-
>>> dates(birthDT+(365.25*62))
>>> else if (ageFedStart >= 57) rtDT <-
>>> dates(serviceCompDT+(365.25*5))
>>> else rtDT <- NA
>>> }
>>> else if (birthDT >= "01/01/70"){
>>> if(ageFedStart < 27) rtDT <- dates(birthDT+(365.25*56))
>>> else if (ageFedStart >= 27 & ageFedStart < 30) rtDT <-
>>> dates(serviceCompDT+(365.25*30))
>>> else if (ageFedStart >= 30 & ageFedStart < 40) rtDT <-
>>> dates(birthDT+(365.25*60))
>>> else if (ageFedStart >= 40 & ageFedStart < 42) rtDT <-
>>> dates(serviceCompDT+(365.25*20))
>>> else if (ageFedStart >= 42 & ageFedStart < 57) rtDT <-
>>> dates(birthDT+(365.25*62))
>>> else if (ageFedStart >= 57) rtDT <-
>>> dates(serviceCompDT+(365.25*5))
>>> else rtDT <- NA
>>> }
>>> }
>>> else rtDT <- NA
>>> return(rtDT)
>>> }
>>>
>>> Adrian R. Katschke
>>> Data Analytics Specialist
>>> Human Capital Program Office
>>> Human Resources
>>> PH: 317-212-7813
>>> DSN: 699-7813
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
--
Bert Gunter
Genentech Nonclinical Biostatistics
More information about the R-help
mailing list