[R] Multiple-Response Analysis: Cleaning of Duplicate Codes
Boris Steipe
boris.steipe at utoronto.ca
Tue Apr 25 19:28:35 CEST 2017
How about:
d_sample_1 <- floor(d_sample/100) * 100
for (i in 1:nrow(d_sample_1)) {
d_sample_1[i, duplicated(unlist(d_sample_1[i, ]))] <- NA
}
B.
> On Apr 25, 2017, at 1:10 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>
> If I understand you correctly, one way is:
>
>> z <- rep(LETTERS[1:3],4)
>> z
> [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
>> z[!duplicated(z)]
> [1] "A" "B" "C"
>
>
> ?duplicated
>
> -- Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 25, 2017 at 9:36 AM, <G.Maubach at weinwolf.de> wrote:
>> Hi All,
>>
>> in my current project I am working with multiple-response questions
>> (MRSets):
>>
>> -- Coding --
>> 100 Main Code 1
>> 110 Sub Code 1.1
>> 120 Sub Code 1.2
>> 130 Sub Code 1.3
>>
>> 200 Main Code 2
>> 210 Sub Code 2.1
>> 220 Sub Code 2.2
>> 230 Sub Code 2.3
>>
>> 300 Main Code 3
>> 310 Sub Code 3.1
>> 320 Sub Code 3.2
>>
>> The coding for the variables is to detailed. Therefore I have recoded all
>> sub codes to the respective main code, e.g. all 110, 120 and 130 to 100,
>> all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.
>>
>> Now it happens that some respondents get several times the same main code.
>> If the coding was done for respondent 1 with 120 and 130 after recoding
>> the values are 100 and 100. If I count this, it would mean that I weight
>> the multiple values of this respondent by factor 2. This is not my aim. I
>> would like to count the 100 for the respective respondent only once.
>>
>> Here is my script so far:
>>
>> # -- cut --
>>
>> library(expss)
>>
>> d_sample <-
>> structure(
>> list(
>> c05_01 = c(
>> 110,
>> 110,
>> 130,
>> 110,
>> 110,
>> 110,
>> 110,
>> 110,
>> 110,
>> 110,
>> 110,
>> 999,
>> 110,
>> 495,
>> 160,
>> 110,
>> 410
>> ),
>> c05_02 = c(NA,
>> NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
>> 170,
>> NA, 130),
>> c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>> NA, NA, NA, NA, NA, NA, NA),
>> c05_04 = c(
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_
>> ),
>> c05_05 = c(
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_
>> )
>> ),
>> .Names = c("c05_01",
>> "c05_02", "c05_03", "c05_04", "c05_05"),
>> row.names = c(
>> "1",
>> "2",
>> "3",
>> "4",
>> "5",
>> "10",
>> "11",
>> "12",
>> "13",
>> "14",
>> "15",
>> "20",
>> "21",
>> "22",
>> "23",
>> "24",
>> "25"
>> ),
>> class = "data.frame"
>> )
>>
>> c05_xx_r01 <- d_sample %>%
>> select(starts_with("c05_")) %>%
>> recode(c(
>> 110 %thru% 195 ~ 100,
>> 210 %thru% 295 ~ 200,
>> 310 %thru% 395 ~ 300,
>> 410 %thru% 495 ~ 400,
>> 510 %thru% 595 ~ 500,
>> 810 %thru% 895 ~ 800,
>> 910 %thru% 999 ~ 900))
>> names(c05_xx_r01) <- paste0("c05_0", 1:5, "_r01")
>> d_sample <- cbind(d_sample, c05_xx_r01)
>>
>> # -- cut --
>>
>> I would like to eliminate all duplicates codes, e. g. 100 and 100 for
>> respondents in row 3, 6, 13, 14 and 15 to 100 only once:
>>
>> # -- cut --
>> d_sample_1 <-
>> structure(
>> list(
>> c05_01 = c(
>> 110,
>> 110,
>> 130,
>> 110,
>> 110,
>> 110,
>> 110,
>> 110,
>> 110,
>> 110,
>> 110,
>> 999,
>> 110,
>> 495,
>> 160,
>> 110,
>> 410
>> ),
>> c05_02 = c(NA,
>> NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
>> 170,
>> NA, 130),
>> c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>> NA, NA, NA, NA, NA, NA, NA),
>> c05_04 = c(
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_
>> ),
>> c05_05 = c(
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_,
>> NA_real_
>> ),
>> c05_01_r01 = c(
>> 100,
>> 100,
>> 100,
>> 100,
>> 100,
>> 100,
>> 100,
>> 100,
>> 100,
>> 100,
>> 100,
>> 900,
>> 100,
>> 400,
>> 100,
>> 100,
>> 400
>> ),
>> c05_02_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, 100),
>> c05_03_r01 = c(NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, 400, NA, NA, NA, NA, NA,
>> NA, NA),
>> c05_04_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA),
>> c05_05_r01 = c(NA, NA, NA, NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
>> ),
>> .Names = c(
>> "c05_01",
>> "c05_02",
>> "c05_03",
>> "c05_04",
>> "c05_05",
>> "c05_01_r01",
>> "c05_02_r01",
>> "c05_03_r01",
>> "c05_04_r01",
>> "c05_05_r01"
>> ),
>> row.names = c(
>> "1",
>> "2",
>> "3",
>> "4",
>> "5",
>> "10",
>> "11",
>> "12",
>> "13",
>> "14",
>> "15",
>> "20",
>> "21",
>> "22",
>> "23",
>> "24",
>> "25"
>> ),
>> class = "data.frame"
>> )
>>
>> # -- cut --
>>
>> How could I achieve this?
>>
>> Kind regards
>>
>> Georg
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list