[R] Partial LookUP
gary chimuzinga
gkchimz28 @ending from hotm@il@com
Tue Nov 20 17:06:17 CET 2018
I am working n R, using R studio,
I have a dataframe with 4 columns. Column A contains passenger iD, B contains passenger name, C contains husband name.
I am attempting to create a new column which look to see if the husband name in column C is listed in any of the records in column B. If so it should then return to me the passenger iD of the husband from column A.
To make things more complicated, as in the first example in some cases, the husband's given in column C might not include the his second name, which would be included in column B.
Reproducible Example
library(stringr)
rm(list=ls())
passengerid <- c(0908,9883,7767,3302)
Name<- c("Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",
"Backstrom, Mr. Karl Alfred John",
"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
"Cumings, Mr. John Bradley")
HusbandName <- c("Backstrom, Mr. Karl Alfred","","Cumings, Mr. John
Bradley","")
df1<- data.frame(cbind(passengerid,Name,HusbandName))
df1$Name <- as.character(df1$Name)
df1$HusbandName <- as.character(df1$HusbandName)
I have tried using Stringr, but facing problems because 1)I need the code to look at only 1 element of the vector HusbandName and search for it in the whole vector Name. 2) I found it difficult to use regular expressions given that the pattern I am looking for is vectorised (as HusbandName)
This is what I have tried so far:
Attempt 1 - only finds exact matches & doesn't return the passengerID & doesn't add column to df
df1$Husbandid < - for (i in 1:NROW(df1$HusbandName)) {
print(HusbandName[i] %in% Name)}
Attempt 2 - finds partial matches, but does not ignore blanks & does not tell me passenger id & doesn't add column to df
df1$Husbandid <- for (i in 1:NROW(df1$HusbandName)) {
print(which(str_detect(df1$Name,df1$HusbandName[i])))}
#Attempt 3 - almost works but - the printed results are different from those added into the dataframe as a new column. how can i correct for this? Ultimately I need the ones in the df to be correct. the error is that those without husbands are showing husbandiD when this should be blank or na. can this be corrected or is there a way to convert the output of the for loop into a vector we can add to the df?
for (i in 1:NROW(df1$HusbandName)) {
if (df1$HusbandName[i] =="") {
print("Man") & next()
}
FoundHusbandNames<- c(which(str_detect(df1$Name,df1$HusbandName[i])))
print(df1$passengerid[FoundHusbandNames]) -> df1$Husbandid[i] }
[[alternative HTML version deleted]]
More information about the R-help
mailing list