[R] Split strings based on multiple patterns
Joe Ceradini
joeceradini at gmail.com
Sat Oct 15 01:16:23 CEST 2016
Afternoon,
I unfortunately inherited a dataframe with a column that has many fields
smashed together. My goal is to split the strings in the column into
separate columns based on patterns.
Example of what I'm working with:
ugly <- c("Water temp:14: F Waterbody type:Permanent Lake/Pond: Water
pH:Unkwn:
Conductivity:Unkwn: Water color: Clear: Water turbidity: clear:
Manmade:no Permanence:permanent: Max water depth: <3: Primary
substrate: Silt/Mud: Evidence of cattle grazing: none:
Shoreline Emergent Veg(%): 1-25: Fish present: yes: Fish species: unkwn: no
amphibians observed")
ugly
Far as I can tell, there is not a single pattern that would work for
splitting this string. Splitting on ":" is close but not quite consistent.
Each of these attributes should be a separate column:
attributes <- c("Water temp", "Waterbody type", "Water pH", "Conductivity",
"Water color", "Water turbidity", "Manmade", "Permanence", "Max water
depth", "Primary substrate", "Evidence of cattle grazing", "Shoreline
Emergent Veg(%)", "Fish present", "Fish species")
So, conceptually, I want to do something like this, where the string is
split for each of the patterns in attributes. However, strsplit only uses
the 1st value of attributes
strsplit(ugly, attributes)
Should I loop through the values of "attributes"?
Is there an argument in strsplit I'm missing that will do what I want?
Different approach altogether?
Thanks! Happy Friday.
Joe
[[alternative HTML version deleted]]
More information about the R-help
mailing list