[R] Split strings based on multiple patterns
dwinsemius at comcast.net
Sat Oct 15 01:49:19 CEST 2016
> On Oct 14, 2016, at 4:16 PM, Joe Ceradini <joeceradini at gmail.com> wrote:
> I unfortunately inherited a dataframe with a column that has many fields
> smashed together. My goal is to split the strings in the column into
> separate columns based on patterns.
> Example of what I'm working with:
> ugly <- c("Water temp:14: F Waterbody type:Permanent Lake/Pond: Water
> Conductivity:Unkwn: Water color: Clear: Water turbidity: clear:
> Manmade:no Permanence:permanent: Max water depth: <3: Primary
> substrate: Silt/Mud: Evidence of cattle grazing: none:
> Shoreline Emergent Veg(%): 1-25: Fish present: yes: Fish species: unkwn: no
> amphibians observed")
> Far as I can tell, there is not a single pattern that would work for
> splitting this string. Splitting on ":" is close but not quite consistent.
> Each of these attributes should be a separate column:
> attributes <- c("Water temp", "Waterbody type", "Water pH", "Conductivity",
> "Water color", "Water turbidity", "Manmade", "Permanence", "Max water
> depth", "Primary substrate", "Evidence of cattle grazing", "Shoreline
> Emergent Veg(%)", "Fish present", "Fish species")
> So, conceptually, I want to do something like this, where the string is
> split for each of the patterns in attributes. However, strsplit only uses
> the 1st value of attributes
> strsplit(ugly, attributes)
> Should I loop through the values of "attributes"?
> Is there an argument in strsplit I'm missing that will do what I want?
> Different approach altogether?
> Thanks! Happy Friday.
> [[alternative HTML version deleted]]
Need to post in plain text. We cannot see where your "carriage returns" are located in that data. HTML uses some other character(s?) that doesn't get translated by our mailserver.
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Yes, please do read that.
> and provide commented, minimal, self-contained, reproducible code.
Alameda, CA, USA
More information about the R-help