[R] Split strings based on multiple patterns
538280 at gmail.com
Wed Oct 19 18:19:24 CEST 2016
I would suggest looking at the strapply function in the gsubfn
package. That gives you more flexibility in specifying what to look
for in the structure of the data, then extract only those pieces that
On Fri, Oct 14, 2016 at 5:16 PM, Joe Ceradini <joeceradini at gmail.com> wrote:
> I unfortunately inherited a dataframe with a column that has many fields
> smashed together. My goal is to split the strings in the column into
> separate columns based on patterns.
> Example of what I'm working with:
> ugly <- c("Water temp:14: F Waterbody type:Permanent Lake/Pond: Water
> Conductivity:Unkwn: Water color: Clear: Water turbidity: clear:
> Manmade:no Permanence:permanent: Max water depth: <3: Primary
> substrate: Silt/Mud: Evidence of cattle grazing: none:
> Shoreline Emergent Veg(%): 1-25: Fish present: yes: Fish species: unkwn: no
> amphibians observed")
> Far as I can tell, there is not a single pattern that would work for
> splitting this string. Splitting on ":" is close but not quite consistent.
> Each of these attributes should be a separate column:
> attributes <- c("Water temp", "Waterbody type", "Water pH", "Conductivity",
> "Water color", "Water turbidity", "Manmade", "Permanence", "Max water
> depth", "Primary substrate", "Evidence of cattle grazing", "Shoreline
> Emergent Veg(%)", "Fish present", "Fish species")
> So, conceptually, I want to do something like this, where the string is
> split for each of the patterns in attributes. However, strsplit only uses
> the 1st value of attributes
> strsplit(ugly, attributes)
> Should I loop through the values of "attributes"?
> Is there an argument in strsplit I'm missing that will do what I want?
> Different approach altogether?
> Thanks! Happy Friday.
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com
More information about the R-help