[R] Fw: Regex problem
suttoncarl at ymail.com
Thu Jan 5 19:09:20 CET 2017
Re-sending help request, went to wrong addy first time.
r-help-request at r-project.org
Belated Happy new year to the Guru's:
I have a data frame with 570+ columns and in those column headers yours truly has a few blunders. Namely somehow I managed to end some of them with both an apostrophe ' and an ending quote. I think the attached code finds the occurrences (not 100% sure) and feedback is appreciated. This is my first attempt at regex and I have been googling and reading the last few days (including an R -Exercise).
Confused as to why the column names shows a "." instead of a " ' ".
Ignorant of why gregexpr and regexpr show attr(,"useBytes") as TRUE when the default is FALSE. Is it possible I somehow messed them up last week? Simply typing the function name in the console shows the defaults as FALSE.
I have not been able to build a construct to simply delete the apostrophe. I have made several attempts to do this, and left one for your perusal. The others were just to "off the wall" and embarrassing.
Lastly, is there a way for me to check that all of my column names end with a letter followed by a quote? I am thinking something along the lines of "[[:alpha:]\\"" but I expect that will throw an error. I stumbled upon the ' " problem when dplyr complained about it last week, and it is unsettling to think I may have more goofs.
Any suggestions of a good reference book is much appreciated. I can see extended use of regex coming toward me and I am so ignorant it is frightening (all volunteer work, no $'s involved, but I dislike being incompetent).
# regex problemdf1 <- data.frame("WhatAmI'" = 1:5, "WhoAreYou" = 11:15)
ma_pattern <- "[[:punct:]][[:punct:]]" # Need single ][ in the middle??
ma_pattern <- "[[:punct:][:punct:]]" # single ][ worked
grep(ma_pattern,colnames(df1),value = TRUE) # found it
gregexpr(ma_pattern,colnames(df1)) # at position 8
#sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
# fixed = FALSE, useBytes = FALSE)
#sub(ma_pattern,replacement = "'\\"",df1)
More information about the R-help