[R] using a regular expression

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Mon Sep 12 18:32:30 CEST 2016


If you think you might want to put this function into a package, it would be much better to use gsub instead of passing the job off to an external program, because non-POSIX operating systems (Windows) will be a headache to support.
-- 
Sent from my phone. Please excuse my brevity.

On September 10, 2016 12:23:37 PM PDT, Glenn Schultz <glennmschultz at me.com> wrote:
>I have a file that for basically carries three datasets of differing
>lengths.  To make this a single downloadable file the creator of the
>file as used both NUL hex00 and space hex20 to normalize the lengths.
>
>Below is the function that I am writing.  I am using sed to replace the
>hex characters.  First, to get past NUL I use sed to replace hex 00
>with hex 20.  This has worked.  Once the Nul is removed and can
>successfully parse the file with ReadLine sub_str.  This final step
>before delimiting the file and making it nice and tidy is to remove the
>hex 20 characters.   I am using the same strategy to eliminate the
>spaces and sed command works in a shell but does not work in the R
>function.  What am I doing wrong?  I have dput - some of the nastier
>lines with hex 20 characters below my code.
>
>Any advice is appreciated.
>
>Glenn
>
>arm <- function(filepath){
>callpath <- paste(filepath, "arm.txt", sep ="")
>ARMReturn <- paste(filepath, "arm.csv", sep = "")
>ARMPoolReturnPath <- paste(filepath,"armatpool.csv", sep = "")
>ARMNextChgReturnPath <- paste(filepath,"nexratechangedate.csv", sep =
>"")
>ARMFirstPmtReturnPath <- paste(filepath,"firstpaymentdate.csv", sep =
>"")
>
># This file contains NUL hex characters before parsing the file replace
># the hex NUL x00 with space x20 and save as a csv file. Use system
>command
>sedcommand <- paste("sed -e 's/\\x00/\\x20/g' <", 
>filepath, "arm.txt", 
>">", "arm.csv", sep = " ")
>system(sedcommand)
>
># read the arm quartile data to a file once skipNuls then length of
>each
># record set changes and the data map provided by FNMA is no longer
>valid
># with respect to the length of each embedded data set
>data <- readLines(ARMReturn, encoding = "ascii")
>
>quartile <- NULL
>numchar <- nchar(x = data, type = "chars")
>start <- c(seq(1, numchar, 399))
>end <- c(seq(399, numchar, 399))
>quartile <- str_sub(data, start[1:length(start)], end[1:length(end)])
>write(quartile, ARMReturn)
>
># The file has been parsed accroding to length 400 for each data
>element.
># The next step is to remove all the trailing white space hex character
># x20
>
>sedcommand2 <- paste("sed -e '/\\x20/d' <", 
>filepath, "arm.csv", 
>">", "arm2.csv", sep = "")
>system(sedcommand2)
>} # end of function
>
>
>c("                                                 555556
>WS320021201006125{000378{000348{                                       
>                                                                    ", 
>"                                                  555556
>WS320021201006250{000954{000880{                                       
>                                                                    ", 
>"                                                   555556
>WS320021201005625{001062{000983{                                       
>                                                                    ", 
>"                                                    555556
>WS320030101005250{000027{000025{                                       
>                                                                    ", 
>"                                                     555556
>WS320030101006500{000033{000030{                                       
>                                                                    ", 
>"                                                      555556
>WS320030101005125{000061{000056{                                       
>                                                                    ", 
>"                                                       555556
>WS320030101005375{000095{000088{                                       
>                                                                    ", 
>"                                                        555556
>WS320030101005350{000217{000200{                                       
>                                                                    ", 
>"                                                         555556
>WS320030101006125{000400{000369{                                       
>                                                                    ", 
>"                                                          555556
>WS320030101005310{000439{000406{                                       
>                                                                    ", 
>"                                                           555556
>WS320030101006000{000573{000529{                                       
>                                                                      "
>
>
>
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list