[R] Split fixed width data in R

Zilefac Elvis zilefacelvis at yahoo.com
Wed Oct 22 17:37:42 CEST 2014


Hi,
I have fixed width data that I would like to split into columns. Here is a sanpshot of the data (actual data is a list object):
lst1Sub<-
"20131124GGG1 23.00" 
"20131125GGG1 15.00"   
"20131128GGG1  0.00" 
"201312 1GGG1  0.00"
"201312 4GGG1  0.00"
"201312 7GGG1 10.00" 
"20131210GGG1  0.00"
"20131213GGG1  0.00" 
"20131216GGG1  0.00" 
"20131219GGG1  0.00" 
"20131222GGG1  0.00"
"20131225GGG1  0.00"   
"20131228GGG1  0.00" 

The following script will split the data into [Year Month Day Site Precipitation]
------------------------------------------------------------------------------------------------------
library(stringr)
dateSite <- gsub("(.*G.{3}).*","\\1",lst1Sub); 
dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)), Month=as.numeric(substr(dateSite,5,6)), 
                   Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),Rain=substr(dateSite,13,18),stringsAsFactors=FALSE);
lst3 <- lapply(lst1Sub,function(x) {dateSite <- gsub("(.*G.{3}).*","\\1",x); 
                                    dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)), Month=as.numeric(substr(dateSite,5,6)),Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),stringsAsFactors=FALSE); 
                                    Sims <- str_trim(gsub(".*G.{3}\\s?(.*)","\\1",x));Sims[grep("\\d+-",Sims)] <- gsub("(.*)([-][0-9]+\\.[0-9]+)","\\1 \\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2", Sims[grep("\\d+-",Sims)])); 
                                    Sims1 <- read.table(text=Sims,header=FALSE); names(Sims1) <- c("Precipitation");dat2 <- cbind(dat1,Sims1)}) 
------------------------------------------------------------------------------------------------------------------------------------------

Problem: the above script deletes the first value of my precipitation values. For example, after splitting, "20131124GGG1 23.00" becomes 
2013 11 24 GGG1 3.00 INSTEAD of 2013 11 24 GGG1 23.00 (right answer).

Anything wrong with the string trimming? Is there another way to arrive at the same answer?

Thanks,
AT.



More information about the R-help mailing list