[R] Split data frame and create a new column
arun
smartpink111 at yahoo.com
Sat Nov 17 17:59:47 CET 2012
HI,
Just a modification of Rui's function:
fun1<-function(x){
r1<-unlist(strsplit(x,"L\\d+|G|P|S|max|mean|10"))
r1<-r1[r1!=""]
r2<-r1[!grepl("\\_",r1)]
r3<-integer(length(x))
r3[grepl("^L",x)]<-gsub("L(\\d+).*","\\1",x[grep("L\\d+",x)])
r3[grepl("_\\d+$",x)]<-gsub("[\\_]","",r1[grepl("\\_",r1)])
r4<-gsub(".*(G|P|S).*","\\1",x)
res<-data.frame(col1=r2,col2=r3,col3=r4)
res}
fun1(x)
# col1 col2 col3
#1 o3 1 G
#2 o3 1 P
#3 o3 2 G
#4 nox 0 P
#5 pm25 01 S
#6 co 03 S
#7 nox 04 P
A.K.
----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: Zlatan <pollaroid at gmail.com>
Cc: r-help at r-project.org
Sent: Saturday, November 17, 2012 10:22 AM
Subject: Re: [R] Split data frame and create a new column
Hello,
I don't know if this is general purpose but try
x <- scan(what = "character", text="
L1o3maxG10
L1o3P10
L2o3G10
noxP10
pm25S_01
comeanS_03
noxP_04")
fun <- function(x){
r1 <- unlist(strsplit(x, "L[[:digit:]]+|G|P|S"))
r1 <- r1[nchar(r1) != 0]
r1 <- r1[rep(c(TRUE, FALSE), length(r1)/2)]
r1 <- unlist(strsplit(r1, "max|mean"))
r1 <- r1[nchar(r1) != 0]
r2 <- integer(length(x))
w2 <- grep("L[[:digit:]]+", x)
re2 <- regexpr("L[[:digit:]]+", x)
re2 <- unlist(strsplit(regmatches(x, re2), "L"))
re2 <- re2[nchar(re2) != 0]
r2[w2] <- re2
w2 <- grep("G_|P_|S_", x)
re2 <- regmatches(x, regexpr("(G_|P_|S_)[[:digit:]]+", x))
re2 <- unlist(strsplit(re2, "G_|P_|S_"))
re2 <- re2[nchar(re2) != 0]
r2[w2] <- re2
r3 <- regmatches(x, regexpr("G|P|S", x))
data.frame(r1, r2, r3)
}
fun(x)
Hope this helps,
Rui Barradas
Em 16-11-2012 00:05, Zlatan escreveu:
> I need to split a data frame into 3 columns. The column I want to split
> contains indices of lag (prefix L1 or L2 and suffix 01, 03, 04), station
> name (shown in the sample data as capitalized G, P and S) and pollutant
> name. Names with no “L” prefix or 01/04 suffix are lag 0. Lag 01 is average
> of lag 0 and 1, and 04 is average of 0 to 4 days. How can one do that in R?
> I will ignore the other components( e.g. 10 , max or mean)
>
>
>
> Current stand
>
> L1o3maxG10
> L1o3P10
> L2o3G10
> noxP10
> pm25S_01
> comeanS_03
> noxP_04
>
> What I want to get :
>
> pollutant Lag station
> o3 1 G
> o3 1 P
> o3 2 G
> nox 0 P
> Pm25 01 S
> co 03 S
> nox 04 P
>
>
> Thanks
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Split-data-frame-and-create-a-new-column-tp4649683.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list