[R] R help
arun
smartpink111 at yahoo.com
Tue Feb 11 17:51:22 CET 2014
Hi,
My solution was based on the input dataset you showed. If xy at 12_g.com is "xy12_g at gmail.com" (or both of them exist in the dataset?? Not clear!)., then try:
dat <- read.table(text="Emails
Mal123 at gmail.com
Mahi.r at gmail.com
xyz at gmail.com
Ravi_123 at yahoo.com
Lavk.lll at rediff.com
xy12_g at gmail.com",sep="",header=TRUE,stringsAsFactors=FALSE)
library(stringr)
vec1 <- dat$Emails
vec2 <- gsub("\\.[[:alnum:]]+$","",gsub("^([[:alpha:]]+)(\\d+.*)","\\1_\\2",vec1))
indx <- which(str_count(vec2,"\\_")>1)
vec2[indx] <- str_replace(vec2[indx],"_","*")
indx1 <- setdiff(grep("[[:punct:]]+",gsub("\\@.*","",vec2)),indx)
res <- setNames(cbind(dat,do.call(rbind,lapply(seq_along(vec2),function(i) if(i %in% indx1){strsplit(vec2[i],"[_ at .]")[[1]]} else if(i %in% indx){strsplit(vec2[i],"[*@]")[[1]]} else strsplit(gsub("(.*)(\\@.*)","\\1*\\2",vec2[i]),"[*@]")[[1]]))),c("Emails","f.name","l.name","domain"))
res[sapply(res,is.factor)] <- lapply(res[sapply(res,is.factor)],as.character)
res
A.K.
On Tuesday, February 11, 2014 5:31 AM, Malyadri Putchakayala <malyadri.putchakayala at nuevora.com> wrote:
HI,
Emails f.name l.name domain
#1 Mal123 at gmail.com Mal 123 Gmail
#2 Mahi.r at gmail.com Mahi r Gmail
#3 xyz at gmail.com xyz Gmail
#4 Ravi_123 at yahoo.com Ravi 123 yahoo
#5 Lavk.lll at rediff.com Lavk lll rediff
#6 xy at 12_g.com xy 12_g
ABOVE IS ALL ARE RIGHT.BUT MY REQUIREMENT IS 12_G IS ALSO LAST NAME
Emails f.name l.name domain
#1 Mal123 at gmail.com Mal 123 Gmail
#2 Mahi.r at gmail.com Mahi r Gmail
#3 xyz at gmail.com xyz Gmail
#4 Ravi_123 at yahoo.com Ravi 123 yahoo
#5 Lavk.lll at rediff.com Lavk lll rediff
#6 xy12_g at gmail.com xy 12_g Gmail
MY FINAL OUTPUT IS THIS TYPE.IF POSSIBLE PLEASE HELP
More information about the R-help
mailing list