[R] Splitting dataframes and cleaning extraneous characters
arun
smartpink111 at yahoo.com
Wed Jul 17 19:47:00 CEST 2013
HI,
One problem with using ?subst() would be it depends upon the number of digits, characters etc.
For eg.
substring("-005-190",6)
#[1] "190"
substring("-0057-190",6)
#[1] "-190"
#whereas
gsub("^-[^-]*-","","-0057-190")
#[1] "190"
Probably, your dataset doesn't have that sort of problem.
dat1<- read.table(text="
project boro
123 m
134 k
123 m
123 m
543 q
543 q
134 k
",sep="",header=TRUE,stringsAsFactors=FALSE)
res<-split(dat1,gsub("\\.","",as.character(interaction(dat1[,2],dat1[,1]))))
res
$k134
# project boro
#2 134 k
#7 134 k
#
#$m123
# project boro
#1 123 m
#3 123 m
#4 123 m
#
#$q543
# project boro
#5 543 q
#6 543 q
str(res$k134)
#'data.frame': 2 obs. of 2 variables:
# $ project: int 134 134
# $ boro : chr "k" "k"
A.K.
I was able to split the extraneous stuff using
a<-substring(Project_NBR, first=6)
and then cbind to add the edited column to the df. I have a
sample but I am not sure how to provide it to you. I will try to produce
an example that's similar to what I have:
project boro
123 m
134 k
123 m
123 m
543 q
543 q
134 k
Basically I am trying to subset the data frame according to
project and boro with the name of the subset being boro-project (ex.
m123, k134)
I hope this provides more clarity to my problem.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc:
Sent: Wednesday, July 17, 2013 11:06 AM
Subject: Re: Splitting dataframes and cleaning extraneous characters
Hi,
YOu could try.
?split()
split(ats,ats$Project_NBR)
You also mentioned about two columns.
split(ats,list(ats$col1, ats$col2))
You should have provided an example dataset using ?dput() ( dput(head(data,10)) ) for testing.
Also,
gsub("^-[^-]*-","","-005-190")
#[1] "190"
A.K.
Problem: I have a large data set and need to separate based on factors
in 2 columns. The final output would be a collection of dataframes
renamed to
the corresponding factor levels.
So far I know that for each corresponding factor I can execute
x190<-ats[which(Project_NBR=='-005-190'),]
However there are about 400 factors needing to be separated.
Also, I would like to remove the "-005-". Any guidance will be greatly
appreciated.
More information about the R-help
mailing list