[R] Combining CSV data
arun
smartpink111 at yahoo.com
Tue Jun 11 23:09:45 CEST 2013
HI,
You could use:
result3<- data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
colnames(result3)[5:7]<- paste0("DataComment",1:3)
A.K.
________________________________
From: Shreya Rawal <rawal.shreya at gmail.com>
To: arun <smartpink111 at yahoo.com>
Sent: Tuesday, June 11, 2013 4:22 PM
Subject: Re: [R] Combining CSV data
Hey Arun,
I guess you could guide me with this a little bit. I have been working on the solution Jim suggested (and also because that I could understand it with my little knowledge of R :))
So with these commands I am able to get the data in this format:
> fileA <- read.csv(text = "Row_ID_CR, Data1, Data2, Data3
+ 1, aa, bb, cc
+ 2, dd, ee, ff", as.is = TRUE)
>
> fileB <- read.csv(text = "Row_ID_N, Src_Row_ID, DataN1
+ 1a, 1, This is comment 1
+ 2a, 1, This is comment 2
+ 3a, 2, This is comment 1
+ 4a, 1, This is comment 3", as.is = TRUE)
>
> # get rid of leading/trailing blanks on comments
> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1)
>
> # merge together
> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID")
>
> # now partition by Row_ID_CR and aggregate the comments
> result2 <- do.call(rbind,
+ lapply(split(result, result$Row_ID_CR), function(.grp){
+ cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse = '|'))
+ })
+ )
Row_ID_CR Data1 Data2 Data3 comment
1 1 aa bb cc This is comment 1| This is comment 2| This is comment 3
2 2 dd ee ff This is comment 1| This is Comment 2
I can even split the last column by this: strsplit(as.character(result2$comment), split='\\|')
[[1]]
[1] "This is comment 1" "This is comment 2" " This is comment 3"
[[2]]
[1] "This is comment 1" "This is comment 2"
but now I am not sure how to combine everything together. I guess by now you must have realized how new I am to R :)
Thanks!!
Shreya
On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink111 at yahoo.com> wrote:
Hi,
>If the dataset is like this with the comments in the order:
>
>dat2<-read.table(text="
>Row_ID_N, Src_Row_ID, DataN1
>1a, 1, This is comment 1
>2a, 1, This is comment 2
>3a, 2, This is comment 1
>4a, 1, This is comment 3
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>dat3<-read.table(text="
>Row_ID_N, Src_Row_ID, DataN1
>1a, 1, This is comment 1
>2a, 1, This is comment 2
>3a, 2, This is comment 1 #
>
>4a, 1, This is comment 3
>5a, 2, This is comment 2 #
>
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>
>library(stringr)
>library(plyr)
>fun1<- function(data1,data2){
> data2$DataN1<- str_trim(data2$DataN1)
> res<- merge(data1,data2,by.x=1,by.y=2)
> res1<- res[,-5]
> res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
> Mx1<- max(sapply(res2[,5],length))
> res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
> c(x,rep(NA,Mx1-length(x)))
>
> })),stringsAsFactors=FALSE)
> colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
> res3
> }
>
>
>fun1(dat1,dat2)
># Row_ID_CR Data1 Data2 Data3 DataComment1
>#1 1 aa bb cc This is comment 1
>
>#2 2 dd ee ff This is comment 1
># DataComment2 DataComment3
>#1 This is comment 2 This is comment 3
>#2 <NA> <NA>
>
> fun1(dat1,dat3)
># Row_ID_CR Data1 Data2 Data3 DataComment1
>#1 1 aa bb cc This is comment 1
>
>#2 2 dd ee ff This is comment 1
> # DataComment2 DataComment3
>#1 This is comment 2 This is comment 3
>
>#2 This is comment 2 <NA>
>
>
>Otherwise, you need to provide an example that matches the real dataset.
>A.K.
>
>________________________________
>From: Shreya Rawal <rawal.shreya at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Cc: R help <r-help at r-project.org>
>Sent: Tuesday, June 11, 2013 12:22 PM
>
>Subject: Re: [R] Combining CSV data
>
>
>
>Hi Arun,
>
>Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values?
>
>Thanks for your help, I appreciate!
>
>
>
>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111 at yahoo.com> wrote:
>
>HI,
>>I am not sure about your DataN1 column. If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column.
>> My previous solution is not helpful in situations like these:
>>
>>dat2<-read.table(text="
>>Row_ID_N, Src_Row_ID, DataN1
>>1a, 1, This is comment 1
>>2a, 1, This is comment 2
>>3a, 2, This is comment 2
>>4a, 1, This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>dat3<-read.table(text="
>>
>>Row_ID_N, Src_Row_ID, DataN1
>>1a, 1, This is comment 1
>>2a, 1, This is comment 2
>>3a, 2, This is comment 3
>>4a, 1, This is comment 3
>>5a, 2, This is comment 2
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>
>>library(stringr)
>>library(plyr)
>>fun1<- function(data1,data2){
>> data2$DataN1<- str_trim(data2$DataN1)
>> res<- merge(data1,data2,by.x=1,by.y=2)
>> res1<- res[,-5]
>> res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>> Mx1<- max(sapply(res2[,5],length))
>> res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>> indx<- as.numeric(gsub("[[:alpha:]]","",x))
>> x[match(seq(Mx1),indx)]
>> })),stringsAsFactors=FALSE)
>>
>> colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>> res3
>> }
>>fun1(dat1,dat2)
>>
>># Row_ID_CR Data1 Data2 Data3 DataComment1
>>#1 1 aa bb cc This is comment 1
>>#2 2 dd ee ff <NA>
>>
>># DataComment2 DataComment3
>>#1 This is comment 2 This is comment 3
>>#2 This is comment 2 <NA>
>> fun1(dat1,dat3)
>>
>># Row_ID_CR Data1 Data2 Data3 DataComment1
>>#1 1 aa bb cc This is comment 1
>>#2 2 dd ee ff <NA>
>>
>># DataComment2 DataComment3
>>#1 This is comment 2 This is comment 3
>>#2 This is comment 2 This is comment 3
>>
>>
>>
>>A.K.
>>
>>
>>----- Original Message -----
>>
>>From: arun <smartpink111 at yahoo.com>
>>To: Shreya Rawal <rawal.shreya at gmail.com>
>>Cc: R help <r-help at r-project.org>
>>Sent: Monday, June 10, 2013 6:41 PM
>>Subject: Re: [R] Combining CSV data
>>
>>Hi,
>>Try this:
>>
>>dat1<-read.table(text="
>>Row_ID_CR, Data1, Data2, Data3
>>1, aa, bb, cc
>>2, dd, ee, ff
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>dat2<-read.table(text="
>>Row_ID_N, Src_Row_ID, DataN1
>>1a, 1, This is comment 1
>>2a, 1, This is comment 2
>>3a, 2, This is comment 1
>>4a, 1, This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>library(stringr)
>>dat2$DataN1<-str_trim(dat2$DataN1)
>>res<- merge(dat1,dat2,by.x=1,by.y=2)
>> res1<-res[,-5]
>>library(plyr)
>> res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
>> res2
>> # Row_ID_CR Data1 Data2 Data3
>>#1 1 aa bb cc
>>#2 2 dd ee ff
>># DataN1
>>#1 This is comment 1, This is comment 2, This is comment 3
>>#2 This is comment 1
>>
>>
>>
>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
>> colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>res3
>># Row_ID_CR Data1 Data2 Data3 DataComment1
>>#1 1 aa bb cc This is comment 1
>>#2 2 dd ee ff This is comment 1
>># DataComment2 DataComment3
>>#1 This is comment 2 This is comment 3
>>#2 <NA> <NA>
>>
>>A.K.
>>
>>
>>----- Original Message -----
>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>To: r-help at r-project.org
>>Cc:
>>Sent: Monday, June 10, 2013 4:38 PM
>>Subject: [R] Combining CSV data
>>
>>Hello R community,
>>
>>I am trying to combine two CSV files that look like this:
>>
>>File A
>>
>>Row_ID_CR, Data1, Data2, Data3
>>1, aa, bb, cc
>>2, dd, ee, ff
>>
>>
>>File B
>>
>>Row_ID_N, Src_Row_ID, DataN1
>>1a, 1, This is comment 1
>>2a, 1, This is comment 2
>>3a, 2, This is comment 1
>>4a, 1, This is comment 3
>>
>>And the output I am looking for is, comparing the values of Row_ID_CR and
>>Src_Row_ID
>>
>>Output
>>
>>ROW_ID_CR, Data1, Data2, Data3, DataComment1,
>>DataComment2, DataComment3
>>1, aa, bb, cc, This is
>>comment1, This is comment2, This is comment 3
>>2, dd, ee, ff, This is
>>comment1
>>
>>
>>I am a novice R user, I am able to replicate a left join but I need a bit
>>more in the final result.
>>
>>
>>Thanks!!
>>
>> [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
More information about the R-help
mailing list