[R] Combining CSV data
arun
smartpink111 at yahoo.com
Wed Jun 12 15:10:29 CEST 2013
HI Shreya,
#Looks like you run the two line code as a single line.
result3<-
data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
colnames(result3)[5:7]<- paste0("DataComment",1:3)
result3
# Row_ID_CR Data1 Data2 Data3 DataComment1
#1 1 aa bb cc This is comment 1
#2 2 dd ee ff This is comment 1
# DataComment2 DataComment3
#1 This is comment 2 This is comment 3
#2 <NA> <NA>
A.K.
________________________________
From: Shreya Rawal <rawal.shreya at gmail.com>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>; jim holtman <jholtman at gmail.com>
Sent: Wednesday, June 12, 2013 8:58 AM
Subject: Re: [R] Combining CSV data
Great, thanks Arun, but I seem to be running into this error. Not sure what did I miss.
> result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames(result)[5:7]<-paste0("DataComment",1:3)
Error: unexpected symbol in "result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames"
On Tue, Jun 11, 2013 at 5:09 PM, arun <smartpink111 at yahoo.com> wrote:
>
>
>HI,
>You could use:
>result3<- data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
>colnames(result3)[5:7]<- paste0("DataComment",1:3)
>
>A.K.
>________________________________
>From: Shreya Rawal <rawal.shreya at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, June 11, 2013 4:22 PM
>
>Subject: Re: [R] Combining CSV data
>
>
>
>Hey Arun,
>
>I guess you could guide me with this a little bit. I have been working on the solution Jim suggested (and also because that I could understand it with my little knowledge of R :))
>
>So with these commands I am able to get the data in this format:
>
>> fileA <- read.csv(text = "Row_ID_CR, Data1, Data2, Data3
>+ 1, aa, bb, cc
>+ 2, dd, ee, ff", as.is = TRUE)
>>
>> fileB <- read.csv(text = "Row_ID_N, Src_Row_ID, DataN1
>+ 1a, 1, This is comment 1
>+ 2a, 1, This is comment 2
>+ 3a, 2, This is comment 1
>+ 4a, 1, This is comment 3", as.is = TRUE)
>>
>> # get rid of leading/trailing blanks on comments
>> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1)
>>
>> # merge together
>> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID")
>>
>> # now partition by Row_ID_CR and aggregate the comments
>> result2 <- do.call(rbind,
>+ lapply(split(result, result$Row_ID_CR), function(.grp){
>+ cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse = '|'))
>+ })
>+ )
>
>Row_ID_CR Data1 Data2 Data3 comment
>1 1 aa bb cc This is comment 1| This is comment 2| This is comment 3
>2 2 dd ee ff This is comment 1| This is Comment 2
>
>I can even split the last column by this: strsplit(as.character(result2$comment), split='\\|')
>
>[[1]]
>[1] "This is comment 1" "This is comment 2" " This is comment 3"
>
>[[2]]
>[1] "This is comment 1" "This is comment 2"
>
>
>but now I am not sure how to combine everything together. I guess by now you must have realized how new I am to R :)
>
>Thanks!!
>Shreya
>
>
>
>
>
>
>On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi,
>>If the dataset is like this with the comments in the order:
>>
>>dat2<-read.table(text="
>>Row_ID_N, Src_Row_ID, DataN1
>>1a, 1, This is comment 1
>>2a, 1, This is comment 2
>>3a, 2, This is comment 1
>>4a, 1, This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>dat3<-read.table(text="
>>Row_ID_N, Src_Row_ID, DataN1
>>1a, 1, This is comment 1
>>2a, 1, This is comment 2
>>3a, 2, This is comment 1 #
>>
>>4a, 1, This is comment 3
>>5a, 2, This is comment 2 #
>>
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>
>>library(stringr)
>>library(plyr)
>>fun1<- function(data1,data2){
>> data2$DataN1<- str_trim(data2$DataN1)
>> res<- merge(data1,data2,by.x=1,by.y=2)
>> res1<- res[,-5]
>> res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>> Mx1<- max(sapply(res2[,5],length))
>> res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>> c(x,rep(NA,Mx1-length(x)))
>>
>> })),stringsAsFactors=FALSE)
>> colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>> res3
>> }
>>
>>
>>fun1(dat1,dat2)
>># Row_ID_CR Data1 Data2 Data3 DataComment1
>>#1 1 aa bb cc This is comment 1
>>
>>#2 2 dd ee ff This is comment 1
>># DataComment2 DataComment3
>>#1 This is comment 2 This is comment 3
>>#2 <NA> <NA>
>>
>> fun1(dat1,dat3)
>># Row_ID_CR Data1 Data2 Data3 DataComment1
>>#1 1 aa bb cc This is comment 1
>>
>>#2 2 dd ee ff This is comment 1
>> # DataComment2 DataComment3
>>#1 This is comment 2 This is comment 3
>>
>>#2 This is comment 2 <NA>
>>
>>
>>Otherwise, you need to provide an example that matches the real dataset.
>>A.K.
>>
>>________________________________
>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Cc: R help <r-help at r-project.org>
>>Sent: Tuesday, June 11, 2013 12:22 PM
>>
>>Subject: Re: [R] Combining CSV data
>>
>>
>>
>>Hi Arun,
>>
>>Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values?
>>
>>Thanks for your help, I appreciate!
>>
>>
>>
>>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>HI,
>>>I am not sure about your DataN1 column. If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column.
>>> My previous solution is not helpful in situations like these:
>>>
>>>dat2<-read.table(text="
>>>Row_ID_N, Src_Row_ID, DataN1
>>>1a, 1, This is comment 1
>>>2a, 1, This is comment 2
>>>3a, 2, This is comment 2
>>>4a, 1, This is comment 3
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>dat3<-read.table(text="
>>>
>>>Row_ID_N, Src_Row_ID, DataN1
>>>1a, 1, This is comment 1
>>>2a, 1, This is comment 2
>>>3a, 2, This is comment 3
>>>4a, 1, This is comment 3
>>>5a, 2, This is comment 2
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>
>>>
>>>library(stringr)
>>>library(plyr)
>>>fun1<- function(data1,data2){
>>> data2$DataN1<- str_trim(data2$DataN1)
>>> res<- merge(data1,data2,by.x=1,by.y=2)
>>> res1<- res[,-5]
>>> res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>>> Mx1<- max(sapply(res2[,5],length))
>>> res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>>> indx<- as.numeric(gsub("[[:alpha:]]","",x))
>>> x[match(seq(Mx1),indx)]
>>> })),stringsAsFactors=FALSE)
>>>
>>> colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>> res3
>>> }
>>>fun1(dat1,dat2)
>>>
>>># Row_ID_CR Data1 Data2 Data3 DataComment1
>>>#1 1 aa bb cc This is comment 1
>>>#2 2 dd ee ff <NA>
>>>
>>># DataComment2 DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2 This is comment 2 <NA>
>>> fun1(dat1,dat3)
>>>
>>># Row_ID_CR Data1 Data2 Data3 DataComment1
>>>#1 1 aa bb cc This is comment 1
>>>#2 2 dd ee ff <NA>
>>>
>>># DataComment2 DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2 This is comment 2 This is comment 3
>>>
>>>
>>>
>>>A.K.
>>>
>>>
>>>----- Original Message -----
>>>
>>>From: arun <smartpink111 at yahoo.com>
>>>To: Shreya Rawal <rawal.shreya at gmail.com>
>>>Cc: R help <r-help at r-project.org>
>>>Sent: Monday, June 10, 2013 6:41 PM
>>>Subject: Re: [R] Combining CSV data
>>>
>>>Hi,
>>>Try this:
>>>
>>>dat1<-read.table(text="
>>>Row_ID_CR, Data1, Data2, Data3
>>>1, aa, bb, cc
>>>2, dd, ee, ff
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>
>>>dat2<-read.table(text="
>>>Row_ID_N, Src_Row_ID, DataN1
>>>1a, 1, This is comment 1
>>>2a, 1, This is comment 2
>>>3a, 2, This is comment 1
>>>4a, 1, This is comment 3
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>library(stringr)
>>>dat2$DataN1<-str_trim(dat2$DataN1)
>>>res<- merge(dat1,dat2,by.x=1,by.y=2)
>>> res1<-res[,-5]
>>>library(plyr)
>>> res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
>>> res2
>>> # Row_ID_CR Data1 Data2 Data3
>>>#1 1 aa bb cc
>>>#2 2 dd ee ff
>>># DataN1
>>>#1 This is comment 1, This is comment 2, This is comment 3
>>>#2 This is comment 1
>>>
>>>
>>>
>>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
>>> colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>>res3
>>># Row_ID_CR Data1 Data2 Data3 DataComment1
>>>#1 1 aa bb cc This is comment 1
>>>#2 2 dd ee ff This is comment 1
>>># DataComment2 DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2 <NA> <NA>
>>>
>>>A.K.
>>>
>>>
>>>----- Original Message -----
>>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>>To: r-help at r-project.org
>>>Cc:
>>>Sent: Monday, June 10, 2013 4:38 PM
>>>Subject: [R] Combining CSV data
>>>
>>>Hello R community,
>>>
>>>I am trying to combine two CSV files that look like this:
>>>
>>>File A
>>>
>>>Row_ID_CR, Data1, Data2, Data3
>>>1, aa, bb, cc
>>>2, dd, ee, ff
>>>
>>>
>>>File B
>>>
>>>Row_ID_N, Src_Row_ID, DataN1
>>>1a, 1, This is comment 1
>>>2a, 1, This is comment 2
>>>3a, 2, This is comment 1
>>>4a, 1, This is comment 3
>>>
>>>And the output I am looking for is, comparing the values of Row_ID_CR and
>>>Src_Row_ID
>>>
>>>Output
>>>
>>>ROW_ID_CR, Data1, Data2, Data3, DataComment1,
>>>DataComment2, DataComment3
>>>1, aa, bb, cc, This is
>>>comment1, This is comment2, This is comment 3
>>>2, dd, ee, ff, This is
>>>comment1
>>>
>>>
>>>I am a novice R user, I am able to replicate a left join but I need a bit
>>>more in the final result.
>>>
>>>
>>>Thanks!!
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>______________________________________________
>>>R-help at r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>
More information about the R-help
mailing list