[R] Combining CSV data

arun smartpink111 at yahoo.com
Wed Jun 12 15:10:29 CEST 2013


HI Shreya,
#Looks like you run the two line code as a single line.

result3<- 
data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)


colnames(result3)[5:7]<- paste0("DataComment",1:3)

 result3
#  Row_ID_CR                 Data1        Data2        Data3      DataComment1
#1         1                    aa           bb           cc This is comment 1
#2         2                    dd           ee           ff This is comment 1
#       DataComment2      DataComment3
#1 This is comment 2 This is comment 3
#2              <NA>              <NA>



A.K.


________________________________
From: Shreya Rawal <rawal.shreya at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Cc: R help <r-help at r-project.org>; jim holtman <jholtman at gmail.com> 
Sent: Wednesday, June 12, 2013 8:58 AM
Subject: Re: [R] Combining CSV data



Great, thanks Arun, but I seem to be running into this error. Not sure what did I miss.

> result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames(result)[5:7]<-paste0("DataComment",1:3)
Error: unexpected symbol in "result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames"



On Tue, Jun 11, 2013 at 5:09 PM, arun <smartpink111 at yahoo.com> wrote:


>
>
>HI,
>You could use:
>result3<- data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
>colnames(result3)[5:7]<- paste0("DataComment",1:3)
>
>A.K.
>________________________________
>From: Shreya Rawal <rawal.shreya at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Tuesday, June 11, 2013 4:22 PM
>
>Subject: Re: [R] Combining CSV data
>
>
>
>Hey Arun,
>
>I guess you could guide me with this a little bit. I have been working on the solution Jim suggested (and also because that I could understand it with my little knowledge of R :))
>
>So with these commands I am able to get the data in this format:
>
>> fileA <- read.csv(text = "Row_ID_CR,   Data1,    Data2,    Data3
>+ 1,                   aa,          bb,          cc
>+ 2,                   dd,          ee,          ff", as.is = TRUE)
>> 
>> fileB <- read.csv(text = "Row_ID_N,   Src_Row_ID,   DataN1
>+ 1a,               1,                   This is comment 1
>+ 2a,               1,                   This is comment 2
>+ 3a,               2,                   This is comment 1
>+ 4a,               1,                   This is comment 3", as.is = TRUE)
>> 
>> # get rid of leading/trailing blanks on comments
>> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1)
>> 
>> # merge together
>> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID")
>> 
>> # now partition by Row_ID_CR and aggregate the comments
>> result2 <- do.call(rbind, 
>+     lapply(split(result, result$Row_ID_CR), function(.grp){
>+         cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse = '|'))
>+     })
>+ )
>
>Row_ID_CR                 Data1        Data2        Data3                                                 comment
>1         1                    aa           bb           cc                                              This is comment 1| This is comment 2| This is comment 3
>2         2                    dd           ee           ff                                                This is comment 1| This is Comment 2
>
>I can even split the last column by this: strsplit(as.character(result2$comment), split='\\|')
>
>[[1]]
>[1] "This is comment 1" "This is comment 2" " This is comment 3"
>
>[[2]]
>[1] "This is comment 1" "This is comment 2"
>
>
>but now I am not sure how to combine everything together. I guess by now you must have realized how new I am to R :)
>
>Thanks!!
>Shreya
>
>
>
>
>
>
>On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi,
>>If the dataset is like this with the comments in the order:
>>
>>dat2<-read.table(text="
>>Row_ID_N,  Src_Row_ID,  DataN1
>>1a,              1,                  This is comment 1
>>2a,              1,                  This is comment 2
>>3a,              2,                  This is comment 1
>>4a,              1,                  This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>dat3<-read.table(text="
>>Row_ID_N,  Src_Row_ID,  DataN1
>>1a,              1,                  This is comment 1
>>2a,              1,                  This is comment 2
>>3a,              2,                  This is comment 1   #
>>
>>4a,              1,                  This is comment 3
>>5a,         2,                  This is comment 2  #
>>
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>
>>library(stringr)
>>library(plyr)
>>fun1<- function(data1,data2){
>>    data2$DataN1<- str_trim(data2$DataN1)  
>>        res<- merge(data1,data2,by.x=1,by.y=2)
>>    res1<- res[,-5]
>>    res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>>    Mx1<- max(sapply(res2[,5],length))
>>    res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>>                                  c(x,rep(NA,Mx1-length(x)))
>>
>>                                  })),stringsAsFactors=FALSE)
>>    colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>    res3
>>    }   
>>
>>     
>>fun1(dat1,dat2)
>>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>>#1         1                   aa           bb           cc This is comment 1
>>
>>#2         2                   dd           ee           ff This is comment 1
>>#       DataComment2      DataComment3
>>#1 This is comment 2 This is comment 3
>>#2              <NA>              <NA>
>>
>> fun1(dat1,dat3)
>>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>>#1         1                   aa           bb           cc This is comment 1
>>
>>#2         2                   dd           ee           ff This is comment 1
>> #      DataComment2      DataComment3
>>#1 This is comment 2 This is comment 3
>>
>>#2 This is comment 2              <NA>
>>
>>
>>Otherwise, you need to provide an example that matches the real dataset.
>>A.K.
>>
>>________________________________
>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Cc: R help <r-help at r-project.org>
>>Sent: Tuesday, June 11, 2013 12:22 PM
>>
>>Subject: Re: [R] Combining CSV data
>>
>>
>>
>>Hi Arun,
>>
>>Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values?
>>
>>Thanks for your help, I appreciate!   
>>
>>
>>
>>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>HI,
>>>I am not sure about your DataN1 column.  If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column.
>>>  My previous solution is not helpful in situations like these:
>>>
>>>dat2<-read.table(text="
>>>Row_ID_N,  Src_Row_ID,  DataN1
>>>1a,              1,                  This is comment 1
>>>2a,              1,                  This is comment 2
>>>3a,              2,                  This is comment 2
>>>4a,              1,                  This is comment 3
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>dat3<-read.table(text="
>>>
>>>Row_ID_N,  Src_Row_ID,  DataN1
>>>1a,              1,                  This is comment 1
>>>2a,              1,                  This is comment 2
>>>3a,              2,                  This is comment 3
>>>4a,              1,                  This is comment 3
>>>5a,         2,                  This is comment 2
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>
>>>
>>>library(stringr)
>>>library(plyr)
>>>fun1<- function(data1,data2){
>>>    data2$DataN1<- str_trim(data2$DataN1)   
>>>        res<- merge(data1,data2,by.x=1,by.y=2)
>>>    res1<- res[,-5]
>>>    res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>>>    Mx1<- max(sapply(res2[,5],length))
>>>    res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>>>                                  indx<- as.numeric(gsub("[[:alpha:]]","",x))
>>>                                  x[match(seq(Mx1),indx)]
>>>                                  })),stringsAsFactors=FALSE)
>>>
>>>    colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>>    res3
>>>    }          
>>>fun1(dat1,dat2)
>>>
>>>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>>>#1         1                   aa           bb           cc This is comment 1
>>>#2         2                   dd           ee           ff              <NA>
>>>
>>>#       DataComment2      DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2 This is comment 2              <NA>
>>> fun1(dat1,dat3)
>>>
>>>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>>>#1         1                   aa           bb           cc This is comment 1
>>>#2         2                   dd           ee           ff              <NA>
>>>
>>>#       DataComment2      DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2 This is comment 2 This is comment 3
>>>
>>>
>>>
>>>A.K.
>>>
>>>
>>>----- Original Message -----
>>>
>>>From: arun <smartpink111 at yahoo.com>
>>>To: Shreya Rawal <rawal.shreya at gmail.com>
>>>Cc: R help <r-help at r-project.org>
>>>Sent: Monday, June 10, 2013 6:41 PM
>>>Subject: Re: [R] Combining CSV data
>>>
>>>Hi,
>>>Try this:
>>>
>>>dat1<-read.table(text="
>>>Row_ID_CR,  Data1,    Data2,    Data3
>>>1,                  aa,          bb,          cc
>>>2,                  dd,          ee,          ff
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>
>>>dat2<-read.table(text="
>>>Row_ID_N,  Src_Row_ID,  DataN1
>>>1a,              1,                  This is comment 1
>>>2a,              1,                  This is comment 2
>>>3a,              2,                  This is comment 1
>>>4a,              1,                  This is comment 3
>>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>>library(stringr)
>>>dat2$DataN1<-str_trim(dat2$DataN1)
>>>res<- merge(dat1,dat2,by.x=1,by.y=2)
>>> res1<-res[,-5]
>>>library(plyr)
>>> res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
>>> res2
>>> # Row_ID_CR                Data1        Data2        Data3
>>>#1         1                   aa           bb           cc
>>>#2         2                   dd           ee           ff
>>>#                                                   DataN1
>>>#1 This is comment 1, This is comment 2, This is comment 3
>>>#2                                       This is comment 1
>>>
>>>
>>>
>>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
>>> colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>>res3
>>>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>>>#1         1                   aa           bb           cc This is comment 1
>>>#2         2                   dd           ee           ff This is comment 1
>>>#       DataComment2      DataComment3
>>>#1 This is comment 2 This is comment 3
>>>#2              <NA>              <NA>
>>>
>>>A.K.
>>>
>>>
>>>----- Original Message -----
>>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>>To: r-help at r-project.org
>>>Cc:
>>>Sent: Monday, June 10, 2013 4:38 PM
>>>Subject: [R] Combining CSV data
>>>
>>>Hello R community,
>>>
>>>I am trying to combine two CSV files that look like this:
>>>
>>>File A
>>>
>>>Row_ID_CR,   Data1,    Data2,    Data3
>>>1,                   aa,          bb,          cc
>>>2,                   dd,          ee,          ff
>>>
>>>
>>>File B
>>>
>>>Row_ID_N,   Src_Row_ID,   DataN1
>>>1a,               1,                   This is comment 1
>>>2a,               1,                   This is comment 2
>>>3a,               2,                   This is comment 1
>>>4a,               1,                   This is comment 3
>>>
>>>And the output I am looking for is, comparing the values of Row_ID_CR and
>>>Src_Row_ID
>>>
>>>Output
>>>
>>>ROW_ID_CR,    Data1,    Data2,    Data3,    DataComment1,
>>>DataComment2,          DataComment3
>>>1,                      aa,         bb,         cc,        This is
>>>comment1,    This is comment2,     This is comment 3
>>>2,                      dd,          ee,         ff,          This is
>>>comment1
>>>
>>>
>>>I am a novice R user, I am able to replicate a left join but I need a bit
>>>more in the final result.
>>>
>>>
>>>Thanks!!
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>>______________________________________________
>>>R-help at r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>



More information about the R-help mailing list