[R] Problem with comparing multiple data sets

Jim Lemon drjimlemon at gmail.com
Sun May 24 08:15:57 CEST 2015


Hi Mohammad,
You know, I thought this would be fairly easy, but it wasn't really.

df1<-data.frame(Class=c(0,2,1),Comment=c("com1","com2","com3"),
 Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
df2<-data.frame(Class=c(0,2,1),Comment=c("com1","com2","com3"),
 Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
df3<-data.frame(Class=c(2,1,0),Comment=c("com1","com2","com3"),
 Term=c("aac","aax","vvx"),Text=c("text1","text2","text3"))
dflist<-list(df1,df2,df3)
dflist

# define a function that extracts the value from one field
# selected by a value in another field
extract_by_value<-function(x,field1,value1,field2) {
 return(x[x[,field1]==value1,field2])
}

# define another function that equates all of the values
sub_value<-function(x,field1,value1,field2,value2) {
 x[x[,field1]==value1,field2]<-value2
 return(x)
}

conformity<-function(x,fieldname1,value1,fieldname2) {
 # get the most frequent value in fieldname2
 # for the desired value in fieldname1
 most_freq<-as.numeric(names(which.max(table(unlist(lapply(x,
  extract_by_value,fieldname1,value1,fieldname2))))))
 # now set all the values to the most frequent
 for(i in 1:length(x))
  x[[i]]<-sub_value(x[[i]],fieldname1,value1,fieldname2,most_freq)
 return(x)
}

conformity(dflist,"Text","text1","Class")

Jim

On Sat, May 23, 2015 at 11:23 PM, John Kane <jrkrideau at inbox.com> wrote:
> Hi Mohammad
>
> Welcome to the R-help list.
>
> There probably is a fairly easy way to what you want but I think we probably need a bit more background information on what you are trying to achieve.  I know I'm not exactly clear on your decision rule(s).
>
> It would also be very useful to see some actual sample data in useable R format.Have a look at these links http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html for some hints on what you might want to include in your question.
>
> In particular, read up about dput()  in those links and/or see ?dput.  This is the generally preferred way to supply sample or illustrative data to the R-help list.  It basically creates a perfect copy of the data as it exists on 'your' machine so that R-help readers see exactly what you do.
>
>
>
>
>
>
>
> John Kane
> Kingston ON Canada
>
>
>> -----Original Message-----
>> From: mxalimohamma at ualr.edu
>> Sent: Fri, 22 May 2015 12:37:50 -0500
>> To: r-help at r-project.org
>> Subject: [R] Problem with comparing multiple data sets
>>
>> Hi everyone,
>>
>> I am very new to R and I have a task to do. I appreciate any help. I have
>> 3
>> data sets. Each data set has 4 columns. For example:
>>
>> Class  Comment   Term   Text
>> 0           com1        aac    text1
>> 2           com2        aax    text2
>> 1           com3        vvx    text3
>>
>> Now I need t compare the class section between 3 data sets and assign the
>> most available class to that text. For example if text1 is assigned to
>> class 0 in data set 1&2 but assigned as 2 in data set 3 then it should be
>> assigned to class 0. If they are all the same so the class will be the
>> same. The ideal thing would be to keep the same format and just update
>> the
>> class. Is there any easy way to do this?
>>
>> Thanks a lot.
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ____________________________________________________________
> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list