[R] Comparing elements for equality
markleeds at verizon.net
markleeds at verizon.net
Tue Jan 13 20:57:16 CET 2009
Hi Harold: Below works on your data set but check it a lot because I am
a little worried that
I could have missed something. Hopefully someone can send a a little
clearer way.
dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 =
c('foo', 'foo', 'foo', 'foobar', 'foo'))
print(dat)
temp <- lapply(split(dat,dat$id), function(.df) {
data.frame(id=.df$id[1],freq=nrow(.df),var1=all(.df$var1 %in%
.df$var1[1]),var2=all(.df$var2 %in% .df$var2[1]))
})
result <- do.call(rbind,temp)
print(result)
On Tue, Jan 13, 2009 at 2:17 PM, Doran, Harold wrote:
> Suppose I have a dataframe as follows:
>
> dat <- data.frame(id = c(1,1,2,2,2), var1 = c(10,10,20,20,25), var2 =
> c('foo', 'foo', 'foo', 'foobar', 'foo'))
>
> Now, if I were to subset by id, such as:
>
>> subset(dat, id==1)
> id var1 var2
> 1 1 10 foo
> 2 1 10 foo
>
> I can see that the elements in var1 are exactly the same and the
> elements in var2 are exactly the same. However,
>
>> subset(dat, id==2)
> id var1 var2
> 3 2 20 foo
> 4 2 20 foobar
> 5 2 25 foo
>
> Shows the elements are not the same for either variable in this
> instance. So, what I am looking to create is a data frame that would
> be
> like this
>
> id freq var1 var2
> 1 2 TRUE TRUE
> 2 3 FALSE FALSE
>
> Where freq is the number of times the ID is repeated in the dataframe.
> A
> TRUE appears in the cell if all elements in the column are the same
> for
> the ID and FALSE otherwise. It is insignificant which values differ
> for
> my problem.
>
> The way I am thinking about tackling this is to loop through the ID
> variable and compare the values in the various columns of the
> dataframe.
> The problem I am encountering is that I don't think all.equal or
> identical are the right functions in this case.
>
> So, say I was wanting to compare the elements of var1 for id ==1. I
> would have
>
> x <- c(10,10)
>
> Of course, the following works
>
>> all.equal(x[1], x[2])
> [1] TRUE
>
> As would a similar call to identical. However, what if I only have a
> vector of values (or if the column consists of names) that I want to
> assess for equality when I am trying to automate a process over
> thousands of cases? As in the example above, the vector may contain
> only
> two values or it may contain many more. The number of values in the
> vector differ by id.
>
> Any thoughts?
>
> Harold
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list