[R] Testing if all elements are equal in a vector/matrix
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jun 16 17:18:57 CEST 2009
On Tue, 16 Jun 2009, Prof Brian Ripley wrote:
> On Tue, 16 Jun 2009, jim holtman wrote:
>
>> I think the only way that you are going to get it to stop on the first
>> mismatch is to write your own function in C if you are concerned about the
>> time. Matching on character vectors will be even more costly since it is
>> having to loop to check the equality of each character in each element.
>> This is one of the places it might pay to convert to factors and then the
>> comparison only uses the integer values assigned to the factors.
>
> Not so in a recent R: comparison of character vectors is now done by
> comparing pointers in the first instance so (at least on a 32-bit platform)
> is as fast as comparing integers. And on x86_64 Linux:
>
>> x <- as.character(c(1,2,rep(1,10000000)))
>> system.time(print(all(x[1] == x)))
> [1] FALSE
> user system elapsed
> 0.123 0.019 0.142
>
>> system.time(xx <- as.factor(x))
> user system elapsed
> 9.874 0.284 10.159
>> system.time(print(all(xx[1] == xx)))
> [1] FALSE
> user system elapsed
> 0.511 0.145 0.656
>
> Recent pre-release versions of R (e.g. 2.9.1 beta) allow
>
>> system.time(anyDuplicated(x))
> user system elapsed
> 0.034 0.078 0.113
>> system.time(anyDuplicated(xx))
> user system elapsed
> 0.037 0.076 0.113
I'm sorry, a line got reverted here: I had edited this to say
'which is a C-level speedup of the sort the original poster seemed to
be looking for'
>
>>
>> On Tue, Jun 16, 2009 at 8:31 AM, utkarshsinghal <
>> utkarsh.singhal at global-analytics.com> wrote:
>>
>>> Hi Jim,
>>>
>>> What you are saying is correct. Although, my computer might not have same
>>> speed and I am getting the following for 10M entries:
>>>
>>> user system elapsed
>>> 0.559 0.038 0.607
>>>
>>> Moreover, in the case of character vectors, it gets more than double.
>>>
>>> In my modeling, which is already highly time consuming, I need to do
>>> check
>>> this for few thousand vectors and the entries can easily be 10M in each
>>> vector. So I am just looking for any possibilities of time saving. I am
>>> pretty sure that whenever elements are not all equal, it can be concluded
>>> from any few entries (most of the times). It will be worth if I can find a
>>> way which stops checking further the moment it find two distinct elements.
>>>
>>> Regards
>>> Utkarsh
>>>
>>>
>>>
>>> jim holtman wrote:
>>>
>>> Just check that the first (or any other element) is equal to all the rest:
>>>
>>>> x = c(1,2,rep(1,10000000)) # 10,000,000
>>>> system.time(print(all(x[1] == x)))
>>> [1] FALSE
>>> user system elapsed
>>> 0.18 0.00 0.19
>>>
>>>>
>>> This was for 10M entries.
>>>
>>> On Tue, Jun 16, 2009 at 7:42 AM, utkarshsinghal <
>>> utkarsh.singhal at global-analytics.com> wrote:
>>>
>>>>
>>>> Hi All,
>>>>
>>>> There are several replies to the question below, but I think there must
>>>> exist a better way of doing so.
>>>> I just want to check whether all the elements of a vector are same. My
>>>> vector has one million elements and it is highly likely that there are
>>>> distinct elements in the first few itself. For example:
>>>>
>>>> > x = c(1,2,rep(1,100000))
>>>>
>>>> I want the answer as FALSE, which is clear from the first two
>>>> observations itself and we don't need to check for the rest.
>>>>
>>>> Does anybody know the most efficient way of doing this?
>>>>
>>>> Regards
>>>> Utkarsh
>>>>
>>>>
>>>>
>>>> From: Francisco J. Zagmutt <gerifalte28_at_hotmail.com
>>>> <mailto:gerifalte28_at_hotmail.com
>>>> ?Subject=Re:%20%5BR%5D%20Testing%20if%20all%20elements%20are%20equal%20in%20a%20vector/matrix>>
>>>>
>>>> Date: Tue 30 Aug 2005 - 06:05:20 EST
>>>>
>>>>
>>>> Hi Doran
>>>>
>>>> The documentation for isTRUE reads 'isTRUE(x)' is an abbreviation of
>>>> 'identical(TRUE,x)' so actually Vincent's solutions is "cleaner" than
>>>> using identical :)
>>>>
>>>> Cheers
>>>>
>>>> Francisco
>>>>
>>>> />From: "Doran, Harold" <HDoran at air.org> /
>>>> />To: <vincent.goulet at act.ulaval.ca>, <r-help at stat.math.ethz.ch> /
>>>> />Subject: Re: [R] Testing if all elements are equal in a vector/matrix /
>>>> />Date: Mon, 29 Aug 2005 15:49:20 -0400 /
>>>> /> /
>>>> >See ?identical
>>>> <http://tolstoy.newcastle.edu.au/R/help/05/08/11201.html#11202qlink1>
>>>> /> /
>>>> />-----Original Message----- /
>>>> />From: r-help-bounces at stat.math.ethz.ch /
>>>> />[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Vincent Goulet /
>>>> />Sent: Monday, August 29, 2005 3:35 PM /
>>>> />To: r-help at stat.math.ethz.ch /
>>>> />Subject: [R] Testing if all elements are equal in a vector/matrix /
>>>> /> /
>>>> /> /
>>>> />Is there a canonical way to check if all elements of a vector or
>>>> matrix are /
>>>> />the same? Solutions below work, but look hackish to me. /
>>>> /> /
>>>> /> > x <- rep(1, 10) /
>>>> /> > all(x == x[1]) # == operator does not provide for small differences
>>>> /
>>>> */>[1] TRUE /
>>>> */> > isTRUE(all.equal(x, rep(x[1], length(x)))) # ugly /
>>>> */>[1] TRUE /
>>>> */> /
>>>> />Best, /
>>>> /> /
>>>> />Vincent /
>>>> />-- /
>>>> /> Vincent Goulet, Associate Professor /
>>>> /> ?cole d'actuariat /
>>>> /> Universit? Laval, Qu?bec /
>>>> />
>>>> Vincent.Goulet_at_act.ulaval.ca<http://vincent.goulet_at_act.ulaval.ca/>
>>>> <mailto:Vincent.Goulet_at_act.ulaval.ca
>>>> ?Subject=Re:%20%5BR%5D%20Testing%20if%20all%20elements%20are%20equal%20in%20a%20vector/matrix>
>>>> http://vgoulet.act.ulaval.ca /
>>>> /> /
>>>> />______________________________________________ /
>>>> />R-help at stat.math.ethz.ch mailing list /
>>>> />https://stat.ethz.ch/mailman/listinfo/r-help /
>>>> />PLEASE do read the posting guide! /
>>>> />http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>/
>>>> /> /
>>>> />______________________________________________ /
>>>> />R-help at stat.math.ethz.ch mailing list /
>>>> />https://stat.ethz.ch/mailman/listinfo/r-help /
>>>> />PLEASE do read the posting guide! /
>>>> />http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>/
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>>
>>> What is the problem that you are trying to solve?
>>>
>>>
>>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> [[alternative HTML version deleted]]
>>
>>
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list