[R] Mathematical working procedure of duplicated() function in r
Greg Snow
538280 @end|ng |rom gm@||@com
Tue Aug 4 21:22:13 CEST 2020
Rui pointed out that you can examine the source yourself. FAQ 7.40
has a link to an article with detail on finding and examining the
source code.
A general algorithm for checking for duplicates follows (I have not
examined to R source code to see if they use something more clever).
Create an empty object (I will call it seen). This could be a simple
vector, but for efficiency it is better to use an object type that has
fast lookup, e.g. binary tree, associative array/hash/dictionary, etc.
Create an empty vector of logicals the same length as x (I will call it result).
loop from 1 to the length of x (or from the length to 1 if
fromLast=TRUE), on each iteration
check to see if the value of x[i] is in seen
If it is: set result[i] to TRUE
If it is not: add the current value to seen and set result[i] to false
After the loop finishes, throw away seen and reclaim the memory, then
return result.
Since it looks like you are using this on a matrix or data frame,
there is probably a preprocessing step that combines all the values on
each row into a single character string.
On Tue, Aug 4, 2020 at 6:45 AM K Purna Prakash <prakash.nani using gmail.com> wrote:
>
> Dear Sir(s),
> I request you to provide the detailed* internal mathematical working
> mechanism of the following function *for better understanding.
> *x[duplicated(x) | duplicated(x, fromLast=TRUE), ]*
> I am having some confusion in understanding how duplicates are being
> identified when thousands of records are there.
> I will look for a positive response.
> Thank you,
> K.Purna Prakash.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Gregory (Greg) L. Snow Ph.D.
538280 using gmail.com
More information about the R-help
mailing list