[R] Adding SORT to UNIQUE

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Tue Dec 21 18:58:29 CET 2021


When your brain is wired to treat a data frame like a matrix, then you think things like

for ( col in colnames( col ) ) {
  idx <- expr
  D[ col, idx ] <- otherexpr
}

are reasonable, when

for ( col in colnames( col ) ) {
  idx <- expr
  D[[ col ]][ idx ] <- otherexpr
}

does actually run significantly faster.


On December 21, 2021 9:28:52 AM PST, "Fox, John" <jfox using mcmaster.ca> wrote:
>Dear Jeff,
>
>On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller" <r-help-bounces using r-project.org on behalf of jdnewmil using dcn.davis.ca.us> wrote:
>
>    Intuitive, perhaps, but noticably slower. 
>
>I think that in most applications, one wouldn't notice the difference; for example:
>
>> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))
>
>> microbenchmark(D[, 1])
>Unit: microseconds
>   expr   min    lq    mean median     uq    max neval
> D[, 1] 3.321 3.362 3.98561  3.444 3.5875 51.291   100
>
>> microbenchmark(D[[1]])
>Unit: microseconds
>   expr   min    lq    mean median     uq    max neval
> D[[1]] 1.722 1.763 1.99137  1.804 1.8655 17.876   100
>
>Best,
> John
>
>
>    And it doesn't work on tibbles by design. Data frames are lists of columns.
>
>
>    On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>    >On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>    >> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>    >>> Thanks for the reply.
>    >>>
>    >>> sort(unique(Data[1]))
>    >>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>    >>> decreasing)) :
>    >>>      undefined columns selected
>    >> 
>    >> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
>    >> Data[[1]] for that, so
>    >> 
>    >>     sort(unique(Data[[1]]))
>    >
>    >Actually, I'd probably recommend
>    >
>    >   sort(unique(Data[, 1]))
>    >
>    >instead.  This treats Data as a matrix rather than as a list. 
>    >Dataframes are lists that look like matrices, but to me the matrix 
>    >aspect is usually more intuitive.
>    >
>    >Duncan Murdoch
>    >
>    >> 
>    >> I think Rui already pointed out the typo in the quoted text below...
>    >> 
>    >> Duncan Murdoch
>    >> 
>    >>>
>    >>> The recommended syntax did not work, as listed above.
>    >>>
>    >>> What I want is the sort of distinct column output. Again, the column may
>    >>> be text or numbers. This is a huge analysis effort with data coming at
>    >>> me from many different sources.
>    >>>
>    >>>
>    >>> *Stephen Dawson, DSL*
>    >>> /Executive Strategy Consultant/
>    >>> Business & Technology
>    >>> +1 (865) 804-3454
>    >>> http://www.shdawson.com <http://www.shdawson.com>
>    >>>
>    >>>
>    >>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>    >>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>    >>>>> Thanks everyone for the replies.
>    >>>>>
>    >>>>> It is clear one either needs to write a function or put the unique
>    >>>>> entries into another dataframe.
>    >>>>>
>    >>>>> It seems odd R cannot sort a list of unique column entries with ease.
>    >>>>> Python and SQL can do it with ease.
>    >>>>
>    >>>> I've seen several responses that looked pretty simple.  It's hard to
>    >>>> beat sort(unique(x)), though there's a fair bit of confusion about
>    >>>> what you actually want.  Maybe you should post an example of the code
>    >>>> you'd use in Python?
>    >>>>
>    >>>> Duncan Murdoch
>    >>>>
>    >>>>>
>    >>>>> QUESTION
>    >>>>> Is there a simpler means than other than the unique function to capture
>    >>>>> distinct column entries, then sort that list?
>    >>>>>
>    >>>>>
>    >>>>> *Stephen Dawson, DSL*
>    >>>>> /Executive Strategy Consultant/
>    >>>>> Business & Technology
>    >>>>> +1 (865) 804-3454
>    >>>>> http://www.shdawson.com <http://www.shdawson.com>
>    >>>>>
>    >>>>>
>    >>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>    >>>>>> Hello,
>    >>>>>>
>    >>>>>> Inline.
>    >>>>>>
>    >>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>    >>>>>>> Thanks.
>    >>>>>>>
>    >>>>>>> sort(unique(Data[[1]]))
>    >>>>>>>
>    >>>>>>> This syntax provides row numbers, not column values.
>    >>>>>>
>    >>>>>> This is not right.
>    >>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
>    >>>>>> extracts the column vector.
>    >>>>>>
>    >>>>>> As for my previous answer, it was not addressing the question, I
>    >>>>>> misinterpreted it as being a question on how to sort by numeric order
>    >>>>>> when the data is not numeric. Here is a, hopefully, complete answer.
>    >>>>>> Still with package stringr.
>    >>>>>>
>    >>>>>>
>    >>>>>> cols_to_sort <- 1:4
>    >>>>>>
>    >>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>    >>>>>>      stringr::str_sort(unique(x), numeric = TRUE)
>    >>>>>> })
>    >>>>>>
>    >>>>>>
>    >>>>>> Or using Avi's suggestion of writing a function to do all the work and
>    >>>>>> simplify the lapply loop later,
>    >>>>>>
>    >>>>>>
>    >>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
>    >>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>    >>>>>>
>    >>>>>>
>    >>>>>> Hope this helps,
>    >>>>>>
>    >>>>>> Rui Barradas
>    >>>>>>
>    >>>>>>
>    >>>>>>>
>    >>>>>>> *Stephen Dawson, DSL*
>    >>>>>>> /Executive Strategy Consultant/
>    >>>>>>> Business & Technology
>    >>>>>>> +1 (865) 804-3454
>    >>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>    >>>>>>>
>    >>>>>>>
>    >>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>    >>>>>>>> Hi,
>    >>>>>>>>
>    >>>>>>>>
>    >>>>>>>> Running a simple syntax set to review entries in dataframe columns.
>    >>>>>>>> Here is the working code.
>    >>>>>>>>
>    >>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>    >>>>>>>> describe(Data)
>    >>>>>>>> summary(Data)
>    >>>>>>>> unique(Data[1])
>    >>>>>>>> unique(Data[2])
>    >>>>>>>> unique(Data[3])
>    >>>>>>>> unique(Data[4])
>    >>>>>>>>
>    >>>>>>>> I would like to add sort the unique entries. The data in the various
>    >>>>>>>> columns are not defined as numbers, but also text. I realize 1 and
>    >>>>>>>> 10 will not sort properly, as the column is not defined as a number,
>    >>>>>>>> but want to see what I have in the columns viewed as sorted.
>    >>>>>>>>
>    >>>>>>>> QUESTION
>    >>>>>>>> What is the best process to sort unique output, please?
>    >>>>>>>>
>    >>>>>>>>
>    >>>>>>>> Thanks.
>    >>>>>>>
>    >>>>>>> ______________________________________________
>    >>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>    >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>    >>>>>>> PLEASE do read the posting guide
>    >>>>>>> http://www.R-project.org/posting-guide.html
>    >>>>>>> and provide commented, minimal, self-contained, reproducible code.
>    >>>>>>
>    >>>>>
>    >>>>> ______________________________________________
>    >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>    >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>    >>>>> PLEASE do read the posting guide
>    >>>>> http://www.R-project.org/posting-guide.html
>    >>>>> and provide commented, minimal, self-contained, reproducible code.
>    >>>>
>    >>>>
>    >>>
>    >>>
>    >>
>    >
>    >______________________________________________
>    >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>    >https://stat.ethz.ch/mailman/listinfo/r-help
>    >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>    >and provide commented, minimal, self-contained, reproducible code.
>
>    -- 
>    Sent from my phone. Please excuse my brevity.
>
>    ______________________________________________
>    R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>    https://stat.ethz.ch/mailman/listinfo/r-help
>    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>    and provide commented, minimal, self-contained, reproducible code.
>

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list