[R] Adding SORT to UNIQUE

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Dec 22 17:08:06 CET 2021


On 22/12/2021 10:55 a.m., Stephen H. Dawson, DSL wrote:
> I see.
> 
> So, we are talking taking the output into a new dataframe. I was hoping
> to have the output rendered on screen without another dataframe, but I
> can live with this option it if must occur.
> 
> Am I correct the desired vertical output must first go to a dataframe?

No, that's just one option.  The other 3 don't use dataframes.

Duncan Murdoch
> 
> 
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com <http://www.shdawson.com>
> 
> 
> On 12/22/21 10:47 AM, Duncan Murdoch wrote:
>> On 22/12/2021 10:20 a.m., Stephen H. Dawson, DSL wrote:
>>> Thanks for the reply.
>>>
>>> Both syntax options work to render the correct (unique) output. However,
>>> the output is rendered as horizontal. What needs to happen to get the
>>> output to render vertical, please?
>>
>> The result of those expressions is a vector of the same type as the
>> column, so your question is really about how to get a vector to print
>> one element per line.
>>
>> Probably the simplest way is to put the vector in a dataframe (or
>> matrix, or tibble, depending on which formatting you prefer).  For
>> example,
>>
>>>     v <- c("red", "green", "blue")
>>>     data.frame(v)
>>        v
>> 1   red
>> 2 green
>> 3  blue
>>
>> If you want a more minimal display, try
>>
>>> cat(v, sep = "\n")
>> red
>> green
>> blue
>>
>> or
>>
>>> cat(format(v, justify = "right"), sep = "\n")
>>    red
>> green
>>   blue
>>
>> If you want this to happen when you auto-print the object, you can
>> give it a class attribute and write a function to print that class, e.g.
>>
>>>    class(v) <- "oneperline"
>>>
>>>     print.oneperline <- function(x, ...) cat(format(x, justify =
>> "right"), sep = "\n")
>>>
>>>     v
>>    red
>> green
>>   blue
>>
>> Duncan Murdoch
>>
>>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com <http://www.shdawson.com>
>>>
>>>
>>> On 12/21/21 11:38 AM, Duncan Murdoch wrote:
>>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>> Thanks for the reply.
>>>>>>
>>>>>> sort(unique(Data[1]))
>>>>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>>>>>> decreasing)) :
>>>>>>        undefined columns selected
>>>>>
>>>>> That's the wrong syntax:  Data[1] is not "column one of Data". Use
>>>>> Data[[1]] for that, so
>>>>>
>>>>>       sort(unique(Data[[1]]))
>>>>
>>>> Actually, I'd probably recommend
>>>>
>>>>     sort(unique(Data[, 1]))
>>>>
>>>> instead.  This treats Data as a matrix rather than as a list.
>>>> Dataframes are lists that look like matrices, but to me the matrix
>>>> aspect is usually more intuitive.
>>>>
>>>> Duncan Murdoch
>>>>
>>>>>
>>>>> I think Rui already pointed out the typo in the quoted text below...
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>>
>>>>>> The recommended syntax did not work, as listed above.
>>>>>>
>>>>>> What I want is the sort of distinct column output. Again, the column
>>>>>> may
>>>>>> be text or numbers. This is a huge analysis effort with data
>>>>>> coming at
>>>>>> me from many different sources.
>>>>>>
>>>>>>
>>>>>> *Stephen Dawson, DSL*
>>>>>> /Executive Strategy Consultant/
>>>>>> Business & Technology
>>>>>> +1 (865) 804-3454
>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>
>>>>>>
>>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>> Thanks everyone for the replies.
>>>>>>>>
>>>>>>>> It is clear one either needs to write a function or put the unique
>>>>>>>> entries into another dataframe.
>>>>>>>>
>>>>>>>> It seems odd R cannot sort a list of unique column entries with
>>>>>>>> ease.
>>>>>>>> Python and SQL can do it with ease.
>>>>>>>
>>>>>>> I've seen several responses that looked pretty simple. It's hard to
>>>>>>> beat sort(unique(x)), though there's a fair bit of confusion about
>>>>>>> what you actually want.  Maybe you should post an example of the
>>>>>>> code
>>>>>>> you'd use in Python?
>>>>>>>
>>>>>>> Duncan Murdoch
>>>>>>>
>>>>>>>>
>>>>>>>> QUESTION
>>>>>>>> Is there a simpler means than other than the unique function to
>>>>>>>> capture
>>>>>>>> distinct column entries, then sort that list?
>>>>>>>>
>>>>>>>>
>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>> /Executive Strategy Consultant/
>>>>>>>> Business & Technology
>>>>>>>> +1 (865) 804-3454
>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Inline.
>>>>>>>>>
>>>>>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>>
>>>>>>>>>> This syntax provides row numbers, not column values.
>>>>>>>>>
>>>>>>>>> This is not right.
>>>>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
>>>>>>>>> extracts the column vector.
>>>>>>>>>
>>>>>>>>> As for my previous answer, it was not addressing the question, I
>>>>>>>>> misinterpreted it as being a question on how to sort by numeric
>>>>>>>>> order
>>>>>>>>> when the data is not numeric. Here is a, hopefully, complete
>>>>>>>>> answer.
>>>>>>>>> Still with package stringr.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>>
>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>>>>        stringr::str_sort(unique(x), numeric = TRUE)
>>>>>>>>> })
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Or using Avi's suggestion of writing a function to do all the
>>>>>>>>> work and
>>>>>>>>> simplify the lapply loop later,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Rui Barradas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>> Business & Technology
>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Running a simple syntax set to review entries in dataframe
>>>>>>>>>>> columns.
>>>>>>>>>>> Here is the working code.
>>>>>>>>>>>
>>>>>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>>>>>>>> describe(Data)
>>>>>>>>>>> summary(Data)
>>>>>>>>>>> unique(Data[1])
>>>>>>>>>>> unique(Data[2])
>>>>>>>>>>> unique(Data[3])
>>>>>>>>>>> unique(Data[4])
>>>>>>>>>>>
>>>>>>>>>>> I would like to add sort the unique entries. The data in the
>>>>>>>>>>> various
>>>>>>>>>>> columns are not defined as numbers, but also text. I realize
>>>>>>>>>>> 1 and
>>>>>>>>>>> 10 will not sort properly, as the column is not defined as a
>>>>>>>>>>> number,
>>>>>>>>>>> but want to see what I have in the columns viewed as sorted.
>>>>>>>>>>>
>>>>>>>>>>> QUESTION
>>>>>>>>>>> What is the best process to sort unique output, please?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>>> code.
>>>>>>>>>
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>



More information about the R-help mailing list