[R] Adding SORT to UNIQUE
Stephen H. Dawson, DSL
@erv|ce @end|ng |rom @hd@w@on@com
Wed Dec 22 18:03:39 CET 2021
Wow! Thanks.
I need to process the logic you have presented next week when I have the
time to focus. I now need to accomplish some productive work output
based on what I have now for understandings.
Kindest Regards,
*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>
On 12/22/21 11:57 AM, Rui Barradas wrote:
> Hello,
>
> The problem is that the vectors of unique values in each column of the
> original data.frame Data need not be same length. And the output of
> sort(unique(.)) is a list of vectors of different lengths. And lists
> print "horizontally", each vector on its own.
>
> Like Duncan said, one of the ways of getting a vertical display is to
> have the list of sorted, unique values be of a custom class and write
> a print method for that class. Here is an example of this. The
> function to sort outputs an object of a class that sub-classes class
> "list". And a print method takes care of the printing. This method
> creates a temp data.frame, prints that df and invisibly returns its
> input.
>
> # Create a test data set
> set.seed(2021)
> Data <- replicate(4, as.character(sample(20, 20, TRUE)))
> Data <- as.data.frame(Data)
>
>
> # Now the functions
> sort_unique <- function(x){
> y <- lapply(x, \(.x) stringr::str_sort(unique(.x), numeric = TRUE))
> old_class <- class(y)
> class(y) <- c("sortUnique", old_class)
> y
> }
> print.sortUnique <- function(x, ...){
> n <- max(lengths(x))
> y <- lapply(x, \(.x) c(.x, rep("", n - length(.x))))
> y <- do.call(cbind.data.frame, y)
> names(y) <- names(x)
> print(y)
> invisible(x)
> }
>
> # Test the functions above
> Data2 <- sort_unique(Data)
>
> class(Data2)
> Data2
> print(Data2)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 15:55 de 22/12/21, Stephen H. Dawson, DSL escreveu:
>> I see.
>>
>> So, we are talking taking the output into a new dataframe. I was
>> hoping to have the output rendered on screen without another
>> dataframe, but I can live with this option it if must occur.
>>
>> Am I correct the desired vertical output must first go to a dataframe?
>>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com <http://www.shdawson.com>
>>
>>
>> On 12/22/21 10:47 AM, Duncan Murdoch wrote:
>>> On 22/12/2021 10:20 a.m., Stephen H. Dawson, DSL wrote:
>>>> Thanks for the reply.
>>>>
>>>> Both syntax options work to render the correct (unique) output.
>>>> However,
>>>> the output is rendered as horizontal. What needs to happen to get the
>>>> output to render vertical, please?
>>>
>>> The result of those expressions is a vector of the same type as the
>>> column, so your question is really about how to get a vector to
>>> print one element per line.
>>>
>>> Probably the simplest way is to put the vector in a dataframe (or
>>> matrix, or tibble, depending on which formatting you prefer). For
>>> example,
>>>
>>> > v <- c("red", "green", "blue")
>>> > data.frame(v)
>>> v
>>> 1 red
>>> 2 green
>>> 3 blue
>>>
>>> If you want a more minimal display, try
>>>
>>> > cat(v, sep = "\n")
>>> red
>>> green
>>> blue
>>>
>>> or
>>>
>>> > cat(format(v, justify = "right"), sep = "\n")
>>> red
>>> green
>>> blue
>>>
>>> If you want this to happen when you auto-print the object, you can
>>> give it a class attribute and write a function to print that class,
>>> e.g.
>>>
>>> > class(v) <- "oneperline"
>>> >
>>> > print.oneperline <- function(x, ...) cat(format(x, justify =
>>> "right"), sep = "\n")
>>> >
>>> > v
>>> red
>>> green
>>> blue
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>>
>>>> *Stephen Dawson, DSL*
>>>> /Executive Strategy Consultant/
>>>> Business & Technology
>>>> +1 (865) 804-3454
>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>
>>>>
>>>> On 12/21/21 11:38 AM, Duncan Murdoch wrote:
>>>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> sort(unique(Data[1]))
>>>>>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>>>>>>> decreasing)) :
>>>>>>> undefined columns selected
>>>>>>
>>>>>> That's the wrong syntax: Data[1] is not "column one of Data". Use
>>>>>> Data[[1]] for that, so
>>>>>>
>>>>>> sort(unique(Data[[1]]))
>>>>>
>>>>> Actually, I'd probably recommend
>>>>>
>>>>> sort(unique(Data[, 1]))
>>>>>
>>>>> instead. This treats Data as a matrix rather than as a list.
>>>>> Dataframes are lists that look like matrices, but to me the matrix
>>>>> aspect is usually more intuitive.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>>
>>>>>> I think Rui already pointed out the typo in the quoted text below...
>>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>>>
>>>>>>> The recommended syntax did not work, as listed above.
>>>>>>>
>>>>>>> What I want is the sort of distinct column output. Again, the
>>>>>>> column
>>>>>>> may
>>>>>>> be text or numbers. This is a huge analysis effort with data
>>>>>>> coming at
>>>>>>> me from many different sources.
>>>>>>>
>>>>>>>
>>>>>>> *Stephen Dawson, DSL*
>>>>>>> /Executive Strategy Consultant/
>>>>>>> Business & Technology
>>>>>>> +1 (865) 804-3454
>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>
>>>>>>>
>>>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>>> Thanks everyone for the replies.
>>>>>>>>>
>>>>>>>>> It is clear one either needs to write a function or put the
>>>>>>>>> unique
>>>>>>>>> entries into another dataframe.
>>>>>>>>>
>>>>>>>>> It seems odd R cannot sort a list of unique column entries
>>>>>>>>> with ease.
>>>>>>>>> Python and SQL can do it with ease.
>>>>>>>>
>>>>>>>> I've seen several responses that looked pretty simple. It's
>>>>>>>> hard to
>>>>>>>> beat sort(unique(x)), though there's a fair bit of confusion about
>>>>>>>> what you actually want. Maybe you should post an example of
>>>>>>>> the code
>>>>>>>> you'd use in Python?
>>>>>>>>
>>>>>>>> Duncan Murdoch
>>>>>>>>
>>>>>>>>>
>>>>>>>>> QUESTION
>>>>>>>>> Is there a simpler means than other than the unique function to
>>>>>>>>> capture
>>>>>>>>> distinct column entries, then sort that list?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>> Business & Technology
>>>>>>>>> +1 (865) 804-3454
>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Inline.
>>>>>>>>>>
>>>>>>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help
>>>>>>>>>> escreveu:
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>>>
>>>>>>>>>>> This syntax provides row numbers, not column values.
>>>>>>>>>>
>>>>>>>>>> This is not right.
>>>>>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax
>>>>>>>>>> Data[[1]]
>>>>>>>>>> extracts the column vector.
>>>>>>>>>>
>>>>>>>>>> As for my previous answer, it was not addressing the question, I
>>>>>>>>>> misinterpreted it as being a question on how to sort by numeric
>>>>>>>>>> order
>>>>>>>>>> when the data is not numeric. Here is a, hopefully, complete
>>>>>>>>>> answer.
>>>>>>>>>> Still with package stringr.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>>>
>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>>>>> stringr::str_sort(unique(x), numeric = TRUE)
>>>>>>>>>> })
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Or using Avi's suggestion of writing a function to do all the
>>>>>>>>>> work and
>>>>>>>>>> simplify the lapply loop later,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec),
>>>>>>>>>> ...)
>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Rui Barradas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>>> Business & Technology
>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Running a simple syntax set to review entries in dataframe
>>>>>>>>>>>> columns.
>>>>>>>>>>>> Here is the working code.
>>>>>>>>>>>>
>>>>>>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>>>>>>>>> describe(Data)
>>>>>>>>>>>> summary(Data)
>>>>>>>>>>>> unique(Data[1])
>>>>>>>>>>>> unique(Data[2])
>>>>>>>>>>>> unique(Data[3])
>>>>>>>>>>>> unique(Data[4])
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to add sort the unique entries. The data in the
>>>>>>>>>>>> various
>>>>>>>>>>>> columns are not defined as numbers, but also text. I
>>>>>>>>>>>> realize 1 and
>>>>>>>>>>>> 10 will not sort properly, as the column is not defined as a
>>>>>>>>>>>> number,
>>>>>>>>>>>> but want to see what I have in the columns viewed as sorted.
>>>>>>>>>>>>
>>>>>>>>>>>> QUESTION
>>>>>>>>>>>> What is the best process to sort unique output, please?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> ______________________________________________
>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and
>>>>>>>>>>> more, see
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>>>> code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>> code.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
More information about the R-help
mailing list