[R] Adding SORT to UNIQUE
Stephen H. Dawson, DSL
@erv|ce @end|ng |rom @hd@w@on@com
Thu Dec 23 11:36:34 CET 2021
Yes, I saw the period character is where the problem occurred. However,
I decided to loop back with the poster to close the discussion loop.
*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>
On 12/22/21 5:37 PM, Rui Barradas wrote:
> Hello,
>
> The error is a simple typo, instead of the period after
> names(Data[,1]), it should be a comma.
>
> cat(format(names(Data[,1]), "\n", v1, justify = "right"), sep = "\n")
>
> (And the error message accurately points out where the error is, in
> these cases try to read the instruction more carefully, typos can be
> hard to find.)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 17:59 de 22/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>> Thanks.
>>
>> I am pondering label names, not set on one as of yet. I like your
>> recommendation.
>>
>>
>> > cat(format(names(Data[,1]). "\n", v1, justify = "right"), sep = "\n")
>> Error: unexpected symbol in "cat(format(names(Data[,1])."
>> >
>>
>> Your proposed syntax has an error.
>>
>> QUESTION
>> Can you identify the error and reply with another recommendation,
>> please?
>>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com <http://www.shdawson.com>
>>
>>
>> On 12/22/21 12:33 PM, Avi Gross via R-help wrote:
>>> Stephen,
>>>
>>> Why should there be a column header when you take your data and
>>> reformat it?
>>>
>>> cat(format(v1, justify = "right"), sep = "\n")
>>>
>>> The above is no longer your original data structure and has
>>> specified what you want printed. Your column header and other names
>>> associated with your original data.frame are stored as attributes
>>> that you sort of discarded.
>>>
>>> The name you want is associated not with v1 but with what you call
>>> Data[,1] and you can get that name using names(Data[,1]) and put it
>>> where you want. In your case, if you want the single line above your
>>> values to have that name, this would do it:
>>>
>>> cat(format(names(Data[,1]). "\n", v1, justify = "right"), sep = "\n")
>>>
>>> -----Original Message-----
>>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Stephen H.
>>> Dawson, DSL via R-help
>>> Sent: Wednesday, December 22, 2021 12:02 PM
>>> To: Duncan Murdoch <murdoch.duncan using gmail.com>; Rui Barradas
>>> <ruipbarradas using sapo.pt>; Stephen H. Dawson, DSL via R-help
>>> <r-help using r-project.org>
>>> Subject: Re: [R] Adding SORT to UNIQUE
>>>
>>> Data <- read.csv("./input/Source.csv", header=T)
>>> v1 <- sort(unique(Data[, 1]))
>>> cat(format(v1, justify = "right"), sep = "\n")
>>>
>>> OK, working with the options you presented. This is the combination
>>> where I gain the most benefit.
>>>
>>> However, there is no listing of a column header with the output of
>>> this syntax.
>>>
>>> > cat(format(v1, justify = "right"), sep = "\n")
>>> 2
>>> 3
>>> 4
>>> 5
>>> 6
>>> 7
>>> 8
>>> 9
>>> 10
>>> >
>>>
>>> NOTE
>>> The output here is correct (unique) based on the entries from the
>>> column.
>>>
>>> QUESTION
>>> How does one add a text label of something as simple as v1 to the
>>> vertical output of this syntax, please?
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com <http://www.shdawson.com>
>>>
>>>
>>> On 12/22/21 11:13 AM, Stephen H. Dawson, DSL via R-help wrote:
>>>> OK, now I get what you are suggesting.
>>>>
>>>> Much appreciated.
>>>>
>>>>
>>>> Kindest Regards,
>>>> *Stephen Dawson, DSL*
>>>> /Executive Strategy Consultant/
>>>> Business & Technology
>>>> +1 (865) 804-3454
>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>
>>>>
>>>> On 12/22/21 11:08 AM, Duncan Murdoch wrote:
>>>>> On 22/12/2021 10:55 a.m., Stephen H. Dawson, DSL wrote:
>>>>>> I see.
>>>>>>
>>>>>> So, we are talking taking the output into a new dataframe. I was
>>>>>> hoping to have the output rendered on screen without another
>>>>>> dataframe, but I can live with this option it if must occur.
>>>>>>
>>>>>> Am I correct the desired vertical output must first go to a
>>>>>> dataframe?
>>>>> No, that's just one option. The other 3 don't use dataframes.
>>>>>
>>>>> Duncan Murdoch
>>>>>>
>>>>>> *Stephen Dawson, DSL*
>>>>>> /Executive Strategy Consultant/
>>>>>> Business & Technology
>>>>>> +1 (865) 804-3454
>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>
>>>>>>
>>>>>> On 12/22/21 10:47 AM, Duncan Murdoch wrote:
>>>>>>> On 22/12/2021 10:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>>>> Thanks for the reply.
>>>>>>>>
>>>>>>>> Both syntax options work to render the correct (unique) output.
>>>>>>>> However,
>>>>>>>> the output is rendered as horizontal. What needs to happen to get
>>>>>>>> the output to render vertical, please?
>>>>>>> The result of those expressions is a vector of the same type as the
>>>>>>> column, so your question is really about how to get a vector to
>>>>>>> print one element per line.
>>>>>>>
>>>>>>> Probably the simplest way is to put the vector in a dataframe (or
>>>>>>> matrix, or tibble, depending on which formatting you prefer). For
>>>>>>> example,
>>>>>>>
>>>>>>>> v <- c("red", "green", "blue")
>>>>>>>> data.frame(v)
>>>>>>> v
>>>>>>> 1 red
>>>>>>> 2 green
>>>>>>> 3 blue
>>>>>>>
>>>>>>> If you want a more minimal display, try
>>>>>>>
>>>>>>>> cat(v, sep = "\n")
>>>>>>> red
>>>>>>> green
>>>>>>> blue
>>>>>>>
>>>>>>> or
>>>>>>>
>>>>>>>> cat(format(v, justify = "right"), sep = "\n")
>>>>>>> red
>>>>>>> green
>>>>>>> blue
>>>>>>>
>>>>>>> If you want this to happen when you auto-print the object, you can
>>>>>>> give it a class attribute and write a function to print that class,
>>>>>>> e.g.
>>>>>>>
>>>>>>>> class(v) <- "oneperline"
>>>>>>>>
>>>>>>>> print.oneperline <- function(x, ...) cat(format(x, justify =
>>>>>>> "right"), sep = "\n")
>>>>>>>> v
>>>>>>> red
>>>>>>> green
>>>>>>> blue
>>>>>>>
>>>>>>> Duncan Murdoch
>>>>>>>
>>>>>>>>
>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>> /Executive Strategy Consultant/
>>>>>>>> Business & Technology
>>>>>>>> +1 (865) 804-3454
>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/21/21 11:38 AM, Duncan Murdoch wrote:
>>>>>>>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>>>>>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>>>>>>> Thanks for the reply.
>>>>>>>>>>>
>>>>>>>>>>> sort(unique(Data[1]))
>>>>>>>>>>> Error in `[.data.frame`(x, order(x, na.last = na.last,
>>>>>>>>>>> decreasing =
>>>>>>>>>>> decreasing)) :
>>>>>>>>>>> undefined columns selected
>>>>>>>>>> That's the wrong syntax: Data[1] is not "column one of Data".
>>>>>>>>>> Use Data[[1]] for that, so
>>>>>>>>>>
>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>> Actually, I'd probably recommend
>>>>>>>>>
>>>>>>>>> sort(unique(Data[, 1]))
>>>>>>>>>
>>>>>>>>> instead. This treats Data as a matrix rather than as a list.
>>>>>>>>> Dataframes are lists that look like matrices, but to me the
>>>>>>>>> matrix aspect is usually more intuitive.
>>>>>>>>>
>>>>>>>>> Duncan Murdoch
>>>>>>>>>
>>>>>>>>>> I think Rui already pointed out the typo in the quoted text
>>>>>>>>>> below...
>>>>>>>>>>
>>>>>>>>>> Duncan Murdoch
>>>>>>>>>>
>>>>>>>>>>> The recommended syntax did not work, as listed above.
>>>>>>>>>>>
>>>>>>>>>>> What I want is the sort of distinct column output. Again, the
>>>>>>>>>>> column may be text or numbers. This is a huge analysis effort
>>>>>>>>>>> with data coming at me from many different sources.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>> /Executive Strategy Consultant/ Business & Technology
>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>>>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Thanks everyone for the replies.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is clear one either needs to write a function or put the
>>>>>>>>>>>>> unique entries into another dataframe.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It seems odd R cannot sort a list of unique column entries
>>>>>>>>>>>>> with ease.
>>>>>>>>>>>>> Python and SQL can do it with ease.
>>>>>>>>>>>> I've seen several responses that looked pretty simple. It's
>>>>>>>>>>>> hard to beat sort(unique(x)), though there's a fair bit of
>>>>>>>>>>>> confusion about what you actually want. Maybe you should post
>>>>>>>>>>>> an example of the code you'd use in Python?
>>>>>>>>>>>>
>>>>>>>>>>>> Duncan Murdoch
>>>>>>>>>>>>
>>>>>>>>>>>>> QUESTION
>>>>>>>>>>>>> Is there a simpler means than other than the unique function
>>>>>>>>>>>>> to capture distinct column entries, then sort that list?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>>>> /Executive Strategy Consultant/ Business & Technology
>>>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Inline.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help
>>>>>>>>>>>>>> escreveu:
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This syntax provides row numbers, not column values.
>>>>>>>>>>>>>> This is not right.
>>>>>>>>>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax
>>>>>>>>>>>>>> Data[[1]] extracts the column vector.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for my previous answer, it was not addressing the
>>>>>>>>>>>>>> question, I misinterpreted it as being a question on how to
>>>>>>>>>>>>>> sort by numeric order when the data is not numeric. Here is
>>>>>>>>>>>>>> a, hopefully, complete answer.
>>>>>>>>>>>>>> Still with package stringr.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>>>>>>>>> stringr::str_sort(unique(x), numeric = TRUE)
>>>>>>>>>>>>>> })
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Or using Avi's suggestion of writing a function to do all
>>>>>>>>>>>>>> the work and simplify the lapply loop later,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> unisort2 <- function(vec, ...)
>>>>>>>>>>>>>> stringr::str_sort(unique(vec), ...)
>>>>>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Rui Barradas
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>>>>>> /Executive Strategy Consultant/ Business & Technology
>>>>>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Running a simple syntax set to review entries in dataframe
>>>>>>>>>>>>>>>> columns.
>>>>>>>>>>>>>>>> Here is the working code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>>>>>>>>>>>>> describe(Data)
>>>>>>>>>>>>>>>> summary(Data)
>>>>>>>>>>>>>>>> unique(Data[1])
>>>>>>>>>>>>>>>> unique(Data[2])
>>>>>>>>>>>>>>>> unique(Data[3])
>>>>>>>>>>>>>>>> unique(Data[4])
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would like to add sort the unique entries. The data in
>>>>>>>>>>>>>>>> the various columns are not defined as numbers, but also
>>>>>>>>>>>>>>>> text. I realize
>>>>>>>>>>>>>>>> 1 and
>>>>>>>>>>>>>>>> 10 will not sort properly, as the column is not defined as
>>>>>>>>>>>>>>>> a number, but want to see what I have in the columns
>>>>>>>>>>>>>>>> viewed as sorted.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> QUESTION
>>>>>>>>>>>>>>>> What is the best process to sort unique output, please?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>> ______________________________________________
>>>>>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and
>>>>>>>>>>>>>>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>>>>>> and provide commented, minimal, self-contained,
>>>>>>>>>>>>>>> reproducible code.
>>>>>>>>>>>>> ______________________________________________
>>>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>>>>>>>>>>> see
>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>>>>>> code.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list