[R] Adding SORT to UNIQUE

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Wed Dec 22 17:59:35 CET 2021


Stephen,

Understanding a bit better where you are coming from, I come back to how people think about things. Languages like R often focus on doing things incrementally. I don't mean the language exactly as much as many of the people using the language.

So it is perfectly normal to make multiple versions of something as you go along and let older versions no longer in use be garbage collected if needed.

Your latest question was how to display your data a certain way. You want it written down the screen (or paper) rather than across. Duncan provided you with a few of the many ways to do that. Unless you are working with giant amounts of data already using up most of your available memory, it is fairly harmless to make some temporary copies that get you what you want.

What you may not have known is that some of the systems in R use a concept of generic functions. In particular, when you use print() or just put a variable name on a line by itself which normally implicitly calls print() for you, it does not just magically print but examines what you asked to print and does a lookup to see  how to print it. My setup currently has 303 such methods defined with names like print.factor and print.Date and print.data.frame that are given control to print each kind of object the way you want. So one way to change how something is printed is to make it into an object for some class and let the system then print it. Of course, you can also design a new class of your own and make a print method for it and I suspect someone has done what you want in some package.

Changing the class of an existing object, even a large object, is fairly inexpensive. Other attributes can also control things like "dim" specifying dimensions. Say I have a vector containing 1:5 that I want to print vertically.

> vec <- 1:5

  > vec
  [1] 1 2 3 4 5

You can see it normally prints horizontally. Transposing it might sound like a good way to go except R vectors generally do not have the concept. Transposing a vector will make a matrix which prints a biut differently but is still horizontal, but a second transpose works beter:

  > class(t(vec))
  [1] "matrix" "array" 
 
 > t(vec)
  [,1] [,2] [,3] [,4] [,5]
  [1,]    1    2    3    4    5
  
> t(t(vec))
  [,1]
  [1,]    1
  [2,]    2
  [3,]    3
  [4,]    4
  [5,]    5

Of course, you can probably as easily make it a matrix:

  > as.matrix(vec)
  [,1]
  [1,]    1
  [2,]    2
  [3,]    3
  [4,]    4
  [5,]    5
  
  > matrix(vec, ncol=1)
  [,1]
  [1,]    1
  [2,]    2
  [3,]    3
  [4,]    4
  [5,]    5

The above made a copy, of course.

You can change the original into a matrix by just changing an attribute:

  > dim(vec) <- c(length(vec), 1)

  > vec
  [,1]
  [1,]    1
  [2,]    2
  [3,]    3
  [4,]    4
  [5,]    5

  > attributes(vec)
  $dim
  [1] 5 1
  
  > class(vec)
  [1] "matrix" "array"

BUT you need to be careful as in your earlier experience. Some places that accept a vector will not accept a 1-column or 1-row matrix, or a data.frame with one column or just one row. Best to be careful about mixing.

So look again at what Duncan sent and some are quite nice. You can speficically use the cat() command instead of a default print and it has added functionality. Various packages exist including some that do various kinds of pretty printing.

He left out one of the simplest ones, which is simply to write your own print routine such as this loop:

Here you define a trivial one-line function that calls print() multiple times to make your output vertical:

vertprint <- function(horiz) for (item in horiz) print(item)

for (item in horiz) print(item)

  [1] 1
  [1] 2
  [1] 3
  [1] 4
  [1] 5

Obviously if you are printing huge amounts of data, this is not necessarily any more efficient. But it does not necessarily make many copies of your data if that bothers you.

May I end with a suggestion. It can be fun to start a discussion in a place like this but it can also be a waste of time for many people, especially those who provide longer answers and do some experimenting to illustrate. Often a simple search like the following can rapidly get you an answer before feeling the need to ask here. I did a simple search just now for what I assumed was a very frequent question:

"R how to print data vertically"

I looked at a few of the answers and noted a few other suggestions with one similar but different:

cat(paste(x),sep="\n")

And of course various packages that implemented something like print.vertical().

Your earlier statement suggests you may be interested in what is the canonical or best way and by now, you may note there are very often MANY ways and some programmers prefer one or another. And, I note, after enough questions of a fairly basic or even naïve nature, some responders in these groups stop responding for some reason.

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Stephen H. Dawson, DSL via R-help
Sent: Wednesday, December 22, 2021 10:55 AM
To: Duncan Murdoch <murdoch.duncan using gmail.com>; Rui Barradas <ruipbarradas using sapo.pt>; Stephen H. Dawson, DSL via R-help <r-help using r-project.org>
Subject: Re: [R] Adding SORT to UNIQUE

I see.

So, we are talking taking the output into a new dataframe. I was hoping to have the output rendered on screen without another dataframe, but I can live with this option it if must occur.

Am I correct the desired vertical output must first go to a dataframe?


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/22/21 10:47 AM, Duncan Murdoch wrote:
> On 22/12/2021 10:20 a.m., Stephen H. Dawson, DSL wrote:
>> Thanks for the reply.
>>
>> Both syntax options work to render the correct (unique) output. 
>> However, the output is rendered as horizontal. What needs to happen 
>> to get the output to render vertical, please?
>
> The result of those expressions is a vector of the same type as the 
> column, so your question is really about how to get a vector to print 
> one element per line.
>
> Probably the simplest way is to put the vector in a dataframe (or 
> matrix, or tibble, depending on which formatting you prefer).  For 
> example,
>
> >   v <- c("red", "green", "blue")
> >   data.frame(v)
>       v
> 1   red
> 2 green
> 3  blue
>
> If you want a more minimal display, try
>
> > cat(v, sep = "\n")
> red
> green
> blue
>
> or
>
> > cat(format(v, justify = "right"), sep = "\n")
>   red
> green
>  blue
>
> If you want this to happen when you auto-print the object, you can 
> give it a class attribute and write a function to print that class, e.g.
>
> >  class(v) <- "oneperline"
> >
> >   print.oneperline <- function(x, ...) cat(format(x, justify =
> "right"), sep = "\n")
> >
> >   v
>   red
> green
>  blue
>
> Duncan Murdoch
>
>>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com <http://www.shdawson.com>
>>
>>
>> On 12/21/21 11:38 AM, Duncan Murdoch wrote:
>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>> Thanks for the reply.
>>>>>
>>>>> sort(unique(Data[1]))
>>>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing 
>>>>> =
>>>>> decreasing)) :
>>>>>       undefined columns selected
>>>>
>>>> That's the wrong syntax:  Data[1] is not "column one of Data". Use 
>>>> Data[[1]] for that, so
>>>>
>>>>      sort(unique(Data[[1]]))
>>>
>>> Actually, I'd probably recommend
>>>
>>>    sort(unique(Data[, 1]))
>>>
>>> instead.  This treats Data as a matrix rather than as a list.
>>> Dataframes are lists that look like matrices, but to me the matrix 
>>> aspect is usually more intuitive.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> I think Rui already pointed out the typo in the quoted text below...
>>>>
>>>> Duncan Murdoch
>>>>
>>>>>
>>>>> The recommended syntax did not work, as listed above.
>>>>>
>>>>> What I want is the sort of distinct column output. Again, the 
>>>>> column may be text or numbers. This is a huge analysis effort with 
>>>>> data coming at me from many different sources.
>>>>>
>>>>>
>>>>> *Stephen Dawson, DSL*
>>>>> /Executive Strategy Consultant/
>>>>> Business & Technology
>>>>> +1 (865) 804-3454
>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>
>>>>>
>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>>>>>>> Thanks everyone for the replies.
>>>>>>>
>>>>>>> It is clear one either needs to write a function or put the 
>>>>>>> unique entries into another dataframe.
>>>>>>>
>>>>>>> It seems odd R cannot sort a list of unique column entries with 
>>>>>>> ease.
>>>>>>> Python and SQL can do it with ease.
>>>>>>
>>>>>> I've seen several responses that looked pretty simple. It's hard 
>>>>>> to beat sort(unique(x)), though there's a fair bit of confusion 
>>>>>> about what you actually want.  Maybe you should post an example 
>>>>>> of the code you'd use in Python?
>>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>>>
>>>>>>> QUESTION
>>>>>>> Is there a simpler means than other than the unique function to 
>>>>>>> capture distinct column entries, then sort that list?
>>>>>>>
>>>>>>>
>>>>>>> *Stephen Dawson, DSL*
>>>>>>> /Executive Strategy Consultant/
>>>>>>> Business & Technology
>>>>>>> +1 (865) 804-3454
>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>
>>>>>>>
>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Inline.
>>>>>>>>
>>>>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>
>>>>>>>>> This syntax provides row numbers, not column values.
>>>>>>>>
>>>>>>>> This is not right.
>>>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax 
>>>>>>>> Data[[1]] extracts the column vector.
>>>>>>>>
>>>>>>>> As for my previous answer, it was not addressing the question, 
>>>>>>>> I misinterpreted it as being a question on how to sort by 
>>>>>>>> numeric order when the data is not numeric. Here is a, 
>>>>>>>> hopefully, complete answer.
>>>>>>>> Still with package stringr.
>>>>>>>>
>>>>>>>>
>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>
>>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>>>       stringr::str_sort(unique(x), numeric = TRUE)
>>>>>>>> })
>>>>>>>>
>>>>>>>>
>>>>>>>> Or using Avi's suggestion of writing a function to do all the 
>>>>>>>> work and simplify the lapply loop later,
>>>>>>>>
>>>>>>>>
>>>>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), 
>>>>>>>> ...)
>>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>>>>>>
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>>
>>>>>>>> Rui Barradas
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>> /Executive Strategy Consultant/ Business & Technology
>>>>>>>>> +1 (865) 804-3454
>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Running a simple syntax set to review entries in dataframe 
>>>>>>>>>> columns.
>>>>>>>>>> Here is the working code.
>>>>>>>>>>
>>>>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>>>>>>> describe(Data)
>>>>>>>>>> summary(Data)
>>>>>>>>>> unique(Data[1])
>>>>>>>>>> unique(Data[2])
>>>>>>>>>> unique(Data[3])
>>>>>>>>>> unique(Data[4])
>>>>>>>>>>
>>>>>>>>>> I would like to add sort the unique entries. The data in the 
>>>>>>>>>> various columns are not defined as numbers, but also text. I 
>>>>>>>>>> realize
>>>>>>>>>> 1 and
>>>>>>>>>> 10 will not sort properly, as the column is not defined as a 
>>>>>>>>>> number, but want to see what I have in the columns viewed as 
>>>>>>>>>> sorted.
>>>>>>>>>>
>>>>>>>>>> QUESTION
>>>>>>>>>> What is the best process to sort unique output, please?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, 
>>>>>>>>> see https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide 
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible 
>>>>>>>>> code.
>>>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, 
>>>>>>> see https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide 
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list