[R] Getting minimum value of a column according a factor column of a dataframe

Thu Aug 25 18:26:19 CEST 2022

Rui wrote:

<<<<
Yes, I agree with you.
In the raw data the Code is already ordered so it doesn't matter if it is not addressed in the aggregation code. But from the OP's last post I conclude (wrongly?) that in the original data this need not be the case. 
Hence the double order() in my last post. I wonder (a) if it's really necessary and (b) if it would make a difference if the aggregated output is sorted.
>>>>

My reply (truncated but in earlier messages) has to do with the nature of ordering. We were given what looks like a brief version of some data showing 4 unique values for a column contained in "Code" where the data happened to be in complete order showing several rows with values 41003, then several with 41005 then two more in what looked like numerical ascending order.

My personal guess is that if the Code column is a factor, it is left alone, otherwise it is made into a factor and the order of the factor may not be what you want. I did an experiment of changing one factor to a higher number, and it was moved to the last position in the report even though it was near the top. The default ordering is thus going to result in a summary that may not match what is wanted UNLESS you make adjustments.

ONE METHOD I can think of is to take what I call mydf (the data.frame or tibble you read in) and ask for the unique values of Code like so before making it into a factor:

my_order <- unique(mydf$Code)

I had inserted an 88888 into the data in second position so the result is this:

> my_order
[1] 41003 88888 41005 41009 41017

The order has been preserved and I can use it to make the order in the factors be the same by using this:

mydf$Code <- factor(mydf$Code, levels=my_order)

It now produces output in that order in the full code shown below:

   Code  minQ
1 41003 0.160
2 88888 0.160
3 41005 0.210
4 41009 0.218
5 41017 0.240

I quickly mention  a second method where you reorder the rows of the output to whatever order you want. I do not show or do that.

NOTE: the code below shows a changed set of data from the original containing an 88888 in row 2 for illustration:
-----CODE-----

# Load required libraries

library(dplyr)

# Simulate reading in from a file by using in-line text version

file_contents <- 'Code Y M D Q N O
41003 81 1 19 0.16 7.17 2.5
88888 81 1 19 0.16 7.17 2.5
41003 77 9 22 0.197 6.8 2.2
41003 79 7 28 0.21 4.7 6.2
41005 79 8 17 0.21 5.5 7.2
41005 80 10 30 0.21 6.84 2.6
41005 80 12 20 0.21 6.84 2.4
41005 79 6 14 0.217 5.61 3.55
41009 79 2 21 0.218 5.56 4.04
41009 79 5 27 0.218 6.4 3.12
41009 80 11 29 0.22 6.84 2.8
41009 78 5 28 0.232 6 3.2
41009 81 8 20 0.233 6.39 1.6
41009 79 9 30 0.24 5.6 7.5
41017 79 10 20 0.24 5.3 7.1
41017 80 7 30 0.24 6.73 2.6'

mydf <- read.table(text=file_contents, header=TRUE)

# Make a factor of column Code in the same order as
# the numbers are first introduced in rows.

my_order <- unique(mydf$Code)

mydf$Code <- factor(mydf$Code, levels=my_order)

# Group the results and provide a summary of anything you
# want calculated by group:

mydf %>%
  group_by(Code) %>%
  summarize(minQ=min(Q)) %>% 
  as.data.frame