[R] categorizing data

David Carlson dc@r|@on @end|ng |rom t@mu@edu
Mon May 30 06:33:29 CEST 2022

```Here is one way to get the table you are describing. First some made up data:

dta <- structure(list(tree = c(27, 47, 33, 31, 45, 54, 47, 27, 33, 26,
14, 43, 36, 0, 29, 24, 43, 38, 32, 21, 21, 23, 12, 42, 34), shrub = c(19,
29, 27, 31, 5, 24, 6, 37, 4, 6, 59, 7, 23, 15, 32, 1, 31, 37,
30, 44, 40, 10, 28, 23, 32), grass = c(44, 14, 30, 28, 40, 12,
37, 26, 53, 58, 17, 40, 31, 75, 29, 65, 16, 15, 28, 25, 29, 57,
50, 25, 24)), class = "data.frame", row.names = c(NA, -25L))

rnks <- data.frame(t(apply(dta, 1, rank, ties.method="first")))
rnks <- sapply(rnks, factor, labels=c("Low", "Med", "High"))
tree   shrub  grass
[1,] "Med"  "Low"  "High"
[2,] "High" "Med"  "Low"
[3,] "High" "Low"  "Med"
[4,] "Med"  "High" "Low"
[5,] "High" "Low"  "Med"
[6,] "High" "Med"  "Low"

table(apply(rnks, 1, paste, collapse="/"))

High/Low/Med High/Med/Low Low/High/Med Low/Med/High Med/High/Low Med/Low/High
6            6            4            2            2            5

David L Carlson
Texas A&M University

On Sun, May 29, 2022 at 5:08 PM Roy Mendelssohn - NOAA Federal via
R-help <r-help using r-project.org> wrote:
>
> Hi Janet: here is a start to give you the idea, now you need loop either use a "for" or one of the apply functions. 1. Preallocate new data (i am lazy so it is array, for example of size three. 2. order the data and set values. junk <- array(0,
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
>
> Hi Janet:
>
> here is a start to give you the idea,  now you need  loop either use a "for" or one of the apply functions.
>
> 1.  Preallocate new data  (i am lazy so it is array, for example of size three.
>
> 2.  order the data and set values.
>
> junk <- array(0, dim = c(2,3))
> values <- c(10, 30, 50)
> junk[1, order(c(32, 11, 17))] <- values
> junk[1, ]
> [1] 50 10 30
>
>
> This works because order() returns the index of the ordering, not the values.
>
> HTH,
>
> -Roy
> > On May 29, 2022, at 1:31 PM, Janet Choate <jsc.eco using gmail.com> wrote:
> >
> > I'm sorry if this has come across as a homework assignment!I was trying to
> > provide a simple example.
> > There are actually 38323 rows of data, each row is an observation of the
> > percent that each of those veg types occupies in a spatial unit - where
> > each line adds to 90 - and values are different every line.
> > I need a way to categorize the data, so I can reduce the number of unique
> > observations.
> >
> > So instead of 38323 unique observations - I can reduce this to
> > X number of High/Med/Low
> > X number of Med/Low/High
> > X number of Low/High/Med
> > etc... for all combinations
> >
> > I hope this makes it more clear......
> > thank you all for your responses,
> > JC
> >
> > On Sun, May 29, 2022 at 1:16 PM Avi Gross via R-help <r-help using r-project.org>
> > wrote:
> >
> >> Tom,
> >> You may have a very different impression of what was asked! LOL!
> >> Unless Janet clarifies what seems a bit like a homework assignment, it
> >> seems to be a fairly simple and straightforward assignment with exactly
> >> three rows/columns and asking how to replace the variables, in a sense, by
> >> finding the high and low and perhaps thus identifying the medium, but to do
> >> this for each row without changing the order of the resulting data.frame.
> >> I note most techniques people have used focus on columns, not rows, but an
> >> all-numeric data.frame can be transposed, or converted to a matrix and
> >> later converted back.
> >> If this is HW, the question becomes what has been taught so far and is
> >> supposed to be used in solving it. Can they make their own functions
> >> perhaps to be called three times, once per row or column, to replace that
> >> row/column, or can they use some form of loop to iterate over the columns?
> >> Does it need to sort of be done in place or can they create gradually a
> >> second data.frame and then move the pointer to it and lots of other similar
> >> ideas.
> >> I am not sure, other than as a HW assignment, why this transformation
> >> would need to be done but of course, there may well be a reason.
> >> I note that the particular example shown just happens to create almost a
> >> magic square as the sum of rows and columns and the major diagonal happen
> >> to be 0, albeit the reverse diagonal is all 50's.
> >> Again, there are many solutions imaginable but the goal may be more
> >> specific and I shudder to supply one given that too often questions here
> >> are not detailed enough and are misunderstood. In this case, I thought I
> >> understood until I saw what Tom wrote! LOL!
> >> I will add this. Is it guaranteed that no two items in the same row are
> >> never equal or is there some requirement for how to handle a tie? And note
> >> there are base R functions called min() and max() and you can ask for
> >> things like:
> >>
> >> if ( current == min(mydata[1,])) ...
> >>
> >>
> >> -----Original Message-----
> >> From: Tom Woolman <twoolman using ontargettek.com>
> >> To: Janet Choate <jsc.eco using gmail.com>
> >> Cc: r-help using r-project.org
> >> Sent: Sun, May 29, 2022 3:42 pm
> >> Subject: Re: [R] categorizing data
> >>
> >>
> >> Some ideas:
> >>
> >> You could create a cluster model with k=3 for each of the 3 variables,
> >> to determine what constitutes high/medium/low centroid values for each
> >> of the 3 types of plant types. Centroid values could then be used as the
> >> upper/lower boundary ranges for high/med/low.
> >>
> >> Or utilize a histogram for each variable, and use quantiles or
> >> densities, etc. to determine the natural breaks for the high/med/low
> >> ranges for each of the IVs.
> >>
> >>
> >>
> >>
> >> On 2022-05-29 15:28, Janet Choate wrote:
> >>> Hi R community,
> >>> I have a data frame with three variables, where each row adds up to 90.
> >>> I want to assign a category of low, medium, or high to the values in
> >>> each
> >>> row - where the lowest value per row will be set to 10, the medium
> >>> value
> >>> set to 30, and the high value set to 50 - so each row still adds up to
> >>> 90.
> >>>
> >>> For example:
> >>> Data: Orig
> >>> tree  shrub  grass
> >>> 32    11      47
> >>> 23      41      26
> >>> 49      23      18
> >>>
> >>> Data: New
> >>> tree  shrub  grass
> >>> 30      10      50
> >>> 10      50    30
> >>> 50      30    10
> >>>
> >>> I am not attaching any code here as I have not been able to write
> >>> anything
> >>> effective! appreciate help with this!
> >>> thank you,
> >>> JC
> >>>
> >>> --
> >>>
> >>>    [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mtlced2hw\$
> >>> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mtYaXmkSg\$
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mtlced2hw\$
> >> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mtYaXmkSg\$
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mtlced2hw\$
> >> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mtYaXmkSg\$
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> > --
> > Tague Team Lab Manager
> > 1005 Bren Hall
> > UCSB, Santa Barbara, CA.
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mtlced2hw\$
> > and provide commented, minimal, self-contained, reproducible code.
>
> **********************
> "The contents of this message do not reflect any position of the U.S. Government or NOAA."
> **********************
> Roy Mendelssohn
> Supervisory Operations Research Analyst
> NOAA/NMFS
> Environmental Research Division
> Southwest Fisheries Science Center
> 110 McAllister Way
> Santa Cruz, CA 95060
> Phone: (831)-420-3666
> Fax: (831) 420-3980
> e-mail: Roy.Mendelssohn using noaa.gov www: https://urldefense.com/v3/__https://www.pfeg.noaa.gov/__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mt97kpYnw\$
>
> "Old age and treachery will overcome youth and skill."
> "From those who have been given much, much will be expected"
> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!CAeqf6N-NIznWH-eQrsKxFSQmM4-I8-TJ08e524GLIOOUhUXzev4CTPMCYOcOsFORucb3hkc4mtlced2hw\$