[R] categorizing data

Janet Choate j@c@eco @end|ng |rom gm@||@com
Sun May 29 22:31:45 CEST 2022


I'm sorry if this has come across as a homework assignment!I was trying to
provide a simple example.
There are actually 38323 rows of data, each row is an observation of the
percent that each of those veg types occupies in a spatial unit - where
each line adds to 90 - and values are different every line.
I need a way to categorize the data, so I can reduce the number of unique
observations.

So instead of 38323 unique observations - I can reduce this to
X number of High/Med/Low
X number of Med/Low/High
X number of Low/High/Med
etc... for all combinations

I hope this makes it more clear......
thank you all for your responses,
JC

On Sun, May 29, 2022 at 1:16 PM Avi Gross via R-help <r-help using r-project.org>
wrote:

> Tom,
> You may have a very different impression of what was asked! LOL!
> Unless Janet clarifies what seems a bit like a homework assignment, it
> seems to be a fairly simple and straightforward assignment with exactly
> three rows/columns and asking how to replace the variables, in a sense, by
> finding the high and low and perhaps thus identifying the medium, but to do
> this for each row without changing the order of the resulting data.frame.
> I note most techniques people have used focus on columns, not rows, but an
> all-numeric data.frame can be transposed, or converted to a matrix and
> later converted back.
> If this is HW, the question becomes what has been taught so far and is
> supposed to be used in solving it. Can they make their own functions
> perhaps to be called three times, once per row or column, to replace that
> row/column, or can they use some form of loop to iterate over the columns?
> Does it need to sort of be done in place or can they create gradually a
> second data.frame and then move the pointer to it and lots of other similar
> ideas.
> I am not sure, other than as a HW assignment, why this transformation
> would need to be done but of course, there may well be a reason.
> I note that the particular example shown just happens to create almost a
> magic square as the sum of rows and columns and the major diagonal happen
> to be 0, albeit the reverse diagonal is all 50's.
> Again, there are many solutions imaginable but the goal may be more
> specific and I shudder to supply one given that too often questions here
> are not detailed enough and are misunderstood. In this case, I thought I
> understood until I saw what Tom wrote! LOL!
> I will add this. Is it guaranteed that no two items in the same row are
> never equal or is there some requirement for how to handle a tie? And note
> there are base R functions called min() and max() and you can ask for
> things like:
>
> if ( current == min(mydata[1,])) ...
>
>
> -----Original Message-----
> From: Tom Woolman <twoolman using ontargettek.com>
> To: Janet Choate <jsc.eco using gmail.com>
> Cc: r-help using r-project.org
> Sent: Sun, May 29, 2022 3:42 pm
> Subject: Re: [R] categorizing data
>
>
> Some ideas:
>
> You could create a cluster model with k=3 for each of the 3 variables,
> to determine what constitutes high/medium/low centroid values for each
> of the 3 types of plant types. Centroid values could then be used as the
> upper/lower boundary ranges for high/med/low.
>
> Or utilize a histogram for each variable, and use quantiles or
> densities, etc. to determine the natural breaks for the high/med/low
> ranges for each of the IVs.
>
>
>
>
> On 2022-05-29 15:28, Janet Choate wrote:
> > Hi R community,
> > I have a data frame with three variables, where each row adds up to 90.
> > I want to assign a category of low, medium, or high to the values in
> > each
> > row - where the lowest value per row will be set to 10, the medium
> > value
> > set to 30, and the high value set to 50 - so each row still adds up to
> > 90.
> >
> > For example:
> > Data: Orig
> > tree  shrub  grass
> > 32    11      47
> > 23      41      26
> > 49      23      18
> >
> > Data: New
> > tree  shrub  grass
> > 30      10      50
> > 10      50    30
> > 50      30    10
> >
> > I am not attaching any code here as I have not been able to write
> > anything
> > effective! appreciate help with this!
> > thank you,
> > JC
> >
> > --
> >
> >     [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Tague Team Lab Manager
1005 Bren Hall
UCSB, Santa Barbara, CA.

	[[alternative HTML version deleted]]



More information about the R-help mailing list