[R] as.factor and floating point numbers
Tobias Fellinger
tobby @end|ng |rom htu@@t
Wed Jan 25 21:57:33 CET 2023
Hello,
I'll reply in one mail to all.
Thank you for your suggestions. I already tried Andrews solution with
increasing the digits. In the most extreme case I encountered I had to take
the maximum possible digits in format but it worked.
Tims solution is also a good workaround but in this case I would have to know
much about the user input.
Valentins solution works and is surely the safest of the options but somehow
more than I need. The case I encountered does not really need to deal with the
levels, but just with the counts of every unique value across another
variable.
After thinking about it a little bit longer I came up with another solution
that works alright for my purposes: I use table on the ranks. Since in the
case I encountered the vector does not have duplicates and is already sorted,
I can use table on the ranks of the vector and get the counts in the right
order.
Thanks Everyone, Tobias
On Mittwoch, 25. Jänner 2023 20:59:16 CET Valentin Petzel wrote:
> Hello Tobias,
>
> A factor is basically a way to get a character to behave like an integer. It
> consists of an integer with values from 1 to nlev, and a character vector
> levels, specifying for each value a level name.
>
> But this means that factors only really make sense with characters, and
> anything that is not a character will be forced to be a character. Thus two
> values that are represented by the same value in as.character will be
> treated as the same.
>
> Now this is probably reasonable most of the time, as numeric values will
> usually represent metric data, which tends to make little sense as factor.
> But if we want to do this we can easily build or own factors from floats,
> and even write some convenience wrapper around table, as shown in the
> appended file.
>
> Best regards,
> Valentin
>
> Am Mittwoch, 25. Jänner 2023, 10:03:01 CET schrieb Tobias Fellinger:
> > Hello,
> >
> > I'm encountering the following error:
> >
> > In a package for survival analysis I use a data.frame is created, one
> > column is created by applying unique on the event times while others are
> > created by running table on the event times and the treatment arm.
> >
> > When there are event times very close together they are put in the same
> > factor level when coerced to factor while unique outputs both values,
> > leading to different lengths of the columns.
> >
> > Try this to reproduce:
> > x <- c(1, 1+.Machine$double.eps)
> > unique(x)
> > table(x)
> >
> > Is there a general best practice to deal with such issues?
> >
> > Should calling table on floats be avoided in general?
> >
> > What can one use instead?
> >
> > One could easily iterate over the unique values and compare all values
> > with
> > the whole vector but this are N*N comparisons, compared to N*log(N) when
> > sorting first and taking into account that the vector is sorted.
> >
> > I think for my purposes I'll round to a hundredth of a day before calling
> > the function, but any advice on avoiding this issue an writing more fault
> > tolerant code is greatly appreciated.
> >
> > all the best, Tobias
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
More information about the R-help
mailing list