[Rd] Proposal: Enhance as.character() for Consistency with as.numeric() Precision

龙华 |onghu@880 @end|ng |rom |oxm@||@com
Thu Sep 25 10:19:56 CEST 2025


Dear R Core Team and R-devel Community,


I hope this message finds you well. I am writing to propose an enhancement to the `as.character()` function in R's base package to address an inconsistency with `as.numeric()` when handling high-precision floating-point numbers. This issue has practical implications for code reliability, especially in scientific computing and data analysis, and I believe a small adjustment could align the behavior more closely with modern user expectations and R's evolving use cases.


Problem Description
The current behavior of `as.character()` and `as.numeric()` leads to logical inconsistencies when converting high-precision decimal strings. For example, consider the string `"7.999999999999999111822"` (22 significant digits):


- `as.numeric("7.999999999999999111822")` converts this to a double-precision floating-point number (per IEEE 754), which is stored as approximately `7.9999999999999991118` (verifiable with `print(x, digits = 20)`). The difference from 8 (`8 - x ≈ 8.88178e-16`) is slightly greater than half the machine epsilon (`0.5 * .Machine$double.eps ≈ 1.11e-16`), so it is not rounded to `8.0`.
- However, `as.character(as.numeric("7.999999999999999111822"))` returns `"8"`, simplifying the value and losing the small difference. This leads to a mismatch: `x < 8` is `TRUE`, but `as.numeric(as.character(x)) == 8` is also `TRUE`.


This inconsistency arises because `as.numeric()` preserves the precision of the IEEE 754 double (up to ~15-17 decimal digits), while `as.character()` defaults to a human-readable simplification, often rounding to the nearest integer when the difference is below its internal display threshold.


Proposed Solution
I suggest either of the following enhancements to improve consistency:


1. Swap the Functionality of `format()` and `as.character()`:
   - Redefine `as.character(x)` to inherit `format()`'s behavior, providing a default precision (e.g., `digits = 17`) to match the effective decimal precision of double-precision floats. This would output `"7.99999999999999911"` for the example above.
   - Redefine `format(x)` to inherit `as.character()`'s current behavior, serving as a utility for concise, human-readable output (e.g., `"8"`).
   - Naming would then align with intent: `as.character()` for type conversion with precision, `format()` for formatting adjustments.


2. Add a `digits` Parameter to `as.character()`:
   - Extend `as.character()` to accept a `digits` argument (defaulting to `NULL` for current behavior, or e.g., `17` for precision matching). Example:


     x <- as.numeric("7.999999999999999111822")
     as.character(x, digits = 17)  # "7.99999999999999911"
     as.character(x)               # "8" (current default)


   - This would allow users to opt for precise conversion while preserving backward compatibility.


Rationale
- Consistency: `as.numeric()` and `as.character()` are similarly named base functions, suggesting they should follow analogous precision rules. The current discrepancy violates the expectation of round-trip consistency (string → numeric → string).
- Modern Use Cases: With R's growing use in scientific computing and data science, high-precision handling is increasingly critical. The proposed change aligns R with tools like Python (`str(float(x))` retains more precision) and NumPy.
- User Experience: Explicit control via `digits` or a redefined `as.character()` would reduce confusion, especially for users relying on type conversion for logical operations.


Use Case
Consider a data validation script:


s1 <- "7.999999999999999111822"
x <- as.numeric(s1)
if (x < 8) print("Less than 8")  # TRUE, correct
if (as.numeric(as.character(x)) == 8) print("Equal to 8")  # TRUE, inconsistent


The second condition fails due to `as.character(x)` simplifying to `"8"`. With the proposed change (e.g., `as.character(x, digits = 17)`), both conditions would align with the stored value (`< 8`).


Implementation Considerations
- Backward Compatibility: Option 2 (adding `digits`) is less disruptive, allowing existing code to use the default `as.character()` behavior. Option 1 requires a transition period or deprecation notice.
- Performance: High-precision formatting may have minor overhead, but this is negligible for modern hardware.
- Documentation: Clear guidance on the new `digits` parameter or redefined roles would be essential.


Next Steps
I would be happy to assist with testing or drafting a patch if this proposal gains traction. Please let me know your thoughts or any additional considerations. This issue was identified with the help of Grok (xAI), and I believe community feedback could refine the approach.


Thank you for your time and the incredible work on R!


Best regards


龙华
longhua880 using foxmail.com
	[[alternative HTML version deleted]]



More information about the R-devel mailing list