# Change log ## 2.0.2 - Fix a bug with `read_naaccr_xml_plain` when a `record_format` is provided ## 2.0.0 - New functions - `read_naaccr_xml_plain` and `read_naaccr_xml` for reading files in NAACCR's XML formats - New data - `naaccr_formats`, a named list containing `record_format` objects for the official NAACCR formats - `naaccr_format_21`, `naaccr_format_22`, and `naaccr_format_23` for the NAACCR XML formats, versions 21, 22 and 23 - Possible breaking changes - New columns in `record_format` - `"parent"`: the field's parent XML node - `"width"`: field's length in characters - `"cleaner"`: function used to clean/standardize data when reading - `"unknown_finder"`: function to replace values with `NA` when reading - Deprecated columns in `record_format` - `"end_col"`: not used for XML files, and inferred from `"start_col"` and `"width"` for fixed-width files - Package data now compressed using the `"xz"` method, which requires R version 2 or higher ## 1.0.0 - Added `field_levels` and `field_levels_all` lists, which make it easy to see the possible values for factor fields. - All factor and sentinel levels are now easy to type on a standard U.S. QWERTY keyboard. - Repeatedly applying `naaccr_factor` or `naaccr_sentinel` (e.g., `naaccr_factor(naaccr_factor(...))`) is now a no-op. - Removed AJCC-copyright material. - Functions are safer: more argument checking, warnings, and option-robustness. - Fixed bugs with handling custom formats and columns not in the format. - Fixed bug with formats that included fields not found in text files ## 0.5.0 - Processed date and time fields now store their original text values in the `"original"` attribute. Partial dates and times are still useful. - New functions - `unknown_to_na`: Convert levels of NAACCR factor to `NA` if they mean the value is unknown. - `naaccr_encode`: Convert processed values back to their NAACCR text codes. - `as.record_format`: Create a custom record format from an object (this was incorrectly an internal function before). - `naaccr_date`, `naaccr_datetime`: Parse NAACCR-formatted dates and times. - `read_naaccr_plain`: Read a NAACCR file and divide it into fields, but don't do any more processing. - `split_sequence_number`: Divide the different pieces of information in a tumor sequence number into multiple columns (i.e., order, uniqueness, and invasiveness). - Revamped `read_naaccr` and `write_naaccr` - *Much* faster! - `read_naaccr` has more parameters to handle how a file is read: `skip`, `nrows`, and `encoding` like in `read.table`, and `buffersize` to ease the pain of reading large files. - `naaccr_record` and its kin now have a `format` parameter to use custom formats. - Treat behavior fields as coded factor, as they should've been - **Potentially breaking changes** - Cleaning functions (e.g., `clean_count`) no longer use a field name to retrieve cleaning parameters (e.g., field width). Those parameters now need to passed directly. - The `type` and `alignment` columns of `record_format` objects are now factors. - The standard NAACCR record format objects (`naaccr_format_12`, ...) are no longer lazy-loaded. They are now immediately loaded. - Improved documentation throughout. ## 0.4.0 - Major improvements to handling of field-specific codes ## 0.3.0 ### New features - New function: `write_naaccr` for writing `naaccr_record` data sets to text files according to one of the NAACCR formats. For now, it writes all unknown and missing values as blanks, not the standard codes. Still, `write_naaccr` and `read_naaccr` are inverse functions. - Fields with country codes are now converted to factors. ### Backend - Using package `ISOcodes` instead of `maps` for location code tables