[R] Can file size affect how na.strings operates in a read.table call?

Sebastien Bihorel Seb@@t|en@B|hore| @end|ng |rom cogn|gencorp@com
Thu Nov 14 16:57:10 CET 2019

The data file is a csv file. Some text variables contain spaces.

"Check for extraneous spaces"
Are there specific locations that would be more critical than others?

From: Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
Sent: Thursday, November 14, 2019 10:52
To: Sebastien Bihorel <Sebastien.Bihorel using cognigencorp.com>; Sebastien Bihorel via R-help <r-help using r-project.org>; r-help using r-project.org <r-help using r-project.org>
Subject: Re: [R] Can file size affect how na.strings operates in a read.table call?

Check for extraneous spaces. You may need more variations of the na.strings.

On November 14, 2019 7:40:42 AM PST, Sebastien Bihorel via R-help <r-help using r-project.org> wrote:
>I have this generic function to read ASCII data files. It is
>essentially a wrapper around the read.table function. My function is
>used in a large variety of situations and has no a priori knowledge
>about the data file it is asked to read. Nothing is known about file
>size, variable types, variable names, or data table dimensions.
>One argument of my function is na.strings which is passed down to
>Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by
>~ 160 columns) using na.strings = c('-99', '.') with the intention of
>interpreting '.' and '-99'
>strings as the internal missing data NA. Dots were converted to NA
>appropriately. However, not all -99 values in the data were interpreted
>as NA. In some variables, -99 were converted to NA, while in others -99
>was read as a number. More surprisingly, when the data file was cut in
>smaller chunks (ie, by dropping either rows or columns) saved in
>multiple files, the function calls applied on the new data files
>resulted in the correct conversion of the -99 values into NAs.
>In all cases, the data frames produced by read.table contained the
>expected number of records.
>While, on face value, it appears that file size affects how the
>na.strings argument operates, I wondering if there is something else at
>play here.
>Unfortunately, I cannot share the data file for confidentiality reason
>but was wondering if you could suggest some checks I could perform to
>get to the bottom on this issue.
>Thank you in advance for your help and sorry for the lack of
>reproducible example.
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>PLEASE do read the posting guide
>and provide commented, minimal, self-contained, reproducible code.

Sent from my phone. Please excuse my brevity.

	[[alternative HTML version deleted]]

More information about the R-help mailing list