[R] read.table and NaN

Sebastien Bihorel Seb@@t|en@B|hore| @end|ng |rom cogn|gencorp@com
Fri Oct 25 22:12:27 CEST 2019


My bad, Bert 😉

My point is that my function/framework has very minimal expectations about the source data (mostly, that it is a rectangular shape table of data separated by some separator) and does not have any a-priori knowledge about what the first, second, etc columns in the data files must contain.... so while it would be possible to pass down some class vector which would be passed down as the colClasses argument to read.table, it is not necessarily reasonable in the context of the overall framework.

I guess I was surprised that read.table interprets NaN in an input file as the internal "Not a number" rather than as a string... there is nothing in the ?read.table about that.

Anyways, as I said, I need to think more about this in the context of the framework where this function operates...

Thanks for the input


________________________________
From: Bert Gunter <bgunter.4567 using gmail.com>
Sent: Thursday, October 24, 2019 10:39
To: Sebastien Bihorel <Sebastien.Bihorel using cognigencorp.com>
Cc: r-help using r-project.org <r-help using r-project.org>
Subject: Re: [R] read.table and NaN

Not so. Read ?read.table carefully. You can use "NA" as a default. Moreover, you **specified** that you want NaN read as character, which means that any column containing NaN **must** be character. That's part of the specification for data frames (all columns must be one data type). So either change your specfication or change your data structure.

And, incidentally, my first name is "Bert" .

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Oct 24, 2019 at 6:43 AM Sebastien Bihorel <Sebastien.Bihorel using cognigencorp.com<mailto:Sebastien.Bihorel using cognigencorp.com>> wrote:
Thanks Gunter

It seems that one has to know the structure of the data and adapt the read.table call accordingly. I am working on a framework that is meant to process data files with unknown structure, so I have to think a bit more about that...
________________________________
From: Bert Gunter <bgunter.4567 using gmail.com<mailto:bgunter.4567 using gmail.com>>
Sent: Thursday, October 24, 2019 00:08
To: Sebastien Bihorel <Sebastien.Bihorel using cognigencorp.com<mailto:Sebastien.Bihorel using cognigencorp.com>>
Cc: r-help using r-project.org<mailto:r-help using r-project.org> <r-help using r-project.org<mailto:r-help using r-project.org>>
Subject: Re: [R] read.table and NaN

Like this?

con <- textConnection(object = 'A,B\n1,NaN\nNA,2')
> tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', stringsAsFactors = FALSE,
+                   colClasses = c("numeric", "character"))
> close.connection(con)
> tmp
   A   B
1  1 NaN
2 NA   2
> class(tmp[,1])
[1] "numeric"
> class(tmp[,2])
[1] "character"
> tmp[,2]
[1] "NaN" "2"


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Oct 23, 2019 at 6:31 PM Sebastien Bihorel via R-help <r-help using r-project.org<mailto:r-help using r-project.org>> wrote:
Hi,

Is there a way to make read.table consider NaN as a string of characters rather than the internal NaN? Changing the na.strings argument does not seems to have any effect on how R interprets the NaN string (while is does not the the NA string)

con <- textConnection(object = 'A,B\n1,NaN\nNA,2')
tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', stringsAsFactors = FALSE)
close.connection(con)
tmp
class(tmp[,1])
class(tmp[,2])


______________________________________________
R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list