[R] fread transforms numbers

Matt Dowle mattjdowle at gmail.com
Wed Mar 22 20:17:51 CET 2017


Thanks Bill for cc.

Santosh,

I'm almost certain you don't have package bit64 installed.  When you do it
works fine :

> remove.packages("bit64")
> data.table::fread("9876543210\n")
              V1
1: 4.879661e-314
> install.packages("bit64")
> data.table::fread("9876543210\n")
           V1
1: 9876543210

News for data.table v1.10.2 on CRAN 31 Jan 2017 contained :

* When fread() or print() see integer64 columns are present, bit64's
namespace is now automatically loaded for convenience.

However, when data.table loads the namespace there is a bug in this
function :

> data.table:::require_bit64
function ()
{
    tt = try(requireNamespace("bit64", quietly = TRUE))
    if (inherits(tt, "try-error"))
        warning("Some columns are type 'integer64' but package bit64 is not
installed. Those columns will print as strange looking floating point data.
There is no need to reload the data. Simply install.packages('bit64') to
obtain the integer64 print method and print the data again.")
}

The intent was to display that nice helpful message to you.   Due to this
report, I can see now that I shouldn't have wrapped requireNamespace() with
try() because  requireNamespace() returns TRUE or FALSE anyway. Even though
requireNamespace() prints 'Failed with error' it doesn't actually throw an
error.  I'll change data.table's function to the following :

if (!requireNamespace("bit64", quietly = TRUE))
    warning("Some columns ...")

bit64 is correctly Suggests not Depends.   It's just unfortunate the
intended message wasn't displayed.

Santosh, in future please follow the data.table support guide here:
https://github.com/Rdatatable/data.table/wiki/Support.  r-help is not
supposed to be used for package support.  The main thing though is thanks
for helping me find this bug.

Thanks,
Matt


On Wed, Mar 22, 2017 at 10:22 AM, William Dunlap <wdunlap at tibco.com> wrote:

> Here is a way to reproduce the problem:
>   > data.table::fread("9876543210\n") # number bigger than 2^31-1
>                 V1
>   1: 4.879661e-314
> and your work-around does fix things up
>   > data.table::fread("9876543210\n", colClasses="numeric")
>              V1
>   1: 9876543210
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Wed, Mar 22, 2017 at 9:58 AM, Jeff Newmiller
> <jdnewmil at dcn.davis.ca.us> wrote:
> > You failed to provide a reproducible example, and you posted HTML so the
> quality of any answer will be limited by the quality of your question.
> >
> > My stab at your problem is that you should read ?fread, and in
> particular should try using the colClasses argument.
> > --
> > Sent from my phone. Please excuse my brevity.
> >
> > On March 22, 2017 8:52:55 AM PDT, Santosh <santosh2005 at gmail.com> wrote:
> >>Hi
> >>
> >>I have been using "fread" utility of "data.table" packge .. on a
> >>dataset of
> >>about 20 million rows. It's a fantastic package to read datasets. Thank
> >>you, Matt D.
> >>
> >>However, I am faced with a peculiar instance of  certain numbers in a
> >>column being transformed.
> >>
> >>In the dataset, a column has values ranging from 1 to 9##########
> >>(nchar(x)=11, e.g. 98765432109). After using "fread" to read the
> >>dataset,
> >>values in all the columns are displayed correctly upto the first 1000
> >>rows.
> >>If "fread" is applied for reading >1000 rows of  the total of 20Million
> >>rows, the values in only this (column (having wide range of values) are
> >>displayed as x.xxxxxxxe-3yy. (e.g. 3.5639877e-324)
> >>
> >>I tried reading all the columns as "character" and didn't help.
> >>
> >>Would highly appreciate your assistance!
> >>
> >>Thanks so much in advance.
> >>
> >>Best regards,
> >>Santosh
> >>
> >>       [[alternative HTML version deleted]]
> >>
> >>______________________________________________
> >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >>and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list