[R] Does function read.sas7bdat() have some memory limitations?
xiaochun at iupui.edu
Thu Nov 21 15:45:57 CET 2013
Thank you, Peter. I'll generate a self-contained example and contact the maintainer.
From: peter dalgaard [mailto:pdalgd at gmail.com]
Sent: Thursday, November 21, 2013 9:19 AM
To: Li, Xiaochun
Cc: r-help at R-project.org
Subject: Re: [R] Does function read.sas7bdat() have some memory limitations?
This certainly looks like a bug, and there are many ways of inducing bugs that only show up with large datasets - buffer overruns, fields that are too small to hold the number of rows, etc. Remember that there is NO official documentation of the .sas7bdat format, everything has been reverse engineered, and if something in the format is different for very large datasets, it may well have gone unnoticed.
However, read.sas7bdat is from the sas7bdat package which has a maintainer. It is not unlikely that he is interested in tracking down the root cause, if you show him how to generate SAS datasets that reproduce the issue.
On 19 Nov 2013, at 22:40 , Li, Xiaochun <xiaochun at iupui.edu> wrote:
> Dear R-ers,
> I was trying to read in a large sas7bdat file (size 148094976 bytes) using 'read.sas7bdat()', but it did not read in the data correctly. E.g., the first 5 rows will come out like this (I'm omitting other columns to keep it readable):
> PERSON_ID age
> 1 5.399114e-315 5.329436e-315
> 2 5.399114e-315 5.328302e-315
> 3 5.399114e-315 5.332026e-315
> 4 5.399114e-315 5.329112e-315
> 5 5.399114e-315 5.331055e-315
> If I reduced the original sas dataset to the first 5 rows, 'read.sas7bdat' read them in correctly:
> PERSON_ID age
> 1 612569 55
> 2 612571 48
> 3 612580 78
> 4 612606 53
> 5 612617 66
> So for now I first saved the sas dataset as .csv, then read using 'read.csv', everything is fine.
> Any suggestion why 'read.sas7bdat' didn't work, and if some fix in its code can make it work?
> Thank you.
> Xiaochun Li, Ph.D.
> Department of Biostatistics
> Indiana University
> School of Medicine and
> Richard M. Fairbanks School of Public Health
> R-help at r-project.org mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help