[R] Does function read.sas7bdat() have some memory limitations?

Li, Xiaochun xiaochun at iupui.edu
Thu Nov 21 15:45:57 CET 2013


Thank you, Peter.  I'll generate a self-contained example and contact the maintainer.

Xiaochun

-----Original Message-----
From: peter dalgaard [mailto:pdalgd at gmail.com] 
Sent: Thursday, November 21, 2013 9:19 AM
To: Li, Xiaochun
Cc: r-help at R-project.org
Subject: Re: [R] Does function read.sas7bdat() have some memory limitations?

This certainly looks like a bug, and there are many ways of inducing bugs that only show up with large datasets - buffer overruns, fields that are too small to hold the number of rows, etc. Remember that there is NO official documentation of the .sas7bdat format, everything has been reverse engineered, and if something in the format is different for very large datasets, it may well have gone unnoticed.

However, read.sas7bdat is from the sas7bdat package which has a maintainer.  It is not unlikely that he is interested in tracking down the root cause, if you show him how to generate SAS datasets that reproduce the issue.

Best,
Peter D.

On 19 Nov 2013, at 22:40 , Li, Xiaochun <xiaochun at iupui.edu> wrote:

> Dear R-ers,
> 
> I was trying to read in a large sas7bdat file (size 148094976 bytes) using 'read.sas7bdat()', but it did not read in the data correctly.  E.g., the first 5 rows will come out like this (I'm omitting other columns to keep it readable):
> 
>       PERSON_ID           age
> 1  5.399114e-315 5.329436e-315
> 2  5.399114e-315 5.328302e-315
> 3  5.399114e-315 5.332026e-315
> 4  5.399114e-315 5.329112e-315
> 5  5.399114e-315 5.331055e-315
> 
> If I reduced the original sas dataset to the first 5 rows, 'read.sas7bdat' read them in correctly:
> 
>  PERSON_ID age
> 1    612569  55
> 2    612571  48
> 3    612580  78
> 4    612606  53
> 5    612617  66
> 
> So for now I first saved the sas dataset as .csv, then read using 'read.csv', everything is fine.  
> 
> Any suggestion why 'read.sas7bdat' didn't work, and if some fix in its code can make it work?
> 
> Thank  you.
> _____________________________
> Xiaochun Li, Ph.D. 
> Department of Biostatistics
> Indiana University
> School of Medicine and
> Richard M. Fairbanks School of Public Health
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list