[R] Loading large .pxt and .asc datasets causes issues.

Federman, Douglas Douglas.Federman at utoledo.edu
Tue Feb 23 22:39:32 CET 2016


You might want to look at Anthony Damico's work at

http://www.asdfree.com/search/label/behavioral%20risk%20factor%20surveillance%20system%20%28brfss%29

--
Better name for the general practitioner might be multispecialist. 
~Martin H. Fischer (1879-1962)


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Torvon
Sent: Tuesday, February 23, 2016 2:13 PM
To: r-help at r-project.org
Subject: [R] Loading large .pxt and .asc datasets causes issues.

Hi,

I want to load a dataset into R. This dataset is available in two formats:
.XPT and .ASC. The dataset is available at http://www.cdc.gov/brfss/annual_data/annual_2006.htm.

They are about 40mb zipped, and about 500mb unzipped.

I can get the .xpt data to load, using:

> library(hmisc)
> data <- sasxport.get("CDBRFS06.XPT")

The data look fine, no error messages. However, the data only contains 302 columns, which is less than it should have (according to the documentation). It does not contain my variables of interest, so either the documentation or the data file is wrong, and I want to make sure it's not the data file.

Hence I wanted to see if I get the same results loading the .ASC file.
However, multiple ways to do so have failed.

> library(adehabitat)
> import.asc("CDBRFS06.asc")

Results in:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
: scan() expected 'a real', got '1191.8808943.38209868648.960119'

> library(SDMTools)
> read.asc("CDBRFS06.asc")

Results in:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
: scan() expected 'a real', got '1191.8808943.38209868648.960119' In
addition: Warning messages: 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : number of items read is not a multiple of the number of columns 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : number of items read is not a multiple of the number of columns 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : number of items read is not a multiple of the number of columns 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : number of items read is not a multiple of the number of columns 5: In scan(file, nmax = nl * nc, skip = 6, quiet = TRUE) : NAs introduced by coercion to integer range

Thank you for your help.
   Eiko

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list