[R] Getting codebook data into R

Chris Stubben stubben at lanl.gov
Tue Feb 14 05:05:46 CET 2012


Just to follow up on Dan's code - once you have a data.frame listing column
positions, then it's just a couple steps to download the file...

x <- data.frame(name=c('caseid', 'nbrnaliv', 'babysex',
'birthwgt_lb','birthwgt_oz','prglength',
'outcome', 'birthord',  'agepreg',  'finalwgt'),
begin = c(1, 22, 56, 57, 59, 275, 277, 278, 284, 423),
end =  c(12, 22, 56, 58, 60, 276, 277, 279, 287, 440)
)


x$width <- x$end - x$begin + 1
x$skip <-  (-c(x$begin[-1]-x$end[-nrow(x)]-1,0))

widths <- c(t(x[,4:5]))
widths <- widths[widths!=0]

ftp<-
"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NSFG/2002FemPreg.dat"
# drop the n=10 option to get all lines
y<- read.fwf(ftp, widths, n=10)
names(y) <- x$name
y
   caseid nbrnaliv babysex birthwgt_lb birthwgt_oz prglength outcome
birthord agepreg  finalwgt
1       1        1       1           8          13        39       1       
1    3316  6448.271
2       1        1       2           7          14        39       1       
2    3925  6448.271
3       2        3       1           9           2        39       1       
1    1433 12999.542
4       2        1       2           7           0        39       1       
2    1783 12999.542
5       2        1       2           6           3        39       1       
3    1833 12999.542
6       6        1       1           8           9        38       1       
1    2700  8874.441
7       6        1       2           9           9        40       1       
2    2883  8874.441
8       6        1       2           8           6        42       1       
3    3016  8874.441
9       7        1       1           7           9        39       1       
1    2808  6911.880
10      7        1       2           6          10        35       1       
2    3233  6911.880


Chris Stubben




Daniel Nordlund-4 wrote
> 
>> -----Original Message-----
> 
>> I've been trying to get some data from the National Survey for Family
>> Growth
>> into R - however, the data is in a .dat file and the data I need doesn't
>> have any spaces or commas separating fields - rather you have to look
>> into
>> the codebook and what number of digits along the line the data you need
>> is.
>> The data I want are the following, where 1,12,int means that the data I'm
>> interested starts in column 1 and finishes in column 12 and is an
>> integer.
>> 
>>             ('caseid', 1, 12, int),
>>              ('nbrnaliv', 22, 22, int),
>>             ('babysex', 56, 56, int),
>>             ('birthwgt_lb', 57, 58, int),
>>             ('birthwgt_oz', 59, 60, int),
>>             ('prglength', 275, 276, int),
>>             ('outcome', 277, 277, int),
>>             ('birthord', 278, 279, int),
>>             ('agepreg', 284, 287, int),
>>             ('finalwgt', 423, 440, float)
>> 
>> How can I do this using R? I've written a python programme which
>> basically
>> does it but it'd be nicer if I could skip the Python bit and just do it
>> using R. Cheers for any help.
>> 
> 
> 
> Dan
> 
> 
> 


--
View this message in context: http://r.789695.n4.nabble.com/Getting-codebook-data-into-R-tp4374331p4386135.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list