[R] multiple record types from a single file efficiently?
jcress410
boywonder410 at gmail.com
Wed Oct 6 18:02:00 CEST 2010
The current population survey march supplements contain records on
households, families and individuals, each with distinct record types all in
the same file. I'm trying to efficiently read these files,
the following function reads the data file "indata", the records are
described in lists contained in "dd_by_type" and flag_pos gives the
character position in the data which indicates which record type to be used,
the function i've written "works" (see below, please) but it's awful at its
job, it has to perform two read operations and one write operation per line,
and I don't know how to make it more efficient.
There exists a function for handling similar problems (read.fwf.multi) but
it requires each collection of records be of a defined row length, (one
household entry = n for all households) but, that doesnt work here
so the function needs to read.fwf (or scan) a line and parse the line
according to the flag in character position given by flag_pos....
thoughts?
loadhierarchy <- function(indata,dd_by_type,flag_pos) {
# read indata line by line and add each row based on its record type
# we'll grab the line, compare the character in col position flag_pos to
type_flag (column 3 in dd_by_type) and
# rbind to the data with the right formatting
i <- 1
width <- max(sum(unlist(dd_by_type[1:dim(dd_by_type)[1],3])))
#for (i in 1:dim(dd_by_type)[1]) {
#assign(paste("con_",i,sep=""),file(paste(file.path(indata),"_",i,".csv",sep=""),
open="w"))
#}
while (length(line <-
(scan(indata,skip=(i-1),nlines=1,what=character(),fill=TRUE, sep=","))) >
0){
typeflag <- as.integer(substr(line,flag_pos,flag_pos))
inline <-
read.fwf(indata,skip=(i-1),n=1,widths=as.vector(unlist(dd_by_type[typeflag,3]))
)
inline <- matrix(unlist(inline),nrow=1)
write.table(inline,paste(file.path(indata),"_",typeflag,".csv",sep=""),
row.names=FALSE, col.names=FALSE, append=TRUE , sep=",")
print(i)
i <- (i+1)
}
}
--
View this message in context: http://r.789695.n4.nabble.com/multiple-record-types-from-a-single-file-efficiently-tp2965249p2965249.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list