[R] Reading hierarchical data
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Feb 8 01:01:47 CET 2010
Here is a further simplification. We use the colClasses= argument
with "NULL" for the columns we do not want so we do not have to later
remove those columns.
# record type ("1" or "2")
rectype <- substr(input, 7, 7)
# read in record type "1"
input1 <- input[rectype == "1"]
DF1 <- read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1),
col.names = c("familyid", "", "", "", "dwelling"),
colClasses = c("numeric", "NULL", "NULL", "NULL", "numeric"))
# read in record type "2"
input2 <- input[rectype == "2"]
DF2 <- read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1),
col.names = c("personalid", "", "", "age", "", "sex"),
colClasses = c("numeric", "NULL", "NULL", "numeric", "NULL", "numeric"))
# ix is the index in DF1 of family row corresponding to each personal row in DF2
ix <- cumsum(rectype == "1")[rectype == "2"]
DF <- cbind(DF1[ix,], DF2)
DF
On Sun, Feb 7, 2010 at 6:30 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Try this. It uses input defined in Jim's post and defines the rectype
> of each row ("1" or "2"). It then reads the rectype "1" records into
> DF1 using read.fwf and the rectype "2" records into DF2 also using
> read.fwf. ix is defined to have one component per personal record
> giving the row number in DF1 of the corresponding family. We combine
> DF1 and DF2 using ix and remove the column names that start with "X".
>
> # record type ("1" or "2")
> rectype <- substr(input, 7, 7)
>
> # read in record type "1"
> input1 <- input[rectype == "1"]
> DF1 <- read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1),
> col.names = c("familyid", "X", "X", "X", "dwelling"))
>
> # read in record type "2"
> input2 <- input[rectype == "2"]
> DF2 <- read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1),
> col.names = c("personalid", "X", "X", "age", "X", "sex"))
>
> # ix is the index in DF1 of family row corresponding to each personal row in DF2
> ix <- cumsum(rectype == "1")[rectype == "2"]
> DF <- cbind(DF1[ix,], DF2)
> DF <- DF[substr(names(DF), 1, 1) != "X"]
>
> so DF looks like this:
>
>> DF
> familyid dwelling personalid age sex
> 1 6470 1 1 32 0
> 1.1 6470 1 2 30 1
> 2 7470 0 1 40 1
> 3 8470 0 1 27 0
> 4 9470 0 1 13 1
> 4.1 9470 0 2 22 0
> 4.2 9470 0 3 24 1
> 5 10470 1 1 20 0
> 5.1 10470 1 2 11 1
> 6 11470 0 1 17 0
> 6.1 11470 0 2 10 1
> 6.2 11470 0 3 26 1
>
> On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) <sabaric at charter.net> wrote:
>>
>> I would like to read the following hierarchical data set. There is a family
>> record followed by one or more personal records.
>> If col. 7 is "1" it is a family record. If it is "2" it is a personal
>> record.
>> The family record is formatted as follows:
>> col. 1-5 family id
>> col. 7 "1"
>> col. 9 dwelling type code
>> The personal record is formatted as follows:
>> col. 1-5 personal id
>> col. 7 "2"
>> col. 8-9 age
>> col. 11 sex code
>>
>> The first six family and accompanying personal records look like this:
>> 06470 1 1
>> 1 232 0
>> 2 230 1
>> 07470 1 0
>> 1 240 1
>> 08470 1 0
>> 1 227 0
>> 09470 1 0
>> 1 213 1
>> 2 222 0
>> 3 224 1
>> 10470 1 1
>> 1 220 0
>> 2 211 1
>> 11470 1 0
>> 1 217 0
>> 2 210 1
>> 3 226 1
>>
>> I want to create a dataset containing
>> . family ID
>> . dwelling code
>> . person ID
>> . age
>> . sex code
>> The dataset will contain one observation per person, and the with family
>> information repeated for people in the same family.
>> Can anyone help?
>> Thanks,
>> Richard Saba
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
More information about the R-help
mailing list