[R] Reading hierarchical data

Gabor Grothendieck ggrothendieck at gmail.com
Mon Feb 8 01:01:47 CET 2010


Here is a further simplification.  We use the colClasses= argument
with "NULL" for the columns we do not want so we do not have to later
remove those columns.

# record type ("1" or "2")
rectype <- substr(input, 7, 7)

# read in record type "1"
input1 <- input[rectype == "1"]
DF1 <- read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1),
	col.names = c("familyid", "", "", "", "dwelling"),
	colClasses = c("numeric", "NULL", "NULL", "NULL", "numeric"))

# read in record type "2"
input2 <- input[rectype == "2"]
DF2 <- read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1),
	col.names = c("personalid", "", "", "age", "", "sex"),
	colClasses = c("numeric", "NULL", "NULL", "numeric", "NULL", "numeric"))

# ix is the index in DF1 of family row corresponding to each personal row in DF2
ix <- cumsum(rectype == "1")[rectype == "2"]
DF <- cbind(DF1[ix,], DF2)

DF


On Sun, Feb 7, 2010 at 6:30 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Try this. It uses input defined in Jim's post and defines the rectype
> of each row ("1" or "2").  It then reads the rectype "1" records into
> DF1 using read.fwf and the rectype "2" records into DF2 also using
> read.fwf.  ix is defined to have one component per personal record
> giving the row number in DF1 of the corresponding family.  We combine
> DF1 and DF2 using ix and remove the column names that start with "X".
>
> # record type ("1" or "2")
> rectype <- substr(input, 7, 7)
>
> # read in record type "1"
> input1 <- input[rectype == "1"]
> DF1 <- read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1),
>        col.names = c("familyid", "X", "X", "X", "dwelling"))
>
> # read in record type "2"
> input2 <- input[rectype == "2"]
> DF2 <- read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1),
>        col.names = c("personalid", "X", "X", "age", "X", "sex"))
>
> # ix is the index in DF1 of family row corresponding to each personal row in DF2
> ix <- cumsum(rectype == "1")[rectype == "2"]
> DF <- cbind(DF1[ix,], DF2)
> DF <- DF[substr(names(DF), 1, 1) != "X"]
>
> so DF looks like this:
>
>> DF
>    familyid dwelling personalid age sex
> 1       6470        1          1  32   0
> 1.1     6470        1          2  30   1
> 2       7470        0          1  40   1
> 3       8470        0          1  27   0
> 4       9470        0          1  13   1
> 4.1     9470        0          2  22   0
> 4.2     9470        0          3  24   1
> 5      10470        1          1  20   0
> 5.1    10470        1          2  11   1
> 6      11470        0          1  17   0
> 6.1    11470        0          2  10   1
> 6.2    11470        0          3  26   1
>
> On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) <sabaric at charter.net> wrote:
>>
>> I would like to read the following hierarchical data set. There is a family
>> record followed by one or more personal records.
>> If col. 7 is "1" it is a family record. If it is "2" it is a personal
>> record.
>> The family record is formatted as follows:
>> col. 1-5     family id
>> col. 7        "1"
>> col. 9        dwelling type code
>> The personal record is formatted as follows:
>> col. 1-5        personal id
>> col. 7   "2"
>> col. 8-9        age
>> col. 11 sex code
>>
>> The first six family and accompanying personal records look like this:
>> 06470 1 1
>>    1 232 0
>>    2 230 1
>> 07470 1 0
>>    1 240 1
>> 08470 1 0
>>    1 227 0
>> 09470 1 0
>>    1 213 1
>>    2 222 0
>>    3 224 1
>> 10470 1 1
>>    1 220 0
>>    2 211 1
>> 11470 1 0
>>    1 217 0
>>    2 210 1
>>    3 226 1
>>
>> I want to create a dataset containing
>> . family ID
>> . dwelling code
>> . person ID
>> . age
>> . sex code
>> The dataset will contain one observation per person, and the with family
>> information repeated for people in the same family.
>> Can anyone help?
>> Thanks,
>> Richard Saba
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list