[R] Reading hierarchical data

Gabor Grothendieck ggrothendieck at gmail.com
Mon Feb 8 00:30:49 CET 2010


Try this. It uses input defined in Jim's post and defines the rectype
of each row ("1" or "2").  It then reads the rectype "1" records into
DF1 using read.fwf and the rectype "2" records into DF2 also using
read.fwf.  ix is defined to have one component per personal record
giving the row number in DF1 of the corresponding family.  We combine
DF1 and DF2 using ix and remove the column names that start with "X".

# record type ("1" or "2")
rectype <- substr(input, 7, 7)

# read in record type "1"
input1 <- input[rectype == "1"]
DF1 <- read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1),
	col.names = c("familyid", "X", "X", "X", "dwelling"))

# read in record type "2"
input2 <- input[rectype == "2"]
DF2 <- read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1),
	col.names = c("personalid", "X", "X", "age", "X", "sex"))

# ix is the index in DF1 of family row corresponding to each personal row in DF2
ix <- cumsum(rectype == "1")[rectype == "2"]
DF <- cbind(DF1[ix,], DF2)
DF <- DF[substr(names(DF), 1, 1) != "X"]

so DF looks like this:

> DF
    familyid dwelling personalid age sex
1       6470        1          1  32   0
1.1     6470        1          2  30   1
2       7470        0          1  40   1
3       8470        0          1  27   0
4       9470        0          1  13   1
4.1     9470        0          2  22   0
4.2     9470        0          3  24   1
5      10470        1          1  20   0
5.1    10470        1          2  11   1
6      11470        0          1  17   0
6.1    11470        0          2  10   1
6.2    11470        0          3  26   1

On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) <sabaric at charter.net> wrote:
>
> I would like to read the following hierarchical data set. There is a family
> record followed by one or more personal records.
> If col. 7 is "1" it is a family record. If it is "2" it is a personal
> record.
> The family record is formatted as follows:
> col. 1-5     family id
> col. 7        "1"
> col. 9        dwelling type code
> The personal record is formatted as follows:
> col. 1-5        personal id
> col. 7   "2"
> col. 8-9        age
> col. 11 sex code
>
> The first six family and accompanying personal records look like this:
> 06470 1 1
>    1 232 0
>    2 230 1
> 07470 1 0
>    1 240 1
> 08470 1 0
>    1 227 0
> 09470 1 0
>    1 213 1
>    2 222 0
>    3 224 1
> 10470 1 1
>    1 220 0
>    2 211 1
> 11470 1 0
>    1 217 0
>    2 210 1
>    3 226 1
>
> I want to create a dataset containing
> . family ID
> . dwelling code
> . person ID
> . age
> . sex code
> The dataset will contain one observation per person, and the with family
> information repeated for people in the same family.
> Can anyone help?
> Thanks,
> Richard Saba
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list