[R] Re ferencing columns and pulling selected data

Brian Diggs diggsb at ohsu.edu
Wed Aug 5 22:16:11 CEST 2009


PDXRugger wrote:
> Please consider the following inputs:
> PrsnSerialno<-c(735,1147,2019,4131,4131,4217,4629,4822,4822,5979,5979,6128,6128,7004,7004,
> 7004,7004,7004,7438,7438,9402,9402,9402,10115,10115,11605,12693,12693,12693)
> 
> PrsnAge<-c(59,48,42,24,24,89,60,43,47,57,56,76,76,66,70,14,7,3,62,62,30,10,7,20,21,50,53,44,29)
> 
> IsHead<-c(TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,
> FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE)
> 
> PrsnData<-cbind(PrsnSerialno,PrsnAge,IsHead)

This is more easily dealt with using data.frames than matrices (which is what cbind will give you; also a data.frame will not promote your logical IsHead to numeric in the process)

PrsnData<-data.frame(PrsnSerialno,PrsnAge,IsHead)

> HhSerialno<-c(735,1147,2019,4131,4217,4629,4822,5979,6128,7004,7438,9402,10115,11605,12693)
> HhData<-cbind(HhSerialno)

Same for HhData:

HhData<-data.frame(HhSerialno)


> What i would like to do is to add a age column to HhData that would
> correspond to the serial number and which is also the oldest person in the
> house, or what corresponds to "TRUE"(designates oldest person).  The TRUE
> false doesnt have to be considered but is preferable.  
> 
> The result would then be:
> HhSerialno HhAge
> 735	59
> 1147	48
> 2019	42
> 4131	24
> 4217	89
> 4629	60
> 4822	47
> 5979	57
> 6128	76
> 7004	70
> 7438	62
> 9402	30
> 10115	21
> 11605	50
> 12693	53
> 
> I tried
> PumsHh..$Age<-PumsPrsn[PumsPrsn$SERIALNO==PumsHh..$Serialno,PumsPrsn$AGE]
> but becaseu teh data frames are of different length it doesnt work so im
> unsure of another way of doing this.  Thanks in advance

merge will pull together two data.frames based on some matching criteria without regard to if they are the same length.

HhData <- merge(HhData,
                PrsnData[PrsnData$IsHead==TRUE,
                         c("PrsnSerialno","PrsnAge")],
                by.x = "HhSerialno", 
                by.y = "PrsnSerialno")

That is, merge the data.frame HhData with the a selected subset of PrsnData (those cases with IsHead == TRUE and only the columns with the serial number and age).  Since the variable names that are to be matched are not the same in the two data.frames, by.x and by.y must be specified.

names(HhData)[2] <- "HhAge"

This will change the variable name from PrsnAge (which it inherited from PrsnData) to HhAge.

HhData
   HhSerialno HhAge
1         735    59
2        1147    48
3        2019    42
4        4131    24
5        4217    89
6        4629    60
7        4822    47
8        5979    57
9        6128    76
10       7004    70
11       7438    62
12       9402    30
13      10115    21
14      12693    53


> JR
> 

--
Brian Diggs, Ph.D.
Senior Research Associate, Department of Surgery, Oregon Health & Science University




More information about the R-help mailing list