[R] Advice needed on awkward tables

Jim Lemon jim at bitwrit.com.au
Tue May 11 12:39:16 CEST 2010


On 05/11/2010 02:05 PM, Greg Orm wrote:
> Dear r-help list members,
>
> I am quite new to R, and hope to seek advice from you about a problem I have
> been cracking my head over. Apologies if this seems like a simple problem.
>
> I have essentially two tables. The first (Table A) is a standard patient
> clinicopathological data table, where rows correspond to patient IDs and
> columns correspond to clinical features. Records in this table are stored as
> 1 or 0 (denoting presence). An example is provided below.
>
> The second (Table B) is a table that represents a 'key' to Table A. This
> Table B has a category field, as well as a feature field which links to the
> Table B. Unfortunately, this is a one-to-many relationship, and the numbers
> in the feature field represent the respective columns in Table A, delimited
> by semicolons. So in the example below, I need to collapse the data in Table
> B into a table with nrow equivalent to the number of categories and ncol =
> number of patients. The collapsing of each categoriy, will be based on a
> Boolean OR, or the equivalent ANY in R (so long as 1 of the features is
> true, the resulting outcome will be true)
>
> data.table.a<-
> matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10)))
> data.table.b<- data.frame
> (ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8"))
>
> In the example tables above, we hope to collapse the features by category -
> so the final desired output will be a total of 10 patients as rows, and a
> total of 5 categories as columns. (after collapsing the features by a
> Boolean OR). (i.e. if any of the features in the category are present, it
> will be a TRUE).
>
> I apologize for the apparently awkward table, but this is what I had to
> start with. I tried expanding data.table.b$FEATURE using strsplit, which
> resulted in a list, and then I got stuck there for a long time.

Hi Greg,
Messy, but I think it works.

feature2category<-function(dta,dtb) {
  categories<-unique(dtb$CATEGORY)
  category.table<-matrix(0,nrow=dim(dta)[1],
   ncol=length(categories))
  colnames(category.table)<-
   paste("Category",1:length(categories))
  for(patrow in 1:dim(dta)[1]) {
   for(catrow in 1:dim(dtb)[1]) {
    acols<-as.numeric(unlist(strsplit(
     as.character(dtb[catrow,"FEATURE"]),";")))
    cat("patrow",patrow,"catrow",catrow,acols,"\n")
    category.table[patrow,dtb[catrow,"CATEGORY"]]<-
     as.logical(sum(dta[patrow,acols]))
   }
  }
  return(as.data.frame(category.table))
}

Jim



More information about the R-help mailing list