[R] lists: removing elements, iterating over elements,

Paul Johnson pauljohn at ku.edu
Tue Apr 5 19:36:22 CEST 2005


I'm writing R code to calculate Hierarchical Social Entropy, a diversity 
index that Tucker Balch proposed.  One article on this was published in 
Autonomous Robots in 2000. You can find that and others through his web 
page at Georgia Tech.

http://www.cc.gatech.edu/~tucker/index2.html

While I work on this, I realize (again) that I'm a C programmer 
masquerading in R, and its really tricky working with R lists.  Here are 
things that surprise me, I wonder what your experience/advice is.

I need to calculate overlapping U-diametric clusters of a given radius. 
   (Again, I apologize this looks so much like C.)


## Returns a list of all U-diametric clusters of a given radius
## Give an R distance matrix
## Clusters may overlap.  Clusters may be identical (redundant)
getUDClusters <-function(distmat,radius){
   mem <- list()

   nItems <- dim(distmat)[1]
   for ( i in 1:nItems ){
     mem[[i]] <- c(i)
   }


   for ( m in 1:nItems ){
     for ( n in 1:nItems ){
       if (m != n & (distmat[m,n] <= radius)){
	##item is within radius, so add to collection m
         mem[[m]] <- sort(c( mem[[m]],n))
       }
     }
   }

   return(mem)
}


That generates the list, like this:

[[1]]
[1]  1  3  4  5  6  7  8  9 10

[[2]]
[1]  2  3  4 10

[[3]]
[1]  1  2  3  4  5  6  7  8 10

[[4]]
[1]  1  2  3  4 10

[[5]]
[1]  1  3  5  6  7  8  9 10

[[6]]
[1]  1  3  5  6  7  8  9 10

[[7]]
[1]  1  3  5  6  7  8  9 10

[[8]]
[1]  1  3  5  6  7  8  9 10

[[9]]
[1]  1  5  6  7  8  9 10

[[10]]
  [1]  1  2  3  4  5  6  7  8  9 10


The next task is to eliminate the redundant elements.  unique() does not 
apply to lists, so I have to scan one by one.


   cluslist <- getUDClusters(distmat,radius)

   ##find redundant (same) clusters
   redundantCluster <- c()
   for (m in 1:(length(cluslist)-1)) {
     for ( n in (m+1): length(cluslist) ){
       if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
         if ( sum(cluslist[[m]] == cluslist[[n]]){
           redundantCluster <- c( redundantCluster,n)
         }
       }
     }
   }


   ##make sure they are sorted in reverse order
   if (length(redundantCluster)>0)
     {
       redundantCluster <- unique(sort(redundantCluster, decreasing=T))

   ## remove redundant clusters (must do in reverse order to preserve 
index of cluslist)
       for (i in redundantCluster) cluslist[[i]] <- NULL
     }


Question: am I deleting the list elements properly?

I do not find explicit documentation for R on how to remove elements 
from lists, but trial and error tells me

myList[[5]] <- NULL

will remove the 5th element and then "close up" the hole caused by 
deletion of that element.  That suffles the index values, So I have to 
be careful in dropping elements. I must work from the back of the list 
to the front.


Is there an easier or faster way to remove the redundant clusters?


Now, the next question.  After eliminating the redundant sets from the 
list, I need to calculate the total number of items present in the whole 
list, figure how many are in each subset--each list item--and do some 
calculations.

I expected this would iterate over the members of the list--one step for 
each subcollection

for (i in cluslist){

}

but it does not.  It iterates over the items within the subsets of the 
list "cluslist."  I mean, if cluslist has 5 sets, each with 10 elements, 
this for loop takes 50 steps, one for each individual item.

I find this does what I want

for (i in 1:length(cluslist))

But I found out the hard way :)


Oh, one more quirk that fooled me.  Why does unique() applied to a 
distance matrix throw away the 0's????  I think that's really bad!

 > x <- rnorm(5)
 > myDist <- dist(x,diag=T,upper=T)
 > myDist
           1         2         3         4         5
1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918
2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058
3 1.6658710 0.3728735 0.0000000 0.9989292 1.1163793
4 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085
5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000
 > unique(myDist)
  [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058
  [8] 0.9989292 1.1163793 2.1153085
 >

-- 
Paul E. Johnson                       email: pauljohn at ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700




More information about the R-help mailing list