[R] wanting to count instances of values in each cell of a series of simulated symmetric matrices of the same size

R. Mark Sharp rm@h@rp @end|ng |rom me@com
Wed Jun 2 03:59:50 CEST 2021


I want to capture the entire distribution of values for each cell in a sequence of symmetric matrices of the same size. The diagonal values are all 0.5 so I need only the values above or below the diagonal. 

A small example with three of the structures I am wanting to count follows:
       F      G      H      I     J
F 0.6250 0.3750 0.2500 0.1875 0.125
G 0.3750 0.6250 0.2500 0.1875 0.125
H 0.2500 0.2500 0.5000 0.1875 0.125
I 0.1875 0.1875 0.1875 0.5000 0.250
J 0.1250 0.1250 0.1250 0.2500 0.500

       F      G      H      I     J
F 0.5625 0.3125 0.1875 0.1250 0.125
G 0.3125 0.5625 0.1875 0.1250 0.125
H 0.1875 0.1875 0.5000 0.1875 0.125
I 0.1250 0.1250 0.1875 0.5000 0.250
J 0.1250 0.1250 0.1250 0.2500 0.500

        F       G      H       I      J
F 0.50000 0.25000 0.1250 0.09375 0.0625
G 0.25000 0.50000 0.1250 0.09375 0.0625
H 0.12500 0.12500 0.5000 0.18750 0.1250
I 0.09375 0.09375 0.1875 0.50000 0.2500
J 0.06250 0.06250 0.1250 0.25000 0.5000


To be more specific, I have coded up a solution for a single cell with the sequence of values (one from each matrix) in a vector. 

I used match() below and it works with a matrix but I do not know how to do what is in the if statements with matrices. Since the number of values and the values will be different among the various cells a simple array structure does not seem appropriate and I am assuming I will need to use a list but I would like to do as much as I can with matrices for speed and clarity.

#' Counts the number of occurrences of each kinship value seen for a pair of
#' individuals.
#'
#' @examples
#' \donttest{
#' set.seed(20210529)
#' kSamples <- sample(c(0, 0.0675, 0.125, 0.25, 0.5, 0.75), 10000, replace = TRUE,
#'                    prob = c(0.005, 0.3, 0.15, 0.075, 0.0375, 0.01875))
#' kVC <- list(kinshipValues = numeric(0),
#'             kinshipCounts = numeric(0))
#' for (kSample in kSamples) {
#'   kVC <- countKinshipValues(kSample, kVC$kinshipValues, kVC$kinshipCounts)
#' }
#' kVC
#' ## $kinshipValues
#' ## [1] 0.2500 0.1250 0.0675 0.7500 0.5000 0.0000
#' ##
#' ## $kinshipCounts
#' ## [1]  301 2592 5096 1322  592   97
#' }
#'
#' @param kValue numeric value being counted (kinship value in
#' \emph{nprcgenekeepr})
#' @param kinshipValues vector of unique values of \code{kValue} seen
#' thus far.
#' @param kinshipCounts vector of the counts of the unique values of
#' \code{kValue} seen thus far.
#' @export
countKinshipValues <- function(kValue, kinshipValues = numeric(0),
                              kinshipCounts = numeric(0)) {
  kinshipValue <- match(kValue, kinshipValues, nomatch = -1L)
  if (kinshipValue == -1L) {
    kinshipValues <- c(kinshipValues, kValue)
    kinshipCounts[length(kinshipCounts) + 1] <- 1
  } else {
    kinshipCounts[kinshipValue] <- kinshipCounts[kinshipValue] + 1
  }
  list(kinshipValues = kinshipValues,
       kinshipCounts = kinshipCounts)
}

Mark


R. Mark Sharp, Ph.D.
Data Scientist and Biomedical Statistical Consultant
7526 Meadow Green St.
San Antonio, TX 78251
mobile: 210-218-2868
rmsharp using me.com



More information about the R-help mailing list