How to use this package (and why)

Introduction

The CFAcoop package equips you with functions to analyze data from the clonogenic assay (also called ‘Colony Formation Assay’) in presence or absence of cellular cooperation. Thus, this package allows you to robustly extract clonogenic survival information for your cell lines under a given treatment.

This vignette is meant to enable you to process your data following the method first presented in Brix et al., Radiation Oncology, 2020: “The clonogenic assay: robustness of plating efficiency-based analysis is strongly compromised by cellular cooperation”. Therefore, the data presented in Figure 2 in Brix et al., which is provided within the package, is used to illustrate how to use the CFAcoop package. Further, it is shown, how the survival fractions of cooperative cell lines would look like, if cellular cooperation was ignored.

Cellular Cooperation

In order to avoid confusion, cellular cooperation should be defined. Understanding this concept comes easy, when starting from the opposite. Clonogenic growth of non-cooperative cells is independent of cell density. Thus, there is a constant relationship between cells seeded \(S\) and colonies counted \(C\): \[C = a \cdot S,\] where \(a\) corresponds just to the conventional plating efficiency (\(PE = \frac{C}{S}\)).

Now, cellular cooperation refers to the benefit cells can have from their surrounding neighbors which results in a non-constant relation of cells seeded and colonies counted (For details, see Brix et al., Radiation Oncology, 2020). The probability of clonogenic growth for a single cell increases with the number of surrounding cells to cooperate with. It has turned out that generalizing the equation above by a parameter \(b\) adequately models the colonies counted of cooperative and non-cooperative cell lines: \[C = a \cdot S^b.\] In this model, \(b = 1\) gives the non-cooperative case and \(b > 1\) corresponds to cooperative growth.

In short, a cell line is called cooperative, if \(b > 1\).

Clonogenic Survival

Conventionally, clonogenic survival at a given treatment \(x\) was determined as the ratio of colonies counted \(C_x\) and the cells seeded \(S_x\) scaled to the plating efficiency of a reference \(PE_0 = \frac{C_0}{S_0}\) \[SF'_x = \frac{\frac{C_x}{S_x}}{PE_0}.\]

The new method now shifts the focus of the survival fraction directly to the number of cells needed to be seeded under the two conditions (treated and untreated) in order to achieve an identical expectation of the number of colonies formed \(C\). Essentially, the new method does not focus on the number of colonies formed after growth in different cell densities, but on the number of seeded single cells with clonogenic potential before growing to identical colony numbers.

\[SF_x(C) = \frac{S_0(C)}{S_x(C)} = exp\left( \frac{log\left(\frac{C}{a_0}\right)}{b_0} - \frac{log\left(\frac{C}{a_x}\right)}{b_x}\right)\]

Obviously, for \(b_x = b_0 = 1\) the equivalence \(SF_x(C) \equiv SF'\) holds for all \(C\), and thus, the non-cooperative case is well covered by the new method. Importantly, the conventional determination of clonogenic survival is heavily compromised by cellular cooperation, if present.

Getting Started

The data as presented in Figure 2 in Brix et al. is included in the package in form of a `data.frame` CFAdata. It can be loaded and summarized by:

library(CFAcoop)
data("CFAdata")
summary(CFAdata)
#>      cell.line   Cells seeded           0 Gy             1 Gy       
#>  T47D     :54   Min.   :   14.68   Min.   :  0.00   Min.   :  0.00  
#>  MDA-MB231:84   1st Qu.:  146.78   1st Qu.:  8.00   1st Qu.: 10.00  
#>  A549     :72   Median : 1000.00   Median : 35.50   Median : 35.00  
#>  HCC1806  :48   Mean   : 3980.14   Mean   : 79.55   Mean   : 83.97  
#>  SKBR3    :80   3rd Qu.: 4641.59   3rd Qu.:114.00   3rd Qu.:125.50  
#>  SKLU1    :56   Max.   :31622.78   Max.   :535.00   Max.   :510.00  
#>  BT20     :60                      NA's   :214      NA's   :214     
#>       2 Gy             4 Gy             6 Gy             8 Gy       
#>  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
#>  1st Qu.: 10.00   1st Qu.:  7.50   1st Qu.:  4.00   1st Qu.:  1.00  
#>  Median : 35.00   Median : 29.00   Median : 19.00   Median :  7.00  
#>  Mean   : 85.31   Mean   : 70.01   Mean   : 56.17   Mean   : 30.21  
#>  3rd Qu.:119.00   3rd Qu.:105.00   3rd Qu.: 73.25   3rd Qu.: 30.00  
#>  Max.   :480.00   Max.   :400.00   Max.   :450.00   Max.   :250.00  
#>  NA's   :219      NA's   :215      NA's   :214      NA's   :213

Fast Analysis and Plotting of Results

The shortcut to analyze data, is using the wrapper function `analyze_survival(RD, name, xtreat)` where RD is a `data.frame` or `matrix` containing your numbers of seeded cells (first column) and numbers of colonies counted under the treatments (numeric argument, e.g. the dose applied `xtreat = c(0,1,2,4,6,8)`).

The returned objects should be concatenated in a list-object and can be plotted by `plot_sf()`.

data("CFAdata")
data1 <- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "T47D")
data2 <- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "BT20")
SF <- vector("list", 2)
SF[[1]] <- analyze_survival(
  RD = data1[, c("Cells seeded","0 Gy","1 Gy","2 Gy","4 Gy","6 Gy","8 Gy")], 
  name = as.character(data1[1,1]), 
  xtreat = c(0, 1, 2, 4, 6, 8), 
  C = 20)
SF[[2]] <- analyze_survival(
  RD = data2[,-1], 
  name = as.character(data2[1,1]), 
  xtreat = c(0, 1, 2, 4, 6, 8))
plot_sf(SF = SF)

Raw data from single replicates is plotted as ‘\(+\)’-symbols. Corresponding regression lines are calculated using the mean values of replicates of identical numbers of cells seeded, which are plotted as dots. The color indicates the treatment (irradiation with 0 to 8 Gy) and links the numbers of colonies counted from the upper plot to the calculated clonogenic survival in the lower plot. Shaded areas indicate the span from \(C = 5\) to \(C = 100\), which is within the target region of good experimental practice. Dashed lines show regression lines with a slope of \(b=1\) (at \(log10(a)\) varying from 0 to 4) for orientation, so that any substantially non-linear relation (i.e. \(b \neq 1\)) between the number of colonies counted (\(C\)) and the number of cells seeded (\(S\)) can be spotted easily.
The dots in the treatment response curves correspond to the survival fractions at \(C = 20\) with error bars indicating the uncertainty of the estimated survival fractions in terms of its standard deviation. This uncertainty is calculated via First-Order-Taylor-Series-Approximation of \(SF_x(C)\).

All information used for plotting is contained in the objects returned by `analyze_survival`.

A `data.frame` with a summary of the estimated survival fractions can be generated by

summary_df <- export_sf(SF)

to export this `data.frame` in a csv-File, execute: ` write.csv(x = summary_df,file = "CFAcoopResult.csv") `

The `data.frame` includes the following columns

colnames(summary_df)
#>  [1] "cell.line"   "treatment"   "ln.a"        "sd.ln.a"     "b"          
#>  [6] "sd.b"        "r.ln.a.b"    "SF"          "sd.SF"       "lb.SF"      
#> [11] "ub.SF"       "log10.SF"    "sd.log10.SF" "lb.log10.SF" "ub.log10.SF"
head(summary_df)
#>   cell.line treatment   ln.a sd.ln.a     b  sd.b r.ln.a.b     SF  sd.SF  lb.SF
#> 1      T47D         0 -1.040   0.108 1.024 0.020   -0.983 1.0000     NA     NA
#> 2      T47D         1 -1.277   0.094 1.015 0.016   -0.986 0.7630 0.0340 0.6992
#> 3      T47D         2 -1.984   0.163 1.055 0.026   -0.987 0.4595 0.0251 0.4129
#> 4      T47D         4 -2.375   0.377 0.960 0.054   -0.990 0.1914 0.0189 0.1578
#> 5      T47D         6 -4.247   0.359 0.996 0.044   -0.993 0.0358 0.0023 0.0315
#> 6      T47D         8 -5.460   0.873 0.882 0.098   -0.994 0.0035 0.0005 0.0027
#>    ub.SF log10.SF sd.log10.SF lb.log10.SF ub.log10.SF
#> 1     NA   0.0000          NA          NA          NA
#> 2 0.8325  -0.1175      0.0193     -0.1554     -0.0796
#> 3 0.5114  -0.3377      0.0237     -0.3842     -0.2913
#> 4 0.2322  -0.7180      0.0428     -0.8020     -0.6341
#> 5 0.0407  -1.4460      0.0285     -1.5018     -1.3902
#> 6 0.0047  -2.4526      0.0622     -2.5746     -2.3307
summary(summary_df)
#>   cell.line           treatment        ln.a            sd.ln.a      
#>  Length:12          Min.   :0.0   Min.   :-16.049   Min.   :0.0940  
#>  Class :character   1st Qu.:1.0   1st Qu.:-12.333   1st Qu.:0.3100  
#>  Mode  :character   Median :3.0   Median : -7.246   Median :0.8135  
#>                     Mean   :3.5   Mean   : -7.676   Mean   :0.7391  
#>                     3rd Qu.:6.0   3rd Qu.: -2.277   3rd Qu.:1.0542  
#>                     Max.   :8.0   Max.   : -1.040   Max.   :1.7440  
#>                                                                     
#>        b              sd.b            r.ln.a.b             SF        
#>  Min.   :0.882   Min.   :0.01600   Min.   :-0.9940   Min.   :0.0035  
#>  1st Qu.:1.010   1st Qu.:0.03950   1st Qu.:-0.9915   1st Qu.:0.1717  
#>  Median :1.407   Median :0.10450   Median :-0.9900   Median :0.4451  
#>  Mean   :1.502   Mean   :0.09942   Mean   :-0.9891   Mean   :0.4708  
#>  3rd Qu.:2.069   3rd Qu.:0.14800   3rd Qu.:-0.9868   3rd Qu.:0.7641  
#>  Max.   :2.155   Max.   :0.21000   Max.   :-0.9830   Max.   :1.0000  
#>                                                                      
#>      sd.SF             lb.SF            ub.SF           log10.SF      
#>  Min.   :0.00050   Min.   :0.0027   Min.   :0.0047   Min.   :-2.4526  
#>  1st Qu.:0.01792   1st Qu.:0.1016   1st Qu.:0.1727   1st Qu.:-0.7757  
#>  Median :0.02415   Median :0.2696   Median :0.3951   Median :-0.3517  
#>  Mean   :0.03517   Mean   :0.3030   Mean   :0.4420   Mean   :-0.6104  
#>  3rd Qu.:0.04188   3rd Qu.:0.5000   3rd Qu.:0.7431   3rd Qu.:-0.1169  
#>  Max.   :0.11330   Max.   :0.6992   Max.   :1.0249   Max.   : 0.0000  
#>  NA's   :2         NA's   :2        NA's   :2                         
#>   sd.log10.SF       lb.log10.SF       ub.log10.SF     
#>  Min.   :0.01930   Min.   :-2.5746   Min.   :-2.3307  
#>  1st Qu.:0.03207   1st Qu.:-1.0118   1st Qu.:-0.7703  
#>  Median :0.04445   Median :-0.5906   Median :-0.4231  
#>  Mean   :0.04452   Mean   :-0.8198   Mean   :-0.6452  
#>  3rd Qu.:0.05860   3rd Qu.:-0.3034   3rd Qu.:-0.1361  
#>  Max.   :0.06790   Max.   :-0.1554   Max.   : 0.0107  
#>  NA's   :2         NA's   :2         NA's   :2

All information of this `data.frame` is also accessible directly in the object returned by `analyze_survival`. For instance, the information about the regression of the 0 Gy reference of the cell line BT20 or the survival fractions of the 5 treatments for T47D (at \(C = 20\)) can be recalled by:

SF[[2]]$fit[1]
#> [[1]]
#> 
#> Call:
#> lm(formula = "lnC ~ 1 + lnS", data = x)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.60869 -0.30974  0.07812  0.28621  0.59499 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  -9.0318     0.7538  -11.98 2.17e-06 ***
#> lnS           1.7586     0.1173   14.99 3.86e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.4088 on 8 degrees of freedom
#> Multiple R-squared:  0.9656, Adjusted R-squared:  0.9613 
#> F-statistic: 224.8 on 1 and 8 DF,  p-value: 3.865e-07
SF[[1]]$SF
#> [1] 0.76295935 0.45950014 0.19141889 0.03581058 0.00352676

Details for Focussed Analysis

Assess Cellular Cooperation

Key to the robust analysis of clonogenic analysis data is the modeling of the cellular cooperation. We assume that the underlying functional dependency of seeded cells and counted colonies is of the form \[C = a \cdot S^{b},\] where \(b\) indicates the degree of cellular cooperation (\(b = 1\) is implicitly assumed for the PE-based approach).

The coefficient \(b\) is estimated in a linear regression model \[log(C) = log(a) + b \cdot log(S) + \varepsilon, \varepsilon \sim \mathcal{N}(0,\sigma^2). \]

The function `pwr_reg(seede, counted)` provides this regression and returns a `summary.lm` object.

Note that the analysis of cellular cooperation is restricted to the range of seeded cells, where at least one colony was observed. Outside this range, the attempt of studying clonogenic survival based on no observed colony counts is not reasonable and thus, `pwr_reg` will remove those data points from analysis. Thus, it is strongly recommended to use the averaged data for regression. By doing so, the range of the independent variable of the regression is widened. (Removing only those replicates with no colonies at one or few specific cell densities would bias the model fitting.)

data <- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "BT20")
data <- aggregate(x = data[, -1], by = list(data$`Cells seeded`), FUN = "mean", na.rm = TRUE)
par_0 <- pwr_reg(seeded = data$`Cells seeded`, counted = data$`0 Gy`)
par_0$coefficients
#>              Estimate Std. Error   t value     Pr(>|t|)
#> (Intercept) -9.031773  0.7538225 -11.98130 2.169497e-06
#> lnS          1.758593  0.1172839  14.99433 3.864744e-07
plot(x = log10(data$`Cells seeded`), y = log10(data$`0 Gy`),xlim = c(2,3.5))
abline(a = log10(exp(1)) * par_0$coefficients[1, 1], b = par_0$coefficients[2, 1])

With the results of this function, we can also test for cellular cooperation. Note, that the p-value in the coefficients table corresponds to the null hypothesis \(b = 0\), but we are interested in the null hypothesis of \(b = 1\). Thus, we find our p-value of interest by computing

p_value <-
  (1 - pt(
    q = (par_0$coefficients[2, 1] - 1) / par_0$coefficients[2, 2],
    df = par_0$df[2]
  ))

Thus, BT20 is higly cooperative (\(b = 1.76\), \(\hat{\sigma}_b = 0.12\), \(p < 0.001\)).

Determine Clonogenic Survival Fractions

In this package, the survival fraction \(SF(C)\) for clonogenic survival is calculated as the number of cells that need to be seeded without treatment divided by the number of cells needed to be seeded with treatment for obtaining the same expectation of colonies counted \(C\). \[ SF(C) = \frac{S_0(C)}{S_x(C)} = exp\left( \frac{log\left(\frac{C}{a_0}\right)}{b_0} - \frac{log\left(\frac{C}{a_x}\right)}{b_x} \right)\]

Given two parameter sets of clonogenic assay data, the clonogenic survival can be calculated as:

data <- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "BT20")
data <- aggregate(x = data[, -1], by = list(data$`Cells seeded`), FUN = "mean", na.rm = TRUE)
par_0 <- pwr_reg(seeded = data$`Cells seeded`, counted = data$`0 Gy`)
par_4 <- pwr_reg(seeded = data$`Cells seeded`, counted = data$`4 Gy`)
calculate_sf(par_ref = par_0, par_treat = par_4, C = 20)
#>        20 
#> 0.4308404

Remark on Ignoring Cellular Cooperation

Note that in case of cooperative cell lines, the parameter \(a\) does not correspond to a plating efficiency as for non-cooperative cell lines. The concept of a characteristic plating efficiency does not apply to cooperative cell lines.

Determine Uncertainty of Survival Fractions

The survival fraction at \(C\) for treatment \(x\) is calculated by the function \[SF_x(C) = \frac{S_0(C)}{S_x(C)} = exp\left( \frac{log\left(\frac{C}{a_0}\right)}{b_0} - \frac{log\left(\frac{C}{a_x}\right)}{b_x}\right).\]

Since the SF-values are solely dependent on the estimated parameters in the power regression (and the chosen \(C\)), the inherent uncertainty can be assessed via parametric bootstrapping (e.g. using the package mvtnorm to generate parameter sets according to the variance-covariance matrix of the fit), or by following the laws of error propagation (First-Order Taylor-Series Approximation).

We choose the analytic approximation. In order to build meaningful uncertainty intervals (i.e. respect that survival fractions will never be below zero), we work on the log-scale and transform the boundaries to the linear scale at the end.

For the sake of a shorter notation, we write: \[g = log(SF_x(C)) = \frac{d-\alpha_0}{b_0} - \frac{d-\alpha_x}{b_x} \]

According to \(\Sigma_g \approx J \Sigma_pJ^T\), where \(J\) denotes the Jacobian of \(g\) and \(\Sigma_p\) the variance-covariance matrix of the estimated parameters \(\alpha = log(a)\) and \(b\) at \(0\) and \(x\) Gy, we find \[\sigma_g^2 \approx \frac{1}{b_0^2} z_0 \Sigma_0 z_0^T + \frac{1}{b_x^2} z_x \Sigma_x z_x^T \] with \[ z_x = \left(\begin{matrix}1 & \frac{d-\alpha_x}{b_x}\end{matrix}\right) \] and \[\Sigma_x = \left(\begin{matrix} \sigma_{\alpha_x}^2 & \sigma_{\alpha_x b_x}\\ \sigma_{\alpha_x b_x} & \sigma_{b_x}^2 \end{matrix} \right).\]

What’s the problem with PE-based analysis?

In short: The plating efficiency frequently is not as constant as it needs to be in order to serve as an adequate normalization factor.

To illustrate this, we compare the PE-based calculated \(SF'\)-values with the \(SF(C = 20)\)-values calculated with the new method for (1) the non-cooperative cell line T47D and (2) the cooperative cell line (BT20).

(1) The non-cooperative case

For calculating the survival fraction, plating efficiencies are required. Plating efficiencies (\(C/S\)) are calculated easily as:

data(CFAdata)
data <- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "T47D")
data <- aggregate(x = data[, -1], by = list(data$`Cells seeded`), FUN = "mean", na.rm = TRUE)
PE_x <- data$`4 Gy` / data$`Cells seeded`
PE_0 <- data$`0 Gy` / data$`Cells seeded`
plot(x = rep(c(0, 1), each = 18), y = c(PE_0, PE_x), lty = 0, ylim = c(0,0.5),xlim = c(-0.1,1.1),
     xlab = "treatment", ylab = "C / S", axes = FALSE, main = "T47D")
axis(side = 1,at = c(0,1),labels = c("0 Gy","4 Gy"))
axis(side = 2,at = seq(0,0.5,0.1))

Now, which values are to be compared?

When there is no effect of the cell density, as assumed by the conventional approach (not taking cellular cooperation into account), each combination (of a \(C_{0 Gy}/S_{0 Gy}\)- and \(C_{4 Gy}/S_{4 Gy}\)-value) is equally reliable.

Thus, the full set of all combinations is:

SF_resample <- rep(PE_x, each = length(PE_0)) / rep(PE_0, times = length(PE_x))
hist(SF_resample, breaks = 25,xlim = c(0.12,0.25),xlab = "(C_4/S_4) / (C_0/S_0)",
     main = "valid PE-based SF'-values")

Without the assessment of cellular cooperation, conventional calculation of an \(SF'\)-value, corresponds to picking randomly a sample from the distribution shown in the histogram above.

A comparison of the range of this distribution and the calculated uncertainties of the new method shows, that for non-cooperative cell lines such as T47D, there is no big difference in this variability/uncertainty.

range(SF_resample,na.rm = TRUE)
#> [1] 0.1322039 0.2383178
as_nc_0 <- analyze_survival(RD = data[,-1],C = 20)
as_nc_0$uncertainty[4,c(3,5,6)]
#>          SF     lb.SF     ub.SF
#> 4 0.1914189 0.1577691 0.2322457

(2) The cooperative case

Now, making the same comparison as in (1) for the cooperative cell line BT20 shows the disastrous effect of ignoring the coefficient \(b\), when it is in fact different from \(1\).

data(CFAdata)
data <- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "BT20")
data <- aggregate(x = data[, -1], by = list(data$`Cells seeded`), FUN = "mean", na.rm = TRUE)
PE_x <- data$`4 Gy` / data$`Cells seeded`
PE_0 <- data$`0 Gy` / data$`Cells seeded`
plot(x = rep(c(0, 1), each = length(PE_x)), 
     y = c(PE_0, PE_x), lty = 0, ylim = c(0,0.08),xlim = c(-0.1,1.1),
     xlab = "treatment", ylab = "C / S", axes = FALSE, main = "BT20")
axis(side = 1,at = c(0,1),labels = c("0 Gy","4 Gy"))
axis(side = 2,at = seq(0,0.08,0.02),las = 1)

SF_resample <- rep(PE_x, each = length(PE_0)) / rep(PE_0, times = length(PE_x))
hist(SF_resample, breaks = 100,xlim = c(0,10),xlab = "(C_4/S_4) / (C_0/S_0)",
     main = "valid PE-based SF'-values")

The range of the PE-based \(SF'\)-values does not correspond to the uncertainty of the \(SF(20)\)-values:

range(SF_resample,na.rm = TRUE)
#> [1] 0.02511682 8.53525274
as_c_4 <- analyze_survival(RD = data[,-1],C = 20)
as_c_4$uncertainty[4,c(3,5,6)]
#>          SF     lb.SF     ub.SF
#> 4 0.4308404 0.3518497 0.5275645

Even though the survival fraction can be accurately estimated under consideration of cellular cooperation the PE-based approach fails in returning a trustworthy estimate of the fraction of cells losing their potential due to the treatment (see histogram).

In particular, the average of PE-based \(SF'\) calculations does not asymptotically tend to a meaningful value. In case of strong cellular cooperation, the PE-based calculated \(SF'\)-value is heavily affected by this cellular cooperation and the treatment effect of interest is degraded to a side effect.

Conclusion from (1) and (2)

Before calculating PE-based survival fractions, one must check whether there is cellular cooperation or not.

Essentially, to decide, whether you have a cooperativity issue or not, you need to conduct the same analysis and to generate the same data that is necessary to solve this issue anyways.