CINmetrics

Vishal H. Oza

2-Dec-2020

CINmetrics

The goal of CINmetrics package is to provide different methods of calculating Chromosomal Instability (CIN) metrics from the literature that can be applied to any cancer data set including The Cancer Genome Atlas.

library(CINmetrics)

The dataset provided with CINmetrics package is masked Copy Number variation data for Breast Cancer for 10 unique samples selected randomly from TCGA.

dim(maskCNV_BRCA)
#> [1] 1650    7

Alternatively, you can download the entire dataset from TCGA using TCGAbiolinks package

## Not run:
#library(TCGAbiolinks)
#query.maskCNV.hg39.BRCA <- GDCquery(project = "TCGA-BRCA",
#              data.category = "Copy Number Variation",
#              data.type = "Masked Copy Number Segment", legacy=FALSE)
#GDCdownload(query = query.maskCNV.hg39.BRCA)
#maskCNV.BRCA <- GDCprepare(query = query.maskCNV.hg39.BRCA, summarizedExperiment = FALSE)
#maskCNV.BRCA <- data.frame(maskCNV.BRCA, stringsAsFactors = FALSE)
#tai.test <- tai(cnvData = maskCNV.BRCA)
## End(Not run)

Total Aberration Index

tai calculates the Total Aberration Index (TAI; Baumbusch LO, et. al.), “a measure of the abundance of genomic size of copy number changes in a tumour”. It is defined as a weighted sum of the segment means (\(|\bar{y}_{S_i}|\)).

Biologically, it can also be interpreted as the absolute deviation from the normal copy number state averaged over all genomic locations.

\[ Total\ Aberration\ Index = \frac {\sum^{R}_{i = 1} {d_i} \cdot |{\bar{y}_{S_i}}|} {\sum^{R}_{i = 1} {d_i}}\ \ where |\bar{y}_{S_i}| \ge |\log_2 1.7| \]

tai.test <- tai(cnvData = maskCNV_BRCA)
head(tai.test)
#>                      sample_id       tai
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 0.4574789
#> 2 TCGA-E2-A153-11A-31D-A12A-01 1.4916264
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 0.9886191
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 0.4944296
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 0.3531782
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 0.3706400

Modified Total Aberration Index

taiModified calculates a modified Total Aberration Index using all sample values instead of those in aberrant copy number state, thus does not remove the directionality from the score.

\[ Modified\ Total\ Aberration\ Index = \frac {\sum^{R}_{i = 1} {d_i} \cdot {\bar{y}_{S_i}}} {\sum^{R}_{i = 1} {d_i}} \]

modified.tai.test <- taiModified(cnvData = maskCNV_BRCA)
head(modified.tai.test)
#>                      sample_id modified_tai
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01  0.014579640
#> 2 TCGA-E2-A153-11A-31D-A12A-01  0.012139011
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01  0.015385256
#> 4 TCGA-A2-A0YD-01A-11D-A107-01  0.006692841
#> 5 TCGA-BH-A0BR-01A-21D-A111-01  0.004983911
#> 6 TCGA-D8-A27T-01A-11D-A16C-01  0.014940306

Copy Number Aberration

cna calculates the total number of copy number aberrations (CNA; Davidson JM, et. al.), defined as a segment with copy number outside the pre-defined range of 1.7-2.3 (\((\log_2 1.7 -1) \le \bar{y}_{S_i} \le (\log_2 2.3 -1)\)) that is not contiguous with an adjacent independent CNA of identical copy number. For our purposes, we have adapted the range to be \(|\bar{y}_{S_i}| \ge |\log_2 1.7|\), which is only slightly larger than the original.

This metric is very similar to the number of break points, but it comes with the caveat that adjacent segments need to have a difference in segmentation mean values.

\[ Total\ Copy\ Number\ Aberration = \sum^{R}_{i = 1} n_i \ \ where\ \ \begin{align} |\bar{y}_{S_i}| \ge |\log_2{1.7}|, \\ |\bar{y}_{S_{i-1}} - \bar{y}_{S_i}| \ge 0.2, \\ d_i \ge 10 \end{align} \]

cna.test <- cna(cnvData = maskCNV_BRCA)
head(cna.test)
#>                      sample_id cna
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01  33
#> 2 TCGA-E2-A153-11A-31D-A12A-01  14
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01   7
#> 4 TCGA-A2-A0YD-01A-11D-A107-01  14
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 212
#> 6 TCGA-D8-A27T-01A-11D-A16C-01  31

Counting Altered Base segments

countingBaseSegments calculates the number of altered bases defined as the sums of the lengths of segments (\(d_i\)) with an absolute segment mean (\(|\bar{y}_{S_i}|\)) of greater than 0.2.

Biologically, this value can be thought to quantify numerical chromosomal instability. This is also a simpler representation of how much of the genome has been altered, and it does not run into the issue of sequencing coverage affecting the fraction of the genome altered.

\[ Number\ of\ Altered\ Bases = \sum^{R}_{i = 1} d_i\ where\ |\bar{y}_{S_i}| \ge 0.2 \]

base.seg.test <- countingBaseSegments(cnvData = maskCNV_BRCA)
head(base.seg.test)
#>                      sample_id base_segments
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01      55853059
#> 2 TCGA-E2-A153-11A-31D-A12A-01        131157
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01         80000
#> 4 TCGA-A2-A0YD-01A-11D-A107-01     271941966
#> 5 TCGA-BH-A0BR-01A-21D-A111-01    1314597331
#> 6 TCGA-D8-A27T-01A-11D-A16C-01     536984944

Counting Number of Break Points

countingBreakPoints calculates the number of break points defined as the number of segments (\(n_i\)) with an absolute segment mean greater than 0.2. This is then doubled to account for the 5’ and 3’ break points.

Biologically, this value can be thought to quantify structural chromosomal instability.

\[ Number\ of \ Break\ Points = \sum^{R}_{i = 1} (n_i \cdot 2)\ where\ |\bar{y}_{S_i}| \ge 0.2 \]

break.points.test <- countingBreakPoints(cnvData = maskCNV_BRCA)
head(break.points.test)
#>                      sample_id break_points
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01          104
#> 2 TCGA-E2-A153-11A-31D-A12A-01           40
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01           22
#> 4 TCGA-A2-A0YD-01A-11D-A107-01           40
#> 5 TCGA-BH-A0BR-01A-21D-A111-01          626
#> 6 TCGA-D8-A27T-01A-11D-A16C-01          102

Fraction of Genome Altered

fga calculates the fraction of the genome altered (FGA; Chin SF, et. al.), measured by taking the sum of the number of bases altered and dividing it by the genome length covered (\(G\)). Genome length covered was calculated by summing the lengths of each probe on the Affeymetrix 6.0 array. This calculation excludes sex chromosomes.

\[ Fraction\ Genome\ Altered = \frac {\sum^{R}_{i = 1} d_i} {G} \ \ where\ |\bar{y}_{S_i}| \ge 0.2 \]

fraction.genome.test <- fga(cnvData = maskCNV_BRCA)
head(fraction.genome.test)
#>                      sample_id          fga
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 1.943930e-02
#> 2 TCGA-E2-A153-11A-31D-A12A-01 4.564835e-05
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 2.784349e-05
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 9.464765e-02
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 4.126128e-01
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 1.868942e-01

CINmetrics

CINmetrics calculates tai, cna, number of altered base segments, number of break points, and fraction of genome altered and returns them as a single data frame.

cinmetrics.test <- CINmetrics(cnvData = maskCNV_BRCA)
head(cinmetrics.test)
#>                      sample_id       tai cna base_segments break_points
#> 1 TCGA-A2-A0YD-01A-11D-A107-01 0.4944296  14     271941966           40
#> 2 TCGA-A8-A086-01A-11D-A011-01 0.6721224  70     805881366          214
#> 3 TCGA-AO-A0J5-10A-01D-A037-01 0.8889885  12         41816           34
#> 4 TCGA-AR-A0TV-01A-21D-A087-01 0.5861162 187    1099228749          624
#> 5 TCGA-B6-A0RP-01A-21D-A087-01 0.3184316  41    1291153635          142
#> 6 TCGA-BH-A0BR-01A-21D-A111-01 0.3531782 212    1314597331          626
#>            fga
#> 1 9.464765e-02
#> 2 2.804818e-01
#> 3 1.455379e-05
#> 4 3.825795e-01
#> 5 4.493777e-01
#> 6 4.126128e-01