expect_equal
for comparing
numerical valuesTransformCatalog
in case R was
configured and built in a way that did not support long double.Updated documentation of ReadCatalog
and
ReadCatalogInternal
as there are no ID96 catalogs in COSMIC
v3.2.
Changed the URL of COSMIC mutational signatures page to the redirected URL.
Updated some tests for TransformCatalog
in case R
was configured and built in a way that did not support long
double.
Added the argument strict
back to
ReadCatalog
for backward compatibility; strict
is now ignored and deprecated.
Robustified function StandardChromNameNew
to select
the column which contains chromosome names by name instead of column
index.
Fixed a bug in function
CheckSeqContextInVCF
.
Fixed a bug in function PlotCatalog.SBS96Catalog
when plotting the X axis after setting
par(tck) = 0
.
Changed PlotCatalog
to round the mutation counts for
each main type for SBS96, SBS192, DBS78 and ID counts catalog in case
the input is reconstructed counts catalog.
Updated function AdjustNumberOfCores
not to throw a
message on MS Windows machine.
Added an additional argument ylabels
to
PlotCatalog
and PlotCatalogToPdf
. When
ylabels = FALSE
, don’t plot the y axis labels. Implemented
for SBS96Catalog, DBS78Catalog, IndelCatalog.
Enabled argument grid
, uppder
,
xlabels
in PlotCatalog
and
PlotCatalogToPdf
for DBS78Catalog, IndelCatalog.
ReadCatalog
to import files with:
ReadCatalog
function,
e.g. ReadCatalog.SBS96Catalog
. Now they are in
data-raw/obsolete-files/ReadCatalogMethods.R
.ConvCatalogToICAMS
to convert
SigProfiler/COSMIC-formatted catalog files into ICAMS catalog objects.
Now these functions are in
data-raw/obsolete-files/ConvCatalogToICAMS.R
, and their
functionalities are integrated into ReadCatalog
.ReadCatalog
to remove rows which
have NA in the data table read in. Otherwise the number of rows will not
be accurate to infer the correct catalog type.InferClassOfCatalogForRead
to
data-raw/obsolete-files/InferClassOfCatalogForRead.R
.CreateOneColDBSMatrix
when
returning 1-column DBS144 matrix with all values being 0 and the correct
row labels.Added an additional argument tmpdir
in function
AddRunInformation
.
Updated function CheckAndRemoveDiscardedVariants
and
MakeDataFrameFromVCF
to check for variants that have same
REF and ALT.
Create new temp directory when generating zip archive from VCFs to avoid zipping unnecessary files in the output.
Fixed a bug in function AddRunInformation
for
allowing ref.genome
to be a Bioconductor package.
Fixed bugs in function CreateOneColSBSMatrix
,
CreateOneColDBSMatrix
and CreateOneColIDMatrix
when the variants in the input vcfs should all be discarded.
Updated function CheckAndFixChrNames
to give a
warning instead of an error when “23” and “X” or “24” and “Y” appear on
the chromosome names on the VCF at the same time.
CheckAndFixChrNames
will change “23” to “X” or “24” to “Y”
internally for downstream processing.
Changed some code in function AddTranscript
,
CreateOneColSBSMatrix
, CreateOneColDBSMatrix
to use functions from package dplyr
instead of
data.table
due to segfault error.
RemoveRowsWithDuplicatedCHROMAndPOSNew
to remove variants that have same CHROM, POS, REF.files
in function
VCFsToZipFile
.Fixed a bug in ReadAndSplitVCFs
for merging adjacent
SBSs into DBS when variant.caller
is
mutect
.
Fixed a bug inCheckAndRemoveDiscardedVariants
for
removing wrong DBS variants.
CheckAndRemoveDiscardedVariants
to
remove wrong DBS variants that have same base in the same position in
REF and ALT (e.g. TA > TT or GT > CT).name.of.VCF
in function
MakeDataFrameFromVCF
for better error reporting.Updated function MakeDataFrameFromVCF
for better
error reporting when reading in files that are actually not
VCFs.
Updated function ReadVCFs
to automatically change
the number of cores to 1 on Windows instead of throwing an
error.
CheckAndFixChrNames
for returning the
correct number of chromosome names.stop.on.error
and code
tryCatch
in function VCFsToCatalogs
for better
tracing if the function stops on error.Added argument stop.on.error
to
VCFsToCatalogs
; if false, return list with single element
named error.
Added new internal function
CheckAndFixChrNamesForTransRanges
. The chromosome names in
exported data TranscriptRanges
don’t have “chr”. ICAMS now
will check for the chromosome names format in input vcf and update the
trans.ranges chromosome names in function AddTranscript
if
needed.
Added new argument name.of.VCF
in function
AnnotateSBSVCF
and AnnotateDBSVCF
for better
error reporting.
Changed return from ReadCatalog
to include possible
attribute “error” and allow for not calling stop() on error.
For a stranded catalog, as.catalog
and
ReadCatalog
will silently convert region = “genome” to
“transcript”.
Updated function AddTranscript
to check whether the
format of VCF chromosome names is consistent with that in
trans.ranges
used.
Removed documentation warnings related to
Some file reorganization.
CreateOneColSBSMatrix
for showing
message that SBS variant whose reference base in ref.genome does not
match the reference base in the VCF file.Enabled functions PlotCatalog
and
PlotCatalogToPdf
to plot a numeric matrix, numeric
data.frame, or a vector denoting the mutation
counts.
Added new internal function AdjustNumberOfCores
to
change the number of cores automatically to 1 if the operating system is
Windows.
Added test processing VCF with unknown variant caller.
Added new internal function SplitSBSVCF
,
SplitOneVCF
, SplitListOfVCFs
and
VCFsToZipFileXtra
, WriteSBS96CatalogAsTsv
,
ReadSBS96CatalogFromTsv
,
GetConsensusVAF
.
Added new exported function
ReadAndSplitVCFs
, VCFsToCatalogs
,
VCFsToCatalogsAndPlotToPdf
and
VCFsToZipFile
.
Added new argument filter.status
and
get.vaf.function
in functions ReadVCF
,
ReadVCFs
, ReadAndSplitVCFs
,
VCFsToCatalogs
, VCFsToCatalogsAndPlotToPdf
and
VCFsToZipFile
.
Added a new internal data
catalog.row.headers.SBS.96.v1
.
Added new argument max.vaf.diff
in internal
functions SplitOneVCF
, SplitListOfVCFs
and
exported functions ReadAndSplitVCFs
,
VCFsToCatalogs
, VCFsToCatalogsAndPlotToPdf
and
VCFsToZipFile
.
Added new dependency package parallel
.
Added new dependency package R.utils
for
data.table::fread
to read gz and bz2 files
directly.
Added new argument num.of.cores
in internal
functions ReadVCFs
, SplitListOfVCFs
and
exported functions ReadAndSplitVCFs
,
VCFsToCatalogsAndPlotToPdf
, VCFsToCatalogs
,
VCFsToZipFile
, VCFsToIDCatalogs
,
VCFsToSBSCatalogs
, VCFsToDBSCatalogs
.
Added new argument ...
in internal functions
ReadVCF
, ReadVCFs
and exported functions
ReadAndSplitVCFs
, VCFsToCatalogsAndPlotToPdf
,
VCFsToCatalogs
, VCFsToZipFile
.
Added new argument mc.cores
in internal functions
GetConsensusVAF
.
MakeDataFrameFromVCF
to use
data.table::fread
instead of read.csv
.MakeDataFrameFromVCF
when reading in VCF
from URL.Updated function CreateOneColSBSMatrix
to throw a
message instead of an error when there are SBS variant whose reference
base in ref.genome does not match the reference base in the VCF
file.
Updated function MakeVCFDBSdf
to inherit column
information from original SBS VCF.
Changed the words in legend for DBS144 plot from “Transcribed”, “Untranscribed” to “Transcribed strand” and “Untranscribed strand”.
Updated the documentation for exported data all.abundance.
Updated function ReadCatalog.COMPOSITECatalog
not to
convert “::” to “..” in the column names.
Updated various functions in ICAMS to generate catalogs with zero mutation counts from empty vcfs.
Added a section “ID classification” in the documentation for
exported data catalog.row.order
.
New argument suppress.discarded.variants.warnings
in
exported function AnnotateIDVCF
with default value
TRUE.
Added another paper information in
AddRunInformation
. “Characterization of
colibactin-associated mutational signature in an Asian oral squamous
cell carcinoma and in other mucosal tumor types”, Genome Research 2020
https://doi.org/10.1101/gr.255620.119.
Changed the format of DOIs in DESCRIPTION according to CRAN policy.
Changed back the return value of ReadStrelkaIDVCFs
,
ReadStrelkaSBSVCFs
, ReadMutectVCFs
to a list
of data frames with no variants discarded.
Combined all the discarded variants from
ReadAndSplitMutectVCFs
and
ReadAndSplitStrelkaSBSVCFs
under one element
discarded.variants
in the return value. An extra column
discarded.reason
were added to show the details.
Updated internal functions ReadVCF
and
ReadVCFs
not to remove any discarded variants.
No more removal of “chr” in the CHROM
column when
reading in VCFs.
CheckAndReturnSBSMatrix
,
CheckAndReturnDBSMatrix
,
CreateOneColSBSMatrix
,CreateOneColDBSMatrix
,
VCFsToSBSCatalogs
, VCFsToDBSCatalogs
.CalculateExpressionLevel
for the edge
case.CreateOneColIDMatrix
when the ID.class
contains non canonical representation of the ID mutation type.The return value of exported function
ReadStrelkaIDVCFs
now sometimes contains a new element,
discarded.variants
. This appears when there are variants
that were discarded immediately after reading in the VCFs. At present
these are variants that have duplicated chromosome/positions and
variants that have illegal chromosome names. This means that the user
must check the return to see if discarded.variants
is
present and remove it before passing the return to a function that
expects a list of VCFs. Code in ICAMS that takes lists of VCFs already
checks for this element and removes it if present.
Added argument return.annotated.vcfs
to
exported function VCFsToIDCatalogs
. The default
value for the argument is FALSE to be consistent with
other functions.
Argument return.annotated.vcfs
in functions
VCFsToSBSCatalogs
,VCFsToDBSCatalogs
,
VCFsToIDCatalogs
, MutectVCFFilesToCatalog
,
MutectVCFFilesToCatalogAndPlotToPdf
,
MutectVCFFilesToZipFile
,
StrelkaSBSVCFFilesToCatalog
,
StrelkaSBSVCFFilesToCatalogAndPlotToPdf
,
StrelkaSBSVCFFilesToZipFile
,
StrelkaIDVCFFilesToCatalog
,
StrelkaIDVCFFilesToCatalogAndPlotToPdf
and
StrelkaIDVCFFilesToZipFile
.
Argument suppress.discarded.variants.warnings
in
functions ReadAndSplitMutectVCFs
,
ReadAndSplitStrelkaSBSVCFs
,
VCFsToSBSCatalogs
,VCFsToDBSCatalogs
,
VCFsToIDCatalogs
, MutectVCFFilesToCatalog
,
MutectVCFFilesToCatalogAndPlotToPdf
,
MutectVCFFilesToZipFile
,
StrelkaSBSVCFFilesToCatalog
,
StrelkaSBSVCFFilesToCatalogAndPlotToPdf
,
StrelkaSBSVCFFilesToZipFile
,
StrelkaIDVCFFilesToCatalog
,
StrelkaIDVCFFilesToCatalogAndPlotToPdf
and
StrelkaIDVCFFilesToZipFile
.
Added documentation to exported functions
ReadAndSplitStrelkaSBSVCFs
,
StrelkaSBSVCFFilesToCatalog
,
StrelkaSBSVCFFilesToCatalogAndPlotToPdf
and
StrelkaSBSVCFFilesToZipFile
.
Added information on the “ID classification” in documentation of
functions generating ID catalogs, FindDelMH
and
FindMaxRepeatDel
.
Minor changes to documentation of functions
PlotCatalog
, PlotCatalogToPdf
,
StrelkaSBSVCFFilesToZipFile
,
StrelkaIDVCFFilesToZipFile
and
MutectVCFFilesToZipFile
.
Updated documentation for the return value of functions
StrelkaIDVCFFilesToCatalog
,
StrelkaIDVCFFilesToCatalogAndPlotToPdf
,
StrelkaIDVCFFilesToZipFile
and
VCFsToIDCatalogs
to make it clearer to the user.
Added new exported data of catalog row order for SBS96, SBS1536
and DBS78 in SigProfiler format to
catalog.row.order.sp
.
New internal function
ConvertICAMSCatalogToSigProSBS96
, ReadVCF
,
ReadVCFs
.
New exported function GetFreebayesVAF
for
calculating variant allele frequencies from Freebayes VCF.
New test data for Strelka mixed VCF.
Added time zone information to file “run-information.txt” when
calling functions MutectVCFFilesToZipFile
,
StrelkaSBSVCFFilesToZipFile
and
StrelkaIDVCFFilesToZipFile
.
Enabled “counts” -> “counts.signature” catalog transformation when the source catalog has NULL abundance.
Added legend for SBS192 plot and changed the legend text for SBS12 plot.
Added a second element plot.object
to the return
list from function PlotCatalog
for catalog types
“SBS192Catalog”, “DBS78Catalog”, “DBS144Catalog” and “IndelCatalog”. The
second element is a numeric vector giving the coordinates of the bar
midpoints, useful for adding to the graph.
Made the returns from PlotCatalog
and
PlotCatalogToPdf
invisible.
Improved time performance of GetMutectVAF
,
CanonicalizeDBS
, CanonicalizeQUAD
.
if
statements in GetCustomKmerCounts
、
GetStrandedKmerCounts
and
GetGenomeKmerCounts
.
CreateOneColIDMatrix
when there is NA ID
category.
GetMutectVAF
to check if the VCF is indeed a Mutect
VCF.
CreateOneColDBSMatrix
when the VCF does not have any
variant in the transcribed region.
CalculatePValues
when there is only a single
expression value.
Created an internal function MakeDataFrameFromVCF
to
read in data lines of a VCF.
New argument name.of.VCF
in internal function
CheckAndFixChrNames
to make the error message more
informative.
New argument name.of.VCF
in exported
function AnnotateIDVCF
to make the error message more
informative.
ReadStrelkaIDVCF
to make the
error message more informative.AnnotateIDVCF
to a list. The
first element annotated.vcf
contains the annotated VCF. If
there are rows that are discarded, the function will generate a warning
and a second element discarded.variants
will be included in
the returned list.flag.mismatches
deprecated in
exported function AnnotateIDVCF
. If there are
mismatches to references, the function will automatically discard these
rows. User can refer to the element discarded.variants
in
the return value for the discarded variants.SplitStrelkaSBSVCF
when there are no non.SBS mutations in the input.MakeDataFrameFromMutectVCF
when a Mutect VCF has no
meta-information lines.CreateOneColSBSMatrix
when an annotated SBS VCF has variants on transcribed regions that
all fall on transcripts on both
strand.CreateOneColDBSMatrix
when an annotated DBS VCF has variants on transcribed regions that
all fall on transcripts on both
strand.ReadAndSplitStrelkaSBSVCFs
.MutectVCFFilesToZipFile
,
StrelkaSBSVCFFilesToZipFile
and
StrelkaIDVCFFilesToZipFile
.trans.ranges
to make it optional.name.of.VCF
in internal
functions ReadStrelkaSBSVCF
, ReadStrelkaIDVCF
and exported function GetStrelkaVAF
.flag.mismatches
in functions
VCFsToIDCatalogs
, MutectVCFFilesToCatalog
,
MutectVCFFilesToCatalogAndPlotToPdf
,
MutectVCFFilesToZipFile
,
StrelkaIDVCFFilesToCatalog
,
StrelkaIDVCFFilesToCatalogAndPlotToPdf
and
StrelkaIDVCFFilesToZipFile
.GetStrelkaVAF
andGetMutectVAF
to a data frame which contains the VAF and
read depth information.PlotCatalogToPdf
a list. The first element is a
logical value indicating whether the plot is successful. The second
element is a list containing the strand bias statistics (only for
SBS192Catalog with “counts” catalog.type and non-NULL abundance and
argument plot.SBS12
= TRUE).PlotCatalog
and
PlotCatalogToPdf
: For class SBS96Catalog:
(New) Allow setting ylim and cex.
(New) For PlotCatalog
(not
PlotCatalogToPdf
), allow plotting of a 96 x 2 catalog, in
which case behavior is a stacked bar chart. (New) Plot
x axis tick marks if xlabels
is not TRUE; set
par(tck = 0)
to suppress. For class IndelCatalog:
(New) Allow setting ylim.GetCustomKmerCounts
.PlotTransBiasGeneExpToPdf
so that ymax
on the plot will be changed based on plot.type
.flat.abundance
from “numeric” to “integer”.TransformCatalog
; see documentation for rationale.TransformCatalog
and updated
its documentation for parameter target.abundance
.CheckAndFixChrNames
and
updated the automated tests.TransformCatalog
.GetMutectVAF
and updated the warning message to make it more informative.cbind
to check the attributes of the incoming
catalogs and assign attributes accordingly.TransformCatalog
to check the
attributes of the catalog to be transformed in the first place.AnnotateSBSVCF
,
AnnotateDBSVCF
and AnnotateIDVCF
.PlotTransBiasGeneExp
and
PlotTransBiasGeneExpToPdf
.names.of.VCFs
in
functions ReadAndSplitMutectVCFs
,
ReadAndSplitStrelkaSBSVCFs
, ReadStrelkaIDVCFs
,
MutectVCFFilesToCatalog
,
MutectVCFFilesToCatalogAndPlotToPdf
,
StrelkaIDVCFFilesToCatalog
,
StrelkaIDVCFFilesToCatalogAndPlotToPdf
,
StrelkaSBSVCFFilesToCatalog
and
StrelkaSBSVCFFilesToCatalogAndPlotToPdf
for users to
specify the names of samples in the VCF files.as.catalog
.gene.expression.data.HepG2
and
gene.expression.data.MCF10A
.tumor.col.names
in functions ReadAndSplitMutectVCFs
,
MutectVCFFilesToCatalog
and
MutectVCFFilesToCatalogAndPlotToPdf
to specify the column
of the VCF that contains sequencing statistics such as sequencing depth;
this column is often called “unknown” in Mutect.MutectVCFFilesToCatalog
,
MutectVCFFilesToCatalogAndPlotToPdf
,
StrelkaSBSVCFFilesToCatalog
,
StrelkaSBSVCFFilesToCatalogAndPlotToPdf
,
VCFsToSBSCatalogs
, VCFsToDBSCatalogs
,
ReadCatalog
informing the user how to change attributes of
the generated catalog.VCFsToIDCatalogs
,
StrelkaIDVCFFilesToCatalog
and
StrelkaIDVCFFilesToCatalogAndPlotToPdf
a list; 1st element
is the spectrum catalog (previously the only return); 2nd element is a
list of VCFs with additional annotations.PlotCatalog
a list. The first element is a logical
value indicating whether the plot is successful. The second element is a
numeric vector giving the coordinates of all the bar midpoints drawn,
useful for adding to the graph (only implemented for SBS96Catalog).output.file
argument in
MutectVCFFilesToCatalogAndPlotToPdf
,
StrelkaSBSVCFFilesToCatalogAndPlotToPdf
, and
StrelkaIDVCFFilesToCatalogAndPlotToPdf
so that an indicator
of the catalog type plus “.pdf” is simply appended to the base
output.file
name. Also made this argument optional with
sensible default behavior.trans.ranges.GRCh37
, trans.ranges.GRCh38
and
trans.ranges.GRCm38
.FindDelMH
, cryptic repeats (i.e. un-normalized
deletions in a repeat such as GAGG deleted from CCCAGGGAGGGTCCC, which
should be normalized to a deletion of AGGG) are now ignored with a
warning rather than causing a stop
.FindDelMH
, which
previously did not flag the cryptic repeat in what is now the second
example in the function documentation.as.catalog
supports creation of the catalog from a
vector (interpreted as a 1-column matrix) and optionally infers the
class from the number of rows in the input.