[Bioc-devel] read_bed()
Bhagwat, Aditya
Ad|ty@@Bh@gw@t @end|ng |rom mp|-bn@mpg@de
Tue Sep 17 12:58:28 CEST 2019
Dear bioc-devel,
I had two feedback requests regarding the function read_bed().
1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational package, e.g. `GenomicRanges`, given the rather basal nature of this functionality?
It reads a bedfile into a GRanges, converts the coordinates from 0-based (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome info (to allow for implicit range validity checking<https://support.bioconductor.org/p/124250>) and plots the karyogram<https://support.bioconductor.org/p/124328>.
Thank you for your feedback.
Cheers,
Aditya
#' Read bedfile into GRanges
#'
#' @param bedfile file path
#' @param bsgenome BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates logical(1)
#' @param plot logical(1)
#' @param verbose logical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#' A good discussion on these two alternative codings is given
#' by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom data.table :=
#' @export
read_bed <- function(
bedfile,
bsgenome,
zero_based = TRUE,
rm_duplicates = TRUE,
plot = TRUE,
verbose = TRUE
){
# Assert
assert_all_are_existing_files(bedfile)
assert_is_a_bool(verbose)
assert_is_a_bool(rm_duplicates)
assert_is_a_bool(zero_based)
# Comply
seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL
# Read
if (verbose) cmessage('\tRead %s', bedfile)
dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
col.names = c('seqnames', 'start', 'end', 'strand'))
data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))
# Transform coordinates: 0-based -> 1-based
if (zero_based){
if (verbose) cmessage('\t\tConvert 0 -> 1-based')
dt[, start := start + 1]
}
if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
nrow(dt), length(unique(dt$seqnames)))
# Drop duplicates
if (rm_duplicates){
is_duplicated <- cduplicated(dt)
if (any(is_duplicated)){
if (verbose) cmessage('\t\t %d after removing duplicates')
dt %<>% extract(!duplicated)
}
}
# Turn into GRanges
gr <- add_seqinfo(as(dt, 'GRanges'), bsgenome)
# Plot and return
title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
if (plot) plot_karyogram(gr, title)
gr
}
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list