[BioC] MEDIPS, GRanges, IRanges error processing BAM file
Jonathan Ellis
Jonathan.Ellis at qimr.edu.au
Tue Aug 20 08:50:53 CEST 2013
Hi Lukas,
Thanks for your reply.
> Do they contain non-mapped reads?
I hadn't realised, but yes they do. Each of the problematic BAMs
contains a small number of unmapped reads (between 1 and 3).
> MEDIPS 1.10.0 reports an error, if there are none mapped reads in the
> bam files,
Running MEDIPS 1.10.0, I do not see any errors or warnings about
unmapped reads only the error I previously posted.
I tried the code you suggested and ended up with a data frame that did
not contain any NA values; however, if I try the code without the
scanFlag parameter I get a data frame that does contain NA values. I've
created a set of BAMs with unmapped reads filtered out, and I've run
MEDIPS again. This time MEDIPS successfully runs on the data set.
Thanks again, your reply has saved me a great deal of time,
Jonathan
On Fri, Aug 16, 2013 at 12:26:06PM -0700, Lukas Chavez wrote:
> Hi Jonathan,
>
> I have not encountered this problem previously and do not have an
> immediate solution.
>
> You state that 16 out of 20 bam files can be processed without a
> problem by applying
> mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam", BSgenome =
> "Hsapiens", sample_name = "0139202").
> This surprises me a little bit, because you have to state the whole
> BSgenome name, for example BSgenome ="BSgenome.Hsapiens.UCSC.hg19".
> [It was pointed out few days ago that I can make the package much more
> flexible by allowing other data types- Hervé, thank you for the hint,
> we will definitely consider this]. However, this does not seem to be
> your problem as you encounter problems for only 4 out of 20 bam files.
>
> Therefore, I assume that there is something strange with these
> specific bam files. Do they contain non-mapped reads? MEDIPS 1.10.0
> reports an error, if there are none mapped reads in the bam files, but
> the latest version of MEDIPS available in the development branch can
> deal with this. However, based on the example read you've sent this
> might not cause the error. The last regular status report you get is
> 'Creating GRange Object...'. This happens in the getGRange() function
> called by MEDIPS.createSet() after the bam files has been imported.
> The command that probably causes the error is
>
> regions_GRange = GRanges(seqnames=regions$chr,
> ranges=IRanges(start=regions$start, end=regions$stop),
> strand=regions$strand)
>
> but I do not understand why you have missing values. In order to
> understand this, you might want to try to import your bam file by
> yourself using
>
> library(Rsamtools)
> scanFlag = scanBamFlag(isUnmappedQuery = F) #this will ignore unmapped
> reads in the bam file and is implemented in the latest dev version
> (1.11.16).
> scanParam=ScanBamParam(flag=scanFlag, what = c("rname", "pos", "strand",
> "qwidth"))
> regions = scanBam(file="0139202.fq.sam.noDUP.bam.qf.bam", param=scanParam)
> regions = do.call(rbind,lapply(regions, as.data.frame, stringsAsFactors=F))
> regions = data.frame(chr=as.character(as.vector(regions$rname)),
> start=as.numeric(as.vector(regions$pos)),
> stop=as.numeric(as.vector(regions$pos)+as.vector(regions$qwidth)-1),
> strand=as.character(as.vector(regions$strand)), stringsAsFactors=F)
>
> Afterwards, it will be necessary to investigate where your regions
> object has NA values.
>
> Another thing: do you have bam index files in the same directory where
> you store the bam file? MEDIPS will make use of these, if they are
> available (using slightly different code as stated above). In case
> there is a discrepancy between the index and bam files (having the
> same prefix file name), this might cause a problem as well.
>
> Finally, you can convert your bam files into bed files (by yourself)
> and try uploading simple txt based bed files into MEDIPS.
>
> I hope we will identify the root of your error.
> Lukas
>
> On Thu, Aug 15, 2013 at 9:28 PM, Jonathan Ellis
> <Jonathan.Ellis at qimr.edu.au>wrote:
>
> > Dear Lukas and list,
> >
> > I'm trying to process a set of BAM files using the latest version of
> > MEDIPS (1.10.0), but have run into problems creating MEDIPSset
> > objects for some BAM file. The following is an example, but I'm
> > getting the same error for 4 out of 20 BAM files.
> >
> > > mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam",
> > + BSgenome = "Hsapiens", sample_name = "0139202")
> > Reading bam alignment 0139202.fq.sam.noDUP.bam.qf.bam
> > Total number of imported short reads: 13813686
> > Creating GRange Object...
> > Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges")
> > :
> > solving row 11345799: range cannot be determined from the supplied
> > arguments
> > (too many NAs)
> >
> > The data are single-end reads that were aligned with bwa version
> > 0.5.9-r16 (the alignments were done some time ago hence the older
> > version of bwa), and the corresponding line from the BAM (11,345,799)
> > is:
> >
> > 19441177 16 chr17 38161416 18 49M * 0
> > 0 TAGAGTCCGGCGTTCAGGGGCAGGAAGCATCCAGCACGGGAGAAAGATG
> > BBBBBBBBBBBBBBBBBd_IIX[ffgb[[YJb^d^[[ggee^^cc^^^_ X0:i:1 X1:i:3
> > XA:Z:chr1,+56337006,49M,3;chr12,-55896993,49M,3;chr4,-22778322,49M,3;
> > MD:Z:8A7A32 XG:i:0 NM:i:2 XM:i:2 XO:i:0XT:A:U
> >
> > I'm unsure whether this is a problem with the MEDIPS package or
> > something from GRanges/IRanges. As far as I understand the .Call2
> > function is part of IRanges, but I assume it's failing due to
> > something passed by MEDIPS. Any advice or pointers would be greatly
> > appreciated.
> >
> > > sessionInfo()
> > R version 3.0.1 (2013-05-16)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> > [7] LC_PAPER=C LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel stats graphics grDevices utils datasets methods
> > [8] base
> >
> > other attached packages:
> > [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 MEDIPS_1.10.0
> > [3] DNAcopy_1.34.0 BSgenome_1.28.0
> > [5] FDb.InfiniumMethylation.hg19_1.0.1 rtracklayer_1.20.4
> > [7] Biostrings_2.28.0 GenomicFeatures_1.12.3
> > [9] AnnotationDbi_1.22.6 Biobase_2.20.1
> > [11] GenomicRanges_1.12.4 IRanges_1.18.2
> > [13] BiocGenerics_0.6.0 BiocInstaller_1.10.3
> >
> > loaded via a namespace (and not attached):
> > [1] biomaRt_2.16.0 bitops_1.0-5 DBI_0.2-7 gtools_3.0.0
> > [5] RCurl_1.95-4.1 Rsamtools_1.12.3 RSQLite_0.11.4 stats4_3.0.1
> > [9] tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
More information about the Bioconductor
mailing list