[BioC] MEDIPS error extending reads
Stephen Turner
vustephen at gmail.com
Mon Apr 15 20:49:30 CEST 2013
Lukas,
Thanks for the help and the cautionary note about this design.
One more question for you: if I want to test these gene regions in
windows (kind of combining the functionality of MEDIPS.meth(...,
window_size=500) with a MEDIPSroiSet class, will that window_size
argument do what I would expect - calculate DMR in 500bp windows only
across the regions of interest?
If not, would a reasonable strategy be to filter out alignments that
are not in my regions of interest, then running MEDIPS.meth() with a
minRowSum and window_size argument?
Thanks,
Stephen
On Thu, Apr 11, 2013 at 2:35 PM, Lukas Chavez
<lukas.chavez.mailings at googlemail.com> wrote:
> Hi Stephen,
>
> I am glad to hear that genome wide analysis of your data works fine.
>
> I assume that you will be able to calculate differential coverage for you
> regions of interest (using MEDIPS.createROIset instead of MEDIPS.createSet)
> by specifying the parameter MeDIP=FALSE in the MEDIPS.meth() function.
>
> The NA's in the seqlengths should not be the problem. The error occurs when
> the calibration curve is calculated. I assume that the amount and
> characteristic of your selected regions of interest is not suitable for
> modeling the dependency of CpG density and MeDIP-seq signals as it applies
> for genome wide small windows. CpG density normalization can be avoided as
> mentioned above and, thus, no rms values will be calculated. However,
> differential coverage will still be calculated. Please let me know, if your
> error message will remain anyway.
>
> Nevertheless, let me emphasize again that I have not investigated a
> necessary minimal and representative number of genomic regions of interest
> so that the applied models- CpG density normalization and differential
> coverage- remain reasonable. This becomes a more general question that goes
> beyond the support I can provide here.
>
> All the best,
> Lukas
>
>
>
>
> On Thu, Apr 11, 2013 at 8:08 AM, Stephen Turner <vustephen at gmail.com> wrote:
>>
>> Lukas,
>>
>> Thanks for the quick response. I went back and mapped my IDs using
>> mm9, and MEDIPS.selectROIs() worked without a problem.
>>
>> However, MEDIPS.createROIset() is what I was looking for - to conduct
>> differential methylation on gene regions of interest. I'm able to
>> create my MEDIPSroiSets, but when I run:
>>
>> dmr_rois <- MEDIPS.meth(MSet1=ctlset.roi, MSet2=bpaset.roi,
>> CSet=CSet.roi, p.adj="fdr", diff.method="edgeR", minRowSum=10)
>>
>> I get the following error:
>>
>> Preprocessing MEDIPS SET 1 in MSet1...
>> Calculating calibration curve...
>> Error in if (is.null(max_signal_index) & mean_signal[i] < mean_signal[i -
>> :
>> missing value where TRUE/FALSE needed
>>
>> Not sure what to do here. Looking at my CSet.roi created from the
>> control MEDIPSroiSet above looks like this:
>>
>> > CSet.roi
>> S4 Object of class COUPLINGset
>> =======================================
>> Sequence pattern: CG
>> Genome: BSgenome.Mmusculus.UCSC.mm9
>> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
>> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY
>> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259
>> 149517037 152524553 131738871 124076172 129993255 121843856 121257530
>> 120284312 125194864 103494974 98319150 95272651 90772031 61342430
>> 16299 166650296 15902555
>> Number of sequence pattern in genome: 21342779
>> Window size: -1
>>
>> My control MEDIPSroiSet (ctlset.roi) looks like this. I'm wondering if
>> it's the NA's in the seqlengths that are causing the problem:
>>
>> > ctlset.roi
>> [[1]]
>> S4 Object of class MEDIPSset
>> =======================================
>> Regions file: nounmapped.m54_ctl_fem_medip_r1.bam
>> File path:
>> /.automount/nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/connelly/bpa/newmedips
>> Genome: BSgenome.Mmusculus.UCSC.mm9
>> Number of regions: 18106486
>> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
>> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY
>> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259
>> 149517037 152524553 131738871 124076172 129993255 121843856 121257530
>> 120284312 125194864 103494974 98319150 95272651 90772031 61342430
>> 16299 166650296 15902555
>> Number of bins per ROI: 1
>> Reads extended to: 0
>> Reads shifted by: 0
>> Parameter uniq: TRUE
>> ROIs:
>> GRanges with 2508 ranges and 0 metadata columns:
>> seqnames ranges strand
>> <Rle> <IRanges> <Rle>
>> Mrpl15 chr1 [ 4763287, 4775820] *
>> Sntg1 chr1 [ 8351556, 9289959] *
>> A830018L16Rik chr1 [11404186, 11965982] *
>> Sulf1 chr1 [12708626, 12850452] *
>> Prdm14 chr1 [13103538, 13117244] *
>> ... ... ... ...
>> Il13ra2 chrX [143818019, 143863735] *
>> Sh3kbp1 chrX [156065204, 156416001] *
>> Txlng chrX [159216851, 159267386] *
>> Pir chrX [160707303, 160810943] *
>> Frmpd4 chrX [163909241, 165015163] *
>> ---
>> seqlengths:
>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9
>> chrX
>> NA NA NA NA NA NA ... NA NA NA NA NA
>> NA
>>
>> [[2]]
>> S4 Object of class MEDIPSset
>> =======================================
>> Regions file: nounmapped.m58_ctl_mal_medip_r1.bam
>> File path:
>> /.automount/nas8-s.itc.virginia.edu/export/vol46/uvabx/projects/connelly/bpa/newmedips
>> Genome: BSgenome.Mmusculus.UCSC.mm9
>> Number of regions: 16441823
>> Chromosomes: chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11
>> chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrM chrX chrY
>> Chromosome lengths: 197195432 181748087 159599783 155630120 152537259
>> 149517037 152524553 131738871 124076172 129993255 121843856 121257530
>> 120284312 125194864 103494974 98319150 95272651 90772031 61342430
>> 16299 166650296 15902555
>> Number of bins per ROI: 1
>> Reads extended to: 0
>> Reads shifted by: 0
>> Parameter uniq: TRUE
>> ROIs:
>> GRanges with 2508 ranges and 0 metadata columns:
>> seqnames ranges strand
>> <Rle> <IRanges> <Rle>
>> Mrpl15 chr1 [ 4763287, 4775820] *
>> Sntg1 chr1 [ 8351556, 9289959] *
>> A830018L16Rik chr1 [11404186, 11965982] *
>> Sulf1 chr1 [12708626, 12850452] *
>> Prdm14 chr1 [13103538, 13117244] *
>> ... ... ... ...
>> Il13ra2 chrX [143818019, 143863735] *
>> Sh3kbp1 chrX [156065204, 156416001] *
>> Txlng chrX [159216851, 159267386] *
>> Pir chrX [160707303, 160810943] *
>> Frmpd4 chrX [163909241, 165015163] *
>> ---
>> seqlengths:
>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9
>> chrX
>> NA NA NA NA NA NA ... NA NA NA NA NA
>> NA
>>
>> Thanks for your continuing help!
>>
>> Stephen
>>
>> > sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.19 MEDIPS_1.10.0
>> [3] DNAcopy_1.34.0 BSgenome_1.28.0
>> [5] Biostrings_2.28.0 GenomicRanges_1.12.1
>> [7] IRanges_1.18.0 BiocGenerics_0.6.0
>> [9] BiocInstaller_1.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.95-4.1 Rsamtools_1.12.0 XML_3.96-1.1 biomaRt_2.16.0
>> [5] bitops_1.0-5 gtools_2.7.1 stats4_3.0.0 tools_3.0.0
>> [9] zlibbioc_1.6.0
>>
>> On Wed, Apr 10, 2013 at 11:03 PM, Lukas Chavez
>> <lukas.chavez.mailings at googlemail.com> wrote:
>> >
>> > Hi Stephen,
>> >
>> > we have slightly modified the MEDIPS.selectROIs() function and I forgot
>> > to
>> > update the manual accordingly, please excuse any inconveniences.
>> >
>> > It is not possible to simply specify columns=c("counts") any more but
>> > concrete column names must be given. In case you still want to summarize
>> > all
>> > 'count' columns of a result table 'resultTable' you have to specify
>> > something like:
>> >
>> > columns = colnames(resultTable)[grep("counts", colnames(resultTable))]
>> >
>> > Another example is shown via ?MEDIPS.selectROIs:
>> > columns=names(resultTable)[grep("counts|rpkm|logFC",names(resultTable))]
>> >
>> > Please let me know, if this works for you.
>> >
>> > Two more things:
>> > 1) In addition to the MEDIPS.createSet() function, there is a
>> > MEDIPS.createROIset() function not described in the manual (see also
>> > ?MEDIPS.createROIset). In case you want to calculate coverage, rms,
>> > and/or
>> > differential coverage for selected regions only, you might want to try
>> > this
>> > function. Here, the coverage is only calculated at the given regions
>> > instead
>> > of genome wide windows. Please note, normalization, CpG calibration etc.
>> > are
>> > calculated based on the given regions only and I have not yet
>> > investigated a
>> > minimal number of genomic regions so that the applied models remain
>> > reasonable.
>> >
>> > 2) Your annotation file contains genomic regions for chrs 1 to 19 plus
>> > the X
>> > chromosome. Therefore, I assume that this refers to the mouse genome. I
>> > have
>> > processed some sequencing data (bam files) from mouse mapped against mm9
>> > and
>> > specified BSgenome.Mmusculus.UCSC.mm9 in MEDIPS. When specifying the
>> > counts
>> > parameter of the MEDIPS.selectROIs() as described above (summarize = T),
>> > I
>> > receive only 2602 entries although there are 2611 genomic regions in
>> > your
>> > annotation file (as reported in your error message). See below the
>> > annotation IDs that are not returned by the MEDIPS.selectROIs()
>> > function. I
>> > have checked the first three of them and the genomic coordinates are
>> > outside
>> > of the length of the chromosomes. Your annotations might fit to mm10. I
>> > recommend to check, if your annotations and your bam files refer to the
>> > same
>> > genome version.
>> >
>> > All the best,
>> > Lukas
>> >
>> > ###
>> > chr10 130111841 130112788 Olfr823 --> chr10 length: 129993255 (mm9)
>> > chr15 103503298 103530052 Pde1b --> chr15 length: 103494974 (mm9)
>> > chr2 181763332 181827797 Myt1 --> chr2 length: 181748087 (mm9)
>> > Polr3k
>> > Vwa1
>> > Agrn
>> > Plekhn1
>> > Frmpd4
>> > CR536618.1
>> >
>> >
>> > On Tue, Apr 9, 2013 at 7:33 AM, Stephen Turner <vustephen at gmail.com>
>> > wrote:
>> >>
>> >> Thanks Lukas. That solved that problem. I now have another problem.
>> >> I'm trying to look at regions of interest as before. After I've called
>> >> MEDIPS.meth(), I then try to run:
>> >>
>> >> dmr_rois <- MEDIPS.selectROIs(results=dmr_all, rois=roi_file,
>> >> columns=c("counts"), summarize=TRUE)
>> >>
>> >> where roi_file is created using read.csv() on this file of ROIs:
>> >> https://gist.github.com/stephenturner/5346095
>> >>
>> >> I'm now getting the error:
>> >>
>> >> Error in data.frame(..., check.names = FALSE) :
>> >> arguments imply differing number of rows: 2602, 0
>> >>
>> >> I'm guessing this has something to do with my ROI file, but I'm having
>> >> trouble figuring out what to do. I have 2611 ROIs, not 2602. I've
>> >> double checked that I have no duplicates - these both return 2611:
>> >>
>> >> length(unique(roi_file$ID))
>> >> length(unique(with(roi_file, paste(chr, start, stop))))
>> >>
>> >> There may be some overlapping regions, but I'm unsure how to best
>> >> handle this. Could this be causing the error?
>> >>
>> >> Thanks,
>> >>
>> >> Stephen
>> >>
>> >> On Mon, Apr 8, 2013 at 3:58 PM, Lukas Chavez
>> >> <lukas.chavez.mailings at googlemail.com> wrote:
>> >> >
>> >> > Hi Stephen,
>> >> >
>> >> > I was able to reproduce the error. It appears that you have unmapped
>> >> > reads
>> >> > in your bam file which we always filter out during conversion of sam
>> >> > to
>> >> > bam
>> >> > files using samtools view -S -F 4 -b -o out.bam in.sam (single end
>> >> > sequening
>> >> > data). In case you filter your bam files accordingly, the error
>> >> > message
>> >> > should disappear. However, I will add an additional flag
>> >> > scanBamFlag(isUnmappedQuery = F) to the according scanBam() function
>> >> > in
>> >> > dev.
>> >> > Therefore, unmapped reads in a given bam file should be ignored in
>> >> > the
>> >> > future (MEDIPS version >=1.11.1).
>> >> >
>> >> > Lukas
>> >
>> >
>
>
More information about the Bioconductor
mailing list