Thank you very much for your quick answers and looking in detail into this
as well as updating the package.
The updateObject(OBJECT) function works perfectly ! :-)
Yes the corrected design matrix makes sense. I was really exploring and did
not fully understand how blockfinder worked yet. Will look at the bumphunter
function as well.
Looking forward to the more detailed example :-)
Thanks again,
Florence
2013/10/24 Kasper Daniel Hansen
> Btw, thank you for an extremely useful example/bug report.
>
> Best,
> Kasper
>
>
> On Wed, Oct 23, 2013 at 11:59 PM, Kasper Daniel Hansen <
> kasperdanielhansen@gmail.com> wrote:
>
>> Here is a list of the reported issues
>> 1) The object returned by mapToGenome has is.unsorted return TRUE. This
>> is now fixed, see NEWS below.
>> 2) When bumphunter() returns zero bumps, it throws an error. On the todo
>> list.
>> 3) Is there an issue with the renaming of Relation_to_UCSC_CpG_Island to
>> Relation_to_Island in cpgCollapse and helper function? The answer is yes,
>> it was a bug and it is fixed now.
>>
>> Re. your example, your design matrix for the block finding is wrong. You
>> are testing whether samples in group B has a methylation of 0.5, which is
>> obviously not true. This is why your inferred cutoff is 12033998108438140
>> - you're easily off by a factor 10^15 (!!!!). You need to remove the 0+ in
>> the model matrix line, and just have design = model.matrix(~+factor()). In
>> fact, even better is
>> design = model.matrix(~ gmSet$Sample_Group)
>>
>> Finally, we are working on a more detailed example for the vignette,
>> but it is not done yet.
>>
>> Fixes are in minfi 1.8.3 (1.9.3 devel) which should be available from the
>> build system in around 36h.
>>
>> * The function mapToGenome would return something that looked
>> like an unordered GenomicMethylSet. Actually, loci were
>> correctly ordered within chromosomes, the issue had to do
>> with whether the chromosomes were ordered as chr1, chr2, chr3
>> (used in minfi) or chr1, chr10, chr11 (lexigraphically).
>> Reported by Florence Cavalli .
>>
>> * Bug in cpgCollapse led to incorrect results. Your output is
>> affected if table(granges(output[[1]])$type) is all
>> 'OpenSea'. Reported by Florence Cavalli
>> .
>>
>> Best,
>> Kasper
>>
>>
>>
>>
>>
>> On Tue, Oct 22, 2013 at 8:27 PM, Kasper Daniel Hansen <
>> kasperdanielhansen@gmail.com> wrote:
>>
>>> 1) You can update your objects by running
>>> OBJECT = updateObject(OBJECT)
>>> This fixes everything and there should be no problems. Unless you have
>>> been using the hg18 coordinates from the old annotation package, in which
>>> case you're out of luck.
>>>
>>> The reason for this is that I realized that a substantial amount of the
>>> annotation material provided by illumina of course depends on the genome
>>> build, for example relation to CpG island. And Illumina only provides this
>>> for hg19, not hg18.
>>>
>>> 2) A lot of the other errors you have seems to be real errors probably
>>> caused by some late re-organization of the code. Thanks for providing an
>>> extremely nice example where they fail, which should make it very easy to
>>> track down. I suspect some late code cleaning errors to be behind this. I
>>> will investigate and report back very soon.
>>>
>>> 3) If you want to find DMRs, which we define as smaller regions like
>>> 100bp-5kb or so, you want to use
>>> bumphunter()
>>> The function blockFinder is for finding "blocks" which are large scale
>>> DMRs, on the megabase scale, like in our 'recent' Nat Genet paper. Not
>>> sure if that is what you want?
>>>
>>> Best,
>>> Kasper
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:04 PM, Florence Cavalli wrote:
>>>
>>>> Hello,
>>>>
>>>> I am working with 450k methylation data.
>>>> - Is there a way to modify the annotation linked to an MethylSet object,
>>>> (or is this something we are not supposed to do)?
>>>> I created my large MethylSet object under the pervious version of R and
>>>> Bioc and at the time the annotation package
>>>> IlluminaHumanMethylation450kannotation.ilmn.v1.2 was used. However this
>>>> package is not available in BioC 2.13
>>>> but IlluminaHumanMethylation450kanno.ilmn12.hg19 replaced it (as far as
>>>> I
>>>> understood).
>>>>
>>>> I would like to be able to do this since I'd like to use the
>>>> new cpgCollapse() and blockFinder() functions from minfi that are under
>>>> BioC 2.13. Indeed cpgCollapse() calls getAnnotation() and since I cannot
>>>> load the annotation package in R-3.0.2/BioC 2.13 an error is returned
>>>> (and
>>>> the new functions are not available under the previous version of BioC).
>>>>
>>>> Is there a way around this problem ? I tried the annotation() function
>>>> but
>>>> without success. Or do I have to reconstruct my MethylSet object from
>>>> the
>>>> very beginning reading the raw data using the library
>>>> IlluminaHumanMethylation450kanno.ilmn12.hg19, normalysing (and
>>>> re-running
>>>> my clustering to be sure to have consistent results) ? It is just that I
>>>> would save quite some time if I could actually convert my object.
>>>>
>>>> - Also I am planning to use the the following workflow to run
>>>> blockFinder()
>>>> and find differentially methylated regions.
>>>>
>>>> To be able to ultimately run blockFinder(), using your the minfi test
>>>> data
>>>> I created a GenomicMethylSet with mapToGenome() function from
>>>> the MethylSet object then I ran cpgCollapse() to obtain an
>>>> GenomicRatioSet object and finally used this object and a design
>>>> matrix to
>>>> run blockFinder().
>>>> See example:
>>>>
>>>> As in the vignette:
>>>> library(IlluminaHumanMethylation450kmanifest)
>>>> library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
>>>> library(minfi)
>>>> require(minfiData)
>>>>
>>>> baseDir <- system.file("extdata", package = "minfiData")
>>>> targets <- read.450k.sheet(baseDir)
>>>> targets
>>>>
>>>> sub(baseDir, "", targets$Basename)
>>>> RGset <- read.450k.exp(base = baseDir, targets = targets)
>>>> RGset
>>>> pd <- pData(RGset)
>>>> pd[,1:4]
>>>>
>>>> Then
>>>>
>>>> MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize =
>>>> "controls")
>>>> Mset.swan <- preprocessSWAN(RGset, MSet.norm)
>>>>
>>>> mset <- Mset.swan
>>>>
>>>>
>>>> > mset
>>>> MethylSet (storageMode: lockedEnvironment)
>>>> assayData: 485512 features, 6 samples
>>>> element names: Meth, Unmeth
>>>> phenoData
>>>> sampleNames: 5723646052_R02C02 5723646052_R04C01 ...
>>>> 5723646053_R06C02 (6 total)
>>>> varLabels: Sample_Name Sample_Well ... filenames (13 total)
>>>> varMetadata: labelDescription
>>>> Annotation
>>>> array: IlluminaHumanMethylation450k
>>>> annotation: ilmn12.hg19
>>>> Preprocessing
>>>> Method: SWAN (based on a MethylSet preprocesses as 'Illumina,
>>>> bg.correct
>>>> = TRUE, normalize = controls, reference = 1'
>>>> minfi version: 1.8.1
>>>> Manifest version: 0.4.0
>>>> >
>>>>
>>>> > class(mset)
>>>> [1] "MethylSet"
>>>> attr(,"package")
>>>> [1] "minfi"
>>>> > gmSet = mapToGenome(mset)
>>>> > class(gmSet)
>>>> [1] "GenomicMethylSet"
>>>> attr(,"package")
>>>> [1] "minfi"
>>>> > gmSet_cpgCollapse=cpgCollapse(gmSet,what="M",returnBlockInfo=FALSE)
>>>> [cpgCollapse] Creating annotation.
>>>> Error in cpgCollapseAnnotation(gr, relationToIsland, islandName, maxGap
>>>> =
>>>> maxGap, :
>>>> object has to be ordered.
>>>>
>>>> Not sure about the following !? But I seems that sorting the object
>>>> works
>>>> > gmSet_s=sort(gmSet)
>>>> > gmSet_cpgCollapse=cpgCollapse(gmSet_s,what="M",returnBlockInfo=FALSE)
>>>> [cpgCollapse] Creating annotation.
>>>> [cpgCollapseAnnotation] Clustering islands and clusters of probes.
>>>> [cpgCollapseAnnotation] Computing new annotation.
>>>> [cpgCollapseAnnotation] Defining blocks.
>>>> [cpgCollapse] Collapsing data......
>>>> > class(gmSet_cpgCollapse)
>>>> [1] "GenomicRatioSet"
>>>> attr(,"package")
>>>> [1] "minfi"
>>>>
>>>> ## Sample groups
>>>> ## > targets[,"Sample_Group"]
>>>> ## [1] "GroupA" "GroupA" "GroupB" "GroupB" "GroupA" "GroupB"
>>>> > design <- model.matrix(~ 0+factor(c(1,1,2,2,1,2)))
>>>> > colnames(design) <- c("groupA", "groupB")
>>>> > head(design)
>>>> groupA groupB
>>>> 1 1 0
>>>> 2 1 0
>>>> 3 0 1
>>>> 4 0 1
>>>> 5 1 0
>>>> 6 0 1
>>>>
>>>> > Block_g2=blockFinder(gmSet_cpgCollapse, design=design, coef = 2, what
>>>> =
>>>> "M",pickCutoff=TRUE,smooth=FALSE)
>>>> [bumphunterEngine] Using a single core (backend: doSEQ, version: 1.4.1).
>>>> [bumphunterEngine] Computing coefficients.
>>>> [bumphunterEngine] Performing 1000 permutations.
>>>> [bumphunterEngine] Computing marginal permutation p-values.
>>>> [bumphunterEngine] cutoff: 12033998108438140
>>>> [bumphunterEngine] Finding regions.
>>>> [bumphunterEngine] No bumps found!
>>>> Error in res$table$indexStart : $ operator is invalid for atomic vectors
>>>>
>>>> I assume that if some bumps are found it would not return an error!?
>>>> Would you advice to smooth the signal or not?
>>>>
>>>> Is that the proper way of running this analysis ? (I wonder since I did
>>>> not
>>>> find any example for such analysis)
>>>>
>>>> - Finally I noticed that in the cpgCollapse function the attribute
>>>> Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name from a data frame
>>>> return by getAnnotation object are called.
>>>> However looking at the result of the getAnnotation() function called
>>>> there
>>>> isn't any Relation_to_UCSC_CpG_Island and UCSC_CpG_Islands_Name columns
>>>> attributes (see below)? Are we supposed to use another annotation
>>>> package?
>>>>
>>>>
>>>> > cpgCollapse
>>>> function (object, what = c("Beta", "M"), maxGap = 500, blockMaxGap =
>>>> 2.5 *
>>>> 10^5, maxClusterWidth = 1500, dataSummary = colMeans, na.rm = FALSE,
>>>> returnBlockInfo = TRUE, verbose = TRUE, ...)
>>>> {
>>>> what <- match.arg(what)
>>>> gr <- granges(object)
>>>> ann <- getAnnotation(object)
>>>> relationToIsland <- ann$Relation_to_UCSC_CpG_Island
>>>> islandName <- ann$UCSC_CpG_Islands_Name
>>>> ....
>>>> ....
>>>> }
>>>>
>>>> class(gmSet)
>>>> gmSet
>>>> > class(gmSet)
>>>> [1] "GenomicMethylSet"
>>>> attr(,"package")
>>>> [1] "minfi"
>>>> > gmSet
>>>> class: GenomicMethylSet
>>>> dim: 485512 6
>>>> exptData(0):
>>>> assays(2): Meth Unmeth
>>>> rownames(485512): cg13869341 cg14008030 ... cg08265308 cg14273923
>>>> rowData metadata column names(0):
>>>> colnames(6): 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R05C02
>>>> 5723646053_R06C02
>>>> colData names(13): Sample_Name Sample_Well ... Basename filenames
>>>> Annotation
>>>> array: IlluminaHumanMethylation450k
>>>> annotation: ilmn12.hg19
>>>> Preprocessing
>>>> Method: SWAN (based on a MethylSet preprocesses as 'Illumina,
>>>> bg.correct
>>>> = TRUE, normalize = controls, reference = 1'
>>>> minfi version: 1.8.1
>>>> Manifest version: 0.4.0
>>>>
>>>>
>>>> A=getAnnotation(gmSet)
>>>> colnames(A)
>>>>
>>>> > A=getAnnotation(gmSet)
>>>> > colnames(A)
>>>> [1] "chr" "pos"
>>>> [3] "strand" "Name"
>>>> [5] "AddressA" "AddressB"
>>>> [7] "ProbeSeqA" "ProbeSeqB"
>>>> [9] "Type" "NextBase"
>>>> [11] "Color" "Probe_rs"
>>>> [13] "Probe_maf" "CpG_rs"
>>>> [15] "CpG_maf" "SBE_rs"
>>>> [17] "SBE_maf" "Islands_Name"
>>>> [19] "Relation_to_Island" "Forward_Sequence"
>>>> [21] "SourceSeq" "Random_Loci"
>>>> [23] "Methyl27_Loci" "UCSC_RefGene_Name"
>>>> [25] "UCSC_RefGene_Accession" "UCSC_RefGene_Group"
>>>> [27] "Phantom" "DMR"
>>>> [29] "Enhancer" "HMM_Island"
>>>> [31] "Regulatory_Feature_Name" "Regulatory_Feature_Group"
>>>> [33] "DHS"
>>>> > A$Relation_to_UCSC_CpG_Island
>>>> NULL
>>>> > A$UCSC_CpG_Islands_Name
>>>> NULL
>>>>
>>>> sessionInfo()
>>>> R version 3.0.2 (2013-09-25)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] minfiData_0.4.2
>>>> [2] BiocInstaller_1.12.0
>>>> [3] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.0
>>>> [4] IlluminaHumanMethylation450kmanifest_0.4.0
>>>> [5] minfi_1.8.1
>>>> [6] bumphunter_1.2.0
>>>> [7] locfit_1.5-9.1
>>>> [8] iterators_1.0.6
>>>> [9] foreach_1.4.1
>>>> [10] Biostrings_2.30.0
>>>> [11] GenomicRanges_1.14.1
>>>> [12] XVector_0.2.0
>>>> [13] IRanges_1.20.0
>>>> [14] reshape_0.8.4
>>>> [15] plyr_1.8
>>>> [16] lattice_0.20-24
>>>> [17] Biobase_2.22.0
>>>> [18] BiocGenerics_0.8.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] annotate_1.40.0 AnnotationDbi_1.24.0 base64_1.1
>>>> [4] beanplot_1.1 codetools_0.2-8 DBI_0.2-7
>>>> [7] digest_0.6.3 doRNG_1.5.5 genefilter_1.44.0
>>>> [10] grid_3.0.2 illuminaio_0.4.0 itertools_0.1-1
>>>> [13] limma_3.18.0 MASS_7.3-29 matrixStats_0.8.12
>>>> [16] mclust_4.2 multtest_2.18.0 nlme_3.1-111
>>>> [19] nor1mix_1.1-4 pkgmaker_0.17.4 preprocessCore_1.24.0
>>>> [22] R.methodsS3_1.5.2 RColorBrewer_1.0-5 registry_0.2
>>>> [25] rngtools_1.2.3 RSQLite_0.11.4 siggenes_1.36.0
>>>> [28] splines_3.0.2 stats4_3.0.2 stringr_0.6.2
>>>> [31] survival_2.37-4 tools_3.0.2 XML_3.98-1.1
>>>> [34] xtable_1.7-1
>>>>
>>>>
>>>> Thank you very much for developing these very useful packages!
>>>> Best,
>>>> Florence
>>>>
>>>> --
>>>> Florence Cavalli, PhD.
>>>> The Hospital for Sick Children
>>>> Brain Tumour Research Centre
>>>> 101 College Street, TMDT-11-701
>>>> Toronto, ON M5G1L7
>>>> Canada
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor@r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>
>
[[alternative HTML version deleted]]