[BioC] Help with DMPFinder in minfi package
James W. MacDonald
jmacdon at uw.edu
Wed Jun 19 18:34:33 CEST 2013
Hi Srikanth,
On 6/19/2013 12:02 PM, Srinivas Srikanth Manda wrote:
> Hi James,
>
> Thanks for the response. I am a bit new to this kind of analysis. When
> I use and dmpFinder on three different groups Vs Control and I export
> the results as data.frame, I only get columns corresponding to
> probeid, intercept and F, p and q values but not the group name. I
> want to have data in the form how the plotCpG function works (to see a
> difference in methylation across the three groups). How can I achieve
> this?
>
> My initial goal:
> Find significantly different probes across three different grades of
> cancer samples.
You should note that dmpFinder() does just what you are asking, finding
probes that are significantly different across one or more of your
sample types. However, it is fitting a particular model that you might
not like, and doing an F-test for ANY difference. You may want to do
more directed analyses.
To that end, note that dmpFinder() is just a nice wrapper for doing the
analysis, and you don't have to use it. You can just as easily use limma
to do the univariate analyses. I won't get into details here, as the
limma User's Guide has any number of examples you could emulate (and I
have already given you a head start below).
> Map the probes to gene regions (like promotes, gene body, UTR, etc)
See ?mapToGenome for mapping the probes to genomic coordinates. You will
then need to map significantly differentially methylated regions to
genomic features and maybe make some nice plots. This isn't super
difficult, but it does require a fair amount of base knowledge of a
bunch of packages.
You should probably peruse the following:
http://bioconductor.org/help/course-materials/2013/SeattleFeb2013/IntermediateSequenceAnalysis2013.pdf
which covers a lot of what you want to do (differential methylation
isn't particularly different in a lot of respects from e.g., RNA-seq or
whatever. You have a genomic position that you think is 'interesting'
and you might want to know if there is any known <something> nearby. The
only difference is why you think it is interesting.).
There is no substitute for just trying to do something and doing your
best to figure out why things aren't working. Search the list, read the
vignettes, read the help pages.
Best,
Jim
> Find enriched regions.
>
> Any help for the above tasks is appreciated.
>
>
> Thanks
> Srikanth
>
>
>
>
>
> On Wed, Jun 19, 2013 at 7:06 PM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
> Hi Srinivas,
>
>
> On 6/19/2013 5:21 AM, Srinivas Srikanth Manda wrote:
>
> Hello Members,
>
> I am using Minfi package to analyze 450k data. I have three
> different
> groups of samples and one common control. I did the
> normalization and other
> steps according to manual, but stuck at the differential
> methylation
> positions. When I use:
>
> M<- getM(MSet.norm, type = "beta", betaThreshold = 0.001)
> dmp1<- dmpFinder(M, pheno=pd$Sample_Group, type="categorical")
>
> I want to get a table with probes and corresponding values in
> each group.
> the data.frame dmp1 does not tell me which group has what
> value? How can I
> do that?
>
>
> It's not clear what you mean by 'probes and corresponding values
> in each group'. I am not sure what a corresponding value is.
>
> If I make the assumption that you want the coefficients from the
> model fit, then you can do
>
> design <- model.matrix(~pd$Sample_Group)
> fit <- lmFit(M, design)
>
> and then fit$coefficients has the coefficients. Or perhaps you
> just want the methylation values? The M-values are in your M
> matrix, and if you prefer betas, you can use getBeta(MSet.norm).
>
> You might also just want the mean of each group. In which case it
> would be easier to do
>
> design <- model.matrix(~0+pd$Sample_Group)
> fit <- lmFit(M, design)
>
> and then fit$coefficients will contain the mean value for each
> group, by probe.
>
> Best,
>
> Jim
>
>
>
>
>
> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] minfiData_0.3.1
> [2] IlluminaHumanMethylation450kmanifest_0.4.0
> [3] minfi_1.4.0
> [4] Biostrings_2.26.3
> [5] GenomicRanges_1.10.7
> [6] IRanges_1.16.6
> [7] reshape_0.8.4
> [8] plyr_1.8
> [9] lattice_0.20-15
> [10] Biobase_2.18.0
> [11] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.26.0 annotate_1.36.0
> AnnotationDbi_1.20.7
> [4] beanplot_1.1 BiocInstaller_1.8.3 bit_1.1-10
> [7] codetools_0.2-8 crlmm_1.16.9 DBI_0.2-7
> [10] ellipse_0.3-8 ff_2.2-11 foreach_1.4.0
> [13] genefilter_1.40.0 grid_2.15.2 iterators_1.0.6
> [16] limma_3.14.4 MASS_7.3-23 Matrix_1.0-12
> [19] matrixStats_0.8.1 mclust_4.1 multtest_2.14.0
> [22] mvtnorm_0.9-9994 nor1mix_1.1-4
> oligoClasses_1.20.0
> [25] parallel_2.15.2 preprocessCore_1.20.0
> RColorBrewer_1.0-5
> [28] RcppEigen_0.3.1.2.1 R.methodsS3_1.4.2 RSQLite_0.11.3
> [31] siggenes_1.32.0 splines_2.15.2 stats4_2.15.2
> [34] survival_2.37-4 tools_2.15.2 XML_3.96-1.1
> [37] xtable_1.7-1 zlibbioc_1.4.0
>
>
>
> Regards,
> Srikanth
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>
>
> --
> Srinivas Srikanth Manda
> Ph.D. Student
> Institute of Bioinformatics
> Discoverer, 7th Floor,
> International Technology Park,
> Bangalore, India
> Mob:+919019114878
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list