[BioC] FW: Request for the assistance to use MEDME
Mattia Pelizzola
mattia.pelizzola at yale.edu
Mon Dec 1 16:41:41 CET 2008
Hi Prashantha,
regarding the "pos" slot you have to provide a single number for each
probe. This can be usually be the mid position. You can determine it
from a matrix where you have start and stop columns using the rowMeans
function. Otherwise, you can just use the start position especially if
the probes are short in respect to the window size (60bp or so for the
probe compared to 1000bp window size).
In the specific case of you dataset, since you have the positions in the
"chr3:071885975-071886019" format it is a little bit more complicated,
and you could do as follow:
data = read.table("test.txt", sep="\t", header=TRUE, row.names = 6,
stringsAsFactors = FALSE)
pos = strsplit(split=":", data$SystematicName, fixed=TRUE)
chr = sapply(pos, function(x) x[1])
pos = sapply(pos, function(x) x[2])
pos = strsplit(split='-', pos, fixed=TRUE)
pos = sapply(pos, function(x) mean(as.numeric(x)))
now you have to build the matrix with the MeDIP enrichment data:
logRmat = as.matrix(data$logFC)
rownames(logRmat) = rownames(data)
if you have more than one array, you can you the cbind or data.frame
functions to build logRmat. For example:
logRmat = data.frame(array1 = data1$logFC, array2 = data2$logFC, array2
= data2$logFC)
rownames(logRmat) = rownames(data)
finally you are ready to build the MEDMEset object (has for human, mmu
for mouse):
Mset = new("MEDMEset", pos=pos, chr=chr, logR = logRmat, organism="hsa")
I am sorry but I did not understand the second point, as "weighting of
MeDIP enrichment" and determination of probe position are not related. I
hope I have clarified this above.
Regarding the dataset to be used for fitting the model, there are two
possibilities. On one hand you could generate artificially fully
methylated DNA, as described in the paper. On the other hand, you could
try to use directly the MeDIP data that you already have. The latter is
expected to work for samples where high level of methylation is
expected. This is for example true for human or mouse genomic DNA
hybridized on chromosome-wide tiling arrays. Rather, if you array
represent mostly promoter regions this is not a good idea as these
regions are usually hypo-methylated.
I hope this helps,
mattia
Prashantha Hebbar Kiradi [MU-MLSC] wrote:
>
> Dear Mattia,
>
> I am little confused. Though read.table is working on .txt, I have now
> converted normalized .txt output to .gff output using script. We get
> chromosomal assignment and probe position information in raw as well as
> normalized data. We get position information as range parameter (Ex:
> strat - end) but what I see in the example below that you have provided
> single parameter for "pos" slot. Now I have again gone through the paper
> and understood that you have done some more work (as explained in
> "Weighting of MeDIP enrichment" section) to get positon in single
> parameter. Can you please explian me little more about "Weighting of
> MeDIP enrichment" and the care needs to be taken in this step? Is there
> any command to this in R or we have to script for it? To implement MEDME
> sucessfully in our lab as a third party, do we need to have refrence
> genome where all the "C" are methylated (As explianed in the "Derivation
> of fully methylated DNA" section of paper)?
>
> As you see in attachment files logRatio is not in matrix form. we get
> only logFC value. So, Can you please tell me how should I proceed further?
>
> Thanking you and Dr. Molinaro in anticipation for the support.
>
> Regards,
>
> Prashantha
>
>
>
>
>
> -----Original Message-----
> From: Mattia Pelizzola [mailto:mattia.pelizzola at yale.edu]
> Sent: Wed 11/26/2008 8:25 PM
> To: Prashantha Hebbar Kiradi [MU-MLSC]
> Cc: Annette Molinaro
> Subject: Re: FW: Request for the assistance to use MEDME
>
> Hi Prashantha,
>
> thanks for using MEDME. Unfortunately, the Agilent file format that you
> are using is not supported and MEDME.readFiles can't be used. In this
> case you have to use a little bit basic R functions to "manually" load
> the data into R and create a MEDMEset object.
>
> As you can find in the documentation of the MEDMEset class (type
> "class?MEDMEset" in R) there are several data slots (chr, pos, logR,
> etc..). Few of these are actually mandatory to create a minimal MEDMEset
> (chr, pos, logR and organims). You can find details on all of these in
> the documentation page above, but I'll provide here some examples:
>
> > chr = c("chr1","chr1","chr5","chr6")
> > pos = c(1000, 3000, 100, 5000)
> > logR = cbind(c(2.2, 4.1, -0.5, 0.1),c(3, 0, 0.2, -1)) # a matrix
> with N columns for N samples
> > rownames(logR) = letters[1:4] # probes are named a,b,c,d here
> > organism = "hsa" # for human or "mmu" for mouse
>
> finally you use these to initialize a new MEDMEset:
> Mset = new('MEDMEset', chr=chr, pos=pos, logR=logR, organism=organism)
>
> Now, in you case you have to load all these data from your files. I
> could not find chromosomal assignment and probe position in the header
> of your file (e.g. "test_cis_reg.gff") so I guess these are available
> (or can be exported from the Agilent software) in a separate file. Be
> careful of matching chr and pos with the data in "test_cis_reg.gff"
> respecting the probe order.
>
> You can load the data from datafile.txt with the function read.table:
>
> > data = read.table(file = "test_cis_reg.gff", sep="\t", header=TRUE,
> row.names = NULL, stringsAsFactors = FALSE)
> now you can extract the MedIP data with:
> > MeDIPdata = data[,"logFC"] # assuming that this is the column
> containing the MeDIP logRatio ..
> and the probe names with:
> > probeNames = data[,"ProbeName"]
>
>
> In case you have many samples you have to repeat that many times (you
> can use a "for" cycle to iterate on file names in case ..) and you can
> put them together with the "cbind" function that I used in the example
> above (be careful about the order of probeNames being consistent through
> files!). Finally you assign probe names and you are set.
>
> You have to do something similar with another file to get probe chr and
> pos. Then you can initialize the MEDMEset object.
>
> Let me know if you have any problem,
>
> mattia
>
>
>
> Annette Molinaro wrote:
> > Hi Mattia -
> > Can you follow up on this email?
> >
> > Many thanks,
> >
> > Annette
> >
> >
> >
> >
> >
> > *From:* Prashantha Hebbar Kiradi [MU-MLSC]
> > [mailto:prashantha.hebbar at manipal.edu]
> > *Sent:* Wednesday, November 26, 2008 1:41 AM
> > *To:* annette.molinaro at yale.edu
> > *Subject:* Request for the assistance to use MEDME
> >
> >
> >
> >
> >
> > Dear Dr. Molinaro,
> >
> > I am Prashantha from Manipal Life Sciences Center, Manipal University,
> > India. Recently, I have gone through MEDME (Pelizzola et.al, Genome Res.
> > 2008) and tried to implement the algorithm on our data. I used limma for
> > the normalization. But now I am stucked in MEDME. So, I would like to
> > discuss the issues with you. Incase you do not find the time, Please put
> > me in touch with any of your lab mates.
> >
> > Following are my doubts:
> >
> > 1. The normalized data is in .tsv format and has 'Row', 'Col',
> > 'ProbeUID', 'ControlType', 'ProbeName', 'GeneName', 'SystematicName',
> > 'Description', 'logFC', 'AveExpr', 't', 'P.Value', 'adj.P.Val', 'B' as
> > columns. As you know the GFF format has 9 fields (seqname, source,
> > feature, start, end, score, strand, frame, group). So, for the "score"
> > field in GFF format which field you have chosen from the fileds in
> > normalized data. We use here Agilents image extraction software for
> > feature extraction and I think, there is no way to get the data in GFF.
> >
> > 2. In order to test my data now I have created a dummy GFF by taking
> > logFc as "score". When I use following command
> >> MEDME.readFiles(path = getwd(), files ="test_cis_reg.gff",
> > format="gff", organism="hsp")
> >
> > I will get following error message,
> >
> ---------------------------------------------------------------------------------------
> > Error in initialize(value, ...) :
> > logR has to be a matrix with probeIds as rownames ..
> > In addition: Warning message:
> > In MEDME.readFiles(path = getwd(), files = "test_cis_reg.gff", format =
> > "gff", :
> > no unique probe names provided on column 3; the resulting dataset
> > lacks rownames ..
> >
> -----------------------------------------------------------------------------------------
> >
> > I request you to kindly help me to come out of these problems.
> >
> > Thank you in anticipation for the support.
> >
> > Sincerely,
> >
> > Mr. Prashantha Hebbar
> > Bioinformatician
> > Manipal Life Sciences Center,
> > Manipal University,
> > Manipal, India
> > PIN:576104
> > Ph:+91-9886359007
> >
> > ------------------------------------------------------------------------
> >
> > This e-mail is privileged and confidential. If you are not the
> > intended recipient please delete the message and notify the sender.
> > Any views or opinions presented are solely those of the author.
> >
> > ------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
> This e-mail is privileged and confidential. If you are not the
> intended recipient please delete the message and notify the sender.
> Any views or opinions presented are solely those of the author.
> ------------------------------------------------------------------------
>
More information about the Bioconductor
mailing list