[BioC] Obtaining Raw Intensity Values and Metadata for Exon Arrays in xps
Steve Piccolo
stephen.piccolo at hsc.utah.edu
Thu May 22 02:07:34 CEST 2008
Dear List Members:
I asked a question of Christian Stratowa, maintainer of the xps package,
on how to get raw intensity values for exon arrays along with some other
meta information using his package. My question and his very helpful
reply are below.
Regards,
-Steve
-----Original Message-----
From: cstrato [mailto:cstrato at aon.at]
Sent: Wednesday, May 21, 2008 1:49 PM
To: Steve Piccolo
Subject: Re: Obtaining a Matrix of Raw Values in xps
Dear Steve
This is in principle possible however, it requires to export a couple of
scheme trees in addition to the cel-trees.
Here is what you need to do to get the columns you mention:
3) raw intensity value:
You can get the CEL-file intensities for (x,y)-coordinates using:
export(data.exon, treetype="cel", varlist = "fInten",
outfile="Exon_int_cel.txt")
However, for 75 CEL-files the exported file will be pretty large, so I
suggest to export subsets:
export.data(data.exon, treename=c("BreastA","BreastB"), varlist =
"fInten", outfile="Exon_BreastAB_int_cel.txt")
You can even create a data.frame in R using:
cel <- export.data(data.exon, treename=c("BreastA","BreastB"),
varlist = "fInten", outfile="Exon_BreastAB_int_cel.txt", as.dataframe=T)
head(cel)
1) probe_id:
The probe_id for (x,y) can be obtained by exporting:
export(scheme.exon, treetype="cxy", outfile="HuExon_cxy.txt")
However, currently the probe_id for exon arrays can be calculated from
(x,y):
probe_id = x + ncol * y + 1
where ncol is the number of columns of the array, i.e. for exon array
ncol=2560
2) genomic sequence:
I am not sure what you mean with "genomic sequence", but the probe
sequences can be obtained by exporting:
export(scheme.exon, treetype="prb", outfile="HuExon_prb.txt")
which gives you the probe sequence at (x,y)
4) probeset_id:
You can get the probeset_id by exporting:
export(scheme.exon, treetype="pbs", outfile="HuExon_pbs.txt")
The first column will contain an internal UNIT_ID followed by the
probeset_id.
The internal UNIT_ID corresponds to the internal ProbeSetID of file
"HuExon_pbs.txt".
Alternatively, you get the internal UNIT_ID for (x,y) by exporting:
export(scheme.exon, treetype="scm", outfile="HuExon_scm.txt")
Now, you need to combine the data from these files in order to get the
matrix you want.
I know that this seems to be complex, but sadly the exon array has a
very complex structure.
BTW, every export method has the possibility to import the data.frame
directly into R:
dataframe <- export(object, ..., as.dataframe=T)
Please let me know if you succeeded with this info.
Best regards
Christian
Steve Piccolo wrote:
> Hi Christian,
>
> One thing I'm trying to do is develop a novel statistical approach for
> "normalizing" exon data. What I need for a given CEL file is basically
a
> matrix with the following columns: 1) probe_id, 2) genomic sequence,
3)
> raw intensity value, and 4) probeset_id. Can you tell me a little
about
> how to get started in accomplishing this with xps?
>
> Regards,
> -Steve
More information about the Bioconductor
mailing list