[BioC] Obtaining Raw Intensity Values and Metadata for Exon Arrays in xps

Steve Piccolo stephen.piccolo at hsc.utah.edu
Thu May 22 02:07:34 CEST 2008


Dear List Members:

I asked a question of Christian Stratowa, maintainer of the xps package,
on how to get raw intensity values for exon arrays along with some other
meta information using his package. My question and his very helpful
reply are below.

Regards,
-Steve

-----Original Message-----
From: cstrato [mailto:cstrato at aon.at] 
Sent: Wednesday, May 21, 2008 1:49 PM
To: Steve Piccolo
Subject: Re: Obtaining a Matrix of Raw Values in xps

Dear Steve

This is in principle possible however, it requires to export a couple of

scheme trees in addition to the cel-trees.
Here is what you need to do to get the columns you mention:

3) raw intensity value:
You can get the CEL-file intensities for (x,y)-coordinates using:
   export(data.exon, treetype="cel", varlist = "fInten", 
outfile="Exon_int_cel.txt")

However, for 75 CEL-files the exported file will be pretty large, so I 
suggest to export subsets:
   export.data(data.exon, treename=c("BreastA","BreastB"), varlist = 
"fInten", outfile="Exon_BreastAB_int_cel.txt")

You can even create a data.frame in R using:
   cel <- export.data(data.exon, treename=c("BreastA","BreastB"), 
varlist = "fInten", outfile="Exon_BreastAB_int_cel.txt", as.dataframe=T)
   head(cel)

1) probe_id:
The probe_id for (x,y) can be obtained by exporting:
   export(scheme.exon, treetype="cxy", outfile="HuExon_cxy.txt")
However, currently the probe_id for exon arrays can be calculated from 
(x,y):
   probe_id = x + ncol * y + 1
where ncol is the number of columns of the array, i.e. for exon array 
ncol=2560

2) genomic sequence:
I am not sure what you mean with "genomic sequence", but the probe 
sequences can be obtained by exporting:
   export(scheme.exon, treetype="prb", outfile="HuExon_prb.txt")
which gives you the probe sequence at (x,y)

4) probeset_id:
You can get the probeset_id by exporting:
   export(scheme.exon, treetype="pbs", outfile="HuExon_pbs.txt")
The first column will contain an internal UNIT_ID followed by the 
probeset_id.
The internal UNIT_ID corresponds to the internal ProbeSetID of file 
"HuExon_pbs.txt".
Alternatively, you get the internal UNIT_ID for (x,y) by exporting:
   export(scheme.exon, treetype="scm", outfile="HuExon_scm.txt")

Now, you need to combine the data from these files in order to get the 
matrix you want.
I know that this seems to be complex, but sadly the exon array has a 
very complex structure.

BTW, every export method has the possibility to import the data.frame 
directly into R:
   dataframe <- export(object, ..., as.dataframe=T)

Please let me know if you succeeded with this info.

Best regards
Christian


Steve Piccolo wrote:
> Hi Christian,
>
> One thing I'm trying to do is develop a novel statistical approach for
> "normalizing" exon data. What I need for a given CEL file is basically
a
> matrix with the following columns: 1) probe_id, 2) genomic sequence,
3)
> raw intensity value, and 4) probeset_id. Can you tell me a little
about
> how to get started in accomplishing this with xps?
>  
> Regards,
> -Steve 



More information about the Bioconductor mailing list