[BioC] Reading Agilent files using read.maimages

Wed Aug 15 15:49:44 CEST 2007

Hi Steve,

It would seem that most of the columns that you have are also present in
an Agilent output file (Except for the "H_C1 A_mock A and B/"-part).
You apparently have all the columns necessary to do the analysis: (I'll
remove the extra header part to keep a more clear overview, perhaps it's
a good idea to remove these parts from your files as well)

- r/gMeanSignal  (Mean Signal intensity for red/green channel)
- r/gBGMeanSignal (Mean Background Signal Intensity for red/green
channel)
- r/gMedianSignal (Median Signal r/g)
- r/gBGMedianSignal (Median Background Signal r/g)
- r/gBGUsed (Estimated background using a specific algorithm called
spatial detrending. This value is usually lower compared to
r/gBGMe(di)an signal, and should be used during the background
subtraction (if you intend to use that), in my humble opinion. 

So what I usually do then is the following: 

	# Read the Target file (experimental description) - See Limma
user guide for more information on this
targets <- readTargets("description.txt", sep="\t", quote="\"")
	# Reading the Images
Agilent.RG <- read.maimages(targets$FileName, source="agilent",
path=datapath, 
	names=targets$Description, columns= list( R = "rMeanSignal", G =

	"gMeanSignal", Rb = "rBGUsed", Gb = "gBGUsed", Rb.real =
"rBGMeanSignal", Gb.real = "gBGMeanSignal"), annotation =
	c("FeatureNum","Row","Col","ProbeName","ControlType","GeneName",
	"Description","SystematicName"))

This way you can check both background values (estimated (Agilent.RG$Rb,
Agilent.RG$Gb) vs really measured (Agilent.RG$Rb.real,
Agilent.RG$Gb.real)) during my quality control checks.

As Sean mentioned, the normalization used for the r/gProcessedSignals is
dependant on the Scanner type and software used for image conversion,
but if the original Feature Annotation Software (and Agilent Scanner)
has been used, then I would think in the direction of a LOESS algorithm
(for within-array normalisation) followed by scaling to a reference
value.

If I recall correctly (but I keep forgetting the minor details), then
the Processed signal (i.e. for red) is calculated using
( rMeanSignal - rBGUsed ) --> corrected through LOESS Normalization -->
Scaling --> Processed Value. I think each Feature Extraction Software
comes with a built-in manual where these procedures are more clearly
explained. I would suggest reading that.

The ratio between rProcessedSignal and gProcessedSignal is then
calculated and transformed into a log10-scale! (Note: Bioconductor often
uses a log2 scale, so don't compare the Agilent LogRatio directly with
the Ratio (LogOdds) you will get while using for instance the limma
package in R.

I hope that this clarifies a bit.

  -- Stan

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Steve
Taylor
Sent: 15 August 2007 10:03
To: Sean Davis
Cc: Bioconductor
Subject: Re: [BioC] Reading Agilent files using read.maimages

Hi Sean,

>>A typical header for one of these raw files is
>>
>>CompositeSequence Identifier    Database ebi.ac.uk:Database:embl
Database ebi.ac.uk:Database:ensembl     Database
ebi.ac.uk:Database:locus       Database ebi.ac.uk:Database:refseq
Database 
>>ebi.ac.uk:Database:tigr_thc    Database
www.chem.agilent.com:Database:agc      Database
www.chem.agilent.com:Database:agp      Feature coordinates: metaColumn
metaRow column  row     Reporter control 
>>type   Reporter group  Reporter identifier     Reporter name
Reporter sequence type  H_C1 A_mock A and B/FEATURES    H_C1 A_mock A
and B/FeatureNum  H_C1 A_mock A and B/gbpri       H_C1 A_mock A and 
>>B/gp  H_C1 A_mock A and B/sp  H_C1 A_mock A and B/ProbeUID    H_C1
A_mock A and B/ControlType H_C1 A_mock A and B/ProbeName   H_C1 A_mock A
and B/GeneName    H_C1 A_mock A and B/SystematicName 
>>H_C1 A_mock A and B/Description H_C1 A_mock A and B/LogRatio    H_C1
A_mock A and B/LogRatioError       H_C1 A_mock A and B/PValueLogRatio
H_C1 A_mock A and B/gSurrogateUsed      H_C1 A_mock A 
>>and B/rSurrogateUsed      H_C1 A_mock A and B/gIsFound    H_C1 A_mock
A and B/rIsFound    H_C1 A_mock A and B/gProcessedSignal    H_C1 A_mock
A and B/rProcessedSignal    H_C1 A_mock A and 
>>B/gProcessedSigError  H_C1 A_mock A and B/rProcessedSigError  H_C1
A_mock A and B/gNumPixOLHi H_C1 A_mock A and B/rNumPixOLHi H_C1 A_mock A
and B/gNumPixOLLo H_C1 A_mock A and B/rNumPixOLLo H_C1 
>>A_mock A and B/gNumPix     H_C1 A_mock A and B/rNumPix     H_C1 A_mock
A and B/gMeanSignal H_C1 A_mock A and B/rMeanSignal H_C1 A_mock A and
B/gMedianSignal       H_C1 A_mock A and B/rMedianSignal 
>>    H_C1 A_mock A and B/gPixSDev    H_C1 A_mock A and B/rPixSDev
H_C1 A_mock A and B/gBGNumPix   H_C1 A_mock A and B/rBGNumPix   H_C1
A_mock A and B/gBGMeanSignal       H_C1 A_mock A and 
>>B/rBGMeanSignal       H_C1 A_mock A and B/gBGMedianSignal     H_C1
A_mock A and B/rBGMedianSignal     H_C1 A_mock A and B/gBGPixSDev  H_C1
A_mock A and B/rBGPixSDev  H_C1 A_mock A and B/gNumSatPix 
>>H_C1 A_mock A and B/rNumSatPix  H_C1 A_mock A and B/gIsSaturated
H_C1 A_mock A and B/rIsSaturated        H_C1 A_mock A and
B/PixCorrelation      H_C1 A_mock A and B/BGPixCorrelation    H_C1 
>>A_mock A and B/gIsFeatNonUnifOL    H_C1 A_mock A and
B/rIsFeatNonUnifOL    H_C1 A_mock A and B/gIsBGNonUnifOL      H_C1
A_mock A and B/rIsBGNonUnifOL      H_C1 A_mock A and B/gIsFeatPopnOL
H_C1 
>>A_mock A and B/rIsFeatPopnOL       H_C1 A_mock A and B/gIsBGPopnOL
H_C1 A_mock A and B/rIsBGPopnOL H_C1 A_mock A and B/IsManualFlag
H_C1 A_mock A and B/gBGSubSignal        H_C1 A_mock A and 
>>B/rBGSubSignal        H_C1 A_mock A and B/gBGSubSigError      H_C1
A_mock A and B/rBGSubSigError      H_C1 A_mock A and
B/BGSubSigCorrelation H_C1 A_mock A and B/gIsPosAndSignif     H_C1
A_mock A and 
>>B/rIsPosAndSignif     H_C1 A_mock A and B/gPValFeatEqBG       H_C1
A_mock A and B/rPValFeatEqBG       H_C1 A_mock A and B/gNumBGUsed  H_C1
A_mock A and B/rNumBGUsed  H_C1 A_mock A and B/gIsWellAboveBG 
>>      H_C1 A_mock A and B/rIsWellAboveBG      H_C1 A_mock A and
B/IsUsedBGAdjust      H_C1 A_mock A and B/gBGUsed     H_C1 A_mock A and
B/rBGUsed     H_C1 A_mock A and B/gBGSDUsed   H_C1 A_mock A and 
>>B/rBGSDUsed   H_C1 A_mock A and B/IsNormalization     H_C1 A_mock A
and B/gDyeNormSignal      H_C1 A_mock A and B/rDyeNormSignal      H_C1
A_mock A and B/gDyeNormError       H_C1 A_mock A and 
>>B/rDyeNormError       H_C1 A_mock A and B/DyeNormCorrelation  H_C1
A_mock A and B/ErrorModel
> 
> 
> This is not an Agilent Raw Data file, I do not think.  The column
names
> are similar, but ArrayExpress has significantly changed the file from
> its original format.  That said, the columns with "LogRatio",
> "rProcessedSignal" and "gProcessedSignal" are the columns of interest
> that have already been background corrected and, typically, a
> normalization method applied (not sure which one without some more
> description of the scanner settings).
> 
>
>Ok. Thanks. That's useful information. In the protocols section of AE
it says 'Default settings' 
>(http://www.ebi.ac.uk/aerep/details?class=MAGE.Experiment_protocols&cri
teria=Experiment%>3D921408317&contextClass=MAGE.Protocol&templateName=Pr
otocol.vm). If that means it has been normalised I will 
>have a look at LogRatio, rProcessedSignal and gProcessedSignal, though
it would be nice to know how it had been processed...
>
> 
>>Does this look correct? How do I get access to the intensities, for
example to do a boxplot?
> 
> 
> I'm not sure if the files loaded correctly, given my comments above.
> RG$R and RG$G contain the Red and Green intensities, if it loaded
correctly.
> 

That's what I thought. Thanks for the advice,

Steve

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor