[BioC] Problems normalizing scanarray express data with limma
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Jan 12 05:52:13 CET 2012
Dear Matthew,
This question hasn't been asked for many years! It used to be quite a
common question, see for example:
https://stat.ethz.ch/pipermail/bioconductor/2005-July/009886.html
The problem is not that you have an extra row, but rather than you have
too few rows. Your arrays have 16 blocks (4 x 4) with 6 rows and 14
columns of spots in each block. So limma assumes your arrays to have
4x4x6x14 = 1344 spots, but your files actually contain only 1152 rows of
data. The reason is almost certainly that a number of empty spots have
been removed from the files.
One easy workaround is simply to do global loess instead of
print-tip-loess normalization:
MA <- normalizeWithinArrays(RG, method="loess")
Another workaround is to make up a block count variable:
block <- 4*(RG$genes[,"Array Row"]-1) + RG$genes[,"Array Column"]
and then to use the solution that I suggested back in July 2005.
With respect to the deleting of 74 lines of headers and so forth, have you
tried simply using
RG <-read.maimages(targets, source="scanarrayexpress", sep=",")
using your original unedited files? The whole reason for having a
"scanarrayexpress" method for read.maimages() is that it takes care of all
the editing and reading for you.
Best wishes
Gordon
> Date: Tue, 10 Jan 2012 14:34:53 -0500
> From: Matthew Ouellette <ouellet5 at uwindsor.ca>
> To: bioconductor at r-project.org
> Subject: [BioC] Problems normalizing scanarray express data with limma
>
> Hello,
>
> I'm having trouble analyzing my custom arrays with limma. I've searched
> the archives and I seem to be running into a similar problem that was
> previously dealt with here (
> https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html).
>
> I'm also using outputs from a scanarray express, although I've modified my
> .csv's accordingly and removed the final line of useless data as indicated
> in the archives. Also, being an R newbie I wasn't sure how to tell R that
> my data started after some 74 lines of headers (output info from the
> scanner), so I deleted those headers out as well (and input $printer info
> manually), leaving only a header for the columns of intensity data. For
> simplicities sake I've pasted below a shortened session of what I'm trying
> to do (my apologies for the lengthy e-mail). I appreciate the help and
> comments.
>
>
>
> R version 2.14.0 (2011-10-31)
> Copyright (C) 2011 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0]
>
>> setwd("***")
>> library(limma)
>> targets<-readTargets()
>> RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array
> Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"),
> other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",")
> Read 01-13_B.csv
> Read 01-13_M.csv
> Read 01-13_T.csv
>> RG$printer <-getLayout2("ChinookBOT.gal")
>> spottypes<-readSpotTypes()
>> RG$genes$Status<- controlStatus(spottypes, RG)
> Matching patterns for: Name
> Found 1116 oligo
> Found 21 blank
> Found 15 serial
> Setting attributes: values Color
>> show(RG)
> An object of class "RGList"
> $G
> 01-13_B 01-13_M 01-13_T
> [1,] 102 119 239
> [2,] 100 122 339
> [3,] 102 135 251
> [4,] 90 112 242
> [5,] 110 141 239
> 1147 more rows ...
>
> $Gb
> 01-13_B 01-13_M 01-13_T
> [1,] 89 94 147
> [2,] 88 84 181
> [3,] 88 91 161
> [4,] 92 90 175
> [5,] 86 87 154
> 1147 more rows ...
>
> $R
> 01-13_B 01-13_M 01-13_T
> [1,] 120 678 202
> [2,] 154 610 312
> [3,] 146 614 306
> [4,] 108 654 310
> [5,] 122 710 291
> 1147 more rows ...
>
> $Rb
> 01-13_B 01-13_M 01-13_T
> [1,] 108 119 135
> [2,] 109 137 159
> [3,] 113 124 169
> [4,] 115 124 180
> [5,] 119 104 159
> 1147 more rows ...
>
> $targets
> FileName Cy3 Cy5
> 1 01-13_B.csv B1 B2
> 2 01-13_M.csv M1 M2
> 3 01-13_T.csv T1 T2
>
> $genes
> Array Row Array Column Spot Row Spot Column Name ID Status
> 1 1 1 1 1 HEATH049 Gene A4 oligo
> 2 1 1 1 2 HEATH049 Gene A4 oligo
> 3 1 1 1 3 HEATH049 Gene A4 oligo
> 4 1 1 1 4 HEATH113 Gene A8 oligo
> 5 1 1 1 5 HEATH113 Gene A8 oligo
> 1147 more rows ...
>
> $source
> [1] "scanarrayexpress"
>
> $other
> $Ch1 SignalNoiseRatio
> 01-13_B 01-13_M 01-13_T
> [1,] 3.06 2.55 3.02
> [2,] 2.72 3.06 2.35
> [3,] 2.68 3.60 3.34
> [4,] 2.51 3.12 0.95
> [5,] 3.33 3.82 2.66
> 1147 more rows ...
>
> $Ch2 SignalNoiseRatio
> 01-13_B 01-13_M 01-13_T
> [1,] 2.31 12.41 2.85
> [2,] 2.42 11.82 3.57
> [3,] 2.66 11.71 4.14
> [4,] 1.75 14.41 0.65
> [5,] 2.09 15.90 4.62
> 1147 more rows ...
>
>
> $printer
> $ngrid.r
> [1] 4
>
> $ngrid.c
> [1] 4
>
> $nspot.r
> [1] 6
>
> $nspot.c
> [1] 14
>
>
>> MA<- normalizeWithinArrays(RG)
> Error in normalizeWithinArrays(RG) :
> printer layout information does not match M row dimension
>
>
>
> --
> Matthew Ouellette, M.Sc. Candidate
> Great Lakes Institute for Environmental Research
> University of Windsor
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list