[BioC] Problems normalizing scanarray express data with limma
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Jan 12 23:57:54 CET 2012
Dear Matthew,
On Thu, 12 Jan 2012, Matthew Ouellette wrote:
> Dear Gordon,
>
> I feel I should have come to that conclusion myself - you're correct
> that there are missing spots. However, this is not the result of
> software removing blank spots; it is in fact the way we printed the
> array. Each block consists of 6 rows x 14 columns, but the 6th row
> only has spots in columns 1 and 2 (i.e. there are only 2 spots on row
> 6 in each block). This layout is the result of spotting 384 probes in
> the smallest area possible in order to cut down the amount of RT
> reagents needed to produce significant results.
>
> As I mentioned earlier, I am very new to R. I am in the process of
> attempting to use the block count variable you suggested, however I'm
> having difficulties adjusting it to the code you suggested in 2005.
> How would I modify the following to fit my particular array?
>
> for (b in 1:48) {
> i <- RG$genes$Block==b
> MA2 <- normalizeWithinArrays(RG[i,],method="loess")
> if(b==1)
> MA <- MA2
> else
> MA <- rbind(MA,MA2)
> }
Replace "48" with "16" and "RG$genes$Block" with "block".
> As for the the input files, I have attempted to use:
>
> RG <-read.maimages(targets, source="scanarrayexpress", sep=",")
>
> on unedited files, but the following warnings come up (when analyzing
> 15 unedited array files this time):
>
>> RG<-read.maimages(targets, source="scanarrayexpress", sep=",")
> Read 01-13_B.csv
> Read 01-13_M.csv
> Read 01-13_T.csv
> Read 01-14_B.csv
> Read 01-14_M.csv
> Read 01-14_T.csv
> Read 01-15_B.csv
> Read 01-15_M.csv
> Read 01-15_T.csv
> Read 01-16_B.csv
> Read 01-16_M.csv
> Read 01-16_T.csv
> Read 01-17_B.csv
> Read 01-17_M.csv
> Read 01-17_T.csv
> There were 45 warnings (use warnings() to see them)
>> warnings()
> Warning messages:
> 1: In grep(a, txt) : input string 1 is invalid in this locale
> 2: In grep(a, txt) : input string 1 is invalid in this locale
> 3: In grep(a, txt) : input string 1 is invalid in this locale
> 4: In grep(a, txt) : input string 1 is invalid in this locale
> 5: In grep(a, txt) : input string 1 is invalid in this locale
> 6: In grep(a, txt) : input string 1 is invalid in this locale
> 7: In grep(a, txt) : input string 1 is invalid in this locale
> 8: In grep(a, txt) : input string 1 is invalid in this locale
> 9: In grep(a, txt) : input string 1 is invalid in this locale
> 10: In grep(a, txt) : input string 1 is invalid in this locale
> [... to 45]
This is most likely caused by the fact that your copy of R is compiled for
a different language than that used by the software used to write your
data. Eg., it could be that your R is American English and but the files
were written using an extended French alphabet, so your files contain
non-english letters.
Typing sessionInfo() will reveal the language (locale) your version of R
is compiled for.
> I have contacted a co-worker about this problem and he claims that he
> doesn't get this error when using R in Windows XP (I am currently
> using Mac OS X). At first, I thought these errors would skew my
> results so I opted to edit the files myself just to get the hang of
> limma.
Nothing to do with Windows or Mac. Probably doesn't affect your limma
results.
Best wishes
Gordon
> I appreciate your help,
>
> Matthew
>
> On Wed, Jan 11, 2012 at 11:52 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>
>> Dear Matthew,
>>
>> This question hasn't been asked for many years! It used to be quite a common question, see for example:
>>
>> https://stat.ethz.ch/pipermail/bioconductor/2005-July/009886.html
>>
>> The problem is not that you have an extra row, but rather than you have too few rows. Your arrays have 16 blocks (4 x 4) with 6 rows and 14 columns of spots in each block. So limma assumes your arrays to have 4x4x6x14 = 1344 spots, but your files actually contain only 1152 rows of data. The reason is almost certainly that a number of empty spots have been removed from the files.
>>
>> One easy workaround is simply to do global loess instead of print-tip-loess normalization:
>>
>> MA <- normalizeWithinArrays(RG, method="loess")
>>
>> Another workaround is to make up a block count variable:
>>
>> block <- 4*(RG$genes[,"Array Row"]-1) + RG$genes[,"Array Column"]
>>
>> and then to use the solution that I suggested back in July 2005.
>>
>>
>> With respect to the deleting of 74 lines of headers and so forth, have you tried simply using
>>
>> RG <-read.maimages(targets, source="scanarrayexpress", sep=",")
>>
>> using your original unedited files? The whole reason for having a "scanarrayexpress" method for read.maimages() is that it takes care of all the editing and reading for you.
>>
>> Best wishes
>> Gordon
>>
>>
>>> Date: Tue, 10 Jan 2012 14:34:53 -0500
>>> From: Matthew Ouellette <ouellet5 at uwindsor.ca>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] Problems normalizing scanarray express data with limma
>>>
>>> Hello,
>>>
>>> I'm having trouble analyzing my custom arrays with limma. I've searched
>>> the archives and I seem to be running into a similar problem that was
>>> previously dealt with here (
>>> https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html).
>>>
>>> I'm also using outputs from a scanarray express, although I've modified my
>>> .csv's accordingly and removed the final line of useless data as indicated
>>> in the archives. Also, being an R newbie I wasn't sure how to tell R that
>>> my data started after some 74 lines of headers (output info from the
>>> scanner), so I deleted those headers out as well (and input $printer info
>>> manually), leaving only a header for the columns of intensity data. For
>>> simplicities sake I've pasted below a shortened session of what I'm trying
>>> to do (my apologies for the lengthy e-mail). I appreciate the help and
>>> comments.
>>>
>>>
>>>
>>> R version 2.14.0 (2011-10-31)
>>> Copyright (C) 2011 The R Foundation for Statistical Computing
>>> ISBN 3-900051-07-0
>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>> [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0]
>>>
>>>> setwd("***")
>>>> library(limma)
>>>> targets<-readTargets()
>>>> RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array
>>>
>>> Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"),
>>> other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",")
>>> Read 01-13_B.csv
>>> Read 01-13_M.csv
>>> Read 01-13_T.csv
>>>>
>>>> RG$printer <-getLayout2("ChinookBOT.gal")
>>>> spottypes<-readSpotTypes()
>>>> RG$genes$Status<- controlStatus(spottypes, RG)
>>>
>>> Matching patterns for: Name
>>> Found 1116 oligo
>>> Found 21 blank
>>> Found 15 serial
>>> Setting attributes: values Color
>>>>
>>>> show(RG)
>>>
>>> An object of class "RGList"
>>> $G
>>> 01-13_B 01-13_M 01-13_T
>>> [1,] 102 119 239
>>> [2,] 100 122 339
>>> [3,] 102 135 251
>>> [4,] 90 112 242
>>> [5,] 110 141 239
>>> 1147 more rows ...
>>>
>>> $Gb
>>> 01-13_B 01-13_M 01-13_T
>>> [1,] 89 94 147
>>> [2,] 88 84 181
>>> [3,] 88 91 161
>>> [4,] 92 90 175
>>> [5,] 86 87 154
>>> 1147 more rows ...
>>>
>>> $R
>>> 01-13_B 01-13_M 01-13_T
>>> [1,] 120 678 202
>>> [2,] 154 610 312
>>> [3,] 146 614 306
>>> [4,] 108 654 310
>>> [5,] 122 710 291
>>> 1147 more rows ...
>>>
>>> $Rb
>>> 01-13_B 01-13_M 01-13_T
>>> [1,] 108 119 135
>>> [2,] 109 137 159
>>> [3,] 113 124 169
>>> [4,] 115 124 180
>>> [5,] 119 104 159
>>> 1147 more rows ...
>>>
>>> $targets
>>> FileName Cy3 Cy5
>>> 1 01-13_B.csv B1 B2
>>> 2 01-13_M.csv M1 M2
>>> 3 01-13_T.csv T1 T2
>>>
>>> $genes
>>> Array Row Array Column Spot Row Spot Column Name ID Status
>>> 1 1 1 1 1 HEATH049 Gene A4 oligo
>>> 2 1 1 1 2 HEATH049 Gene A4 oligo
>>> 3 1 1 1 3 HEATH049 Gene A4 oligo
>>> 4 1 1 1 4 HEATH113 Gene A8 oligo
>>> 5 1 1 1 5 HEATH113 Gene A8 oligo
>>> 1147 more rows ...
>>>
>>> $source
>>> [1] "scanarrayexpress"
>>>
>>> $other
>>> $Ch1 SignalNoiseRatio
>>> 01-13_B 01-13_M 01-13_T
>>> [1,] 3.06 2.55 3.02
>>> [2,] 2.72 3.06 2.35
>>> [3,] 2.68 3.60 3.34
>>> [4,] 2.51 3.12 0.95
>>> [5,] 3.33 3.82 2.66
>>> 1147 more rows ...
>>>
>>> $Ch2 SignalNoiseRatio
>>> 01-13_B 01-13_M 01-13_T
>>> [1,] 2.31 12.41 2.85
>>> [2,] 2.42 11.82 3.57
>>> [3,] 2.66 11.71 4.14
>>> [4,] 1.75 14.41 0.65
>>> [5,] 2.09 15.90 4.62
>>> 1147 more rows ...
>>>
>>>
>>> $printer
>>> $ngrid.r
>>> [1] 4
>>>
>>> $ngrid.c
>>> [1] 4
>>>
>>> $nspot.r
>>> [1] 6
>>>
>>> $nspot.c
>>> [1] 14
>>>
>>>
>>>> MA<- normalizeWithinArrays(RG)
>>>
>>> Error in normalizeWithinArrays(RG) :
>>> printer layout information does not match M row dimension
>>>
>>>
>
>
> --
> Matthew Ouellette, M.Sc. Candidate
> Great Lakes Institute for Environmental Research
> University of Windsor
> 401 Sunset Ave., Windsor, ON, N9B 3P4
> Phone: (519) 253-3000, Ext 4248
> Fax: (519) 971-3616
> Email: ouellet5 at uwindsor.ca
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}
More information about the Bioconductor
mailing list