[BioC] Problems normalizing scanarray express data with limma

Thu Jan 12 23:57:54 CET 2012

Dear Matthew,

On Thu, 12 Jan 2012, Matthew Ouellette wrote:

> Dear Gordon,
>
> I feel I should have come to that conclusion myself - you're correct
> that there are missing spots.  However, this is not the result of
> software removing blank spots; it is in fact the way we printed the
> array.  Each block consists of 6 rows x 14 columns, but the 6th row
> only has spots in columns 1 and 2 (i.e. there are only 2 spots on row
> 6 in each block).  This layout is the result of spotting 384 probes in
> the smallest area possible in order to cut down the amount of RT
> reagents needed to produce significant results.
>
> As I mentioned earlier, I am very new to R.  I am in the process of
> attempting to use the block count variable you suggested, however I'm
> having difficulties adjusting it to the code you suggested in 2005.
> How would I modify the following to fit my particular array?
>
> for (b in 1:48) {
>         i <- RG$genes$Block==b
>         MA2 <- normalizeWithinArrays(RG[i,],method="loess")
>         if(b==1)
>                 MA <- MA2
>         else
>                 MA <- rbind(MA,MA2)
> }

Replace "48" with "16" and "RG$genes$Block" with "block".

> As for the the input files, I have attempted to use:
>
> RG <-read.maimages(targets, source="scanarrayexpress", sep=",")
>
> on unedited files, but the following warnings come up (when analyzing
> 15 unedited array files this time):
>
>> RG<-read.maimages(targets, source="scanarrayexpress", sep=",")
> Read 01-13_B.csv
> Read 01-13_M.csv
> Read 01-13_T.csv
> Read 01-14_B.csv
> Read 01-14_M.csv
> Read 01-14_T.csv
> Read 01-15_B.csv
> Read 01-15_M.csv
> Read 01-15_T.csv
> Read 01-16_B.csv
> Read 01-16_M.csv
> Read 01-16_T.csv
> Read 01-17_B.csv
> Read 01-17_M.csv
> Read 01-17_T.csv
> There were 45 warnings (use warnings() to see them)
>> warnings()
> Warning messages:
> 1: In grep(a, txt) : input string 1 is invalid in this locale
> 2: In grep(a, txt) : input string 1 is invalid in this locale
> 3: In grep(a, txt) : input string 1 is invalid in this locale
> 4: In grep(a, txt) : input string 1 is invalid in this locale
> 5: In grep(a, txt) : input string 1 is invalid in this locale
> 6: In grep(a, txt) : input string 1 is invalid in this locale
> 7: In grep(a, txt) : input string 1 is invalid in this locale
> 8: In grep(a, txt) : input string 1 is invalid in this locale
> 9: In grep(a, txt) : input string 1 is invalid in this locale
> 10: In grep(a, txt) : input string 1 is invalid in this locale
> [... to 45]

This is most likely caused by the fact that your copy of R is compiled for 
a different language than that used by the software used to write your 
data.  Eg., it could be that your R is American English and but the files 
were written using an extended French alphabet, so your files contain 
non-english letters.

Typing sessionInfo() will reveal the language (locale) your version of R 
is compiled for.

> I have contacted a co-worker about this problem and he claims that he
> doesn't get this error when using R in Windows XP (I am currently
> using Mac OS X).  At first, I thought these errors would skew my
> results so I opted to edit the files myself just to get the hang of
> limma.

Nothing to do with Windows or Mac.  Probably doesn't affect your limma 
results.

Best wishes
Gordon

> I appreciate your help,
>
> Matthew
>
> On Wed, Jan 11, 2012 at 11:52 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>
>> Dear Matthew,
>>
>> This question hasn't been asked for many years!  It used to be quite a common question, see for example:
>>
>> https://stat.ethz.ch/pipermail/bioconductor/2005-July/009886.html
>>
>> The problem is not that you have an extra row, but rather than you have too few rows.  Your arrays have 16 blocks (4 x 4) with 6 rows and 14 columns of spots in each block.  So limma assumes your arrays to have 4x4x6x14 = 1344 spots, but your files actually contain only 1152 rows of data.  The reason is almost certainly that a number of empty spots have been removed from the files.
>>
>> One easy workaround is simply to do global loess instead of print-tip-loess normalization:
>>
>>  MA <- normalizeWithinArrays(RG, method="loess")
>>
>> Another workaround is to make up a block count variable:
>>
>>  block <- 4*(RG$genes[,"Array Row"]-1) + RG$genes[,"Array Column"]
>>
>> and then to use the solution that I suggested back in July 2005.
>>
>>
>> With respect to the deleting of 74 lines of headers and so forth, have you tried simply using
>>
>>  RG <-read.maimages(targets, source="scanarrayexpress", sep=",")
>>
>> using your original unedited files?  The whole reason for having a "scanarrayexpress" method for read.maimages() is that it takes care of all the editing and reading for you.
>>
>> Best wishes
>> Gordon
>>
>>
>>> Date: Tue, 10 Jan 2012 14:34:53 -0500
>>> From: Matthew Ouellette <ouellet5 at uwindsor.ca>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] Problems normalizing scanarray express data with limma
>>>
>>> Hello,
>>>
>>> I'm having trouble analyzing my custom arrays with limma.  I've searched
>>> the archives and I seem to be running into a similar problem that was
>>> previously dealt with here (
>>> https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html).
>>>
>>> I'm also using outputs from a scanarray express, although I've modified my
>>> .csv's accordingly and removed the final line of useless data as indicated
>>> in the archives.  Also, being an R newbie I wasn't sure how to tell R that
>>> my data started after some 74 lines of headers (output info from the
>>> scanner), so I deleted those headers out as well (and input $printer info
>>> manually), leaving only a header for the columns of intensity data.   For
>>> simplicities sake I've pasted below a shortened session of what I'm trying
>>> to do (my apologies for the lengthy e-mail).  I appreciate the help and
>>> comments.
>>>
>>>
>>>
>>> R version 2.14.0 (2011-10-31)
>>> Copyright (C) 2011 The R Foundation for Statistical Computing
>>> ISBN 3-900051-07-0
>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>> [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0]
>>>
>>>> setwd("***")
>>>> library(limma)
>>>> targets<-readTargets()
>>>> RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array
>>>
>>> Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"),
>>> other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",")
>>> Read 01-13_B.csv
>>> Read 01-13_M.csv
>>> Read 01-13_T.csv
>>>>
>>>> RG$printer <-getLayout2("ChinookBOT.gal")
>>>> spottypes<-readSpotTypes()
>>>> RG$genes$Status<- controlStatus(spottypes, RG)
>>>
>>> Matching patterns for: Name
>>> Found 1116 oligo
>>> Found 21 blank
>>> Found 15 serial
>>> Setting attributes: values Color
>>>>
>>>> show(RG)
>>>
>>> An object of class "RGList"
>>> $G
>>>    01-13_B 01-13_M 01-13_T
>>> [1,]     102     119     239
>>> [2,]     100     122     339
>>> [3,]     102     135     251
>>> [4,]      90     112     242
>>> [5,]     110     141     239
>>> 1147 more rows ...
>>>
>>> $Gb
>>>    01-13_B 01-13_M 01-13_T
>>> [1,]      89      94     147
>>> [2,]      88      84     181
>>> [3,]      88      91     161
>>> [4,]      92      90     175
>>> [5,]      86      87     154
>>> 1147 more rows ...
>>>
>>> $R
>>>    01-13_B 01-13_M 01-13_T
>>> [1,]     120     678     202
>>> [2,]     154     610     312
>>> [3,]     146     614     306
>>> [4,]     108     654     310
>>> [5,]     122     710     291
>>> 1147 more rows ...
>>>
>>> $Rb
>>>    01-13_B 01-13_M 01-13_T
>>> [1,]     108     119     135
>>> [2,]     109     137     159
>>> [3,]     113     124     169
>>> [4,]     115     124     180
>>> [5,]     119     104     159
>>> 1147 more rows ...
>>>
>>> $targets
>>>    FileName Cy3 Cy5
>>> 1 01-13_B.csv  B1  B2
>>> 2 01-13_M.csv  M1  M2
>>> 3 01-13_T.csv  T1  T2
>>>
>>> $genes
>>>  Array Row Array Column Spot Row Spot Column     Name      ID Status
>>> 1         1            1        1           1 HEATH049 Gene A4  oligo
>>> 2         1            1        1           2 HEATH049 Gene A4  oligo
>>> 3         1            1        1           3 HEATH049 Gene A4  oligo
>>> 4         1            1        1           4 HEATH113 Gene A8  oligo
>>> 5         1            1        1           5 HEATH113 Gene A8  oligo
>>> 1147 more rows ...
>>>
>>> $source
>>> [1] "scanarrayexpress"
>>>
>>> $other
>>> $Ch1 SignalNoiseRatio
>>>    01-13_B 01-13_M 01-13_T
>>> [1,]    3.06    2.55    3.02
>>> [2,]    2.72    3.06    2.35
>>> [3,]    2.68    3.60    3.34
>>> [4,]    2.51    3.12    0.95
>>> [5,]    3.33    3.82    2.66
>>> 1147 more rows ...
>>>
>>> $Ch2 SignalNoiseRatio
>>>    01-13_B 01-13_M 01-13_T
>>> [1,]    2.31   12.41    2.85
>>> [2,]    2.42   11.82    3.57
>>> [3,]    2.66   11.71    4.14
>>> [4,]    1.75   14.41    0.65
>>> [5,]    2.09   15.90    4.62
>>> 1147 more rows ...
>>>
>>>
>>> $printer
>>> $ngrid.r
>>> [1] 4
>>>
>>> $ngrid.c
>>> [1] 4
>>>
>>> $nspot.r
>>> [1] 6
>>>
>>> $nspot.c
>>> [1] 14
>>>
>>>
>>>> MA<- normalizeWithinArrays(RG)
>>>
>>> Error in normalizeWithinArrays(RG) :
>>>  printer layout information does not match M row dimension
>>>
>>>
>
>
> --
> Matthew Ouellette, M.Sc. Candidate
> Great Lakes Institute for Environmental Research
> University of Windsor
> 401 Sunset Ave., Windsor, ON, N9B 3P4
> Phone: (519) 253-3000, Ext 4248
> Fax: (519) 971-3616
> Email: ouellet5 at uwindsor.ca
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}