[BioC] limma and marray data import problem

Gordon K Smyth smyth at wehi.EDU.AU
Wed Jun 4 06:56:26 CEST 2008


Dear Piotr,

I can't diagnose your problem, because the shortened version of your data 
file that you emailed reads fine for me when I put the lines in a text 
file, as I show below.  I used sep="" in my code because email doesn't 
preserve tab separators.  Presumably the problem appears further into the 
file, perhaps near the bottom.  Or else you file has inconsistent 
separators.

Can you try the arguments nrows=2 and nrows=2640?

I would also expect the csv file to read with the 
following:

   read.maimages("file.csv",columns=list(G="F543 Median",Gb="B543 Median", 
R="F633 Median", Rb="B633 Median"),sep=",",nrows=2640)

Best wishes
Gordon

My code:

> read.maimages("temp.txt",source="genepix",columns=list(G="F543 Median", 
Gb="B543 Median", R="F633 Median", Rb="B633 Median"),sep="")
Read temp.txt
An object of class "RGList"
$G
      temp
[1,] 5946
[2,] 5368

$Gb
      temp
[1,] 2490
[2,] 2330

$R
      temp
[1,] 1604
[2,] 1624

$Rb
      temp
[1,]  683
[2,]  651

$targets
      FileName
temp temp.txt

$genes
   Block Row Column   ID       Name
1     1   1      1 2078 ERG_Operon
2     1   1      2 2078 ERG_Operon

$source
[1] "genepix"

$printer
$ngrid.r
[1] 1

$ngrid.c
[1] 1

$nspot.r
[1] 1

$nspot.c
[1] 2

attr(,"class")
[1] "PrintLayout"


On Mon, 2 Jun 2008, Piotr Stêpniak wrote:

> Dear Gordon,
>
> Thank you for your reply.
>
> I tried using source="genepix", it did not work better than "scanarray".
> The following commands give:
>
>> bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix")Error in read.table(file = file, header = TRUE, col.names = allcnames,  :
>  duplicate 'row.names' are not allowed
>
> It turnes out the format is not 100% valid GenePix, e.g. it does not
> have any index column, so I try this:
>
>> bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix", row.names=NULL)
> Error in RG[[a]][, i] <- obj[, columns[[a]]] :
>  number of items to replace is not a multiple of replacement length
> In addition: Warning message:
> In getLayout(RG$genes, guessdups = FALSE) : NAs introduced by coercion
>
> I tried different parameter combinations which got me to the command
> you've seen in the previous messages (I'm sorry for sending it 3
> times...).
>
> The file is finally read, but wrongly as described earlier.
>
> Same happens to gal file:
>
>> gal<-readGAL("Bialko.gal")
> Error in read.table(file = file, header = TRUE, col.names = allcnames,  :
>  duplicate 'row.names' are not allowed
>
>> gal<-readGAL("Bialko.gal", row.names=NULL)
> Error in if (is.int(totalPlate)) { : argument is of length zero
>
> To answer your further questions shortly:
> 2. Yes, these are the files straight from the scanner software.
> ScanArrayExpress also offers csv export, but reading them is another
> problem. They do have Index column,
>> bialkoRaw<- read.maimages( dir(pattern="csv"), columns=list(G="Ch1\ Median", Gb="Ch1\ B\ Median", R="Ch2\ Median", Rb="Ch2\ B\ Median"), sep=",")
> reads the file and the values are under correct columns but I get no
> printer layout read and other function to process the data gives:
> Error in if (is.int(totalPlate)) { : argument is of length zero
>
> 3. Yes, I'd be happy to if you please look at it:
>
> Beginning of GPR file:
>
> ATF	1.0
>
> 21	82
>
> "Type=GenePix Results 2"
>
> "DateTime=2008/03/28 10:30:03"
>
> "Settings=Easy Quant"
>
> "GalFile=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\BIALACZKI_2_25luty2008_popr.gal"
>
> "Scanner=Model: Express Serial No.: 432617"
>
> "Comment=<F1>Alexa 555<F2>Alexa 647<F1 Offset>0,0<F2 Offset>0,0<Comment>"
>
> "PixelSize=10"
>
> "Wavelengths=543 nm	633 nm"
>
> "ImageFiles=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\12_03_2008\Skan
> Agi\HL60_szk13_PMT65_roz10_Alexa555.tif	D:\Luiza\Grant
> bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan
> Agi\26sz_szk13_PMT60_roz10_Alexa647.tif"
>
> "PMTGain=65	60"
>
> "NormalizationMethod=LOWESS"
>
> "NormalizationFactors=0.000	0.000"
>
> "JpegImage="
>
> "RatioFormulations=W2/W1(633/543)"
>
> "Barcode="
>
> "ImageOrigin=1500	11600"
>
> "JpegOrigin=0	0"
>
> "Creator=ScanArray Express, Microarray Analysis System 3.0.0.16"
>
> "Temperature=0.0"
>
> "LaserPower=90	90	0	0"
>
> "LaserOnTime=0	0	0	0"
>
> "Block"	"Column"	"Row"	"Name"	"ID"	"X"	"Y"	"Dia."	"F543 Median"	"F543
> Mean"	"F543 SD"	"B543 Median"	"B543 Mean"	"B543 SD"	"% > B543+1SD"	"%
>> B543+2SD"	"F543 % Sat."	"F633 Median"	"F633 Mean"	"F633 SD"	"B633
> Median"	"B633 Mean"	"B633 SD"	"% > B633+1SD"	"% > B633+2SD"	"F633 %
> Sat."	"F3 Median"	"F3 Mean"	"F3 SD"	"B3 Median"	"B3 Mean"	"B3 SD"	"% >
> B3+1SD"	"% > B3+2SD"	"F3 % Sat."	"F4 Median"	"F4 Mean"	"F4 SD"	"B4
> Median"	"B4 Mean"	"B4 SD"	"% > B4+1SD"	"% > B4+2SD"	"F4 % Sat."	"Ratio
> of Medians (633/543)"	"Ratio of Means (633/543)"	"Median of Ratios
> (633/543)"	"Mean of Ratios (633/543)"	"Ratios SD (633/543)"	"Rgn Ratio
> (633/543)"	"Rgn R² (633/543)"	"Ratio of Medians (Ratio/2)"	"Ratio of
> Means (Ratio/2)"	"Median of Ratios (Ratio/2)"	"Mean of Ratios
> (Ratio/2)"	"Ratios SD (Ratio/2)"	"Rgn Ratio (Ratio/2)"	"Rgn R²
> (Ratio/2)"	"Ratio of Medians (Ratio/3)"	"Ratio of Means
> (Ratio/3)"	"Median of Ratios (Ratio/3)"	"Mean of Ratios
> (Ratio/3)"	"Ratios SD (Ratio/3)"	"Rgn Ratio (Ratio/3)"	"Rgn R²
> (Ratio/3)"	"F Pixels"	"B Pixels"	"Sum of Medians"	"Sum of Means"	"Log
> Ratio (633/543)"	"Log Ratio (Ratio/2)"	"Log Ratio (Ratio/3)"	"F543
> Median - B543"	"F633 Median - B633"	"F3 Median - B3"	"F4 Median -
> B4"	"F543 Mean - B543"	"F633 Mean - B633"	"F3 Mean - B3"	"F4 Mean -
> B4"	"Flags"	"Normalize"
>
> 1	1	1	ERG_Operon	2078	2805	13125	230	5946	6035	1754	2490	2506	529	97	92	0	1604	1636	517	683	698	194	94	84	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.266	0.269	0.270	0.329	0.329	0.232	0.621	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	384	734	4377	4498	-1.908	0.000	0.000	3456	921	0	0	3545	953	0	0	100	1
>
> 1	2	1	ERG_Operon	2078	3250	13128	220	5368	5457	1634	2330	2378	537	96	91	0	1624	1651	531	651	671	188	95	88	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.320	0.320	0.318	0.567	0.567	0.254	0.608	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	351	858	4011	4127	-1.643	0.000	0.000	3038	973	0	0	3127	1000	0	0	100	1
>
> 1	3	1	ERG_Operon	2078	3698	13124	220	4368	4676	1646	2206	2240	490	90	81	0	1476	1562	592	646	673	182	90	80	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.384	0.371	0.377	0.498	0.498	0.281	0.610	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	348	858	2992	3386	-1.381	0.000	0.000	2162	830	0	0	2470	916	0	0	100	1
>
> And for comparison here is a corresponding csv:
>
> BEGIN HEADER
>
> PerkinElmer Inc.
>
> ScanArrayCSVFileFormat,2.00
>
> ScanArray Express,2.00
>
> Number_of_Columns,62
>
> END HEADER
>
>
>
> BEGIN GENERAL INFO
>
> DateTime,2008/03/28 10:30
>
> GalFile,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\BIALACZKI_2_25luty2008_popr.gal
>
> Scanner,Model: Express Serial No.: 432617
>
> User Name,Luiza
>
> Computer Name,
>
> Protocol,Easy Quant
>
> Quantitation Method,Adaptive Circle
>
> Quality Confidence Calculation,Footprint
>
> User comments,
>
> Image Origin,1500,11600
>
> Temperature,0
>
> Laser Powers,90,90
>
> Laser On Time,0
>
> PMT Voltages,65,60
>
> END GENERAL INFO
>
>
>
> BEGIN QUANTITATION PARAMETERS
>
> Min Percentile,30
>
> Max Percentile,300
>
> END QUANTITATION PARAMETERS
>
>
>
> BEGIN QUALITY MEASUREMENTS
>
> Max Footprint,100
>
> END QUALITY MEASUREMENTS
>
>
>
> BEGIN ARRAY PATTERN INFO
>
> Units,µm
>
> Array Rows,10
>
> Array Columns,4
>
> Spot Rows,9
>
> Spot Columns,9
>
> Array Row Spacing,4500.000000
>
> Array Column Spacing,4500.000000
>
> Spot Row Spacing,450.000000
>
> Spot Column Spacing,450.000000
>
> Spot Diameter,200
>
> Interstitial,0
>
> Spots Per Array,81
>
> Total Spots,2640
>
> END ARRAY PATTERN INFO
>
>
>
> BEGIN IMAGE INFO
>
> ImageID,Channel,Image,Fluorophore,Barcode,Units,X Units Per Pixel,Y
> Units Per Pixel,X Offset,Y Offset,Status
>
> -1,CH1,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\12_03_2008\Skan Agi\HL60_szk13_PMT65_roz10_Alexa555.tif,Alexa
> 555,,µm,10.000000,10.000000,0.000000,0.000000,Control Image
>
> -1,CH2,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\12_03_2008\Skan Agi\26sz_szk13_PMT60_roz10_Alexa647.tif,Alexa
> 647,,µm,10.000000,10.000000,0.000000,0.000000,
>
> END IMAGE INFO
>
>
>
> BEGIN NORMALIZATION INFO
>
> Normalization Method,LOWESS
>
> END NORMALIZATION INFO
>
>
>
> BEGIN DATA
>
> Index,Array Row,Array Column,Spot Row,Spot
> Column,Name,ID,X,Y,Diameter,F Pixels,B Pixels,Footprint,Flags,Ch1
> Median,Ch1 Mean,Ch1 SD,Ch1 B Median,Ch1 B Mean,Ch1 B SD,Ch1 % > B + 1
> SD,Ch1 % > B + 2 SD,Ch1 F % Sat.,Ch1 Median - B,Ch1 Mean - B,Ch1
> SignalNoiseRatio,Ch2 Median,Ch2 Mean,Ch2 SD,Ch2 B Median,Ch2 B
> Mean,Ch2 B SD,Ch2 % > B + 1 SD,Ch2 % > B + 2 SD,Ch2 F % Sat.,Ch2
> Median - B,Ch2 Mean - B,Ch2 SignalNoiseRatio,Ch2 Ratio of Medians,Ch2
> Ratio of Means,Ch2 Median of Ratios,Ch2 Mean of Ratios,Ch2 Ratios
> SD,Ch2 Rgn Ratio,Ch2 Rgn R²,Ch2 Log Ratio,Sum of Medians,Sum of
> Means,Ch1 N Median,Ch1 N Mean,Ch1 N (Median-B),Ch1 N (Mean-B),Ch2 N
> Median,Ch2 N Mean,Ch2 N (Median-B),Ch2 N (Mean-B),Ch2 N Ratio of
> Medians,Ch2 N Ratio of Means,Ch2 N Median of Ratios,Ch2 N Mean of
> Ratios,Ch2 N Rgn Ratio,Ch2 N Log Ratio
>
> 1,1,1,1,1,"ERG_Operon","2078",2805,13125,230,384,734,0,3,5946,6035,1754.26,2490,2506,529.19,97.4,92.2,0.0,3456,3545,11.24,1604,1636,517.27,683,698,194.19,94.3,84.1,0.0,921,953,8.26,0.27,0.27,0.27,0.33,0.39,0.23,0.62,-1.908,4377,4498,5946,6035,3456,3545,3027,2984,1446,2664,0.42,0.75,0.42,0.92,0.44,-1.257
>
> 2,1,1,1,2,"ERG_Operon","2078",3250,13128,220,351,858,0,3,5368,5457,1634.22,2330,2378,537.27,96.0,90.9,0.0,3038,3127,9.99,1624,1651,531.34,651,671,188.42,94.9,88.0,0.0,973,1000,8.62,0.32,0.32,0.32,0.57,2.14,0.25,0.61,-1.643,4011,4127,5368,5457,3038,3127,3100,3039,1536,2956,0.51,0.95,0.50,1.68,0.48,-0.984
>
> 3,1,1,1,3,"ERG_Operon","2078",3698,13124,220,348,858,0,3,4368,4676,1645.59,2206,2240,490.01,90.2,81.0,0.0,2162,2470,8.91,1476,1562,591.68,646,673,182.34,90.2,80.2,0.0,830,916,8.09,0.38,0.37,0.38,0.50,0.92,0.28,0.61,-1.381,2992,3386,4368,4676,2162,2470,2947,2941,283,797,0.13,0.32,0.13,0.43,0.56,-2.934
>
>
> Kind Regards,
> Piotr
>
> On Mon, Jun 2, 2008 at 3:57 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>> Dear Piotr,
>>
>> The file extension "gpr" is short for GenePix Results file.  If ScanArray
>> Express outputs a file with this extension, you should have every
>> expectation that is formated exactly the same as a gpr file from GenePix,
>> and therefore you should be able to read it using
>> read.maimages(source="genepix").  If this is not true, then ScanArray is
>> irresponsible to use this extension.
>>
>> Same comments for the GAL file.  It is obviously not a GAL file as defined
>> by GenePix, otherwise it would be read using readGAL().
>>
>> From your description below, a possible explanation for the problem is that
>> your files have an extra column with no corresponding heading, e.g., a
>> column of row numbers.  However no one on this mailing list can tell that
>> for sure without you showing us some lines from your file.
>>
>> Questions:
>> 1. Why have you set row.names=NULL? This prevents R from detecting a column
>> of row numbers. What happens if you remove this?
>>
>> 2. Are these files exactly as output by ScanArray, or have they been further
>> processed?
>>
>> 3. Can you post the first few lines of an example file?
>>
>> Best wishes
>> Gordon
>>
>> PS. You posted the same question to the BioC mailing list on three
>> consecutive days during the weekend.  Please post the question just once.
>>
>>
>>> Date: Sat, 31 May 2008 12:55:25 +0200
>>> From: " Piotr St?pniak " <piotrek.stepniak at gmail.com>
>>> Subject: [BioC] limma and marray data import problem
>>> To: bioconductor at stat.math.ethz.ch
>>>
>>> Hello Everyone,
>>>
>>> I am Piotr St?pniak, B.Sc. in Biotechnology, currently under M.Sc.
>>> course at Adam Mickiewicz University in Pozna?, Poland. I am working
>>> in Polish Science Academy in microarray experiments group.
>>>
>>> I'm a newbie in R and BioC, so please forgive me if my question is easy...
>>>
>>> I'm having problem with data import to RGList or marrayRaw objects.
>>> Using the following instruction:
>>> bialkoRaw<- read.maimages( dir(pattern="gpr"), columns=list(G="F543
>>> Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"),
>>> annotation=c("Block", "Column", "Row", "Name", "ID"), row.names=NULL)
>>> The data seems to load, but $genes table looks odd, I guess the column
>>> names are shifted right by 1 column:
>>> $genes
>>>  Block Column         Row Name   ID
>>> 1     1      1  ERG_Operon 2078 2647
>>> 2     2      1  ERG_Operon 2078 3102
>>> 3     3      1  ERG_Operon 2078 3549
>>> 4     4      1 FLT3_Operon 2322 3994
>>> 5     5      1 FLT3_Operon 2322 4444
>>> 2635 more rows ...
>>> This I think causes printer layout to be imported wrongly and then any
>>> other try to process the data (e.g. quality tests) produce such error
>>> message:
>>> Error in if (is.int(totalPlate)) { : argument is of length zero
>>>
>>> The data is obtained with ScanArrayExpress software, so I have it in
>>> gpr or csv files, both give similar errors, but loading csv files
>>> seems also to fail import values for each channel and gets only the
>>> file name headers.
>>>
>>> Marray import also fails, I will skip the info about it not to enlarge
>>> the mail unnecessarily.
>>>
>>> My R session info is as follows:
>>>>
>>>> sessionInfo()
>>>
>>> R version 2.6.2 (2008-02-08)
>>> i486-pc-linux-gnu
>>>
>>> locale:
>>> C
>>>
>>> attached base packages:
>>> [1] grid      splines   tools     stats     graphics  grDevices utils
>>> [8] datasets  methods   base
>>>
>>> other attached packages:
>>> [1] arrayQuality_1.18.0  gridBase_0.4-3       hexbin_1.14.0
>>> [4] convert_1.16.0       RColorBrewer_1.0-2   cluster_1.11.10
>>> [7] arrayMagic_1.16.1    genefilter_1.16.0    survival_2.34-1
>>> [10] marray_1.18.0        vsn_3.6.0            limma_2.14.1
>>> [13] affy_1.16.0          preprocessCore_1.0.0 affyio_1.8.0
>>> [16] Biobase_1.16.3       lattice_0.17-7
>>>
>>> loaded via a namespace (and not attached):
>>> [1] AnnotationDbi_1.0.6 DBI_0.2-4           RSQLite_0.6-8
>>> [4] annotate_1.18.0     rcompgen_0.1-17
>>>
>>>
>>> I think I should also say that these data causes import problems to
>>> any other data analysis software :( I also tried to read the printer
>>> layout from gal file, but all I got was "Block, Row, Column, ID
>>> columns not found" error.
>>>
>>> I'd greatly appreciate any help, please.
>>>
>>> Yours faithfully,
>>> Piotr St?pniak
>>
>


More information about the Bioconductor mailing list