[BioC] limma and marray data import problem

Piotr Stępniak piotrek.stepniak at gmail.com
Mon Jun 2 11:38:45 CEST 2008


Dear Gordon,

Thank you for your reply.

I tried using source="genepix", it did not work better than "scanarray".
The following commands give:

> bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix")Error in read.table(file = file, header = TRUE, col.names = allcnames,  :
  duplicate 'row.names' are not allowed

It turnes out the format is not 100% valid GenePix, e.g. it does not
have any index column, so I try this:

>bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix", row.names=NULL)
Error in RG[[a]][, i] <- obj[, columns[[a]]] :
  number of items to replace is not a multiple of replacement length
In addition: Warning message:
In getLayout(RG$genes, guessdups = FALSE) : NAs introduced by coercion

I tried different parameter combinations which got me to the command
you've seen in the previous messages (I'm sorry for sending it 3
times...).

The file is finally read, but wrongly as described earlier.

Same happens to gal file:

> gal<-readGAL("Bialko.gal")
Error in read.table(file = file, header = TRUE, col.names = allcnames,  :
  duplicate 'row.names' are not allowed

> gal<-readGAL("Bialko.gal", row.names=NULL)
Error in if (is.int(totalPlate)) { : argument is of length zero

To answer your further questions shortly:
2. Yes, these are the files straight from the scanner software.
ScanArrayExpress also offers csv export, but reading them is another
problem. They do have Index column,
> bialkoRaw<- read.maimages( dir(pattern="csv"), columns=list(G="Ch1\ Median", Gb="Ch1\ B\ Median", R="Ch2\ Median", Rb="Ch2\ B\ Median"), sep=",")
reads the file and the values are under correct columns but I get no
printer layout read and other function to process the data gives:
Error in if (is.int(totalPlate)) { : argument is of length zero

3. Yes, I'd be happy to if you please look at it:

Beginning of GPR file:

ATF	1.0

21	82

"Type=GenePix Results 2"

"DateTime=2008/03/28 10:30:03"

"Settings=Easy Quant"

"GalFile=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\BIALACZKI_2_25luty2008_popr.gal"

"Scanner=Model: Express Serial No.: 432617"

"Comment=<F1>Alexa 555<F2>Alexa 647<F1 Offset>0,0<F2 Offset>0,0<Comment>"

"PixelSize=10"

"Wavelengths=543 nm	633 nm"

"ImageFiles=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\12_03_2008\Skan
Agi\HL60_szk13_PMT65_roz10_Alexa555.tif	D:\Luiza\Grant
bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan
Agi\26sz_szk13_PMT60_roz10_Alexa647.tif"

"PMTGain=65	60"

"NormalizationMethod=LOWESS"

"NormalizationFactors=0.000	0.000"

"JpegImage="

"RatioFormulations=W2/W1(633/543)"

"Barcode="

"ImageOrigin=1500	11600"

"JpegOrigin=0	0"

"Creator=ScanArray Express, Microarray Analysis System 3.0.0.16"

"Temperature=0.0"

"LaserPower=90	90	0	0"

"LaserOnTime=0	0	0	0"

"Block"	"Column"	"Row"	"Name"	"ID"	"X"	"Y"	"Dia."	"F543 Median"	"F543
Mean"	"F543 SD"	"B543 Median"	"B543 Mean"	"B543 SD"	"% > B543+1SD"	"%
> B543+2SD"	"F543 % Sat."	"F633 Median"	"F633 Mean"	"F633 SD"	"B633
Median"	"B633 Mean"	"B633 SD"	"% > B633+1SD"	"% > B633+2SD"	"F633 %
Sat."	"F3 Median"	"F3 Mean"	"F3 SD"	"B3 Median"	"B3 Mean"	"B3 SD"	"% >
B3+1SD"	"% > B3+2SD"	"F3 % Sat."	"F4 Median"	"F4 Mean"	"F4 SD"	"B4
Median"	"B4 Mean"	"B4 SD"	"% > B4+1SD"	"% > B4+2SD"	"F4 % Sat."	"Ratio
of Medians (633/543)"	"Ratio of Means (633/543)"	"Median of Ratios
(633/543)"	"Mean of Ratios (633/543)"	"Ratios SD (633/543)"	"Rgn Ratio
(633/543)"	"Rgn R² (633/543)"	"Ratio of Medians (Ratio/2)"	"Ratio of
Means (Ratio/2)"	"Median of Ratios (Ratio/2)"	"Mean of Ratios
(Ratio/2)"	"Ratios SD (Ratio/2)"	"Rgn Ratio (Ratio/2)"	"Rgn R²
(Ratio/2)"	"Ratio of Medians (Ratio/3)"	"Ratio of Means
(Ratio/3)"	"Median of Ratios (Ratio/3)"	"Mean of Ratios
(Ratio/3)"	"Ratios SD (Ratio/3)"	"Rgn Ratio (Ratio/3)"	"Rgn R²
(Ratio/3)"	"F Pixels"	"B Pixels"	"Sum of Medians"	"Sum of Means"	"Log
Ratio (633/543)"	"Log Ratio (Ratio/2)"	"Log Ratio (Ratio/3)"	"F543
Median - B543"	"F633 Median - B633"	"F3 Median - B3"	"F4 Median -
B4"	"F543 Mean - B543"	"F633 Mean - B633"	"F3 Mean - B3"	"F4 Mean -
B4"	"Flags"	"Normalize"

1	1	1	ERG_Operon	2078	2805	13125	230	5946	6035	1754	2490	2506	529	97	92	0	1604	1636	517	683	698	194	94	84	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.266	0.269	0.270	0.329	0.329	0.232	0.621	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	384	734	4377	4498	-1.908	0.000	0.000	3456	921	0	0	3545	953	0	0	100	1

1	2	1	ERG_Operon	2078	3250	13128	220	5368	5457	1634	2330	2378	537	96	91	0	1624	1651	531	651	671	188	95	88	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.320	0.320	0.318	0.567	0.567	0.254	0.608	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	351	858	4011	4127	-1.643	0.000	0.000	3038	973	0	0	3127	1000	0	0	100	1

1	3	1	ERG_Operon	2078	3698	13124	220	4368	4676	1646	2206	2240	490	90	81	0	1476	1562	592	646	673	182	90	80	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.384	0.371	0.377	0.498	0.498	0.281	0.610	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	348	858	2992	3386	-1.381	0.000	0.000	2162	830	0	0	2470	916	0	0	100	1

And for comparison here is a corresponding csv:

BEGIN HEADER

PerkinElmer Inc.

ScanArrayCSVFileFormat,2.00

ScanArray Express,2.00

Number_of_Columns,62

END HEADER



BEGIN GENERAL INFO

DateTime,2008/03/28 10:30

GalFile,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\BIALACZKI_2_25luty2008_popr.gal

Scanner,Model: Express Serial No.: 432617

User Name,Luiza

Computer Name,

Protocol,Easy Quant

Quantitation Method,Adaptive Circle

Quality Confidence Calculation,Footprint

User comments,

Image Origin,1500,11600

Temperature,0

Laser Powers,90,90

Laser On Time,0

PMT Voltages,65,60

END GENERAL INFO



BEGIN QUANTITATION PARAMETERS

Min Percentile,30

Max Percentile,300

END QUANTITATION PARAMETERS



BEGIN QUALITY MEASUREMENTS

Max Footprint,100

END QUALITY MEASUREMENTS



BEGIN ARRAY PATTERN INFO

Units,µm

Array Rows,10

Array Columns,4

Spot Rows,9

Spot Columns,9

Array Row Spacing,4500.000000

Array Column Spacing,4500.000000

Spot Row Spacing,450.000000

Spot Column Spacing,450.000000

Spot Diameter,200

Interstitial,0

Spots Per Array,81

Total Spots,2640

END ARRAY PATTERN INFO



BEGIN IMAGE INFO

ImageID,Channel,Image,Fluorophore,Barcode,Units,X Units Per Pixel,Y
Units Per Pixel,X Offset,Y Offset,Status

-1,CH1,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\12_03_2008\Skan Agi\HL60_szk13_PMT65_roz10_Alexa555.tif,Alexa
555,,µm,10.000000,10.000000,0.000000,0.000000,Control Image

-1,CH2,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\12_03_2008\Skan Agi\26sz_szk13_PMT60_roz10_Alexa647.tif,Alexa
647,,µm,10.000000,10.000000,0.000000,0.000000,

END IMAGE INFO



BEGIN NORMALIZATION INFO

Normalization Method,LOWESS

END NORMALIZATION INFO



BEGIN DATA

Index,Array Row,Array Column,Spot Row,Spot
Column,Name,ID,X,Y,Diameter,F Pixels,B Pixels,Footprint,Flags,Ch1
Median,Ch1 Mean,Ch1 SD,Ch1 B Median,Ch1 B Mean,Ch1 B SD,Ch1 % > B + 1
SD,Ch1 % > B + 2 SD,Ch1 F % Sat.,Ch1 Median - B,Ch1 Mean - B,Ch1
SignalNoiseRatio,Ch2 Median,Ch2 Mean,Ch2 SD,Ch2 B Median,Ch2 B
Mean,Ch2 B SD,Ch2 % > B + 1 SD,Ch2 % > B + 2 SD,Ch2 F % Sat.,Ch2
Median - B,Ch2 Mean - B,Ch2 SignalNoiseRatio,Ch2 Ratio of Medians,Ch2
Ratio of Means,Ch2 Median of Ratios,Ch2 Mean of Ratios,Ch2 Ratios
SD,Ch2 Rgn Ratio,Ch2 Rgn R²,Ch2 Log Ratio,Sum of Medians,Sum of
Means,Ch1 N Median,Ch1 N Mean,Ch1 N (Median-B),Ch1 N (Mean-B),Ch2 N
Median,Ch2 N Mean,Ch2 N (Median-B),Ch2 N (Mean-B),Ch2 N Ratio of
Medians,Ch2 N Ratio of Means,Ch2 N Median of Ratios,Ch2 N Mean of
Ratios,Ch2 N Rgn Ratio,Ch2 N Log Ratio

1,1,1,1,1,"ERG_Operon","2078",2805,13125,230,384,734,0,3,5946,6035,1754.26,2490,2506,529.19,97.4,92.2,0.0,3456,3545,11.24,1604,1636,517.27,683,698,194.19,94.3,84.1,0.0,921,953,8.26,0.27,0.27,0.27,0.33,0.39,0.23,0.62,-1.908,4377,4498,5946,6035,3456,3545,3027,2984,1446,2664,0.42,0.75,0.42,0.92,0.44,-1.257

2,1,1,1,2,"ERG_Operon","2078",3250,13128,220,351,858,0,3,5368,5457,1634.22,2330,2378,537.27,96.0,90.9,0.0,3038,3127,9.99,1624,1651,531.34,651,671,188.42,94.9,88.0,0.0,973,1000,8.62,0.32,0.32,0.32,0.57,2.14,0.25,0.61,-1.643,4011,4127,5368,5457,3038,3127,3100,3039,1536,2956,0.51,0.95,0.50,1.68,0.48,-0.984

3,1,1,1,3,"ERG_Operon","2078",3698,13124,220,348,858,0,3,4368,4676,1645.59,2206,2240,490.01,90.2,81.0,0.0,2162,2470,8.91,1476,1562,591.68,646,673,182.34,90.2,80.2,0.0,830,916,8.09,0.38,0.37,0.38,0.50,0.92,0.28,0.61,-1.381,2992,3386,4368,4676,2162,2470,2947,2941,283,797,0.13,0.32,0.13,0.43,0.56,-2.934


Kind Regards,
Piotr

On Mon, Jun 2, 2008 at 3:57 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> Dear Piotr,
>
> The file extension "gpr" is short for GenePix Results file.  If ScanArray
> Express outputs a file with this extension, you should have every
> expectation that is formated exactly the same as a gpr file from GenePix,
> and therefore you should be able to read it using
> read.maimages(source="genepix").  If this is not true, then ScanArray is
> irresponsible to use this extension.
>
> Same comments for the GAL file.  It is obviously not a GAL file as defined
> by GenePix, otherwise it would be read using readGAL().
>
> From your description below, a possible explanation for the problem is that
> your files have an extra column with no corresponding heading, e.g., a
> column of row numbers.  However no one on this mailing list can tell that
> for sure without you showing us some lines from your file.
>
> Questions:
> 1. Why have you set row.names=NULL? This prevents R from detecting a column
> of row numbers. What happens if you remove this?
>
> 2. Are these files exactly as output by ScanArray, or have they been further
> processed?
>
> 3. Can you post the first few lines of an example file?
>
> Best wishes
> Gordon
>
> PS. You posted the same question to the BioC mailing list on three
> consecutive days during the weekend.  Please post the question just once.
>
>
>> Date: Sat, 31 May 2008 12:55:25 +0200
>> From: " Piotr St?pniak " <piotrek.stepniak at gmail.com>
>> Subject: [BioC] limma and marray data import problem
>> To: bioconductor at stat.math.ethz.ch
>>
>> Hello Everyone,
>>
>> I am Piotr St?pniak, B.Sc. in Biotechnology, currently under M.Sc.
>> course at Adam Mickiewicz University in Pozna?, Poland. I am working
>> in Polish Science Academy in microarray experiments group.
>>
>> I'm a newbie in R and BioC, so please forgive me if my question is easy...
>>
>> I'm having problem with data import to RGList or marrayRaw objects.
>> Using the following instruction:
>> bialkoRaw<- read.maimages( dir(pattern="gpr"), columns=list(G="F543
>> Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"),
>> annotation=c("Block", "Column", "Row", "Name", "ID"), row.names=NULL)
>> The data seems to load, but $genes table looks odd, I guess the column
>> names are shifted right by 1 column:
>> $genes
>>  Block Column         Row Name   ID
>> 1     1      1  ERG_Operon 2078 2647
>> 2     2      1  ERG_Operon 2078 3102
>> 3     3      1  ERG_Operon 2078 3549
>> 4     4      1 FLT3_Operon 2322 3994
>> 5     5      1 FLT3_Operon 2322 4444
>> 2635 more rows ...
>> This I think causes printer layout to be imported wrongly and then any
>> other try to process the data (e.g. quality tests) produce such error
>> message:
>> Error in if (is.int(totalPlate)) { : argument is of length zero
>>
>> The data is obtained with ScanArrayExpress software, so I have it in
>> gpr or csv files, both give similar errors, but loading csv files
>> seems also to fail import values for each channel and gets only the
>> file name headers.
>>
>> Marray import also fails, I will skip the info about it not to enlarge
>> the mail unnecessarily.
>>
>> My R session info is as follows:
>>>
>>> sessionInfo()
>>
>> R version 2.6.2 (2008-02-08)
>> i486-pc-linux-gnu
>>
>> locale:
>> C
>>
>> attached base packages:
>> [1] grid      splines   tools     stats     graphics  grDevices utils
>> [8] datasets  methods   base
>>
>> other attached packages:
>> [1] arrayQuality_1.18.0  gridBase_0.4-3       hexbin_1.14.0
>> [4] convert_1.16.0       RColorBrewer_1.0-2   cluster_1.11.10
>> [7] arrayMagic_1.16.1    genefilter_1.16.0    survival_2.34-1
>> [10] marray_1.18.0        vsn_3.6.0            limma_2.14.1
>> [13] affy_1.16.0          preprocessCore_1.0.0 affyio_1.8.0
>> [16] Biobase_1.16.3       lattice_0.17-7
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.0.6 DBI_0.2-4           RSQLite_0.6-8
>> [4] annotate_1.18.0     rcompgen_0.1-17
>>
>>
>> I think I should also say that these data causes import problems to
>> any other data analysis software :( I also tried to read the printer
>> layout from gal file, but all I got was "Block, Row, Column, ID
>> columns not found" error.
>>
>> I'd greatly appreciate any help, please.
>>
>> Yours faithfully,
>> Piotr St?pniak
>



More information about the Bioconductor mailing list