[BioC] limma and marray data import problem
Piotr Stępniak
piotrek.stepniak at gmail.com
Mon Jun 2 11:38:45 CEST 2008
Dear Gordon,
Thank you for your reply.
I tried using source="genepix", it did not work better than "scanarray".
The following commands give:
> bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix")Error in read.table(file = file, header = TRUE, col.names = allcnames, :
duplicate 'row.names' are not allowed
It turnes out the format is not 100% valid GenePix, e.g. it does not
have any index column, so I try this:
>bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix", row.names=NULL)
Error in RG[[a]][, i] <- obj[, columns[[a]]] :
number of items to replace is not a multiple of replacement length
In addition: Warning message:
In getLayout(RG$genes, guessdups = FALSE) : NAs introduced by coercion
I tried different parameter combinations which got me to the command
you've seen in the previous messages (I'm sorry for sending it 3
times...).
The file is finally read, but wrongly as described earlier.
Same happens to gal file:
> gal<-readGAL("Bialko.gal")
Error in read.table(file = file, header = TRUE, col.names = allcnames, :
duplicate 'row.names' are not allowed
> gal<-readGAL("Bialko.gal", row.names=NULL)
Error in if (is.int(totalPlate)) { : argument is of length zero
To answer your further questions shortly:
2. Yes, these are the files straight from the scanner software.
ScanArrayExpress also offers csv export, but reading them is another
problem. They do have Index column,
> bialkoRaw<- read.maimages( dir(pattern="csv"), columns=list(G="Ch1\ Median", Gb="Ch1\ B\ Median", R="Ch2\ Median", Rb="Ch2\ B\ Median"), sep=",")
reads the file and the values are under correct columns but I get no
printer layout read and other function to process the data gives:
Error in if (is.int(totalPlate)) { : argument is of length zero
3. Yes, I'd be happy to if you please look at it:
Beginning of GPR file:
ATF 1.0
21 82
"Type=GenePix Results 2"
"DateTime=2008/03/28 10:30:03"
"Settings=Easy Quant"
"GalFile=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\BIALACZKI_2_25luty2008_popr.gal"
"Scanner=Model: Express Serial No.: 432617"
"Comment=<F1>Alexa 555<F2>Alexa 647<F1 Offset>0,0<F2 Offset>0,0<Comment>"
"PixelSize=10"
"Wavelengths=543 nm 633 nm"
"ImageFiles=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\12_03_2008\Skan
Agi\HL60_szk13_PMT65_roz10_Alexa555.tif D:\Luiza\Grant
bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan
Agi\26sz_szk13_PMT60_roz10_Alexa647.tif"
"PMTGain=65 60"
"NormalizationMethod=LOWESS"
"NormalizationFactors=0.000 0.000"
"JpegImage="
"RatioFormulations=W2/W1(633/543)"
"Barcode="
"ImageOrigin=1500 11600"
"JpegOrigin=0 0"
"Creator=ScanArray Express, Microarray Analysis System 3.0.0.16"
"Temperature=0.0"
"LaserPower=90 90 0 0"
"LaserOnTime=0 0 0 0"
"Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F543 Median" "F543
Mean" "F543 SD" "B543 Median" "B543 Mean" "B543 SD" "% > B543+1SD" "%
> B543+2SD" "F543 % Sat." "F633 Median" "F633 Mean" "F633 SD" "B633
Median" "B633 Mean" "B633 SD" "% > B633+1SD" "% > B633+2SD" "F633 %
Sat." "F3 Median" "F3 Mean" "F3 SD" "B3 Median" "B3 Mean" "B3 SD" "% >
B3+1SD" "% > B3+2SD" "F3 % Sat." "F4 Median" "F4 Mean" "F4 SD" "B4
Median" "B4 Mean" "B4 SD" "% > B4+1SD" "% > B4+2SD" "F4 % Sat." "Ratio
of Medians (633/543)" "Ratio of Means (633/543)" "Median of Ratios
(633/543)" "Mean of Ratios (633/543)" "Ratios SD (633/543)" "Rgn Ratio
(633/543)" "Rgn R² (633/543)" "Ratio of Medians (Ratio/2)" "Ratio of
Means (Ratio/2)" "Median of Ratios (Ratio/2)" "Mean of Ratios
(Ratio/2)" "Ratios SD (Ratio/2)" "Rgn Ratio (Ratio/2)" "Rgn R²
(Ratio/2)" "Ratio of Medians (Ratio/3)" "Ratio of Means
(Ratio/3)" "Median of Ratios (Ratio/3)" "Mean of Ratios
(Ratio/3)" "Ratios SD (Ratio/3)" "Rgn Ratio (Ratio/3)" "Rgn R²
(Ratio/3)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log
Ratio (633/543)" "Log Ratio (Ratio/2)" "Log Ratio (Ratio/3)" "F543
Median - B543" "F633 Median - B633" "F3 Median - B3" "F4 Median -
B4" "F543 Mean - B543" "F633 Mean - B633" "F3 Mean - B3" "F4 Mean -
B4" "Flags" "Normalize"
1 1 1 ERG_Operon 2078 2805 13125 230 5946 6035 1754 2490 2506 529 97 92 0 1604 1636 517 683 698 194 94 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.266 0.269 0.270 0.329 0.329 0.232 0.621 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 384 734 4377 4498 -1.908 0.000 0.000 3456 921 0 0 3545 953 0 0 100 1
1 2 1 ERG_Operon 2078 3250 13128 220 5368 5457 1634 2330 2378 537 96 91 0 1624 1651 531 651 671 188 95 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.320 0.320 0.318 0.567 0.567 0.254 0.608 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 351 858 4011 4127 -1.643 0.000 0.000 3038 973 0 0 3127 1000 0 0 100 1
1 3 1 ERG_Operon 2078 3698 13124 220 4368 4676 1646 2206 2240 490 90 81 0 1476 1562 592 646 673 182 90 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.384 0.371 0.377 0.498 0.498 0.281 0.610 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 348 858 2992 3386 -1.381 0.000 0.000 2162 830 0 0 2470 916 0 0 100 1
And for comparison here is a corresponding csv:
BEGIN HEADER
PerkinElmer Inc.
ScanArrayCSVFileFormat,2.00
ScanArray Express,2.00
Number_of_Columns,62
END HEADER
BEGIN GENERAL INFO
DateTime,2008/03/28 10:30
GalFile,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\BIALACZKI_2_25luty2008_popr.gal
Scanner,Model: Express Serial No.: 432617
User Name,Luiza
Computer Name,
Protocol,Easy Quant
Quantitation Method,Adaptive Circle
Quality Confidence Calculation,Footprint
User comments,
Image Origin,1500,11600
Temperature,0
Laser Powers,90,90
Laser On Time,0
PMT Voltages,65,60
END GENERAL INFO
BEGIN QUANTITATION PARAMETERS
Min Percentile,30
Max Percentile,300
END QUANTITATION PARAMETERS
BEGIN QUALITY MEASUREMENTS
Max Footprint,100
END QUALITY MEASUREMENTS
BEGIN ARRAY PATTERN INFO
Units,µm
Array Rows,10
Array Columns,4
Spot Rows,9
Spot Columns,9
Array Row Spacing,4500.000000
Array Column Spacing,4500.000000
Spot Row Spacing,450.000000
Spot Column Spacing,450.000000
Spot Diameter,200
Interstitial,0
Spots Per Array,81
Total Spots,2640
END ARRAY PATTERN INFO
BEGIN IMAGE INFO
ImageID,Channel,Image,Fluorophore,Barcode,Units,X Units Per Pixel,Y
Units Per Pixel,X Offset,Y Offset,Status
-1,CH1,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\12_03_2008\Skan Agi\HL60_szk13_PMT65_roz10_Alexa555.tif,Alexa
555,,µm,10.000000,10.000000,0.000000,0.000000,Control Image
-1,CH2,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
RZUT\12_03_2008\Skan Agi\26sz_szk13_PMT60_roz10_Alexa647.tif,Alexa
647,,µm,10.000000,10.000000,0.000000,0.000000,
END IMAGE INFO
BEGIN NORMALIZATION INFO
Normalization Method,LOWESS
END NORMALIZATION INFO
BEGIN DATA
Index,Array Row,Array Column,Spot Row,Spot
Column,Name,ID,X,Y,Diameter,F Pixels,B Pixels,Footprint,Flags,Ch1
Median,Ch1 Mean,Ch1 SD,Ch1 B Median,Ch1 B Mean,Ch1 B SD,Ch1 % > B + 1
SD,Ch1 % > B + 2 SD,Ch1 F % Sat.,Ch1 Median - B,Ch1 Mean - B,Ch1
SignalNoiseRatio,Ch2 Median,Ch2 Mean,Ch2 SD,Ch2 B Median,Ch2 B
Mean,Ch2 B SD,Ch2 % > B + 1 SD,Ch2 % > B + 2 SD,Ch2 F % Sat.,Ch2
Median - B,Ch2 Mean - B,Ch2 SignalNoiseRatio,Ch2 Ratio of Medians,Ch2
Ratio of Means,Ch2 Median of Ratios,Ch2 Mean of Ratios,Ch2 Ratios
SD,Ch2 Rgn Ratio,Ch2 Rgn R²,Ch2 Log Ratio,Sum of Medians,Sum of
Means,Ch1 N Median,Ch1 N Mean,Ch1 N (Median-B),Ch1 N (Mean-B),Ch2 N
Median,Ch2 N Mean,Ch2 N (Median-B),Ch2 N (Mean-B),Ch2 N Ratio of
Medians,Ch2 N Ratio of Means,Ch2 N Median of Ratios,Ch2 N Mean of
Ratios,Ch2 N Rgn Ratio,Ch2 N Log Ratio
1,1,1,1,1,"ERG_Operon","2078",2805,13125,230,384,734,0,3,5946,6035,1754.26,2490,2506,529.19,97.4,92.2,0.0,3456,3545,11.24,1604,1636,517.27,683,698,194.19,94.3,84.1,0.0,921,953,8.26,0.27,0.27,0.27,0.33,0.39,0.23,0.62,-1.908,4377,4498,5946,6035,3456,3545,3027,2984,1446,2664,0.42,0.75,0.42,0.92,0.44,-1.257
2,1,1,1,2,"ERG_Operon","2078",3250,13128,220,351,858,0,3,5368,5457,1634.22,2330,2378,537.27,96.0,90.9,0.0,3038,3127,9.99,1624,1651,531.34,651,671,188.42,94.9,88.0,0.0,973,1000,8.62,0.32,0.32,0.32,0.57,2.14,0.25,0.61,-1.643,4011,4127,5368,5457,3038,3127,3100,3039,1536,2956,0.51,0.95,0.50,1.68,0.48,-0.984
3,1,1,1,3,"ERG_Operon","2078",3698,13124,220,348,858,0,3,4368,4676,1645.59,2206,2240,490.01,90.2,81.0,0.0,2162,2470,8.91,1476,1562,591.68,646,673,182.34,90.2,80.2,0.0,830,916,8.09,0.38,0.37,0.38,0.50,0.92,0.28,0.61,-1.381,2992,3386,4368,4676,2162,2470,2947,2941,283,797,0.13,0.32,0.13,0.43,0.56,-2.934
Kind Regards,
Piotr
On Mon, Jun 2, 2008 at 3:57 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> Dear Piotr,
>
> The file extension "gpr" is short for GenePix Results file. If ScanArray
> Express outputs a file with this extension, you should have every
> expectation that is formated exactly the same as a gpr file from GenePix,
> and therefore you should be able to read it using
> read.maimages(source="genepix"). If this is not true, then ScanArray is
> irresponsible to use this extension.
>
> Same comments for the GAL file. It is obviously not a GAL file as defined
> by GenePix, otherwise it would be read using readGAL().
>
> From your description below, a possible explanation for the problem is that
> your files have an extra column with no corresponding heading, e.g., a
> column of row numbers. However no one on this mailing list can tell that
> for sure without you showing us some lines from your file.
>
> Questions:
> 1. Why have you set row.names=NULL? This prevents R from detecting a column
> of row numbers. What happens if you remove this?
>
> 2. Are these files exactly as output by ScanArray, or have they been further
> processed?
>
> 3. Can you post the first few lines of an example file?
>
> Best wishes
> Gordon
>
> PS. You posted the same question to the BioC mailing list on three
> consecutive days during the weekend. Please post the question just once.
>
>
>> Date: Sat, 31 May 2008 12:55:25 +0200
>> From: " Piotr St?pniak " <piotrek.stepniak at gmail.com>
>> Subject: [BioC] limma and marray data import problem
>> To: bioconductor at stat.math.ethz.ch
>>
>> Hello Everyone,
>>
>> I am Piotr St?pniak, B.Sc. in Biotechnology, currently under M.Sc.
>> course at Adam Mickiewicz University in Pozna?, Poland. I am working
>> in Polish Science Academy in microarray experiments group.
>>
>> I'm a newbie in R and BioC, so please forgive me if my question is easy...
>>
>> I'm having problem with data import to RGList or marrayRaw objects.
>> Using the following instruction:
>> bialkoRaw<- read.maimages( dir(pattern="gpr"), columns=list(G="F543
>> Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"),
>> annotation=c("Block", "Column", "Row", "Name", "ID"), row.names=NULL)
>> The data seems to load, but $genes table looks odd, I guess the column
>> names are shifted right by 1 column:
>> $genes
>> Block Column Row Name ID
>> 1 1 1 ERG_Operon 2078 2647
>> 2 2 1 ERG_Operon 2078 3102
>> 3 3 1 ERG_Operon 2078 3549
>> 4 4 1 FLT3_Operon 2322 3994
>> 5 5 1 FLT3_Operon 2322 4444
>> 2635 more rows ...
>> This I think causes printer layout to be imported wrongly and then any
>> other try to process the data (e.g. quality tests) produce such error
>> message:
>> Error in if (is.int(totalPlate)) { : argument is of length zero
>>
>> The data is obtained with ScanArrayExpress software, so I have it in
>> gpr or csv files, both give similar errors, but loading csv files
>> seems also to fail import values for each channel and gets only the
>> file name headers.
>>
>> Marray import also fails, I will skip the info about it not to enlarge
>> the mail unnecessarily.
>>
>> My R session info is as follows:
>>>
>>> sessionInfo()
>>
>> R version 2.6.2 (2008-02-08)
>> i486-pc-linux-gnu
>>
>> locale:
>> C
>>
>> attached base packages:
>> [1] grid splines tools stats graphics grDevices utils
>> [8] datasets methods base
>>
>> other attached packages:
>> [1] arrayQuality_1.18.0 gridBase_0.4-3 hexbin_1.14.0
>> [4] convert_1.16.0 RColorBrewer_1.0-2 cluster_1.11.10
>> [7] arrayMagic_1.16.1 genefilter_1.16.0 survival_2.34-1
>> [10] marray_1.18.0 vsn_3.6.0 limma_2.14.1
>> [13] affy_1.16.0 preprocessCore_1.0.0 affyio_1.8.0
>> [16] Biobase_1.16.3 lattice_0.17-7
>>
>> loaded via a namespace (and not attached):
>> [1] AnnotationDbi_1.0.6 DBI_0.2-4 RSQLite_0.6-8
>> [4] annotate_1.18.0 rcompgen_0.1-17
>>
>>
>> I think I should also say that these data causes import problems to
>> any other data analysis software :( I also tried to read the printer
>> layout from gal file, but all I got was "Block, Row, Column, ID
>> columns not found" error.
>>
>> I'd greatly appreciate any help, please.
>>
>> Yours faithfully,
>> Piotr St?pniak
>
More information about the Bioconductor
mailing list