[BioC] limma and marray data import problem
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Jun 4 06:56:26 CEST 2008
Dear Piotr,
I can't diagnose your problem, because the shortened version of your data
file that you emailed reads fine for me when I put the lines in a text
file, as I show below. I used sep="" in my code because email doesn't
preserve tab separators. Presumably the problem appears further into the
file, perhaps near the bottom. Or else you file has inconsistent
separators.
Can you try the arguments nrows=2 and nrows=2640?
I would also expect the csv file to read with the
following:
read.maimages("file.csv",columns=list(G="F543 Median",Gb="B543 Median",
R="F633 Median", Rb="B633 Median"),sep=",",nrows=2640)
Best wishes
Gordon
My code:
> read.maimages("temp.txt",source="genepix",columns=list(G="F543 Median",
Gb="B543 Median", R="F633 Median", Rb="B633 Median"),sep="")
Read temp.txt
An object of class "RGList"
$G
temp
[1,] 5946
[2,] 5368
$Gb
temp
[1,] 2490
[2,] 2330
$R
temp
[1,] 1604
[2,] 1624
$Rb
temp
[1,] 683
[2,] 651
$targets
FileName
temp temp.txt
$genes
Block Row Column ID Name
1 1 1 1 2078 ERG_Operon
2 1 1 2 2078 ERG_Operon
$source
[1] "genepix"
$printer
$ngrid.r
[1] 1
$ngrid.c
[1] 1
$nspot.r
[1] 1
$nspot.c
[1] 2
attr(,"class")
[1] "PrintLayout"
On Mon, 2 Jun 2008, Piotr Stêpniak wrote:
> Dear Gordon,
>
> Thank you for your reply.
>
> I tried using source="genepix", it did not work better than "scanarray".
> The following commands give:
>
>> bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix")Error in read.table(file = file, header = TRUE, col.names = allcnames, :
> duplicate 'row.names' are not allowed
>
> It turnes out the format is not 100% valid GenePix, e.g. it does not
> have any index column, so I try this:
>
>> bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix", row.names=NULL)
> Error in RG[[a]][, i] <- obj[, columns[[a]]] :
> number of items to replace is not a multiple of replacement length
> In addition: Warning message:
> In getLayout(RG$genes, guessdups = FALSE) : NAs introduced by coercion
>
> I tried different parameter combinations which got me to the command
> you've seen in the previous messages (I'm sorry for sending it 3
> times...).
>
> The file is finally read, but wrongly as described earlier.
>
> Same happens to gal file:
>
>> gal<-readGAL("Bialko.gal")
> Error in read.table(file = file, header = TRUE, col.names = allcnames, :
> duplicate 'row.names' are not allowed
>
>> gal<-readGAL("Bialko.gal", row.names=NULL)
> Error in if (is.int(totalPlate)) { : argument is of length zero
>
> To answer your further questions shortly:
> 2. Yes, these are the files straight from the scanner software.
> ScanArrayExpress also offers csv export, but reading them is another
> problem. They do have Index column,
>> bialkoRaw<- read.maimages( dir(pattern="csv"), columns=list(G="Ch1\ Median", Gb="Ch1\ B\ Median", R="Ch2\ Median", Rb="Ch2\ B\ Median"), sep=",")
> reads the file and the values are under correct columns but I get no
> printer layout read and other function to process the data gives:
> Error in if (is.int(totalPlate)) { : argument is of length zero
>
> 3. Yes, I'd be happy to if you please look at it:
>
> Beginning of GPR file:
>
> ATF 1.0
>
> 21 82
>
> "Type=GenePix Results 2"
>
> "DateTime=2008/03/28 10:30:03"
>
> "Settings=Easy Quant"
>
> "GalFile=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\BIALACZKI_2_25luty2008_popr.gal"
>
> "Scanner=Model: Express Serial No.: 432617"
>
> "Comment=<F1>Alexa 555<F2>Alexa 647<F1 Offset>0,0<F2 Offset>0,0<Comment>"
>
> "PixelSize=10"
>
> "Wavelengths=543 nm 633 nm"
>
> "ImageFiles=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\12_03_2008\Skan
> Agi\HL60_szk13_PMT65_roz10_Alexa555.tif D:\Luiza\Grant
> bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan
> Agi\26sz_szk13_PMT60_roz10_Alexa647.tif"
>
> "PMTGain=65 60"
>
> "NormalizationMethod=LOWESS"
>
> "NormalizationFactors=0.000 0.000"
>
> "JpegImage="
>
> "RatioFormulations=W2/W1(633/543)"
>
> "Barcode="
>
> "ImageOrigin=1500 11600"
>
> "JpegOrigin=0 0"
>
> "Creator=ScanArray Express, Microarray Analysis System 3.0.0.16"
>
> "Temperature=0.0"
>
> "LaserPower=90 90 0 0"
>
> "LaserOnTime=0 0 0 0"
>
> "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F543 Median" "F543
> Mean" "F543 SD" "B543 Median" "B543 Mean" "B543 SD" "% > B543+1SD" "%
>> B543+2SD" "F543 % Sat." "F633 Median" "F633 Mean" "F633 SD" "B633
> Median" "B633 Mean" "B633 SD" "% > B633+1SD" "% > B633+2SD" "F633 %
> Sat." "F3 Median" "F3 Mean" "F3 SD" "B3 Median" "B3 Mean" "B3 SD" "% >
> B3+1SD" "% > B3+2SD" "F3 % Sat." "F4 Median" "F4 Mean" "F4 SD" "B4
> Median" "B4 Mean" "B4 SD" "% > B4+1SD" "% > B4+2SD" "F4 % Sat." "Ratio
> of Medians (633/543)" "Ratio of Means (633/543)" "Median of Ratios
> (633/543)" "Mean of Ratios (633/543)" "Ratios SD (633/543)" "Rgn Ratio
> (633/543)" "Rgn R² (633/543)" "Ratio of Medians (Ratio/2)" "Ratio of
> Means (Ratio/2)" "Median of Ratios (Ratio/2)" "Mean of Ratios
> (Ratio/2)" "Ratios SD (Ratio/2)" "Rgn Ratio (Ratio/2)" "Rgn R²
> (Ratio/2)" "Ratio of Medians (Ratio/3)" "Ratio of Means
> (Ratio/3)" "Median of Ratios (Ratio/3)" "Mean of Ratios
> (Ratio/3)" "Ratios SD (Ratio/3)" "Rgn Ratio (Ratio/3)" "Rgn R²
> (Ratio/3)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log
> Ratio (633/543)" "Log Ratio (Ratio/2)" "Log Ratio (Ratio/3)" "F543
> Median - B543" "F633 Median - B633" "F3 Median - B3" "F4 Median -
> B4" "F543 Mean - B543" "F633 Mean - B633" "F3 Mean - B3" "F4 Mean -
> B4" "Flags" "Normalize"
>
> 1 1 1 ERG_Operon 2078 2805 13125 230 5946 6035 1754 2490 2506 529 97 92 0 1604 1636 517 683 698 194 94 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.266 0.269 0.270 0.329 0.329 0.232 0.621 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 384 734 4377 4498 -1.908 0.000 0.000 3456 921 0 0 3545 953 0 0 100 1
>
> 1 2 1 ERG_Operon 2078 3250 13128 220 5368 5457 1634 2330 2378 537 96 91 0 1624 1651 531 651 671 188 95 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.320 0.320 0.318 0.567 0.567 0.254 0.608 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 351 858 4011 4127 -1.643 0.000 0.000 3038 973 0 0 3127 1000 0 0 100 1
>
> 1 3 1 ERG_Operon 2078 3698 13124 220 4368 4676 1646 2206 2240 490 90 81 0 1476 1562 592 646 673 182 90 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.384 0.371 0.377 0.498 0.498 0.281 0.610 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 348 858 2992 3386 -1.381 0.000 0.000 2162 830 0 0 2470 916 0 0 100 1
>
> And for comparison here is a corresponding csv:
>
> BEGIN HEADER
>
> PerkinElmer Inc.
>
> ScanArrayCSVFileFormat,2.00
>
> ScanArray Express,2.00
>
> Number_of_Columns,62
>
> END HEADER
>
>
>
> BEGIN GENERAL INFO
>
> DateTime,2008/03/28 10:30
>
> GalFile,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\BIALACZKI_2_25luty2008_popr.gal
>
> Scanner,Model: Express Serial No.: 432617
>
> User Name,Luiza
>
> Computer Name,
>
> Protocol,Easy Quant
>
> Quantitation Method,Adaptive Circle
>
> Quality Confidence Calculation,Footprint
>
> User comments,
>
> Image Origin,1500,11600
>
> Temperature,0
>
> Laser Powers,90,90
>
> Laser On Time,0
>
> PMT Voltages,65,60
>
> END GENERAL INFO
>
>
>
> BEGIN QUANTITATION PARAMETERS
>
> Min Percentile,30
>
> Max Percentile,300
>
> END QUANTITATION PARAMETERS
>
>
>
> BEGIN QUALITY MEASUREMENTS
>
> Max Footprint,100
>
> END QUALITY MEASUREMENTS
>
>
>
> BEGIN ARRAY PATTERN INFO
>
> Units,µm
>
> Array Rows,10
>
> Array Columns,4
>
> Spot Rows,9
>
> Spot Columns,9
>
> Array Row Spacing,4500.000000
>
> Array Column Spacing,4500.000000
>
> Spot Row Spacing,450.000000
>
> Spot Column Spacing,450.000000
>
> Spot Diameter,200
>
> Interstitial,0
>
> Spots Per Array,81
>
> Total Spots,2640
>
> END ARRAY PATTERN INFO
>
>
>
> BEGIN IMAGE INFO
>
> ImageID,Channel,Image,Fluorophore,Barcode,Units,X Units Per Pixel,Y
> Units Per Pixel,X Offset,Y Offset,Status
>
> -1,CH1,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\12_03_2008\Skan Agi\HL60_szk13_PMT65_roz10_Alexa555.tif,Alexa
> 555,,µm,10.000000,10.000000,0.000000,0.000000,Control Image
>
> -1,CH2,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI
> RZUT\12_03_2008\Skan Agi\26sz_szk13_PMT60_roz10_Alexa647.tif,Alexa
> 647,,µm,10.000000,10.000000,0.000000,0.000000,
>
> END IMAGE INFO
>
>
>
> BEGIN NORMALIZATION INFO
>
> Normalization Method,LOWESS
>
> END NORMALIZATION INFO
>
>
>
> BEGIN DATA
>
> Index,Array Row,Array Column,Spot Row,Spot
> Column,Name,ID,X,Y,Diameter,F Pixels,B Pixels,Footprint,Flags,Ch1
> Median,Ch1 Mean,Ch1 SD,Ch1 B Median,Ch1 B Mean,Ch1 B SD,Ch1 % > B + 1
> SD,Ch1 % > B + 2 SD,Ch1 F % Sat.,Ch1 Median - B,Ch1 Mean - B,Ch1
> SignalNoiseRatio,Ch2 Median,Ch2 Mean,Ch2 SD,Ch2 B Median,Ch2 B
> Mean,Ch2 B SD,Ch2 % > B + 1 SD,Ch2 % > B + 2 SD,Ch2 F % Sat.,Ch2
> Median - B,Ch2 Mean - B,Ch2 SignalNoiseRatio,Ch2 Ratio of Medians,Ch2
> Ratio of Means,Ch2 Median of Ratios,Ch2 Mean of Ratios,Ch2 Ratios
> SD,Ch2 Rgn Ratio,Ch2 Rgn R²,Ch2 Log Ratio,Sum of Medians,Sum of
> Means,Ch1 N Median,Ch1 N Mean,Ch1 N (Median-B),Ch1 N (Mean-B),Ch2 N
> Median,Ch2 N Mean,Ch2 N (Median-B),Ch2 N (Mean-B),Ch2 N Ratio of
> Medians,Ch2 N Ratio of Means,Ch2 N Median of Ratios,Ch2 N Mean of
> Ratios,Ch2 N Rgn Ratio,Ch2 N Log Ratio
>
> 1,1,1,1,1,"ERG_Operon","2078",2805,13125,230,384,734,0,3,5946,6035,1754.26,2490,2506,529.19,97.4,92.2,0.0,3456,3545,11.24,1604,1636,517.27,683,698,194.19,94.3,84.1,0.0,921,953,8.26,0.27,0.27,0.27,0.33,0.39,0.23,0.62,-1.908,4377,4498,5946,6035,3456,3545,3027,2984,1446,2664,0.42,0.75,0.42,0.92,0.44,-1.257
>
> 2,1,1,1,2,"ERG_Operon","2078",3250,13128,220,351,858,0,3,5368,5457,1634.22,2330,2378,537.27,96.0,90.9,0.0,3038,3127,9.99,1624,1651,531.34,651,671,188.42,94.9,88.0,0.0,973,1000,8.62,0.32,0.32,0.32,0.57,2.14,0.25,0.61,-1.643,4011,4127,5368,5457,3038,3127,3100,3039,1536,2956,0.51,0.95,0.50,1.68,0.48,-0.984
>
> 3,1,1,1,3,"ERG_Operon","2078",3698,13124,220,348,858,0,3,4368,4676,1645.59,2206,2240,490.01,90.2,81.0,0.0,2162,2470,8.91,1476,1562,591.68,646,673,182.34,90.2,80.2,0.0,830,916,8.09,0.38,0.37,0.38,0.50,0.92,0.28,0.61,-1.381,2992,3386,4368,4676,2162,2470,2947,2941,283,797,0.13,0.32,0.13,0.43,0.56,-2.934
>
>
> Kind Regards,
> Piotr
>
> On Mon, Jun 2, 2008 at 3:57 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>> Dear Piotr,
>>
>> The file extension "gpr" is short for GenePix Results file. If ScanArray
>> Express outputs a file with this extension, you should have every
>> expectation that is formated exactly the same as a gpr file from GenePix,
>> and therefore you should be able to read it using
>> read.maimages(source="genepix"). If this is not true, then ScanArray is
>> irresponsible to use this extension.
>>
>> Same comments for the GAL file. It is obviously not a GAL file as defined
>> by GenePix, otherwise it would be read using readGAL().
>>
>> From your description below, a possible explanation for the problem is that
>> your files have an extra column with no corresponding heading, e.g., a
>> column of row numbers. However no one on this mailing list can tell that
>> for sure without you showing us some lines from your file.
>>
>> Questions:
>> 1. Why have you set row.names=NULL? This prevents R from detecting a column
>> of row numbers. What happens if you remove this?
>>
>> 2. Are these files exactly as output by ScanArray, or have they been further
>> processed?
>>
>> 3. Can you post the first few lines of an example file?
>>
>> Best wishes
>> Gordon
>>
>> PS. You posted the same question to the BioC mailing list on three
>> consecutive days during the weekend. Please post the question just once.
>>
>>
>>> Date: Sat, 31 May 2008 12:55:25 +0200
>>> From: " Piotr St?pniak " <piotrek.stepniak at gmail.com>
>>> Subject: [BioC] limma and marray data import problem
>>> To: bioconductor at stat.math.ethz.ch
>>>
>>> Hello Everyone,
>>>
>>> I am Piotr St?pniak, B.Sc. in Biotechnology, currently under M.Sc.
>>> course at Adam Mickiewicz University in Pozna?, Poland. I am working
>>> in Polish Science Academy in microarray experiments group.
>>>
>>> I'm a newbie in R and BioC, so please forgive me if my question is easy...
>>>
>>> I'm having problem with data import to RGList or marrayRaw objects.
>>> Using the following instruction:
>>> bialkoRaw<- read.maimages( dir(pattern="gpr"), columns=list(G="F543
>>> Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"),
>>> annotation=c("Block", "Column", "Row", "Name", "ID"), row.names=NULL)
>>> The data seems to load, but $genes table looks odd, I guess the column
>>> names are shifted right by 1 column:
>>> $genes
>>> Block Column Row Name ID
>>> 1 1 1 ERG_Operon 2078 2647
>>> 2 2 1 ERG_Operon 2078 3102
>>> 3 3 1 ERG_Operon 2078 3549
>>> 4 4 1 FLT3_Operon 2322 3994
>>> 5 5 1 FLT3_Operon 2322 4444
>>> 2635 more rows ...
>>> This I think causes printer layout to be imported wrongly and then any
>>> other try to process the data (e.g. quality tests) produce such error
>>> message:
>>> Error in if (is.int(totalPlate)) { : argument is of length zero
>>>
>>> The data is obtained with ScanArrayExpress software, so I have it in
>>> gpr or csv files, both give similar errors, but loading csv files
>>> seems also to fail import values for each channel and gets only the
>>> file name headers.
>>>
>>> Marray import also fails, I will skip the info about it not to enlarge
>>> the mail unnecessarily.
>>>
>>> My R session info is as follows:
>>>>
>>>> sessionInfo()
>>>
>>> R version 2.6.2 (2008-02-08)
>>> i486-pc-linux-gnu
>>>
>>> locale:
>>> C
>>>
>>> attached base packages:
>>> [1] grid splines tools stats graphics grDevices utils
>>> [8] datasets methods base
>>>
>>> other attached packages:
>>> [1] arrayQuality_1.18.0 gridBase_0.4-3 hexbin_1.14.0
>>> [4] convert_1.16.0 RColorBrewer_1.0-2 cluster_1.11.10
>>> [7] arrayMagic_1.16.1 genefilter_1.16.0 survival_2.34-1
>>> [10] marray_1.18.0 vsn_3.6.0 limma_2.14.1
>>> [13] affy_1.16.0 preprocessCore_1.0.0 affyio_1.8.0
>>> [16] Biobase_1.16.3 lattice_0.17-7
>>>
>>> loaded via a namespace (and not attached):
>>> [1] AnnotationDbi_1.0.6 DBI_0.2-4 RSQLite_0.6-8
>>> [4] annotate_1.18.0 rcompgen_0.1-17
>>>
>>>
>>> I think I should also say that these data causes import problems to
>>> any other data analysis software :( I also tried to read the printer
>>> layout from gal file, but all I got was "Block, Row, Column, ID
>>> columns not found" error.
>>>
>>> I'd greatly appreciate any help, please.
>>>
>>> Yours faithfully,
>>> Piotr St?pniak
>>
>
More information about the Bioconductor
mailing list