[BioC] BeadarraySNP/read.SnpSetIllumina

Boris Umylny umylny at apbri.org
Wed Jul 29 11:50:47 CEST 2009


Hi,

We are trying to use beadarraySNP on Illumina sample files.  We are using R 
2.9 with bioconductor 3.4.

Please, if any one can help us it would be most appreciated.

At this time we have access only to final report files, so we are constructing 
samplesheet manually based on the documentation:

[Header],,,,,,
Investigator Name,Test,,,,,
Project Name,Test,,,,,
Experiment Name,Test,,,,,
Date,27072009,,,,,

[Data],,,,,,
Sample_Name,Sample_Well,Sample_Plate,Sample_Group,Pool_ID,Sentrix_ID,Sentrix_Position
NA12155,well1,plate1,group,GS001-OPA,1280260,R001_C00
NA10861,well2,plate1,group,GS001-OPA,1280260,R002_C00
NA12814,well3,plate1,group,GS001-OPA,1280260,R003_C00
NA11829,well4,plate1,group,GS001-OPA,1280260,R004_C00

The report file we have contains the following columns:

[Header]
Processing Date,29/7/2009 14:28
Content,Human610_Quad_v1
Num SNPs,620901
Total SNPs,620901
Num Samples,4
Total Samples,4
[Data]
SNP Name,Sample ID,Allele1 - Forward,Allele2 - Forward,GC Score,X,Y,X Raw,Y 
Raw,Log R Ratio,B Allele Freq
200003,NA12155,A,G,0.9299,0.633,0.530,7898,7050,-0.1552,0.5069
200006,NA12155,T,T,0.7877,1.563,0.143,18645,2500,-0.1099,0.0102
200047,NA12155,A,A,0.8612,0.472,0.048,5916,1201,0.0005,0.0242
200050,NA12155,C,C,0.8331,0.009,1.209,761,15072,0.0075,1.0000

This is different than the documentation.  In particular, it does not have GT 
Score, Chr and Position columns.

For our purposes, those columns are not important - we are taking annotation 
information from dbSNP and Illumina platform annotations.  After modifying 
the file to add these columns and to rename Allele1/2 - Forward to 
Allele1/2 - AB, we have:

[Header]
Processing Date,29/7/2009 14:30
Content,Human610_Quad_v1
Num SNPs,620901
Total SNPs,620901
Num Samples,4
Total Samples,4
[Data]
SNP Name,Sample ID,Allele1 - AB,Allele2 - AB,GC Score,X,Y,X Raw,Y Raw,Log R 
Ratio,B Allele Freq,GT Score,Chr,Position
200003,NA12155,A,G,0.9299,0.633,0.530,7898,7050,-0.1552,0.5069,0.0,1,1
200006,NA12155,T,T,0.7877,1.563,0.143,18645,2500,-0.1099,0.0102,0.0,1,1
200047,NA12155,A,A,0.8612,0.472,0.048,5916,1201,0.0005,0.0242,0.0,1,1
200050,NA12155,C,C,0.8331,0.009,1.209,761,15072,0.0075,1.0000,0.0,1,1
200052,NA12155,T,T,0.9466,0.012,0.901,961,12084,-0.0236,1.0000,0.0,1,1

In both cases we got the same error:

xx <- read.SnpSetIllumina(samplesheet="sample_sheet.csv", reportfile 
= "report_file.csv")
Error in read.SnpSetIllumina(samplesheet = "sample_sheet.csv", reportfile 
= "report_file.csv") :
  Columns:SNP Name, Sample ID, GC Score, GT Score, X Raw, Y Raw, Chr, Position 
are missing in the report file

We also tried to remove those columns that are not mentioned in the document 
(for example X, Y and Log R Ratio) - the results were the same.


Many, many thanks in advance!


Sincerely,


Boris Umylny



More information about the Bioconductor mailing list