[BioC] snpStats, read.long, alleles in two columns

David Clayton dc208 at cam.ac.uk
Wed Mar 7 17:17:45 CET 2012


What _should_ work is

..., fields(genotype=NA, allele.A=3, allele.B=4), ...

but I have to agree that the documentation is distinctly lacking.

Let me know if this doesn't work.

David Clayton


On 07/03/12 15:16, Liz Hare wrote:
> Hello,
>
> I am trying to read an Illumina final format .txt file (tab-delimited)
> into snpStats. The file contains 4 columns: snp, sample, allele 1, and
> allele 2. Some sample lines:
>
> BICF2G630100019 04-0677/J279 C C
> BICF2G630100032 04-0677/J279 T T
> BICF2G630100034 04-0677/J279 G G
> BICF2G630100043 04-0677/J279 A A
> BICF2G630100054 04-0677/J279 T T
> BICF2G630100063 04-0677/J279 T C
> BICF2G630100075 04-0677/J279 T T
> BICF2G63010009 04-0677/J279 G G
> BICF2G630100090 04-0677/J279 C C
>
> I can't figure out from the documentation or vignette on data input how
> to specify that the alleles are in two columns.
>
> This doesn't work:
>
>  > CanineHD <- read.long(file="filename",
> + fields=c(snp=1, sample=2, genotype=3, genotype=4),
> + verbose=TRUE)
> Data to be read from the file filename
> No confidence thresholds specified
> Genotype read as a single field of two characters (which specify the
> alleles)
> Initial scan of file
> First sample: 04-0677/J279
> First snp: BICF2G630100019
> Last snp: YNp1-608
> Last sample: 10-1160
> 96x173662 matrix to be read
> Reading genotypes from file
> 20% 40% 60% 80% 100%
> .........|.........|.........|.........|.........|
> -Error in read.long(file = "filename", :
> at line 1: C (expecting a 2-character genotype field)
> In addition: Warning message:
> closing unused connection 3 (filename)
>
> So I tried:
>
>  > CanineHD <- read.long(file="filename",
> + fields=c(snp=1, sample=2, genotype=3),
> + gcodes="\t", codes="nucleotide", verbose=TRUE)
> Error in read.long(file = "filename", :
> unused argument(s) (codes = "nucleotide")
>  > CanineHD <- read.long(file="filename",
> + fields=c(snp=1, sample=2, genotype=3),
> + split="\t", verbose=TRUE)
> Data to be read from the file filename
> No confidence thresholds specified
> Genotype read as a single field of two characters (which specify the
> alleles)
> Initial scan of file
> First sample: 04-0677/J279
> First snp: BICF2G630100019
> Last snp: YNp1-608
> Last sample: 10-1160
> 96x173662 matrix to be read
> Reading genotypes from file
> 20% 40% 60% 80% 100%
> .........|.........|.........|.........|.........|
> -Error in read.long(file = "filename", :
> at line 1: C (expecting a 2-character genotype field)
> In addition: Warning message:
> closing unused connection 12 (filename)
>
> Is there a keyword for alleles rather than genotypes? I tried
> substituting the word 'allele' but didn't get anywhere. I suspect I'm
> not understanding something in the Details section of the documentation.
>
> Thanks,
> Liz
>

-- 
Professor David Clayton
Wellcome Trust/Juvenile Diabetes Research Foundation Principal Research 
Fellow

Diabetes and Inflammation Laboratory
Cambridge University, Department of Medical Genetics
Cambridge Institute for Medical Research
Wellcome Trust/MRC Building
Addenbrooke's Hospital
Hills Road
Cambridge
CB2 0XY

Tel: (44) 1223 762669
Email: david.clayton at cimr.cam.ac.uk



More information about the Bioconductor mailing list