[BioC] snpStats, read.long, alleles in two columns

Liz Hare doggene at earthlink.net
Wed Mar 7 16:16:25 CET 2012


Hello,

I am trying to read an Illumina final format .txt file (tab-delimited) 
into snpStats. The file contains 4 columns: snp, sample, allele 1, and 
allele 2. Some sample lines:

BICF2G630100019	04-0677/J279	C	C
BICF2G630100032	04-0677/J279	T	T
BICF2G630100034	04-0677/J279	G	G
BICF2G630100043	04-0677/J279	A	A
BICF2G630100054	04-0677/J279	T	T
BICF2G630100063	04-0677/J279	T	C
BICF2G630100075	04-0677/J279	T	T
BICF2G63010009	04-0677/J279	G	G
BICF2G630100090	04-0677/J279	C	C

I can't figure out from the documentation or vignette on data input how 
to specify that the alleles are in two columns.

This doesn't work:

 > CanineHD <- read.long(file="filename",
+                       fields=c(snp=1, sample=2, genotype=3, genotype=4),
+                       verbose=TRUE)
Data to be read from the file filename
No confidence thresholds specified
Genotype read as a single field of two characters (which specify the 
alleles)
Initial scan of file
First sample: 04-0677/J279
First snp: BICF2G630100019
Last snp: YNp1-608
Last sample: 10-1160
96x173662 matrix to be read
Reading genotypes from file
         20%       40%       60%       80%       100%
.........|.........|.........|.........|.........|
-Error in read.long(file = "filename",  :
   at line 1: C (expecting a 2-character genotype field)
In addition: Warning message:
closing unused connection 3 (filename)

So I tried:

 > CanineHD <- read.long(file="filename",
+                       fields=c(snp=1, sample=2, genotype=3),
+                       gcodes="\t", codes="nucleotide", verbose=TRUE)
Error in read.long(file = "filename",  :
   unused argument(s) (codes = "nucleotide")
 > CanineHD <- read.long(file="filename",
+                       fields=c(snp=1, sample=2, genotype=3),
+                       split="\t", verbose=TRUE)
Data to be read from the file filename
No confidence thresholds specified
Genotype read as a single field of two characters (which specify the 
alleles)
Initial scan of file
First sample: 04-0677/J279
First snp: BICF2G630100019
Last snp: YNp1-608
Last sample: 10-1160
96x173662 matrix to be read
Reading genotypes from file
         20%       40%       60%       80%       100%
.........|.........|.........|.........|.........|
-Error in read.long(file = "filename",  :
   at line 1: C (expecting a 2-character genotype field)
In addition: Warning message:
closing unused connection 12 (filename)

Is there a keyword for alleles rather than genotypes? I tried 
substituting the word 'allele' but didn't get anywhere. I suspect I'm 
not understanding something in the Details section of the documentation.

Thanks,
Liz

-- 
Liz Hare PhD
Dog Genetics LLC
doggene at earthlink.net
http://www.doggenetics.com



More information about the Bioconductor mailing list