[BioC] Problems Reading SNPs into snpMatrix
Chris K. Fuller
chris at genome.ucsf.edu
Wed May 11 20:38:37 CEST 2011
Hello List,
I'm at a loss to explain why the read.snps.long function in snpStats
refuses to properly read my SNPs. I've attempted this several
different ways without success. I have 195 samples (individuals) and
440,000 SNPs, arranged in the "one call per line" format. I'm running
this on Linux Ubuntu, and all R packages are up to date. I've
reproduced these cases below.
Example 1: Reading SNPs with Nucleotide Genotype
With a file like:
1110043 rs2847443 TT
1110059 rs2847443 TT
1110060 rs2847443 TT
1110063 rs2847443 TA
1110066 rs2847443 TT
1110067 rs2847443 TT
1110070 rs2847443 TT
I use the command:
foo = read.snps.long('one_call_per_line.txt',
sample.id=as.character(sample_id[[1]]),
snp.id=as.character(snp_id_mod[[1]]), fields = c(sample = 1, snp = 2,
genotype = 3), codes = "nucleotide", sep='\t', verbose=TRUE,
in.order=TRUE)
And receive the result:
Error in read.snps.long("one_call_per_line.txt", sample.id =
as.character(sample_id[[1]]), :
Nucleotide coded genotype should be 2 character string: line 1
Example 2: Perform my own SNP genotype coding and try again
With a file like:
110043 rs2847443 3
1110059 rs2847443 3
1110060 rs2847443 3
1110063 rs2847443 2
1110066 rs2847443 3
1110067 rs2847443 3
1110070 rs2847443 3
I use this command:
foo = read.snps.long('one_call_per_line_coded.txt',
sample.id=as.character(sample_id[[1]]),
snp.id=as.character(snp_id_mod[[1]]), fields = c(sample = 1, snp = 2,
genotype = 3), codes = c("1", "2", "3"), sep='\t', verbose=TRUE,
in.order=TRUE)
And receive this result:
Reading SnpMatrix with 195 rows and 443136 columns
Cumulative totals
-----------------------------------
File Line Accepted Rejected No call Skipped File name
1 9142000 0 0 9141999 0 0
...er_line_coded.txt
In the first example, it apparently doesn't treat the third column of
my tab-delimited text file as two characters. Why? I also generated
a file with quotes around each nucleotide (e.g. 'AA'), but this did
not change the behavior. In the second example, it seems to treat
everything as no call. Why?
I've been at this for far longer than I care to admit, so any insight
would be greatly appreciated.
Thank you,
Chris Fuller
chris at genome.ucsf.edu
More information about the Bioconductor
mailing list