[BioC] RE : RE : RE : maping SNPs
Simon Noël
simon.noel.2 at ulaval.ca
Tue Jan 11 22:18:38 CET 2011
Hi,
You say that one of the warning was because I was only having SNPs from
chr1. I decided to add the 2 first SNPs I have from chrd but I get a new
error...
> myids <- c("rs7547453", "rs2840542", "rs1999527", "rs4648545",
"rs10915459", "rs16838750", "rs12128230", "rs4637157", "rs11900053",
"rs999999999")
> mysnps <- makeGRangesFromRefSNPids(myids)
Errorin solveUserSEW0(start = start, end = end, width = width) :
solving row 8: range cannot be determined from the supplied arguments
(too many NAs)
In addition: Warning messages:
1: In ans_locs[!is.na(myrows)] <- locs$loc[myrows] :
number of items to replace is not a multiple of replacement length
2: In ans_locs[!is.na(myrows)] <- locs$loc[myrows] :
number of items to replace is not a multiple of replacement length
But if I only try with the 2 first SNPs on chr2, I have
> myids <- c("rs4637157", "rs11900053", "rs999999999")
> mysnps <- makeGRangesFromRefSNPids(myids)
Warning message:
In ans_locs[!is.na(myrows)] <- locs$loc[myrows] :
number of items to replace is not a multiple of replacement length
> mysnps
GRangeswith 3 ranges and 1 elementMetadata value
seqnames ranges strand | RefSNP_id
<Rle> <IRanges> <Rle> | <character>
[1] ch2 [29443, 29443] * | rs4637157
[2] ch2 [36787, 36787] * | rs11900053
[3] unknown [ 0, 0] * | rs999999999
seqlengths
ch2 unknown
NA NA
So is that mean that I will have to go chr by chr and split my big file?
Now for the problem of changing ch1 to chr1
> seqnames(mysnps)
'factor' Rle of length 8 with 2 runs
Lengths: 7 1
Values : ch1 unknown
Levels(2): ch1 unknown
> seqnames(mysnps) <- sub("ch", "chr", seqnames(mysnps))
> seqnames(mysnps)
'factor' Rle of length 8 with 2 runs
Lengths: 7 1
Values : chr1 unknown
Levels(2): chr1 unknown
> map <- as.matrix(findOverlaps(mysnps, tx))
Message d'avis :
In .local(query, subject, maxgap, minoverlap, type, select, ...) :
Only some seqnames from 'query' and 'subject' were not identical
> mapExon <- as.matrix(findOverlaps(mysnps, txExon))
Message d'avis :
In .local(query, subject, maxgap, minoverlap, type, select, ...) :
Only some seqnames from 'query' and 'subject' were not identical
>
> mapped_genes <- values(tx)$gene_id[map[, 2]]
> mapped_snps <- rep.int(values(mysnps)$RefSNP_id[map[, 1]],
elementLengths(mapped_genes))
> snp2gene <- unique(data.frame(snp_id=mapped_snps,
gene_id=unlist(mapped_genes)))
> rownames(snp2gene) <- NULL
> snp2gene[1:4, ]
snp_id gene_id
1 rs7547453 6497
2 rs2840542 79906
3 rs1999527 63976
4 rs4648545 7161
So now it's working on my computer:) but I am only able to do SNPs from one
chromosome as I say.
On the super computer, it still doesn't work and on my
computer, it still taking a lot of time. What isn't working is
> txdb <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene")
Downloadthe refGene table ... OK
Downloadthe refLink table ... OK
Extractthe 'transcripts' data frame ... OK
Extractthe 'splicings' data frame ... OK
Downloadand preprocess the 'chrominfo' data frame ... OK
Preparethe 'metadata' data frame ... OK
Makethe TranscriptDb object ... Error in .writeMetadataTable(conn,
metadata) : subscript out of bounds
In addition: There were 50 or more warnings (use warnings() to see the first
50)
Simon Noël
CdeC
More information about the Bioconductor
mailing list