[BioC] RE : RE : maping SNPs
Simon Noël
simon.noel.2 at ulaval.ca
Thu Jan 6 18:47:53 CET 2011
Thank you. Sorry for the delay. As I say, on my computer, thing run really
slowy... I have no error but I have some warning and no result...
library("IRanges")
library("GenomicRanges")
library("GenomicFeatures")
#À changer si une version plus récente de la librairie est téléchargée.
library(SNPlocs.Hsapiens.dbSNP.20101109)
> getSNPcount()
ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 c
h10
1849438 1936836 1613418 1613633 1453710 1446827 1335745 1243129 995075
1158707
ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 c
h20
1147722 1105364 815729 740129 657719 757926 641905 645646 520666
586708
ch21 ch22 chX chY chMT
338254 331060 529608 67438 624
> ch22snps <- getSNPlocs("ch22")
> ch22snps[1:5, ]
RefSNP_id alleles_as_ambig loc
1 56342815 K 16050353
2 7288968 S 16050994
3 6518357 M 16051107
4 7292503 R 16051209
5 6518368 Y 16051241
>
>
> makeGRangesFromRefSNPids <- function(myids)
+ {
+ ans_seqnames <- character(length(myids))
+ ans_seqnames[] <- "unknown"
+ ans_locs <- integer(length(myids))
+ for (seqname in names(getSNPcount())) {
+ locs <- getSNPlocs(seqname)
+ ids <- paste("rs", locs$RefSNP_id, sep="")
+ myrows <- match(myids, ids)
+ ans_seqnames[!is.na(myrows)] <- seqname
+ ans_locs[!is.na(myrows)] <- locs$loc[myrows]
+ }
+ GRanges(seqnames=ans_seqnames,
+ IRanges(start=ans_locs, width=1),
+ RefSNP_id=myids)
+ }
>
>
> myids <- c("rs7547453", "rs2840542", "rs1999527", "rs4648545",
"rs10915459", "rs16838750", "rs12128230", "rs999999999")
> mysnps <- makeGRangesFromRefSNPids(myids)
Warning message:
In ans_locs[!is.na(myrows)] <- locs$loc[myrows] :
number of items to replace is not a multiple of replacement length
> mysnps # a GRanges object with 1 SNP per row
GRangeswith 8 ranges and 1 elementMetadata value
seqnames ranges strand | RefSNP_id
<Rle> <IRanges> <Rle> | <character>
[1] ch1 [2195117, 2195117] * | rs7547453
[2] ch1 [2291680, 2291680] * | rs2840542
[3] ch1 [3256108, 3256108] * | rs1999527
[4] ch1 [3577321, 3577321] * | rs4648545
[5] ch1 [4230463, 4230463] * | rs10915459
[6] ch1 [4404344, 4404344] * | rs16838750
[7] ch1 [4501911, 4501911] * | rs12128230
[8] unknown [ 0, 0] * | rs999999999
seqlengths
ch1 unknown
NA NA
> txdb <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene")
Downloadthe refGene table ... OK
Downloadthe refLink table ... OK
Extractthe 'transcripts' data frame ... OK
Extractthe 'splicings' data frame ... OK
Downloadand preprocess the 'chrominfo' data frame ... OK
Preparethe 'metadata' data frame ... OK
Makethe TranscriptDb object ... OK
Il y a eu 50 avis ou plus (utilisez warnings() pour voir les 50 premiers)
> txdb
TranscriptDbobject:
| Db type: TranscriptDb
| Data source: UCSC
| Genome: hg19
| UCSC Table: refGene
| Type of Gene ID: Entrez Gene ID
| Full dataset: yes
| transcript_nrow: 38098
| exon_nrow: 230201
| cds_nrow: 204683
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2011-01-06 11:55:44 -0500 (Thu, 06 Jan 2011)
| GenomicFeatures version at creation time: 1.2.3
| RSQLite version at creation time: 0.9-4
| DBSCHEMAVERSION: 1.0
>
> #voir pour exonsBy() Ã la place de tx
> txExon = exonsBy(txdb, by= "tx")
> txExon
GRangesListof length 38098
$1
GRanges with 25 ranges and 3 elementMetadata values
seqnames ranges strand
| exon_id exon_name exon_rank
<Rle> <IRanges> <Rle> | <integer> <character>
<integer>
[1] chr1 [66999825, 67000051] + | 1 NA
1
[2] chr1 [67091530, 67091593] + | 2 NA
2
[3] chr1 [67098753, 67098777] + | 3 NA
3
[4] chr1 [67101627, 67101698] + | 4 NA
4
[5] chr1 [67105460, 67105516] + | 5 NA
5
[6] chr1 [67108493, 67108547] + | 6 NA
6
[7] chr1 [67109227, 67109402] + | 7 NA
7
[8] chr1 [67126196, 67126207] + | 8 NA
8
[9] chr1 [67133213, 67133224] + | 9 NA
9
... ... ... ... ... ... ...
...
[17] chr1 [67155873, 67155999] + | 17 NA
17
[18] chr1 [67161117, 67161176] + | 18 NA
18
[19] chr1 [67184977, 67185088] + | 19 NA
19
[20] chr1 [67194947, 67195102] + | 20 NA
20
[21] chr1 [67199431, 67199563] + | 21 NA
21
[22] chr1 [67205018, 67205220] + | 22 NA
22
[23] chr1 [67206341, 67206405] + | 23 NA
23
[24] chr1 [67206955, 67207119] + | 24 NA
24
[25] chr1 [67208756, 67210768] + | 25 NA
25
...
<38097 more elements>
seqlengths
chr1 chr2 ... chr18_gl000207_random
249250621 243199373 ... 4262
>
> tx <- transcripts(txdb, columns=c("tx_id", "tx_name", "gene_id"))
> tx # a GRanges object with 1 transcript per row
GRangeswith 38098 ranges and 3 elementMetadata values
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr1 [ 69091, 70008] + | 1021 NM_001005484
[2] chr1 [323892, 328581] + | 1023 NR_028327
[3] chr1 [323892, 328581] + | 1024 NR_028322
[4] chr1 [323892, 328581] + | 1025 NR_028325
[5] chr1 [367659, 368597] + | 1022 NM_001005277
[6] chr1 [367659, 368597] + | 1026 NM_001005221
[7] chr1 [367659, 368597] + | 1027 NM_001005224
[8] chr1 [763064, 789740] + | 174 NR_015368
[9] chr1 [861121, 879961] + | 1035 NM_152486
... ... ... ... ... ... ...
[38090] chrY [27177050, 27198251] - | 18991 NM_004678
[38091] chrY [27177050, 27198251] - | 18992 NM_001002760
[38092] chrY [27177050, 27198251] - | 18993 NM_001002761
[38093] chrY [27209230, 27246039] - | 18994 NR_001525
[38094] chrY [27209230, 27246039] - | 18995 NR_002177
[38095] chrY [27209230, 27246039] - | 18996 NR_002178
[38096] chrY [27329790, 27330920] - | 18997 NR_001526
[38097] chrY [27329790, 27330920] - | 18998 NR_002179
[38098] chrY [27329790, 27330920] - | 18999 NR_002180
gene_id
<CompressedCharacterList>
[1] 79501
[2] 100133331
[3] 100132287
[4] 100132062
[5] 81399
[6] 729759
[7] 26683
[8] 643837
[9] 148398
... ...
[38090] 9083
[38091] 442867
[38092] 442868
[38093] 114761
[38094] 474150
[38095] 474149
[38096] 252949
[38097] 474152
[38098] 474151
seqlengths
chr1 chr2 ... chr18_gl000207_random
249250621 243199373 ... 4262
>
> seqnames(mysnps) <- sub("ch", "chr", seqnames(mysnps))
> map <- as.matrix(findOverlaps(mysnps, tx))
Message d'avis :
In .local(query, subject, maxgap, minoverlap, type, select, ...) :
No seqnames from the 'query' and 'subject' were identical
> mapExon <- as.matrix(findOverlaps(mysnps, txExon))
Message d'avis :
In .local(query, subject, maxgap, minoverlap, type, select, ...) :
No seqnames from the 'query' and 'subject' were identical
>
> mapped_genes <- values(tx)$gene_id[map[, 2]]
> mapped_snps <- rep.int(values(mysnps)$RefSNP_id[map[, 1]],
elementLengths(mapped_genes))
> snp2gene <- unique(data.frame(snp_id=mapped_snps,
gene_id=unlist(mapped_genes)))
> rownames(snp2gene) <- NULL
> snp2gene[1:4, ]
snp_id gene_id
NA <NA> <NA>
NA.1 <NA> <NA>
NA.2 <NA> <NA>
NA.3 <NA> <NA>
On our super computer, I can't update manualy the package. They are suposed
to update automatically once per month according to our
informactician. The result of sessionInfo() is :
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attachedbase packages:
[1] stats graphics grDevices utils datasets methods base
otherattached packages:
[1] GenomicFeatures_1.2.3
[2] SNPlocs.Hsapiens.dbSNP.20101109_0.99.2
[3] GenomicRanges_1.2.2
[4] IRanges_1.8.7
loadedvia a namespace (and not attached):
[1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18.
2
[5] DBI_0.2-5 RCurl_1.5-0
RSQLite_0.9-4 rtracklayer_1.10.6
[9] tools_2.12.0 XML_3.2-0
On my computer , something stragge happen. The package get unstalled when
I run
source([1]"http://bioconductor.org/biocLite.R")
update.packages(repos=biocinstallRepos(), ask=FALSE, checkBuilt=TRUE)
And when I try to reinstall SNPlocs.Hsapiens.dbSNP.20101109, I now get
an error message...
> source("[2]http://www.bioconductor.org/biocLite.R")
BioC_mirror= [3]http://www.bioconductor.org
Change using chooseBioCmirror().
> biocLite("SNPlocs.Hsapiens.dbSNP.20101109")
Using R version 2.12.0, biocinstall version 2.7.4.
InstallingBioconductor version 2.7 packages:
[1] "SNPlocs.Hsapiens.dbSNP.20101109"
Pleasewait...
Message d'avis :
In getDependencies(pkgs, dependencies, available, lib) :
package ‘SNPlocs.Hsapiens.dbSNP.20101109’ is not available
But for sessionInfo, it's :
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=French_Canada.1252 LC_CTYPE=French_Canada.1252
[3] LC_MONETARY=French_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=French_Canada.1252
attachedbase packages:
[1] stats graphics grDevices utils datasets methods base
otherattached packages:
[1] GenomicFeatures_1.2.3 GenomicRanges_1.2.2 IRanges_1.8.8
loadedvia a namespace (and not attached):
[1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2 BSgenome_1.18.2
[5] DBI_0.2-5 RCurl_1.5-0.1 RSQLite_0.9-4
rtracklayer_1.10.6
[9] XML_3.2-0.2
Now I have a super computer that can't run the program because of
a subscript out of bounds and a personal computer than have a missing
library... So now nothing work:( Other problem : why when
I install again IRanges I got a version older than your for exemple?
Simon Noël
CdeC
References
1. http://bioconductor.org/biocLite.R
2. http://www.bioconductor.org/biocLite.R
3. http://www.bioconductor.org/
More information about the Bioconductor
mailing list