[BioC] About the bioconductor package SNPlocs.Hsapiens.dbSNP.20071016.

Hervé Pagès hpages at fhcrc.org
Thu Feb 12 01:17:07 CET 2009


Hi Praveen,

Praveen Surendran wrote:
> Dear Herve Pages,
>  
> I am working on the identification of non-synonymous snp's in humans 
> from an Affymetrix Data Source.
> Currently I am using a version of bioconductor package which will fetch 
> the information on these snps with the variation and provides 
> the information on whether a snp is non-synonymous or not.
> But I just found that the database does not have enough information on 
> all the non-synonymous snp's and would like to use this this 
> bioconductor package.
>  
> Please have your comments on whether I will be able to use the package 
> to get information on whether the snp is non-synonymous from dbsnp using 
> this package.

Please post to the Bioconductor mailing list (I'm cc'ing it right now).
You'll benefit from a wider audience and the answers you will get will
be archived so other people can find them and refer to them in the future.

If I understand correctly you want to be able to determine whether
the SNPs stored in SNPlocs.Hsapiens.dbSNP.20071016 are synonymous or not.
Note that you give very little information about which BioC package you
are currently using to fetch the information for the Human SNPs, where the
package is fetching them from and why it "does not have enough information".

SNPlocs.Hsapiens.dbSNP.20071016 only provides the locations and
alleles of a SNP (see this recent thread for the details
https://stat.ethz.ch/pipermail/bioconductor/2009-February/026231.html)
so it's unlikely that you will get more information by using this
package than by "fetching SNPs" directly from a public database like
dbSNP.
The information in SNPlocs.Hsapiens.dbSNP.20071016 was retrieved
from dbSNP, from this location to be precise:

   ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/

(note that the content of this folder has been updated since
SNPlocs.Hsapiens.dbSNP.20071016 was made).

My understanding is that in order to determine whether a SNP is synonymous
or not you need to know the context of the SNP i.e. does it occur in a gene?
if yes, which strand does the gene belong too? does it occur in a codon
and where in the codon i.e. at position 1, 2 or 3? Also a SNP can have more
than 1 alternate allele, some of them can be synonymous to the reference allele,
other not.
For example SNP with RefSNP id 6474828 (in chr9), has alleles C, G and T:

   > library(Biostrings)
   > library(SNPlocs.Hsapiens.dbSNP.20071016)
   > chr9snps <- getSNPlocs("chr9")
   > subset(chr9snps, RefSNP_id=="6474828")
         RefSNP_id alleles_as_ambig      loc
   61589   6474828                B 14279138
   > IUPAC_CODE_MAP[["B"]]
   [1] "CGT"

The reference allele (T) can be determined by looking at the reference genome:

   > library(BSgenome.Hsapiens.UCSC.hg18)
   > dna <- subseq(unmasked(Hsapiens$chr9), 14279138-2, 14279138+2)
   > dna
     5-letter "DNAString" instance
   seq: TATAC

The UCSC genome browser will confirm this and will also show that this
chromosome location is inside a gene (NFIB) that belongs to the minus
strand. So the coding DNA is:

   > codingdna <- reverseComplement(dna)
   > codingdna
     5-letter "DNAString" instance
   seq: GTATA

Note that letters in this short sequences are on the minus strand of chr9
but now at positions 14279138+2 to 14279138-2 in this order. The SNP is
at position 14279138 (letter A), and the set of alleles originally
reported for the plus strand (C, G, T) now becomes G, C and A.
I don't know if the SNP belongs to a codon (this would need to be checked)
but in case it did, I would also need to know its position in the codon.
If it's at position 1:

   GTATA
     123

   > GENETIC_CODE[c("ATA", "CTA", "GTA")]
   ATA CTA GTA
   "I" "L" "V"

so no alternate allele is synonymous to the reference allele for this SNP.
If it's at position 2:

   GTATA
    123

   > GENETIC_CODE[c("TAT", "TCT", "TGT")]
   TAT TCT TGT
   "Y" "S" "C"

same conclusion.
But if it's at position 3:

   GTATA
   123

   > GENETIC_CODE[c("GTA", "GTC", "GTG")]
   GTA GTC GTG
   "V" "V" "V"

then all alleles are synonymous.

Hope this helps,

H.

>  
> Appreciate your kind attention on this query.
>  
> Kind Regards,
>  
> Praveen Surendran
> Shields Lab.
> School of Medicine & Medical Science.
> Complex & Adaptive Systems Laboratory (CASL).
> 8 Belfield Office.
> University College Dublin (UCD).
> Dublin 4, Ireland.
> Mob : +353 8793 13071
> Off : +353 171 65334
> 
> ------------------------------------------------------------------------
> Unlimited freedom, unlimited storage. Get it now 
> <http://in.rd.yahoo.com/tagline_mail_2/*http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list