[BioC] variantAnnotation: alternative GENETIC_CODE, and circular chromosomes?
Taylor, Sean D
sdtaylor at fhcrc.org
Sun Feb 9 01:24:31 CET 2014
Valerie,
I am excited about these changes and can't wait to use them.
For clarification, to use genetic.code, I would pass in an argument like this:
predictCoding(..., genetic.code=mycode)
where 'mycode' is a named vector of same format as GENETIC_CODE used in translate().
I haven't been using devel in while. Any idea when this might make its way into release?
Thanks,
Sean
> -----Original Message-----
> From: Valerie Obenchain [mailto:vobencha at fhcrc.org]
> Sent: Tuesday, February 04, 2014 11:38 AM
> To: Pages, Herve; Young, Janet; bioconductor at r-project.org; Taylor, Sean D
> Subject: Re: [BioC] variantAnnotation: alternative GENETIC_CODE, and
> circular chromosomes?
>
> Hi Janet,
>
> Last week we enabled findOverlaps(..., type='within') to work on circular
> chromosomes. It was this restriction that prevented
> locateVariants() and predictCoding() from handling ChrM.
> VariantAnnotation 1.9.34 in devel has the most recent changes.
>
> The output of locateVariants() includes ChrM because the function is
> reporting where the range falls wrt the gene (coding, utr, intron, etc.).
> predictCoding() however only reports coding variants. If the annotation you
> supply does not have any coding regions for ChrM or if the ranges don't fall in
> the coding regions then none will be reported.
> To confirm your annotation has coding regions for ChrM:
>
> cds <- cdsBy(txdb)
> cds[seqnames(cds) %in% "chrM"] ## or whatever the proper name is
>
> #1:
> I've added support for the 'genetic.code' and 'if.fuzzy.codon' args to
> translate(). To use a different genetic code just pass the named arg
> ('genetic.code') to predictCoding().
>
> #2:
> We've tried to handle the issue of ChrM through making findOverlaps()
> behave appropriatly with circular chromosomes. The 'type' argument has
> several options (start, end, any, within, equal). This allows quite a bit of
> flexibility. When the annotation has an ORF that spans the start/end
> findOverlaps() will still behave appropriately according to 'type'.
>
> ## annotation with seqlength 9
> genes <- GRanges(seqnames=rep.int("A", 4),
> IRanges(start=c(2, 4, 6, 8), width=3))
> seqinfo(genes) <- Seqinfo(seqnames="A", seqlengths=9, isCircular=TRUE)
>
> ## both ranges span the start/end
> ranges <- GRanges(seqnames=rep.int("A", 2),
> IRanges(9, width=c(2, 4)))
>
> >> findOverlaps(ranges, genes, type="any")
> > Hits of length 3
> > queryLength: 2
> > subjectLength: 4
> > queryHits subjectHits
> > <integer> <integer>
> > 1 1 4
> > 2 2 1
> > 3 2 4
>
> >> findOverlaps(ranges, genes, type="within")
> > Hits of length 1
> > queryLength: 2
> > subjectLength: 4
> > queryHits subjectHits
> > <integer> <integer>
> > 1 1 4
>
> With the combination of findOverlaps() now working on circular
> chromosomes for all values of 'type' and Herve adding the new genetic codes
> to Biostrings there should be no need to ignore ChrM. If you run into trouble
> or if anything look strange please let us know.
>
> Thanks.
> Valerie
>
>
>
> On 02/04/2014 04:08 AM, Hervé Pagès wrote:
> > Hi Janet,
> >
> > On 02/03/2014 07:47 PM, Janet Young wrote:
> >> Hi there, (I think it'll probably be Valerie looking at this
> >> question
> >> - hi Valerie),
> >>
> >> I'm just beginning to look at using VariantAnnotation to annotate
> >> some SNPs I've called on some yeast data (sacCer3). I can see this
> >> will be a really useful package for me - thanks!
> >>
> >> I can see that chrM (mitochiondrial) SNPs are currently not included
> >> in the output of predictCoding, and then using locateVariants, all of
> >> chrM SNPs get annotated as intergenic/NA (with a warning, that we
> >> ignore circular chromosomes). I can understand why that is -
> >> circular chromosomes, and a different genetic code make it trickier. Fair
> enough.
> >>
> >> I'm wondering what the prospects are regarding chrM SNPs in the
> >> future
> >> - any plans to include those later?
> >>
> >> I'm also wondering whether I can use some hacks to get chrM SNPs
> >> annotated. Two questions/potential issues related to that I wanted to
> >> ask you guys about:
> >>
> >> 1. are alternative codon tables already supported anywhere in
> >> Bioconductor? Using "?GENETIC_CODE" it looks like this is defined
> >> in Biostrings, and it looks like only the standard nuclear code is
> >> defined. Are the various alternative genetic codes defined anywhere?
> >> For this project, I'm interested in the yeast mitochondrial code, and
> >> for another I'm interested in the fly mitochondrial code. It'd be
> >> great if we could have all the codes available (I've got another
> >> project looking at ciliate nuclear sequences, for example - not
> >> working with translations yet, but maybe later...)
> >>
> >> With a little work, I'll be able to save flat files from NCBI
> >> (http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), and read
> >> those in and transform them to a character vector that looks like
> >> GENETIC_CODE. But I realise it might be something useful to have
> >> encoded more centrally, so thought I'd ask.
> >
> > What a timely question! I'll let Val answer the questions about
> > support of mitochiondrial DNA in predictCoding() but I can answer that
> > particular one. Last week I added a bunch of non standard genetic
> > codes to Biostrings (2.31.12). To get the genetic code for Yeast
> > Mitochondrial, do:
> >
> > > getGeneticCode("SGC2")
> > TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG
> CTT
> > CTC CTA CTG
> > "F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "W" "W" "T"
> > "T" "T" "T"
> > CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG
> ACT
> > ACC ACA ACG
> > "P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "M" "M" "T"
> > "T" "T" "T"
> > AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA
> GCG GAT
> > GAC GAA GAG
> > "N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D"
> > "D" "E" "E"
> > GGT GGC GGA GGG
> > "G" "G" "G" "G"
> >
> > Its format is the same as for GENETIC_CODE. See ?GENETIC_CODE for the
> > details.
> >
> > I also added the 'genetic.code' arg to translate() so you can supply
> > an alternate genetic code to use for translation. See ?translate for
> > the details.
> >
> > Please let me know if you find any issues, have questions, or want to
> > suggest improvements to these new features.
> >
> > Thanks,
> > H.
> >
> >>
> >> 2. What issues should I think about for the circular chromosomes?
> >> I'm thinking of a slightly hacky solution where I ignore any
> >> annotated ORFs that wrap around from the end of the chromosome to
> the
> >> beginning, and then just treating it as a linear chromosome.
> >> Actually, in my case (using sacCer3) there are no ORFs spanning the
> >> break in the circular chromosome, so I don't think I'll miss any
> >> annotations. Turns out the same is true for human (hg19 knownGene
> >> annotations), so maybe the circular chromosome issue isn't such a big
> >> issue after all?
> >>
> >> It seems like that should work, but any thoughts from you - you've
> >> thought about these questions a lot more than I have?
> >>
> >> Looking forward to hearing any thoughts you have. I know sometimes
> >> people just ignore the chrM SNPs, but it'd be nice to take a slightly
> >> more comprehensive approach if possible.
> >>
> >> thanks in advance for any input you have,
> >>
> >> Janet
> >>
> >>
> >> -------------------------------------------------------------------
> >>
> >> Dr. Janet Young
> >>
> >> Malik lab
> >> http://research.fhcrc.org/malik/en.html
> >>
> >> Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Avenue N., A2-025,
> >> P.O. Box 19024, Seattle, WA 98109-1024, USA.
> >>
> >> tel: (206) 667 4512
> >> email: jayoung ...at... fhcrc.org
> >>
> >> -------------------------------------------------------------------
> >>
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
>
>
> --
> Valerie Obenchain
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B155
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: vobencha at fhcrc.org
> Phone: (206) 667-3158
> Fax: (206) 667-1319
More information about the Bioconductor
mailing list